Kalasa is a NoSQL database and provides more data structures for ease of use.

bottle-kv-storage

Kalasa

Kalasa is a NoSQL database and provides more data structures for ease of use.

Go Report Card Release License Go Reference codecov DeepSource


简体中文 | English


特 性

  • 嵌入的存储引擎
  • 数据可以加密存储
  • 可以自定义实现存储加密器
  • 即使数据文件被拷贝,也保证存储数据的安全
  • 未来索引数据结构也可以支持自定义实现

快速开始

1. 项目介绍

2. 基本操作

3. 数据加密

4. 散列函数

5. 索引大小

6. 配置信息

7. 数据目录

8. 后续计划

9. 贡献指南


简 介

首先要说明的是Bottle是一款KV嵌入式存储引擎,并非是一款KV数据库,我知道很多人看到了KV认为是数据库,当然不是了,很多人会把这些搞混淆掉,KV 存储可以用来存储很多东西,而并非是数据库这一领域。可以这么理解数据库是一台汽车,那么Bottle是一台车的发动机。可以简单理解Bottle是一个对操作系统文件系统的KV抽象化封装,可以基于Bottle 做为存储层,在Bottle层之上封装一些数据结构和对外服务的协议就可以实现一个数据库。

层次架构图

本项目功能实现完全基于 bitcask 论文所实现,另外本项目所用到一些知识和卡内基梅隆大学CMU 15-445: Database Systems 课程内容很接近,这门课由数据库领域的大牛Andy Pavlo讲授,有感兴趣的朋友可以去看看这套课,如果觉得不错你可以给我按一颗小星谢谢。

查看资料


安装Bottle

你只需要在你的项目中安装Bottle模块即可使用:

go get -u github.com/auula/bottle

基本API

如何操作一个Bottle实例代码:

package main

import (
	"fmt"
	"github.com/auula/bottle"
)

func init() {
	// 通过默认配置打开一个存储实例
	err := bottle.Open(bottle.DefaultOption)
	// 并且处理一下可能发生的错误
	if err != nil {
		panic(err)
	}
}

// Userinfo 测试数据结构
type Userinfo struct {
	Name  string
	Age   uint8
	Skill []string
}

func main() {

	// PUT Data
	bottle.Put([]byte("foo"), []byte("66.6"))

	// 如果转成string那么就是字符串
	fmt.Println(bottle.Get([]byte("foo")).String())

	// 如果不存在默认值就是0
	fmt.Println(bottle.Get([]byte("foo")).Int())

	// 如果不成功就是false
	fmt.Println(bottle.Get([]byte("foo")).Bool())

	// 如果不成功就是0.0
	fmt.Println(bottle.Get([]byte("foo")).Float())

	user := Userinfo{
		Name:  "Leon Ding",
		Age:   22,
		Skill: []string{"Java", "Go", "Rust"},
	}

	var u Userinfo

	// 通过Bson保存数据对象,并且设置超时时间为5秒,TTL超时可以不设置看需求
	bottle.Put([]byte("user"), bottle.Bson(&user), bottle.TTL(5))

	// 通过Unwrap解析出结构体
	bottle.Get([]byte("user")).Unwrap(&u)

	// 打印取值
	fmt.Println(u)

	// 删除一个key
	bottle.Remove([]byte("foo"))

	// 关闭处理一下可能发生的错误
	if err := bottle.Close(); err != nil {
		fmt.Println(err)
	}
}

加密器

数据加密器是针对数据的value记录的,也就是针对字段级别的区块加密,并非是把整个文件加密一遍,那样设计会带来性能消耗,所以采用区块数据段方式加密的方式。

下面例子是通过bottle.SetEncryptor(Encryptor,[]byte) 函数去设置数据加密器并且配置16位的数据加密秘钥。

func init() {
    bottle.SetEncryptor(bottle.AES(), []byte("1234567890123456"))
}

你也可以自定义去实现数据加密器的接口:

// SourceData for encryption and decryption
type SourceData struct {
    Data   []byte
    Secret []byte
}

// Encryptor used for data encryption and decryption operation
type Encryptor interface {
    Encode(sd *SourceData) error
    Decode(sd *SourceData) error
}

下面代码就是内置AES加密器的实现代码,实现bottle.Encryptor 接口即可,数据源为bottle.SourceData 结构体字段:

// AESEncryptor Implement the Encryptor interface
type AESEncryptor struct{}

// Encode source data encode
func (AESEncryptor) Encode(sd *SourceData) error {
    sd.Data = aesEncrypt(sd.Data, sd.Secret)
    return nil
}

// Decode source data decode
func (AESEncryptor) Decode(sd *SourceData) error {
    sd.Data = aesDecrypt(sd.Data, sd.Secret)
    return nil
}

具体的加密器实现代码可以查看encrypted.go

散列函数

如果你需要自定义实现散列函数,实现bottle.Hashed 接口即可:

type Hashed interface {
    Sum64([]byte) uint64
}

然后通过内置的bottle.SetHashFunc(hash Hashed) 设置即可完成你的散列函数配置。

索引大小

索引预设置的大小很大程度上会影响你的程序存取和读取数据的速度,如果在初始化的时候能够预计出程序运行时需要的索引大小,并且在初始化的时候配置好,可以减小程序在运行过程中带来的运行数据迁移和扩容带来的性能问题。

func init() {
    // 设置索引大小 
    bottle.SetIndexSize(1000)
}

配置信息

你也可以不使用默认配置,你可以使用内置的bottle.Option 的结构体初始化你存储引擎,配置实例如下:

func init() {
        // 自定义配置信息
        option := bottle.Option{
        // 工作目录
        Directory:       "./data",
        // 算法开启加密
        Enable:          true,
        // 自定义秘钥,可以使用内置的秘钥
        Secret:          bottle.Secret,
        // 自定义数据大小,存储单位是kb
        DataFileMaxSize: 1048576,
    }
    // 通过自定义配置信息
    bottle.Open(option)
}

当然也可以使用内置的bottle.Load(path string) 函数加载配置文件启动Bottle ,配置文件格式为yaml,可配置项如下:

# Bottle config options
Enable: TRUE
Secret: "1234567890123456"
Directory: "./testdata"
DataFileMaxSize: 536870912

需要注意的是内置的加密器实现的秘钥必须是16 位,如果你是自定义实现的加密器可通过bottle.SetEncryptor(Encryptor,[]byte) 设置你自定义的加密器,那这个秘钥位数将不受限制。

数据目录

由于bottle设计就是基于但进程的程序,所以每个存储实例对应是一个数据目录,data为日志合并结构数据目录,index为索引数据版本。

日志合并结构数据目前版本是每次数据启动时候进行合并,默认是data数据文件夹下的所有数据文件占用总和超过1GB就会触发一次合并,合并之后没有用的数据被丢弃。

当然如果未达到脏数据合并要求,数据文件会以启动时候配置的大小进行归档,每个数据有版本号,并且被设置为只读挂载,进程工作目录结构如下:

./testdata
├── data
│   └── 1.data
└── index
    ├── 1646378326.index
    └── 1646378328.index

2 directories, 3 files

当存储引擎开始工作的时候,这个目录下的所以文件夹和文件只能被这个进程操作,保证数据安全。

后续维护

  • Bottle目前不支持多数据存储分区,后续版本会引入一个Bucket概念,未来可以把指定的数据存储到指定的分区中,来降低并发的时候索引锁的颗粒度。
  • 后续将引入零拷贝技术,当前文件操作很大程度上依赖于操作系统,当前文件必须sync才能保证数据一致性。
  • 脏数据合并可以在运行中进行合并整理,基于信号量的方式通知垃圾回收工作线程。

Star History

Star History Chart

其他信息

如果你发现了bug欢迎提issue或者发起pull request,我收到了消息会尽快回复你,另外欢迎各位Gopher提出自己意见,或者贡献做自己的代码也是可以的,另外我们也非常大家进入群进行存储相关技术交流。

WeChat

鸣谢Contributions

Similar Resources

Monitor your Website and APIs from your Computer. Get Notified through Slack, E-mail when your server is down or response time is more than expected.

Monitor your Website and APIs from your Computer. Get Notified through Slack, E-mail when your server is down or response time is more than expected.

StatusOK Monitor your Website and APIs from your computer.Get notified through Slack or E-mail when your server is down or response time is more than

Dec 27, 2022

Bootstrap curated Kubernetes stacks. Logging, metrics, ingress and more - delivered with gitops.

Gimlet Stack Bootstrap curated Kubernetes stacks. Logging, metrics, ingress and more - delivered with gitops. You can install logging aggregators, met

Dec 1, 2021

Kubei is a flexible Kubernetes runtime scanner, scanning images of worker and Kubernetes nodes providing accurate vulnerabilities assessment, for more information checkout:

Kubei is a flexible Kubernetes runtime scanner, scanning images of worker and Kubernetes nodes providing accurate vulnerabilities assessment, for more information checkout:

Kubei is a vulnerabilities scanning and CIS Docker benchmark tool that allows users to get an accurate and immediate risk assessment of their kubernet

Dec 30, 2022

Raspberry Pi Archlinux Automated Offline Installer with Wi-Fi. Windows, Mac and more features coming.

Raspberry Pi Archlinux Automated Offline Installer with Wi-Fi. Windows, Mac and more features coming.

Raspberry Pi Archlinux Automated Installer with Wi-Fi. Windows, Mac and more features coming. Download Go to releases page and download the zip file f

Nov 22, 2022

🔥 🔥 Open source cloud native security observability platform. Linux, K8s, AWS Fargate and more. 🔥 🔥

🔥 🔥   Open source cloud native security observability platform. Linux, K8s, AWS Fargate and more. 🔥 🔥

CVE-2021-44228 Log4J Vulnerability can be detected at runtime and attack paths can be visualized by ThreatMapper. Live demo of Log4J Vulnerability her

Jan 1, 2023

azqlite is a lightweight wrapper around Azure's SDK to interact with the Azure Storage Queue service in a simpler and more idiomatic way.

azqlite azqlite is a lightweight wrapper around github.com/Azure/azure-storage-queue-go to interact with the Azure Storage Queue service in a simpler

Mar 12, 2022

Kstone is an etcd management platform, providing cluster management, monitoring, backup, inspection, data migration, visual viewing of etcd data, and intelligent diagnosis.

Kstone is an etcd management platform, providing cluster management, monitoring, backup, inspection, data migration, visual viewing of etcd data, and intelligent diagnosis.

Kstone 中文 Kstone is an etcd management platform, providing cluster management, monitoring, backup, inspection, data migration, visual viewing of etcd

Dec 27, 2022

more simple "ps aux" / "ps -ef"

simple-ps more simple "ps" (unix) Works on windows and linux but be mindfull that for windows you have to include ".exe" in the name to use the -n arg

Feb 3, 2022
Comments
  • 并发读写宕机

    并发读写宕机

    1、下面代码并发读宕机 2、非正常退出数据脏 不可用,下述代码宕机后需要删除上次产生的文件 3、建议Load函数失败建议强制panic,所有的err均应被处理 4、建议使用相对路径的如Load建议内部转化为绝对路径,或至少保存相对路径的前缀

    config.yaml内容如下

    # Bottle config options
    Enable: TRUE
    Secret: "1234567890123456"
    Directory: "./testdata"
    DataFileMaxSize: 536870912
    
    import (
    	"fmt"
    	"github.com/auula/bottle"
    	"strconv"
    	"sync"
    )
    
    func init() {
    	//bottle.Open(bottle.DefaultOption)
    	//
    	//option := bottle.Option{
    	//	Directory:       "./data",
    	//	Enable:          true,
    	//	Secret:          bottle.Secret,
    	//	DataFileMaxSize: 1048576,
    	//}
    
    	if err := bottle.Load("/home/fxy/go/src/TestTest/testBottle/config.yaml");err != nil {
    		fmt.Println(err)
    	}
    	bottle.SetIndexSize(1000000)
    }
    
    
    
    func main() {
    	wg := sync.WaitGroup{}
    
    	for i:=0;i<100;i++{
    		wg.Add(1)
    		go func(m int){
    			defer wg.Done()
    			for j :=0;j<1000;j++{
    				k := strconv.Itoa(m*1000+j)
    				v := strconv.Itoa(m*1000+j)
    				if err := bottle.Put([]byte(k), []byte(v));err!= nil {
    					fmt.Println(err,k,v)
    				}
    			}
    		}(i)
    	}
    	wg.Wait()
    
    	for i:=0;i<100;i++{
    		wg.Add(1)
    		go func(m int){
    			defer wg.Done()
    			for j :=0;j<1000;j++{
    				k := strconv.Itoa(m*1000+j)
    				v := strconv.Itoa(m*1000+j)
    				d := bottle.Get([]byte(k))
    				if d.Err!= nil {
    					fmt.Println("Get",d.Err,k,v)
    				}else if string(d.Value)!=v{
    					fmt.Println("GGGGG",string(d.Value),v)
    				}
    			}
    		}(i)
    	}
    	wg.Wait()
    
    	if err := bottle.Close(); err != nil {
    		fmt.Println(err)
    	}
    }
    
  • 重复执行put出现panci

    重复执行put出现panci

    
    package main
    
    import (
    	"github.com/auula/bottle"
    	"log"
    )
    
    
    func main() {
    	err := bottle.Open(bottle.DefaultOption)
    	if err != nil {
    		log.Fatalln(err)
    	}
    
    	err = bottle.Put([]byte("username1"), []byte("yikela123"))
    	if err != nil {
    		log.Fatalln(err)
    	}
    
    }
    
    
    

    第一次没生成data目录之前,运行没问题,第一次运行生成了data目录后,再执行,就panic

    
     go run test27.go
    panic: runtime error: index out of range [-1]
    
    goroutine 1 [running]:
    github.com/auula/bottle.findLatestIndexFile(0x1400014fd48, 0x100eaea28, 0x0)
            /Users/yiwenshuo/WorkSpace/go/pkg/mod/github.com/auula/[email protected]/bottle.go:561 +0x448
    github.com/auula/bottle.readIndexItem(0x0, 0x0)
            /Users/yiwenshuo/WorkSpace/go/pkg/mod/github.com/auula/[email protected]/bottle.go:566 +0x30
    github.com/auula/bottle.buildIndex(0x1400000e070, 0x0)
            /Users/yiwenshuo/WorkSpace/go/pkg/mod/github.com/auula/[email protected]/bottle.go:512 +0x20
    github.com/auula/bottle.recoverData(0x1400001c978, 0x7)
            /Users/yiwenshuo/WorkSpace/go/pkg/mod/github.com/auula/[email protected]/bottle.go:410 +0xc0
    github.com/auula/bottle.Open(0x1400001c978, 0x7, 0x2800, 0x0, 0x0, 0x0, 0x1400004a768, 0x100e338b4)
            /Users/yiwenshuo/WorkSpace/go/pkg/mod/github.com/auula/[email protected]/bottle.go:132 +0xb4
    main.main()
            /Users/yiwenshuo/WorkSpace/go/src/github.com/zeusYi/myGolibs/临时测试/code/test27.go:20 +0x44
    exit status 2
    
    
    
  • 发现Stat()过于耗时

    发现Stat()过于耗时

    Stat 耗时

    fileInfo, _ := active.Stat()
    

    用pprof观察,发现这个系统调用占用了很多的时间,作者大大有考虑过文件打开后自己维护文件大小计数,优化一下代码吗。

    pprof统计

    image

    环境

    操作系统:win10 处理器: i5-1135G7 @ 2.40GHz
    内存:16 GB

    测试代码

    package main
    
    import (
    	"fmt"
    	"log"
    	"os"
    	"runtime/pprof"
    
    	"github.com/auula/bottle"
    )
    
    func main() {
    	f, err := os.Create("./cpu.pprof")
    	if err != nil {
    		log.Fatal(err)
    	}
    	defer f.Close()
    	pprof.StartCPUProfile(f)
    	defer pprof.StopCPUProfile()
    
    	if err := bottle.Load("../config.yaml"); err != nil {
    		panic(err)
    	}
    	defer bottle.Close()
    
    	for i := 0; i < 1e7; i++ {
    		err = bottle.Put([]byte(fmt.Sprintf("key%d", 1)), []byte{1, 23})
    		if err != nil {
    			log.Panic(err)
    
    		}
    	}
    
    }
    
    
  • 关于dataFileVersion的一点疑问

    关于dataFileVersion的一点疑问

    您好, 我有一点疑问想请教一下您,dataFileVersion是一直递增的吗?如果在migrate前,数据文件目录下有1.data、2.data、3.data、4.data,并且这4个数据文件可以被整合为3个数据文件。那么migrate后,这四个旧的数据文件就会被删除,会生成3个新的数据文件,此时数据文件目录下为5.data、6.data、7.data。是这样的对吗?我在本地安装mod失败,所以没法运行观察

    是这样的话,有什么办法使得migrate后,数据文件目录下为1.data、2.data、3.data,而不是为5.data、6.data、7.data吗? 如果是在migrate后再修改这三个新的数据文件的名字,那么index中的索引就失效了,想请教一下您有什么好的办法?

ip-masq-agent-v2 aims to solve more specific networking cases, allow for more configuration options, and improve observability compared to the original.

ip-masq-agent-v2 Based on the original ip-masq-agent, v2 aims to solve more specific networking cases, allow for more configuration options, and impro

Aug 31, 2022
Kubectl plugin to ease sniffing on kubernetes pods using tcpdump and wireshark
Kubectl plugin to ease sniffing on kubernetes pods using tcpdump and wireshark

ksniff A kubectl plugin that utilize tcpdump and Wireshark to start a remote capture on any pod in your Kubernetes cluster. You get the full power of

Jan 4, 2023
Stackie enables developers to configure their local environment/toolchain with ease.

Stackie enables developers to configure their local environment/toolchain with ease. Made for Pulumi CLI, Google Cloud Platform (gcloud), and Amazon Web Services (aws-cli).

Sep 10, 2021
KinK is a helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Designed to ease clusters up for fast testing with batteries included in mind.
KinK is a helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Designed to ease clusters up for fast testing with batteries included in mind.

kink A helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Table of Contents kink (KinD in Kubernetes) Introduction How it works ?

Dec 10, 2022
Fancy Git Clone that preserves directory structures

git go-clone This is fancy wrapper around git clone that preserves directory structures. For example, if you have some complex organization, and you w

Sep 24, 2021
CelloDB is an easy-to-use database system implemented in Go

CelloDB CelloDB is the new, easy way to create and query databases. Table of Contents Table of Contents Features More Coming Soon! Features ❌ Querying

Dec 30, 2021
A seed repository that contains a Go project that accepts input via a REST API and saves data to an Oracle database.

rest-oracle-go-seed A seed repository that contains a Go project that accepts input via a REST API and saves data to an Oracle database. Why Oracle? T

Apr 18, 2022
Extypes - Extra data types useful for database

ExTypes Extra data types useful for database JSON Object JSON Object is useful f

Jan 27, 2022
[WIP] Cheap, portable and secure NAS based on the Raspberry Pi Zero - with encryption, backups, and more

PortaDisk - Affordable Raspberry Pi Portable & Secure NAS Project Project Status: Early work in progress. web-unlock is still not ready for production

Nov 23, 2022
Terraform utility provider for constructing bash scripts that use data from a Terraform module

Terraform Bash Provider This is a Terraform utility provider which aims to robustly generate Bash scripts which refer to data that originated in Terra

Sep 6, 2022