A distributed key value store in under 1000 lines. Used in production at comma.ai

minikeyvalue

Tests

Fed up with the complexity of distributed filesystems?

minikeyvalue is a ~1000 line distributed key value store, with support for replication, multiple machines, and multiple drives per machine. Optimized for values between 1MB and 1GB. Inspired by SeaweedFS, but simple. Should scale to billions of files and petabytes of data. Used in production at comma.ai.

A key part of minikeyvalue's simplicity is using stock nginx as the volume server.

Even if this code is crap, the on disk format is super simple! We rely on a filesystem for blob storage and a LevelDB for indexing. The index can be reconstructed with rebuild. Volumes can be added or removed with rebalance.

API

  • GET /key
    • 302 redirect to nginx volume server.
  • PUT /key
    • Blocks. 201 = written, anything else = probably not written.
  • DELETE /key
    • Blocks. 204 = deleted, anything else = probably not deleted.

It also now supports a subset of S3 requests, so some S3 libraries will be somewhat compatible.

Start Volume Servers (default port 3001)

# this is just nginx under the hood
PORT=3001 ./volume /tmp/volume1/ &;
PORT=3002 ./volume /tmp/volume2/ &;
PORT=3003 ./volume /tmp/volume3/ &;

Start Master Server (default port 3000)

./mkv -volumes localhost:3001,localhost:3002,localhost:3003 -db /tmp/indexdb/ server

Usage

# put "bigswag" in key "wehave" (will 403 if it already exists)
curl -v -L -X PUT -d bigswag localhost:3000/wehave

# get key "wehave" (should be "bigswag")
curl -v -L localhost:3000/wehave

# delete key "wehave"
curl -v -L -X DELETE localhost:3000/wehave

# unlink key "wehave", this is a virtual delete
curl -v -L -X UNLINK localhost:3000/wehave

# list keys starting with "we"
curl -v -L localhost:3000/we?list

# list unlinked keys ripe for DELETE
curl -v -L localhost:3000/?unlinked

# put file in key "file.txt"
curl -v -L -X PUT -T /path/to/local/file.txt localhost:3000/file.txt

# get file in key "file.txt"
curl -v -L -o /path/to/local/file.txt localhost:3000/file.txt

./mkv Usage

Usage: ./mkv <server, rebuild, rebalance>

  -db string
        Path to leveldb
  -fallback string
        Fallback server for missing keys
  -port int
        Port for the server to listen on (default 3000)
  -protect
        Force UNLINK before DELETE
  -replicas int
        Amount of replicas to make of the data (default 3)
  -subvolumes int
        Amount of subvolumes, disks per machine (default 10)
  -volumes string
        Volumes to use for storage, comma separated

Rebalancing (to change the amount of volume servers)

# must shut down master first, since LevelDB can only be accessed by one process
./mkv -volumes localhost:3001,localhost:3002,localhost:3003 -db /tmp/indexdb/ rebalance

Rebuilding (to regenerate the LevelDB)

./mkv -volumes localhost:3001,localhost:3002,localhost:3003 -db /tmp/indexdbalt/ rebuild

Performance

# Fetching non-existent key: 116338 req/sec
wrk -t2 -c100 -d10s http://localhost:3000/key

# go run thrasher.go
starting thrasher
10000 write/read/delete in 2.620922675s
thats 3815.40/sec
Owner
George Hotz
We will win self driving cars.
George Hotz
Comments
  • Feature/basic auth

    Feature/basic auth

    I have added basic auth.

    Keep in mind this is my first attempt at writing go. I'm sure there is a easier way to add this functionality. I tried to make the cleanest possible with zero go experience. That being said I don't except the pull request to be accepted right away.

    I will accept every suggestion.

    For reading and validating basic auth files I used https://github.com/tg123/go-htpasswd

    I think https://github.com/abriosi/minikeyvalue/blob/feature/basicAuth/volume#L9-L123 can be improved but my bash scripting sucks.

    It passes the tests. I think this is the right time to ask for feedback

  • Simplify error creation with `fmt.Errorf`

    Simplify error creation with `fmt.Errorf`

    Description

    Hi :wave: I ran the DeepSource static analyzer on the forked copy of this repo and found some interesting code quality issues. This PR fixes a few of them.

    Summary of fixes

    • Simplify error creation with fmt.Errorf
    • added .deepsource.toml fle
  • refactor: move from io/ioutil to io and os packages

    refactor: move from io/ioutil to io and os packages

    This PR introduce two small changes:

    1. Use actions/setup-go instead of installing the golang package from apt. This speeds up the workflow and allows us to update the Go version easily (packages provided by Ubuntu are often several releases behind the latest version).

    2. The io/ioutil package has been deprecated in Go 1.16 (See https://golang.org/doc/go1.16#ioutil). This PR replaces the existing io/ioutil functions with their new definitions in io and os packages.

  • MD5

    MD5

    There is no requirement for a cryptographic hash to use, right? Let us use a non cryptographic hash then like xxhash or murmur3 as they are much faster.

    I am happy to send a PR in case we agree :-)

  • Updated readme with docker-compose Instructions

    Updated readme with docker-compose Instructions

    In order to test out minikeyvalue for my use case, I decided to try it out using docker-compose. I thought that the following addition could assist those in a similar position to get up and running quickly.

    Instead of adding a new docker compose file I just just editing the readme would be sufficient for such a change.

  • Just say Hi from SeaweedFS

    Just say Hi from SeaweedFS

    Hi, George,

    You are one of the guys that I respect. I was watching the youtube video https://www.youtube.com/watch?v=iwcYp-XT7UI where Lex Fridman interviewed you, and you mentioned SeaweedFS for 0.5 seconds. :)

    I work on SeaweedFS. And I wanted to learn your approach to file storage. In another coding session, you mentioned SeaweedFS has some bugs. If you still remember the exact bugs, please let me know.

    Thanks and keep up the nice work!

    Chris

  • Thank you. A quick question on adding a security layer

    Thank you. A quick question on adding a security layer

    Thank you very much for putting this repository together. Reading these lines of code has taught me a lot. Simple, scalable and structured.

    How do you add a security layer to this filesystem in case you need to access it from other services which are not on the same network:

    1. Do you create a VPN? (wouldn't this bottleneck the distributed nature of PUTs and GETs since all traffic would have to me routed by the VPN server?)
    2. A reverse proxy (same problem has 1.)
    3. Do you add authentication, such as, Basic Authentication together with https?
    4. Is there a simpler solution I'm missing?
  • Question about stored file name

    Question about stored file name

    I'm not clear why the stored file name is not set to the requested file name like this:

    fmt.Sprintf("/%02x/%02x/%s", mkey[0], mkey[1], key)

    So that we can simply download the files as their default file name.

  • Please do Gofmt to *.go files in the project

    Please do Gofmt to *.go files in the project

    Please do gofmt to the project (*.go files) because some contributors may use Goland (with File Watchers + Auto gofmt) or VSCode (with go plugins including gofmt so it will auto gofmt when saved the file) so It may caused merge conflict as code formatting is not the same with manual format.

  • Fallback HEAD/GET requests faster when volume server is down

    Fallback HEAD/GET requests faster when volume server is down

    When a node goes down, HEAD requests are used to randomly find a volume server that is up, and it seems to take about 5 seconds to time out by default. This change makes the timeout configurable.

    This was a lot easier on newer versions of go, so I went ahead and updated things to ubuntu 20.04

  • allow 404 response from volume server when deleting

    allow 404 response from volume server when deleting

    when a PUT request fails due to a volume server being down a record is left in the database (in soft delete state) and therefore a DELETE request subsequently fails because it issues a request to the volume server that was down (which returns a 404).

    Seems like if there is no file to delete (note that the node does have to be up to get a 404 response) the delete should succeed (and subsequently clean up the record in the database).

  • optimize WriteToReplicas for better performance

    optimize WriteToReplicas for better performance

    creates a buffered reader to read the value, also calculates the hash of the value before writing to the reps if needed and writes to the replicas in parallel using goroutines uses channels to wait for all of them to complete

  • Data integrity feature?

    Data integrity feature?

    Just a suggestion for implementing (tell me if it doesn't make sense for the project):

    • File integrity: Append the hash of of the previous index value of data (SHA-256) to the newest index value. This prevents tampering with file contents, because every machine can check to make sure that the appended block hash of some index is equal to the block hash of the (index - 1) contents.

    Further clarification with this image: image

    Keep in mind that h_0 represents hash of the newest data || h_1, and h_1 is the hash of data || h_2, and so on. This nested check ensures file integrity.

    Mutability requires a re-computation of the hashes, but can only be done with a key.

    Feedback? Will this fit in the 1000-line requirement?

  • Parallelize writes to replicas

    Parallelize writes to replicas

    • Writes are done sequentially, don't see a reason why they should;
    • Simplify sync.WaitGroup usage: no need to bump the counter on each task, only per goroutine;
    • Panic on failure to bind to a port instead of silently exiting;
  • do not write multi-part upload data to disk

    do not write multi-part upload data to disk

    Writing to /tmp will wear out your OS drive pretty fast pumping hundreds of terabytes into minikeyvalue. Since RAM is good enough for non-multipart uploads, it should be fine for multi-part uploads, too. Maybe we want to suggest using a RAM disk and add expiring partial uploads where the final PUT never happens within some time period?

  • replica 0 write failed: http://localhost:3001/sv07/60/08/L3dlaGF2ZQ==

    replica 0 write failed: http://localhost:3001/sv07/60/08/L3dlaGF2ZQ==

    Hi, I do this: curl -v -L -X PUT -d bigswag lo:3000/wehave, and server print: replica 0 write failed: http://localhost:3001/sv07/60/08/L3dlaGF2ZQ==

    then I go on: curl -v -L localhost:3000/bigswag or curl -v -L localhost:3000/wehave the result is:

    * About to connect() to localhost port 3000 (#0)
    *   Trying 127.0.0.1...
    * Connected to localhost (127.0.0.1) port 3000 (#0)
    > GET /wehave HTTP/1.1
    > User-Agent: curl/7.29.0
    > Host: localhost:3000
    > Accept: */*
    >
    < HTTP/1.1 404 Not Found
    < Content-Length: 0
    < Date: Sun, 31 Jan 2021 08:24:09 GMT
    <
    * Connection #0 to host localhost left intact
    

    I can't get the right value,

    Could you tell me what is wrong, please?

SeaweedFS a fast distributed storage system for blobs, objects, files, and data lake, for billions of files
SeaweedFS a fast distributed storage system for blobs, objects, files, and data lake, for billions of files

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

Jan 8, 2023
Kitten is a distributed file system optimized for small file storage, inspired by Facebook's Haystack.
Kitten is a distributed file system optimized for small file storage, inspired by Facebook's Haystack.

Kitten is a distributed file system optimized for small file storage, inspired by Facebook's Haystack.

Aug 18, 2022
A distributed key value store in under 1000 lines. Used in production at comma.ai

minikeyvalue Fed up with the complexity of distributed filesystems? minikeyvalue is a ~1000 line distributed key value store, with support for replica

Jan 9, 2023
Golang-key-value-store - Key Value Store API Service with Go DDD Architecture

This document specifies the tools used in the Key-Value store and reorganizes how to use them. In this created service, In-Memory Key-Value Service was created and how to use the API is specified in the HTML file in the folder named "doc"

Jul 31, 2022
Tapestry is an underlying distributed object location and retrieval system (DOLR) which can be used to store and locate objects. This distributed system provides an interface for storing and retrieving key-value pairs.

Tapestry This project implements Tapestry, an underlying distributed object location and retrieval system (DOLR) which can be used to store and locate

Mar 16, 2022
Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Olric Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service. With

Jan 4, 2023
A minimalist Go PDF writer in 1982 lines. Draws text, images and shapes. Helps understand the PDF format. Used in production for reports.
A minimalist Go PDF writer in 1982 lines. Draws text, images and shapes. Helps understand the PDF format. Used in production for reports.

one-file-pdf - A minimalist PDF generator in <2K lines and 1 file The main idea behind this project was: "How small can I make a PDF generator for it

Dec 11, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

Dec 28, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

Dec 30, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

Jan 9, 2023
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

Dec 30, 2022
Zero - If Google Drive says that 1 is under copyright, 0 must be under copyleft

zero Zero under copyleft license Google Drive's copyright detector says that fil

May 16, 2022
Akutan is a distributed knowledge graph store, sometimes called an RDF store or a triple store.

Akutan is a distributed knowledge graph store, sometimes called an RDF store or a triple store. Knowledge graphs are suitable for modeling data that is highly interconnected by many types of relationships, like encyclopedic information about the world. A knowledge graph store enables rich queries on its data, which can be used to power real-time interfaces, to complement machine learning applications, and to make sense of new, unstructured information in the context of the existing knowledge.

Jan 7, 2023
A project that provides an in-memory key-value store as a REST API. Also, it's containerized and can be used as a microservice.

Easy to Use In-Memory Key-Value Store A project that provides an in-memory key-value store as a REST API. Also, it's containerized and can be used as

Mar 6, 2022
A distributed key-value store. On Disk. Able to grow or shrink without service interruption.

Vasto A distributed high-performance key-value store. On Disk. Eventual consistent. HA. Able to grow or shrink without service interruption. Vasto sca

Jan 6, 2023
GhostDB is a distributed, in-memory, general purpose key-value data store that delivers microsecond performance at any scale.
GhostDB is a distributed, in-memory, general purpose key-value data store that delivers microsecond performance at any scale.

GhostDB is designed to speed up dynamic database or API driven websites by storing data in RAM in order to reduce the number of times an external data source such as a database or API must be read. GhostDB provides a very large hash table that is distributed across multiple machines and stores large numbers of key-value pairs within the hash table.

Jan 6, 2023
Distributed cache and in-memory key/value data store.

Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Dec 30, 2022
Distributed key-value store
Distributed key-value store

Keva Distributed key-value store General Demo Start the server docker-compose up --build Insert data curl -XPOST http://localhost:5555/storage/test1

Nov 15, 2021
A simple distributed key-value store by using hashicorp/raft

raftkv This repository holds a simple distributed key-value store by using hashicorp/raft. raftkv provides gRPC and HTTP APIs. Please take a look API

Nov 30, 2022
A simple go program which checks if your websites are running and runs forever (stop it with ctrl+c). It takes two optional arguments, comma separated string with urls and an interval.

uptime A simple go program which checks if your websites are running and runs forever (stop it with ctrl+c). It takes two optional arguments: -interva

Dec 15, 2022