Badger - Fast Key-Value DB in Go

BadgerDB Go Reference

This is a fork of dgraph-io/badger, maintained by the Outcaste team.

BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go. It is the underlying database for Dgraph, a fast, distributed graph database. It's meant to be a performant alternative to non-Go-based key-value stores like RocksDB.

Use outcaste-io/issues repository to file issues for Badger.

Project Status [March 24, 2020]

Badger is stable and is being used to serve data sets worth hundreds of terabytes. Badger supports concurrent ACID transactions with serializable snapshot isolation (SSI) guarantees. A Jepsen-style bank test runs nightly for 8h, with --race flag and ensures the maintenance of transactional guarantees. Badger has also been tested to work with filesystem level anomalies, to ensure persistence and consistency. Badger is being used by a number of projects which includes Dgraph, Jaeger Tracing, UsenetExpress, and many more.

The list of projects using Badger can be found here.

Badger v1.0 was released in Nov 2017, and the latest version that is data-compatible with v1.0 is v1.6.0.

Badger v2.0 was released in Nov 2019 with a new storage format which won't be compatible with all of the v1.x. Badger v2.0 supports compression, encryption and uses a cache to speed up lookup.

The Changelog is kept fairly up-to-date.

For more details on our version naming schema please read Choosing a version.

Table of Contents

Getting Started

Installing

To start using Badger, install Go 1.12 or above. Badger v2 needs go modules. Run the following command to retrieve the library.

$ go get github.com/outcaste-io/badger/v3

This will retrieve the library.

Installing Badger Command Line Tool

Download and extract the latest Badger DB release from https://github.com/outcaste-io/badger/releases and then run the following commands.

$ cd badger-<version>/badger
$ go install

This will install the badger command line utility into your $GOBIN path.

Choosing a version

BadgerDB is a pretty special package from the point of view that the most important change we can make to it is not on its API but rather on how data is stored on disk.

This is why we follow a version naming schema that differs from Semantic Versioning.

  • New major versions are released when the data format on disk changes in an incompatible way.
  • New minor versions are released whenever the API changes but data compatibility is maintained. Note that the changes on the API could be backward-incompatible - unlike Semantic Versioning.
  • New patch versions are released when there's no changes to the data format nor the API.

Following these rules:

  • v1.5.0 and v1.6.0 can be used on top of the same files without any concerns, as their major version is the same, therefore the data format on disk is compatible.
  • v1.6.0 and v2.0.0 are data incompatible as their major version implies, so files created with v1.6.0 will need to be converted into the new format before they can be used by v2.0.0.

For a longer explanation on the reasons behind using a new versioning naming schema, you can read VERSIONING.md.

Badger Documentation

Badger Documentation is available at https://dgraph.io/docs/badger

Resources

Blog Posts

  1. Introducing Badger: A fast key-value store written natively in Go
  2. Make Badger crash resilient with ALICE
  3. Badger vs LMDB vs BoltDB: Benchmarking key-value databases in Go
  4. Concurrent ACID Transactions in Badger

Design

Badger was written with these design goals in mind:

  • Write a key-value database in pure Go.
  • Use latest research to build the fastest KV database for data sets spanning terabytes.
  • Optimize for SSDs.

Badger’s design is based on a paper titled WiscKey: Separating Keys from Values in SSD-conscious Storage.

Comparisons

Feature Badger RocksDB BoltDB
Design LSM tree with value log LSM tree only B+ tree
High Read throughput Yes No Yes
High Write throughput Yes Yes No
Designed for SSDs Yes (with latest research 1) Not specifically 2 No
Embeddable Yes Yes Yes
Sorted KV access Yes Yes Yes
Pure Go (no Cgo) Yes No Yes
Transactions Yes, ACID, concurrent with SSI3 Yes (but non-ACID) Yes, ACID
Snapshots Yes Yes Yes
TTL support Yes Yes No
3D access (key-value-version) Yes4 No No

1 The WISCKEY paper (on which Badger is based) saw big wins with separating values from keys, significantly reducing the write amplification compared to a typical LSM tree.

2 RocksDB is an SSD optimized version of LevelDB, which was designed specifically for rotating disks. As such RocksDB's design isn't aimed at SSDs.

3 SSI: Serializable Snapshot Isolation. For more details, see the blog post Concurrent ACID Transactions in Badger

4 Badger provides direct access to value versions via its Iterator API. Users can also specify how many versions to keep per key via Options.

Benchmarks

We have run comprehensive benchmarks against RocksDB, Bolt and LMDB. The benchmarking code, and the detailed logs for the benchmarks can be found in the badger-bench repo. More explanation, including graphs can be found the blog posts (linked above).

Projects Using Badger

Below is a list of known projects that use Badger:

  • Dgraph - Distributed graph database.
  • Jaeger - Distributed tracing platform.
  • go-ipfs - Go client for the InterPlanetary File System (IPFS), a new hypermedia distribution protocol.
  • Riot - An open-source, distributed search engine.
  • emitter - Scalable, low latency, distributed pub/sub broker with message storage, uses MQTT, gossip and badger.
  • OctoSQL - Query tool that allows you to join, analyse and transform data from multiple databases using SQL.
  • Dkron - Distributed, fault tolerant job scheduling system.
  • smallstep/certificates - Step-ca is an online certificate authority for secure, automated certificate management.
  • Sandglass - distributed, horizontally scalable, persistent, time sorted message queue.
  • TalariaDB - Grab's Distributed, low latency time-series database.
  • Sloop - Salesforce's Kubernetes History Visualization Project.
  • Immudb - Lightweight, high-speed immutable database for systems and applications.
  • Usenet Express - Serving over 300TB of data with Badger.
  • gorush - A push notification server written in Go.
  • 0-stor - Single device object store.
  • Dispatch Protocol - Blockchain protocol for distributed application data analytics.
  • GarageMQ - AMQP server written in Go.
  • RedixDB - A real-time persistent key-value store with the same redis protocol.
  • BBVA - Raft backend implementation using BadgerDB for Hashicorp raft.
  • Fantom - aBFT Consensus platform for distributed applications.
  • decred - An open, progressive, and self-funding cryptocurrency with a system of community-based governance integrated into its blockchain.
  • OpenNetSys - Create useful dApps in any software language.
  • HoneyTrap - An extensible and opensource system for running, monitoring and managing honeypots.
  • Insolar - Enterprise-ready blockchain platform.
  • IoTeX - The next generation of the decentralized network for IoT powered by scalability- and privacy-centric blockchains.
  • go-sessions - The sessions manager for Go net/http and fasthttp.
  • Babble - BFT Consensus platform for distributed applications.
  • Tormenta - Embedded object-persistence layer / simple JSON database for Go projects.
  • BadgerHold - An embeddable NoSQL store for querying Go types built on Badger
  • Goblero - Pure Go embedded persistent job queue backed by BadgerDB
  • Surfline - Serving global wave and weather forecast data with Badger.
  • Cete - Simple and highly available distributed key-value store built on Badger. Makes it easy bringing up a cluster of Badger with Raft consensus algorithm by hashicorp/raft.
  • Volument - A new take on website analytics backed by Badger.
  • KVdb - Hosted key-value store and serverless platform built on top of Badger.
  • Terminotes - Self hosted notes storage and search server - storage powered by BadgerDB
  • Pyroscope - Open source confinuous profiling platform built with BadgerDB
  • Veri - A distributed feature store optimized for Search and Recommendation tasks.
  • bIter - A library and Iterator interface for working with the badger.Iterator, simplifying from-to, and prefix mechanics.
  • ld - (Lean Database) A very simple gRPC-only key-value database, exposing BadgerDB with key-range scanning semantics.
  • Souin - A RFC compliant HTTP cache with lot of other features based on Badger for the storage. Compatible with all existing reverse-proxies.
  • Xuperchain - A highly flexible blockchain architecture with great transaction performance.
  • m2 - A simple http key/value store based on the raft protocol.
  • chaindb - A blockchain storage layer used by Gossamer, a Go client for the Polkadot Network.
  • vxdb - Simple schema-less Key-Value NoSQL database with simplest API interface.
  • Opacity - Backend implementation for the Opacity storage project
  • Vephar - A minimal key/value store using hashicorp-raft for cluster coordination and Badger for data storage.

If you are using Badger in a project please send a pull request to add it to the list.

Contributing

If you're interested in contributing to Badger see CONTRIBUTING.md.

Comments
  • chore(bloom): added Hash benchmark

    chore(bloom): added Hash benchmark

    This adds an ability to replace hashing algorithms in the future and measure the impact.

    For example, xxh3 w/ AVX2 is capable of 20.1 ns/op (12.0 GB/s) on medium sized keys, meanwhile the current perf is:

    name                time/op
    Hash/hasher-1-8       2.40ns ± 3%
    Hash/hasher-3-8       3.31ns ± 2%
    Hash/hasher-4-8       3.55ns ± 5%
    Hash/hasher-7-8       4.79ns ± 2%
    Hash/hasher-8-8       4.80ns ± 1%
    Hash/hasher-16-8      7.17ns ± 3%
    Hash/hasher-32-8      11.9ns ± 2%
    Hash/hasher-256-8      102ns ± 5%
    Hash/hasher-1024-8     427ns ± 3%
    Hash/hasher-4096-8    1.68µs ± 1%
    
    name                speed
    Hash/hasher-1-8      417MB/s ± 3%
    Hash/hasher-3-8      907MB/s ± 2%
    Hash/hasher-4-8     1.13GB/s ± 5%
    Hash/hasher-7-8     1.46GB/s ± 2%
    Hash/hasher-8-8     1.67GB/s ± 1%
    Hash/hasher-16-8    2.23GB/s ± 3%
    Hash/hasher-32-8    2.70GB/s ± 2%
    Hash/hasher-256-8   2.52GB/s ± 5%
    Hash/hasher-1024-8  2.40GB/s ± 4%
    Hash/hasher-4096-8  2.44GB/s ± 1%
    

    This is a slightly improved version of dgraph-io/badger#1774


    This change is Reviewable

  • Fix lint warnings

    Fix lint warnings

    I hope these are okay to do! I noted there was a .golangci.yml lint config in the repo, but there's some issues, so I tried to fix some of them. There's still a bunch of warnings remaining though which I haven't got to yet. I'm not sure if you run the linters often, or if there's a plan to add them to CI?

    I separated the changes as much as possible so hopefully it's easier to review.

    I also made an educated guess on which errors should continue to be ignored, looking at existing code that already does _ = tbl.DecrRef() and _ = buf.Release() in various places, but please let me know if something doesn't make sense here!


    This change is Reviewable

  • feat(vlog): Remove Value Log

    feat(vlog): Remove Value Log

    We haven't used the value log for quite some time in Dgraph. And Outserv doesn't use value log either. As part of bringing Badger back to the core features and shedding its weight, I'm removing the value log.

    This change is Reviewable

  • fix(bloom): BloomBitsPerKey does not depend on number of keys

    fix(bloom): BloomBitsPerKey does not depend on number of keys

    During the bits per key (m/n) calculation we first multiply the result by numEntries to get the size of the bloom filter (m), and then divide by numEntries to get the m/n. Also for some reason, as was mentioned in #1763, current implementation underestimates the result by 30% because of erroneous multiplication by Ln2.

    Wikipedia suggests that bits per key calculation can be simplified to:

    While here also fix edgecases for very small and large ε to protect against misuse and added tests.

    This is a copy of: dgraph-io/badger#1773


    This change is Reviewable

  • feat: Improve bitmap, compaction efficiency. Add Lifetime stats.

    feat: Improve bitmap, compaction efficiency. Add Lifetime stats.

    • Only create a bitmap if the space used per uint64 is less than 8.0. Otherwise, directly store the key.
    • Increase the base level size to 256 MiB, and file size to 8 MiB. This helps reduce the write amp from 7.2 to 6.5 on my bitmap run.
    • Add a new LifetimeStats library, which can track the usage of Badger over its lifetime. Useful for determining write amplification.

    This change is Reviewable

  • feat(sroar): Add Roaring Bitmap support to Badger

    feat(sroar): Add Roaring Bitmap support to Badger

    Badger now supports roaring bitmaps. A user can add a key with a uint64 version via WriteBatch. When compacting the data in the background, Badger would merge all these uint64s into sroar . The finalized Bitmap can be queried via a new GetBitmap function. Note that there's no deletion for uint64s yet.

    This works well with other keys in the system. Badger can identify which keys need to be compacted into sroar and which don't.

    P.S. Fun fact. Almost all of the code (minus the test) was written on the plane to Lisbon: https://twitter.com/manishrjain/status/1584989187589410817

    This change is Reviewable

  • feat(simplify): Remove 25% of Badger code

    feat(simplify): Remove 25% of Badger code

    In this PR, I only keep the features that are needed by Outserv.

    • Managed mode is the only mode available.

    • Write transactions and Oracle are gone.

    • Memtable and WAL are gone.

    • SyncWrites option is gone. We no longer need it because we only have SSTables and we always sync them.

    • Backup and Load are gone.

    • WriteBatch is now using Skiplists.

    • A separate memtable held by the DB object is gone. It only holds immutable memtables now.

    • Sequence struct is gone.

    • TTL for keys is gone.

    • Various tools are gone:

      • bank.go
      • backup.go
      • restore.go
    • Commented out lots of tests from:

      • db_test.go
      • managed_db_test.go
      • stream_writer_test.go
      • txn_test.go
      • rotate_test.go Fix them later.

    This change is Reviewable

  • feat(db): Allow maxValueSize as option

    feat(db): Allow maxValueSize as option

    The maximum allowed value size was hardcoded to 1mb. To allow storing bigger values I added the option MaxValueSize that can be set on badger initialization.


    This change is Reviewable

pure golang key database support key have value. 非常高效实用的键值数据库。
pure golang key database support key have value.  非常高效实用的键值数据库。

orderfile32 pure golang key database support key have value The orderfile32 is standard alone fast key value database. It have two version. one is thi

Apr 30, 2022
Fast key-value DB in Go.
Fast key-value DB in Go.

BadgerDB BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go. It is the underlying database for Dgraph, a fast,

Dec 29, 2022
A simple, fast, embeddable, persistent key/value store written in pure Go. It supports fully serializable transactions and many data structures such as list, set, sorted set.

NutsDB English | 简体中文 NutsDB is a simple, fast, embeddable and persistent key/value store written in pure Go. It supports fully serializable transacti

Jan 1, 2023
Fast and simple key/value store written using Go's standard library
Fast and simple key/value store written using Go's standard library

Table of Contents Description Usage Cookbook Disadvantages Motivation Benchmarks Test 1 Test 4 Description Package pudge is a fast and simple key/valu

Nov 17, 2022
BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go
BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go

BadgerDB BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go. It is the underlying database for Dgraph, a fast,

Dec 10, 2021
Eagle - Eagle is a fast and strongly encrypted key-value store written in pure Golang.

EagleDB EagleDB is a fast and simple key-value store written in Golang. It has been designed for handling an exaggerated read/write workload, which su

Dec 10, 2022
An embedded key/value database for Go.

bbolt bbolt is a fork of Ben Johnson's Bolt key/value store. The purpose of this fork is to provide the Go community with an active maintenance and de

Jan 1, 2023
🔑A high performance Key/Value store written in Go with a predictable read/write performance and high throughput. Uses a Bitcask on-disk layout (LSM+WAL) similar to Riak.

bitcask A high performance Key/Value store written in Go with a predictable read/write performance and high throughput. Uses a Bitcask on-disk layout

Sep 26, 2022
BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support
BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support

BuntDB is a low-level, in-memory, key/value store in pure Go. It persists to disk, is ACID compliant, and uses locking for multiple readers and a sing

Dec 30, 2022
ACID key-value database.

Coffer Simply ACID* key-value database. At the medium or even low latency it tries to provide greater throughput without losing the ACID properties of

Dec 7, 2022
A disk-backed key-value store.

What is diskv? Diskv (disk-vee) is a simple, persistent key-value store written in the Go language. It starts with an incredibly simple API for storin

Jan 7, 2023
An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications.

go-cache go-cache is an in-memory key:value store/cache similar to memcached that is suitable for applications running on a single machine. Its major

Dec 29, 2022
LevelDB key/value database in Go.

This is an implementation of the LevelDB key/value database in the Go programming language. Installation go get github.com/syndtr/goleveldb/leveldb R

Jan 1, 2023
Embedded key-value store for read-heavy workloads written in Go
Embedded key-value store for read-heavy workloads written in Go

Pogreb Pogreb is an embedded key-value store for read-heavy workloads written in Go. Key characteristics 100% Go. Optimized for fast random lookups an

Jan 3, 2023
Low-level key/value store in pure Go.
Low-level key/value store in pure Go.

Description Package slowpoke is a simple key/value store written using Go's standard library only. Keys are stored in memory (with persistence), value

Jan 2, 2023
Key-value store for temporary items :memo:

Tempdb TempDB is Redis-backed temporary key-value store for Go. Useful for storing temporary data such as login codes, authentication tokens, and temp

Sep 26, 2022
A distributed key-value store. On Disk. Able to grow or shrink without service interruption.

Vasto A distributed high-performance key-value store. On Disk. Eventual consistent. HA. Able to grow or shrink without service interruption. Vasto sca

Jan 6, 2023
Graviton Database: ZFS for key-value stores.
Graviton Database: ZFS for key-value stores.

Graviton Database: ZFS for key-value stores. Graviton Database is simple, fast, versioned, authenticated, embeddable key-value store database in pure

Dec 31, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

Jan 9, 2023