Fast key-value DB in Go.

Last update: Jan 1, 2023

Comments: 17

BadgerDB

BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go. It is the underlying database for Dgraph, a fast, distributed graph database. It's meant to be a performant alternative to non-Go-based key-value stores like RocksDB.

Use Discuss Issues for reporting issues about this repository.

Project Status [March 24, 2020]

Badger is stable and is being used to serve data sets worth hundreds of terabytes. Badger supports concurrent ACID transactions with serializable snapshot isolation (SSI) guarantees. A Jepsen-style bank test runs nightly for 8h, with --race flag and ensures the maintenance of transactional guarantees. Badger has also been tested to work with filesystem level anomalies, to ensure persistence and consistency. Badger is being used by a number of projects which includes Dgraph, Jaeger Tracing, UsenetExpress, and many more.

The list of projects using Badger can be found here.

Badger v1.0 was released in Nov 2017, and the latest version that is data-compatible with v1.0 is v1.6.0.

Badger v2.0 was released in Nov 2019 with a new storage format which won't be compatible with all of the v1.x. Badger v2.0 supports compression, encryption and uses a cache to speed up lookup.

The Changelog is kept fairly up-to-date.

For more details on our version naming schema please read Choosing a version.

Getting Started
- Installing
  - Installing Badger Command Line Tool
  - Choosing a version
Badger Documentation
Resources
- Blog Posts
Design
- Comparisons
- Benchmarks
Projects Using Badger
Contributing
Contact

Getting Started

Installing

To start using Badger, install Go 1.12 or above. Badger v2 needs go modules. Run the following command to retrieve the library.

$ go get github.com/dgraph-io/badger/v3

This will retrieve the library.

Installing Badger Command Line Tool

Download and extract the latest Badger DB release from https://github.com/dgraph-io/badger/releases and then run the following commands.

$ cd badger-<version>/badger
$ go install

This will install the badger command line utility into your $GOBIN path.

Choosing a version

BadgerDB is a pretty special package from the point of view that the most important change we can make to it is not on its API but rather on how data is stored on disk.

This is why we follow a version naming schema that differs from Semantic Versioning.

New major versions are released when the data format on disk changes in an incompatible way.
New minor versions are released whenever the API changes but data compatibility is maintained. Note that the changes on the API could be backward-incompatible - unlike Semantic Versioning.
New patch versions are released when there's no changes to the data format nor the API.

Following these rules:

v1.5.0 and v1.6.0 can be used on top of the same files without any concerns, as their major version is the same, therefore the data format on disk is compatible.
v1.6.0 and v2.0.0 are data incompatible as their major version implies, so files created with v1.6.0 will need to be converted into the new format before they can be used by v2.0.0.

For a longer explanation on the reasons behind using a new versioning naming schema, you can read VERSIONING.md.

Badger Documentation

Badger Documentation is available at https://dgraph.io/docs/badger

Resources

Blog Posts

Design

Badger was written with these design goals in mind:

Write a key-value database in pure Go.
Use latest research to build the fastest KV database for data sets spanning terabytes.
Optimize for SSDs.

Badger’s design is based on a paper titled WiscKey: Separating Keys from Values in SSD-conscious Storage.

Comparisons

Feature	Badger	RocksDB	BoltDB
Design	LSM tree with value log	LSM tree only	B+ tree
High Read throughput	Yes	No	Yes
High Write throughput	Yes	Yes	No
Designed for SSDs	Yes (with latest research ¹)	Not specifically ²	No
Embeddable	Yes	Yes	Yes
Sorted KV access	Yes	Yes	Yes
Pure Go (no Cgo)	Yes	No	Yes
Transactions	Yes, ACID, concurrent with SSI³	Yes (but non-ACID)	Yes, ACID
Snapshots	Yes	Yes	Yes
TTL support	Yes	Yes	No
3D access (key-value-version)	Yes⁴	No	No

¹ The WISCKEY paper (on which Badger is based) saw big wins with separating values from keys, significantly reducing the write amplification compared to a typical LSM tree.

² RocksDB is an SSD optimized version of LevelDB, which was designed specifically for rotating disks. As such RocksDB's design isn't aimed at SSDs.

³ SSI: Serializable Snapshot Isolation. For more details, see the blog post Concurrent ACID Transactions in Badger

⁴ Badger provides direct access to value versions via its Iterator API. Users can also specify how many versions to keep per key via Options.

Benchmarks

We have run comprehensive benchmarks against RocksDB, Bolt and LMDB. The benchmarking code, and the detailed logs for the benchmarks can be found in the badger-bench repo. More explanation, including graphs can be found the blog posts (linked above).

Projects Using Badger

Below is a list of known projects that use Badger:

Dgraph - Distributed graph database.
Jaeger - Distributed tracing platform.
go-ipfs - Go client for the InterPlanetary File System (IPFS), a new hypermedia distribution protocol.
Riot - An open-source, distributed search engine.
emitter - Scalable, low latency, distributed pub/sub broker with message storage, uses MQTT, gossip and badger.
OctoSQL - Query tool that allows you to join, analyse and transform data from multiple databases using SQL.
Dkron - Distributed, fault tolerant job scheduling system.
smallstep/certificates - Step-ca is an online certificate authority for secure, automated certificate management.
Sandglass - distributed, horizontally scalable, persistent, time sorted message queue.
TalariaDB - Grab's Distributed, low latency time-series database.
Sloop - Salesforce's Kubernetes History Visualization Project.
Immudb - Lightweight, high-speed immutable database for systems and applications.
Usenet Express - Serving over 300TB of data with Badger.
gorush - A push notification server written in Go.
0-stor - Single device object store.
Dispatch Protocol - Blockchain protocol for distributed application data analytics.
GarageMQ - AMQP server written in Go.
RedixDB - A real-time persistent key-value store with the same redis protocol.
BBVA - Raft backend implementation using BadgerDB for Hashicorp raft.
Fantom - aBFT Consensus platform for distributed applications.
decred - An open, progressive, and self-funding cryptocurrency with a system of community-based governance integrated into its blockchain.
OpenNetSys - Create useful dApps in any software language.
HoneyTrap - An extensible and opensource system for running, monitoring and managing honeypots.
Insolar - Enterprise-ready blockchain platform.
IoTeX - The next generation of the decentralized network for IoT powered by scalability- and privacy-centric blockchains.
go-sessions - The sessions manager for Go net/http and fasthttp.
Babble - BFT Consensus platform for distributed applications.
Tormenta - Embedded object-persistence layer / simple JSON database for Go projects.
BadgerHold - An embeddable NoSQL store for querying Go types built on Badger
Goblero - Pure Go embedded persistent job queue backed by BadgerDB
Surfline - Serving global wave and weather forecast data with Badger.
Cete - Simple and highly available distributed key-value store built on Badger. Makes it easy bringing up a cluster of Badger with Raft consensus algorithm by hashicorp/raft.
Volument - A new take on website analytics backed by Badger.
KVdb - Hosted key-value store and serverless platform built on top of Badger.
Terminotes - Self hosted notes storage and search server - storage powered by BadgerDB
Pyroscope - Open source confinuous profiling platform built with BadgerDB
Veri - A distributed feature store optimized for Search and Recommendation tasks.
bIter - A library and Iterator interface for working with the badger.Iterator, simplifying from-to, and prefix mechanics.
ld - (Lean Database) A very simple gRPC-only key-value database, exposing BadgerDB with key-range scanning semantics.
Souin - A RFC compliant HTTP cache with lot of other features based on Badger for the storage. Compatible with all existing reverse-proxies.
Xuperchain - A highly flexible blockchain architecture with great transaction performance.
m2 - A simple http key/value store based on raft protocal.
chaindb - A blockchain storage layer used by Gossamer, a Go client for the Polkadot Network.
Opacity - Backend implementation for the Opacity storage project

If you are using Badger in a project please send a pull request to add it to the list.

Contributing

If you're interested in contributing to Badger see CONTRIBUTING.md.

Contact

Please use discuss.dgraph.io for questions, feature requests and discussions.
Please use discuss.dgraph.io for filing bugs or feature requests.
Follow us on Twitter @dgraphlabs.

Owner

Dgraph

The Only Native GraphQL Database With A Graph Backend.

https://github.com/dgraph-io/badger https://blog.dgraph.io/post/badger/

Comments

RunValueLogGC crashed

What version of Go are you using (`go version`)?

$ go version
go version go1.13.4 linux/amd64

What version of Badger are you using?

v2.0.0

Does this issue reproduce with the latest master?

Never tried

What are the hardware specifications of the machine (RAM, OS, Disk)?

Linux 64 SSD

What did you do?

	opts := badger.DefaultOptions(dir)
	opts.SyncWrites = sync
	db, err := badger.Open(opts)
	if err != nil {
		return nil, err
	}
	db.RunValueLogGC(0.1)

	go func() {
		ticker := time.NewTicker(1 * time.Minute)
		defer ticker.Stop()
		for range ticker.C {
			lsm, vlog := db.Size()
			if lsm > 1024*1024*8 || vlog > 1024*1024*32 {
				db.RunValueLogGC(0.5)
			}
		}
	}()

What did you expect to see?

Run value log gc should work

What did you see instead?

mixin[28404]: github.com/dgraph-io/badger/v2/y.AssertTrue
mixin[28404]:         /home/one/GOPATH/pkg/mod/github.com/dgraph-io/badger/[email protected]/y/error.go:55
mixin[28404]: github.com/dgraph-io/badger/v2.(*valueLog).doRunGC.func2
mixin[28404]:         /home/one/GOPATH/pkg/mod/github.com/dgraph-io/badger/[email protected]/value.go:1591
mixin[28404]: github.com/dgraph-io/badger/v2.(*valueLog).iterate
mixin[28404]:         /home/one/GOPATH/pkg/mod/github.com/dgraph-io/badger/[email protected]/value.go:480
mixin[28404]: github.com/dgraph-io/badger/v2.(*valueLog).doRunGC
mixin[28404]:         /home/one/GOPATH/pkg/mod/github.com/dgraph-io/badger/[email protected]/value.go:1557
mixin[28404]: github.com/dgraph-io/badger/v2.(*valueLog).runGC
mixin[28404]:         /home/one/GOPATH/pkg/mod/github.com/dgraph-io/badger/[email protected]/value.go:1685
mixin[28404]: github.com/dgraph-io/badger/v2.(*DB).RunValueLogGC
mixin[28404]:         /home/one/GOPATH/pkg/mod/github.com/dgraph-io/badger/[email protected]/db.go:1129
mixin[28404]: github.com/MixinNetwork/mixin/storage.openDB.func1
mixin[28404]:         /home/one/github/mixin/storage/badger.go:68
mixin[28404]: runtime.goexit
mixin[28404]:         /snap/go/4762/src/runtime/asm_amd64.s:1357

badger.go:68 db.RunValueLogGC(0.5)

ARMv7 segmentation fault in oracle.readTs when calling loadUint64

I am facing an issue running badger on an ARMv7 architecture. The minimal test case below works quite fine on an amd64 machine but, unfortunately, not on ARMv7 32bit.

The trace below shows that the issue originates in atomic.loadUint64() but I also run basic atomic operations tests against the golang runtime, and they work fine on this architecture.

It looks to me that the underlying memory of oracle.curRead somehow vanishes but I am not sure.

Below you find also a strace trace. There the segmentation fault happens after the madvise, but I am not sure if this is related.

Badger version: 1.0.1 (89689ef36cae26ae094cb5ea79b7400d839f2d68) golang version: 1.8.5 and 1.9.2

Test case:

func TestPersistentCache_DirectBadger(t *testing.T) {
	dir, err := ioutil.TempDir("", "")
	if err != nil {
		t.Fatal(err)
	}
	defer os.RemoveAll(dir)

	config := badger.DefaultOptions
	config.TableLoadingMode = options.MemoryMap
	config.ValueLogFileSize = 16 << 20
	config.LevelOneSize = 8 << 20
	config.MaxTableSize = 2 << 20
	config.Dir = dir
	config.ValueDir = dir
	config.SyncWrites = false

	db, err := badger.Open(config)
	if err != nil {
		t.Fatalf("cannot open db at location %s: %v", dir, err)
	}

	err = db.View(func(txn *badger.Txn) error { return nil })

	if err != nil {
		t.Fatal(err)
	}

	if err = db.Close(); err != nil {
		t.Fatal(err)
	}
}

=== RUN   TestPersistentCache_DirectBadger
--- FAIL: TestPersistentCache_DirectBadger (0.01s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x4 pc=0x1150c]

goroutine 5 [running]:
testing.tRunner.func1(0x10a793b0)
        /usr/lib/go/src/testing/testing.go:711 +0x2a0
panic(0x3e4bd8, 0x6bb478)
        /usr/lib/go/src/runtime/panic.go:491 +0x204
sync/atomic.loadUint64(0x10a483cc, 0x200000, 0x0)
        /usr/lib/go/src/sync/atomic/64bit_arm.go:10 +0x3c
github.com/grid-x/client/vendor/github.com/dgraph-io/badger.(*oracle).readTs(0x10a483c0, 0x14, 0x5)
        /home/robert/Projects/gridx/client/src/github.com/grid-x/client/vendor/github.com/dgraph-io/badger/transaction.go:87 +0x3c
github.com/grid-x/client/vendor/github.com/dgraph-io/badger.(*DB).NewTransaction(0x10b06000, 0x0, 0x4cccc)
        /home/robert/Projects/gridx/client/src/github.com/grid-x/client/vendor/github.com/dgraph-io/badger/transaction.go:440 +0x20
github.com/grid-x/client/vendor/github.com/dgraph-io/badger.(*DB).View(0x10b06000, 0x464e20, 0x0, 0x0)
        /home/robert/Projects/gridx/client/src/github.com/grid-x/client/vendor/github.com/dgraph-io/badger/transaction.go:457 +0x3c
command-line-arguments.TestPersistentCache_DirectBadger(0x10a793b0)
        /home/robert/Projects/gridx/client/src/github.com/grid-x/client/pkg/cache/persistent_cache_test.go:64 +0x1e8
testing.tRunner(0x10a793b0, 0x464e24)
        /usr/lib/go/src/testing/testing.go:746 +0xb0
created by testing.(*T).Run
        /usr/lib/go/src/testing/testing.go:789 +0x258

strace:

[pid 15075] mmap2(NULL, 33554432, PROT_READ, MAP_SHARED, 6, 0) = 0xb4dff000                                   
[pid 15075] madvise(0xb4dff000, 33554432, MADV_RANDOM) = 0                     
[pid 15075] clock_gettime(CLOCK_MONOTONIC, {tv_sec=69709, tv_nsec=217038306}) = 0                            
[pid 15075] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x4} ---
[pid 15075] rt_sigreturn()              = 0

Performance regression 1.6 to 2.0.2

What version of Go are you using (`go version`)?

go version go1.12.7 darwin/amd64

What version of Badger are you using?

2.0.2 (upgrading from 1.6.0)

Does this issue reproduce with the latest master?

Haven't tried.

What are the hardware specifications of the machine (RAM, OS, Disk)?

GCP 8 CPU (Intel Haswell), 32 GB RAM, 750 GB local ssd

What did you do?

Running code which extracts data from Kafka and saves to Badger DB. I'm running on exact same hardware, disk and my code against exact same Kafka topic.

What did you expect to see?

Better or equal performance with Badger 2.

What did you see instead?

Severe slowdown after writing ~1,461,000 records. See below

1.6.0 performance:

Performance in 1.6.0 takes about 300-400ms to extract 1000 messages.

  Up to offset 1453000 Time[330ms] Events[1453001] UrisCreated[1975] PathsCreated[0] Bytes[11.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T07:35:07.000] EstTimeToFinish[4h17m58s]
  Up to offset 1454000 Time[360ms] Events[1454001] UrisCreated[1954] PathsCreated[0] Bytes[11.2 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T11:31:43.000] EstTimeToFinish[4h18m1s]
  Up to offset 1455000 Time[340ms] Events[1455001] UrisCreated[1969] PathsCreated[0] Bytes[11.0 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T15:33:31.000] EstTimeToFinish[4h18m4s]
  Up to offset 1456000 Time[360ms] Events[1456001] UrisCreated[1789] PathsCreated[0] Bytes[13.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T20:46:14.000] EstTimeToFinish[4h18m7s]
  Up to offset 1457000 Time[320ms] Events[1457001] UrisCreated[1720] PathsCreated[0] Bytes[13.0 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T06:56:07.000] EstTimeToFinish[4h18m9s]
  Up to offset 1458000 Time[300ms] Events[1458001] UrisCreated[1736] PathsCreated[1] Bytes[10.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T18:40:17.000] EstTimeToFinish[4h18m9s]
badger 2020/02/17 15:10:15 DEBUG: Flushing memtable, mt.size=194491818 size of flushChan: 0
badger 2020/02/17 15:10:15 DEBUG: Storing value log head: {Fid:1 Len:45 Offset:87078740}
  Up to offset 1459000 Time[380ms] Events[1459001] UrisCreated[2140] PathsCreated[0] Bytes[11.4 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T21:04:18.000] EstTimeToFinish[4h18m13s]
  Up to offset 1460000 Time[370ms] Events[1460001] UrisCreated[1776] PathsCreated[0] Bytes[10.4 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T00:02:01.000] EstTimeToFinish[4h18m17s]
badger 2020/02/17 15:10:16 DEBUG: Flushing memtable, mt.size=119942867 size of flushChan: 0
badger 2020/02/17 15:10:16 DEBUG: Storing value log head: {Fid:1 Len:45 Offset:87168065}
  Up to offset 1461000 Time[430ms] Events[1461001] UrisCreated[1753] PathsCreated[0] Bytes[10.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T06:01:21.000] EstTimeToFinish[4h18m23s]
  Up to offset 1462000 Time[370ms] Events[1462001] UrisCreated[1779] PathsCreated[0] Bytes[10.5 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T16:45:03.000] EstTimeToFinish[4h18m26s]
  Up to offset 1463000 Time[360ms] Events[1463001] UrisCreated[1735] PathsCreated[0] Bytes[11.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T20:10:04.000] EstTimeToFinish[4h18m29s]
  Up to offset 1464000 Time[370ms] Events[1464001] UrisCreated[1664] PathsCreated[0] Bytes[10.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T23:03:44.000] EstTimeToFinish[4h18m33s]
  Up to offset 1465000 Time[350ms] Events[1465001] UrisCreated[1732] PathsCreated[0] Bytes[10.2 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T02:38:13.000] EstTimeToFinish[4h18m35s]
  Up to offset 1466000 Time[380ms] Events[1466001] UrisCreated[1825] PathsCreated[0] Bytes[10.6 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T06:12:39.000] EstTimeToFinish[4h18m39s]
  Up to offset 1467000 Time[360ms] Events[1467001] UrisCreated[1868] PathsCreated[0] Bytes[11.1 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T10:08:51.000] EstTimeToFinish[4h18m42s]
  Up to offset 1468000 Time[380ms] Events[1468001] UrisCreated[1920] PathsCreated[1] Bytes[11.3 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T13:54:45.000] EstTimeToFinish[4h18m46s]
  Up to offset 1469000 Time[350ms] Events[1469001] UrisCreated[1875] PathsCreated[0] Bytes[11.5 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T17:20:47.000] EstTimeToFinish[4h18m48s]
  Up to offset 1470000 Time[350ms] Events[1470001] UrisCreated[1767] PathsCreated[0] Bytes[11.3 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T20:41:05.000] EstTimeToFinish[4h18m51s]
  Up to offset 1471000 Time[340ms] Events[1471001] UrisCreated[1768] PathsCreated[0] Bytes[10.8 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T23:51:59.000] EstTimeToFinish[4h18m54s]
  Up to offset 1472000 Time[370ms] Events[1472001] UrisCreated[1758] PathsCreated[0] Bytes[10.8 MiB] TotalBytes[11.9 GiB] Date[2014-11-14T03:28:45.000] EstTimeToFinish[4h18m57s]

2.0.2 performance:

Notice that at approximately offset 1462000 (1,462,000 records), things start slowing down from a rate of 300-400ms per 1,000 records to 25-30 seconds per 1,000 records! It happens after the very first Flushing memtable debug message. If you look above, the Flushing happens at the exact same place, but things continue speedily after.

  Up to offset 1453000 Time[360ms] Events[1453001] UrisCreated[1975] PathsCreated[0] Bytes[11.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T07:35:07.000] EstTimeToFinish[4h19m33s]
  Up to offset 1454000 Time[330ms] Events[1454001] UrisCreated[1954] PathsCreated[0] Bytes[11.2 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T11:31:43.000] EstTimeToFinish[4h19m35s]
  Up to offset 1455000 Time[380ms] Events[1455001] UrisCreated[1969] PathsCreated[0] Bytes[11.0 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T15:33:31.000] EstTimeToFinish[4h19m39s]
  Up to offset 1456000 Time[320ms] Events[1456001] UrisCreated[1789] PathsCreated[0] Bytes[13.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T20:46:14.000] EstTimeToFinish[4h19m41s]
  Up to offset 1457000 Time[340ms] Events[1457001] UrisCreated[1720] PathsCreated[0] Bytes[13.0 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T06:56:07.000] EstTimeToFinish[4h19m43s]
  Up to offset 1458000 Time[310ms] Events[1458001] UrisCreated[1736] PathsCreated[1] Bytes[10.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T18:40:17.000] EstTimeToFinish[4h19m44s]
badger 2020/03/09 17:36:39 DEBUG: Flushing memtable, mt.size=194487650 size of flushChan: 0
badger 2020/03/09 17:36:39 DEBUG: Storing value log head: {Fid:1 Len:32 Offset:74078864}
  Up to offset 1459000 Time[680ms] Events[1459001] UrisCreated[2140] PathsCreated[0] Bytes[11.4 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T21:04:18.000] EstTimeToFinish[4h20m0s]
  Up to offset 1460000 Time[500ms] Events[1460001] UrisCreated[1776] PathsCreated[0] Bytes[10.4 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T00:02:01.000] EstTimeToFinish[4h20m8s]
badger 2020/03/09 17:36:40 DEBUG: Flushing memtable, mt.size=119942767 size of flushChan: 0
badger 2020/03/09 17:36:40 DEBUG: Storing value log head: {Fid:1 Len:32 Offset:74168111}
  Up to offset 1461000 Time[430ms] Events[1461001] UrisCreated[1753] PathsCreated[0] Bytes[10.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T06:01:21.000] EstTimeToFinish[4h20m14s]
  Up to offset 1462000 Time[4.74s] Events[1462001] UrisCreated[1779] PathsCreated[0] Bytes[10.5 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T16:45:03.000] EstTimeToFinish[4h23m6s]
  Up to offset 1463000 Time[14.45s] Events[1463001] UrisCreated[1735] PathsCreated[0] Bytes[11.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T20:10:04.000] EstTimeToFinish[4h32m12s]
  Up to offset 1464000 Time[19.38s] Events[1464001] UrisCreated[1664] PathsCreated[0] Bytes[10.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T23:03:44.000] EstTimeToFinish[4h44m27s]
  Up to offset 1465000 Time[24.52s] Events[1465001] UrisCreated[1732] PathsCreated[0] Bytes[10.2 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T02:38:13.000] EstTimeToFinish[4h59m59s]
  Up to offset 1466000 Time[27.25s] Events[1466001] UrisCreated[1825] PathsCreated[0] Bytes[10.6 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T06:12:39.000] EstTimeToFinish[5h17m15s]
  Up to offset 1467000 Time[31.8s] Events[1467001] UrisCreated[1868] PathsCreated[0] Bytes[11.1 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T10:08:51.000] EstTimeToFinish[5h37m24s]
  Up to offset 1468000 Time[32.87s] Events[1468001] UrisCreated[1920] PathsCreated[1] Bytes[11.3 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T13:54:45.000] EstTimeToFinish[5h58m12s]
  Up to offset 1469000 Time[28.9s] Events[1469001] UrisCreated[1875] PathsCreated[0] Bytes[11.5 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T17:20:47.000] EstTimeToFinish[6h16m27s]
  Up to offset 1470000 Time[27.58s] Events[1470001] UrisCreated[1767] PathsCreated[0] Bytes[11.3 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T20:41:05.000] EstTimeToFinish[6h33m49s]
  Up to offset 1471000 Time[30.04s] Events[1471001] UrisCreated[1768] PathsCreated[0] Bytes[10.8 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T23:51:59.000] EstTimeToFinish[6h52m44s]
  Up to offset 1472000 Time[34.09s] Events[1472001] UrisCreated[1758] PathsCreated[0] Bytes[10.8 MiB] TotalBytes[11.9 GiB] Date[2014-11-14T03:28:45.000] EstTimeToFinish[7h14m13s]

I tried the same with compression disabled and saw similar results. The options I'm using are DefaultOptions with the following tweaks:

	actualOpts := opts.
		WithMaxTableSize(256 << 20). // max size 256M
		WithSyncWrites(false).       // don't sync writes for faster performance
		WithCompression(options.None)

I literally just started on the 2.0 migration today. I'm running the same code I've been running for 6 months.

Discard invalid versions of keys during compaction

I'm hoping this is a configuration related issue but I've played around with the settings and I keep getting the same behavior. Tested on an i3.4XL in AWS, raid0 on the two SSD drives.

Expected behavior of the code below:

keys/data are stored for 1hr, after a few hours the badger directory should stay fairly constant as you write/expire keys
I would expect to see sst files written and multiple size levels each level a larger file size
memory should stay fairly consistent

Actual behavior seen:

OOM's after 12 hours
all sst files at 67MB (thousands of them)
disk fills up on a 4TB drive, no data appears to ttl out
file counts steadily increase until oom (there's no leveling off)
every hour the process stalls (assuming the stall count is being hit according to profiler)

Please advise of what is wrong in the code below, thanks!

3HRs of runtime you can see just linear growth https://imgur.com/a/2UUfIrG

UPDATE: I've also tried with these settings and memory doesn't grow as fast but it linearly climbs until OOM as well and the same behavior as above

dir := "/raid0/badgertest"
opts := badger.DefaultOptions
opts.Dir = dir
opts.ValueDir = dir
opts.TableLoadingMode = options.FileIO
opts.SyncWrites = false
db, err := badger.Open(opts)

package usecases

import (
	"github.com/dgraph-io/badger"
	"github.com/dgraph-io/badger/options"
	"time"
	"fmt"
	"encoding/binary"
	"github.com/spaolacci/murmur3"
	"path/filepath"
	"os"
	"github.com/Sirupsen/logrus"
)

type writable struct {
	key   []byte
	value []byte
}


type BadgerTest struct {
	db *badger.DB
}

func NewBadgerTest() *BadgerTest {

	dir := "/raid0/badgertest"
	opts := badger.DefaultOptions
	opts.Dir = dir
	opts.ValueDir = dir
	opts.TableLoadingMode = options.MemoryMap
	opts.NumCompactors = 1
	opts.NumLevelZeroTables = 20
	opts.NumLevelZeroTablesStall = 50
	opts.SyncWrites = false
	db, err := badger.Open(opts)
	if err != nil {
		panic(fmt.Sprintf("unable to open badger db; %s", err))
	}
	bt := &BadgerTest{
		db: db,
	}

	go bt.filecounts(dir)
	return bt

}

func (b *BadgerTest) Start() {

	workers := 4
	for i := 0; i < workers; i++ {
		go b.write()
	}
	go b.badgerGC()

}
func (b *BadgerTest) Stop() {

	b.db.Close()
	logrus.Infof("shut down badger test")
	time.Sleep(1 * time.Second)
}

func (b *BadgerTest) filecounts(dir string) {

	ticker := time.NewTicker(60 * time.Second)
	for _ = range ticker.C {

		logFiles := int64(0)
		sstFiles := int64(0)
		_ = filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {

			if filepath.Ext(path) == ".sst" {
				sstFiles++
			}
			if filepath.Ext(path) == ".vlog" {
				logFiles++
			}
			return nil
		})


		logrus.Infof("updated gauges vlog=%d sst=%d", logFiles, sstFiles)

	}

}

func (b *BadgerTest) write() {

	data := `{"randomstring":"6446D58D6DFAFD58586D3EA85A53F4A6B3CC057F933A22BB58E188A74AC8F663","refID":12495494,"testfield1234":"foo bar baz","date":"2018-01-01 12:00:00"}`
	batchSize := 20000
	rows := []writable{}
	var cnt uint64
	for {
		cnt++
		ts := time.Now().UnixNano()
		buf := make([]byte, 24)
		offset := 0
		binary.BigEndian.PutUint64(buf[offset:], uint64(ts))
		offset = offset + 8
		key := fmt.Sprintf("%d%d", ts, cnt)
		mkey := murmur3.Sum64([]byte(key))
		binary.BigEndian.PutUint64(buf[offset:], mkey)

		offset = offset + 8
		binary.BigEndian.PutUint64(buf[offset:], cnt)

		w := writable{key: buf, value: []byte(data)}
		rows = append(rows, w)
		if len(rows) > batchSize {
			b.saveRows(rows)
			rows = []writable{}
		}
	}

}

func (b *BadgerTest) saveRows(rows []writable) {
	ttl := 1 * time.Hour

	_ = b.db.Update(func(txn *badger.Txn) error {
		var err error
		for _, row := range rows {
			testMsgMeter.Mark(1)
			if err := txn.SetWithTTL(row.key, row.value, ttl); err == badger.ErrTxnTooBig {
				logrus.Infof("TX too big, committing...")
				_ = txn.Commit(nil)
				txn = b.db.NewTransaction(true)
				err = txn.SetWithTTL(row.key, row.value, ttl)
			}
		}
		return err
	})
}

func (b *BadgerTest) badgerGC() {

	ticker := time.NewTicker(1 * time.Minute)
	for {
		select {
		case <-ticker.C:
			logrus.Infof("CLEANUP starting to purge keys %s", time.Now())
			err := b.db.PurgeOlderVersions()
			if err != nil {
				logrus.Errorf("badgerOps unable to purge older versions; %s", err)
			}
			err = b.db.RunValueLogGC(0.5)
			if err != nil {
				logrus.Errorf("badgerOps unable to RunValueLogGC; %s", err)
			}
			logrus.Infof("CLEANUP purge complete %s", time.Now())
		}
	}
}

GC doesn't work? (not cleaning up SST files properly)
What version of Go are you using (go version)?

$ go version 1.13.8

What version of Badger are you using?

v1.6.0

opts := badger.DefaultOptions(fmt.Sprintf(dir + "/" + name)) opts.SyncWrites = false opts.ValueLogLoadingMode = options.FileIO

Does this issue reproduce with the latest master?

With the latest master GC becomes much slower

What are the hardware specifications of the machine (RAM, OS, Disk)?

2TB NVME drive, 128 GB RAM

What did you do?

I have a Kafka topic with 12 partitions. For every partition I create a database. Each database grows quite quickly (about 12*30GB per hour) and the TTL for most of the events is 1h, so the size should stay at constant level. Now for every partition I create a separate transaction and I process read and write operations sequentially, there is no concurrency, when the transaction is getting to big I commit it, in separate go-routine I start RunValueLogGC(0.5). Most of GC runs end up with ErrNoRewrite. Even tried to repeat RunValueLogGC until I have 5 errors in the row, but still I was running out of disk space quite quickly. My current fix is to patch the Badger GC, make it run on every fid that is before the head. This works fine, but eventually becomes slow when I have too many log files.

What did you expect to see?

The size of each of twelve databases I created, should stay at constant level and has less then 20 GB

What did you see instead?

After running it for a day, if I look at one of twelve databases, I see 210 sst files, 68 vlog files, db size is 84 GB (and these numbers keep growing).

If I run badger histogram it shows me this stats:

Histogram of key sizes (in bytes) Total count: 4499955 Min value: 13 Max value: 108 Mean: 22.92 Range Count [ 8, 16) 2 [ 16, 32) 4499939 [ 64, 128) 14

Histogram of value sizes (in bytes) Total count: 4499955 Min value: 82 Max value: 3603 Mean: 2428.16 Range Count [ 64, 128) 1 [ 256, 512) 19301 [ 512, 1024) 459 [ 1024, 2048) 569 [ 2048, 4096) 4479625

2428*4479625=10GB

Use pure Go based ZSTD implementation

Fixes https://github.com/dgraph-io/badger/issues/1162

This PR proposes to use https://github.com/klauspost/compress/tree/master/zstd instead of CGO based https://github.com/DataDog/zstd .

This PR also removes the CompressionLevel options since https://github.com/klauspost/compress/tree/master/zstd supports only two levels of ZSTD Compression. The default level is ZSTD Level 3 and the fastest level is ZSTD level 1. ZSTD level 1 will be the default level in badger.

I've experimented will all the suggestions mentioned in https://github.com/klauspost/compress/issues/196#issuecomment-568905095 . Setting WithSingleSegment didn't seem to make a lot of speed difference (~ 1MB/s difference) WithNoEntropyCompression seemed to have ~ 3% of speed improvement (but that could also be because of non-deterministic nature of benchmarks)

name                                       old time/op      new time/op (NoEntropy set)   delta
Compression/ZSTD_-_Go_-_level1-16           35.7µs ± 1%     36.9µs ± 5%                 +3.41%  (p=0.008 n=5+5)
Decompression/ZSTD_-_Go-16                  16.0µs ± 0%     15.9µs ± 1%                 -0.77%  (p=0.016 n=5+5)

name                                    old speed      new speed (NoEntropy set)      delta
Compression/ZSTD_-_Go_-_level1-16      115MB/s ± 1%   111MB/s ± 5%                -3.24%  (p=0.008 n=5+5)
Decompression/ZSTD_-_Go-16             256MB/s ± 0%   258MB/s ± 1%                 +0.78%  (p=0.016 n=5+5)

Benchmarks

Table Data (contains some randomly generated data).

Compression Ratio Datadog ZSTD level 1 3.1993720565149135
Compression Ratio Datadog ZSTD level 3 3.099619771863118

Compression Ratio Go ZSTD 3.2170481452249406
Compression Ratio Go ZSTD level 3 3.1474903474903475

name                                        time/op
Compression/ZSTD_-_Datadog-level1-16    17.6µs ± 3%
Compression/ZSTD_-_Datadog-level3-16    20.7µs ± 3%

Compression/ZSTD_-_Go_-_level1-16       27.8µs ± 2%
Compression/ZSTD_-_Go_-_Default-16      39.1µs ± 1%

Decompression/ZSTD_-_Datadog-16         7.12µs ± 2%
Decompression/ZSTD_-_Go-16              13.7µs ± 2%

name                                       speed
Compression/ZSTD_-_Datadog-level1-16   231MB/s ± 3%
Compression/ZSTD_-_Datadog-level3-16   197MB/s ± 3%

Compression/ZSTD_-_Go_-_level1-16      147MB/s ± 2%
Compression/ZSTD_-_Go_-_Default-16     104MB/s ± 1%

Decompression/ZSTD_-_Datadog-16        573MB/s ± 2%
Decompression/ZSTD_-_Go-16             298MB/s ± 2%

4KB of text taken from https://gist.github.com/StevenClontz/4445774

Compression Ratio ZSTD level 1 1.9294781382228492
Compression Ratio ZSTD level 3 1.9322033898305084

Compression Ratio Go ZSTD 1.894736842105263
Compression Ratio Go ZSTD level 3 1.927665570690465

name                                       time/op
Compression/ZSTD_-_Datadog-level1-16    22.7µs ± 4%
Compression/ZSTD_-_Datadog-level3-16    29.6µs ± 4%

Compression/ZSTD_-_Go_-_level1-16       35.7µs ± 1%
Compression/ZSTD_-_Go_-_Default-16      97.9µs ± 1%

Decompression/ZSTD_-_Datadog-16         8.36µs ± 0%
Decompression/ZSTD_-_Go-16              16.0µs ± 0%

name                                       speed
Compression/ZSTD_-_Datadog-level1-16   181MB/s ± 4%
Compression/ZSTD_-_Datadog-level3-16   139MB/s ± 4%

Compression/ZSTD_-_Go_-_level1-16      115MB/s ± 1%
Compression/ZSTD_-_Go_-_Default-16    41.9MB/s ± 1%

Decompression/ZSTD_-_Datadog-16        489MB/s ± 2%
Decompression/ZSTD_-_Go-16             256MB/s ± 0%

Here's the script I've used https://gist.github.com/jarifibrahim/91920e93d1ecac3006b269e0c05d6a24

This change is

Support encryption at rest

Hi, Currently there is no authentication support. It will be a great feature to have. We are using badger for developing a banking solution and data privacy is a requirement. Kindly let me know if you can incorporate the security feature.

Regards, Asim.
Improve GC strategy to reclaim multiple logs
Hello,

let's take the following scenario:

open a database

insert 1M key/values in badgers, with distinct keys

delete all the key values

run PurgeOlderVersions()

run RunValueLogGC(0.5)

close the database

Then the db directory has still a large size. It looks like disk space was not reclaimed. Am i doing something wrong ?

Moreover, when i iterate over the now empty database, iteration time is still quite long, but no result is returned of course.

Thanks, Stephane
Mobile support.

I currently use boltdb on mobiles. In bolts readme there are some minor adjustments required for mobiles.

The code is then compiled into an aar or framework file for each is using gomobile.

It's stupid easy to us :)

Would the team be open to looking into mobile support ?
BadgerDB open() call takes long time (> 2 min) to complete
What version of Go are you using (go version)?

$ go version go version go1.13.3 linux/amd64

What version of Badger are you using?

github.com/dgraph-io/badger v1.6.0

Does this issue reproduce with the latest master?

Yes

What are the hardware specifications of the machine (RAM, OS, Disk)?

RAM - 16GB OS - Ubuntu 16.04 Disk - SSD

What did you do?

We are using BadgerDB for deduplication. We store message ID as the key and the value as nil. We open the badgerdb during the initialization.

gateway.badgerDB, err = badger.Open(badger.DefaultOptions(path))

Code that writes to badger DB

err := badgerDB.Update(func(txn *badger.Txn) error { for _, messageID := range messageIDs { e := badger.NewEntry([]byte(messageID), nil).WithTTL(dedupWindow * time.Second) if err := txn.SetEntry(e); err == badger.ErrTxnTooBig { _ = txn.Commit() txn = badgerDB.NewTransaction(true) _ = txn.SetEntry(e) } } return nil })

$ du -ch -d 1 ./badgerdb 18G ./badgerdb 18G total $ ls -l ./badgerdb/ | grep sst | wc -l 270

Over 1 day, we have 270 SST files and 18 GB data.

What did you expect to see?

The badger.Open call completing in a few seconds.

What did you see instead?

The badger.Open takes around 2.5 minutes to open 270 files.

Infinite recursion in Item.yieldItemValue ?

Hi,

I face a difficult to debug problem with badger. It happens in the following situation:

ingest a lot of data (say 1M key-values)
delete that data
stop the program (properly closing the badger database)
relaunch the program

Then it can happen that when the program reopens the badger database, go panics with a "runtime: goroutine stack exceeds 1000000000-byte limit".

Further tries to start the program then always face a panic.

The problem might be in my code of course, but I can't find anything strange. I disabled everything except opening the database and iterating over key values, and panic still happens.

The traces show:

goroutine 1 [running]:
runtime.makeslice(0xef4340, 0x28, 0x28, 0xc425764000, 0x0, 0x7ff73adb46c8)
        /usr/local/go/src/runtime/slice.go:46 +0xf7 fp=0xc44cd70348 sp=0xc44cd70340 pc=0x4470f7
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table.(*blockIterator).parseKV(0xc42d3aa990, 0xf00140000, 0xffffffff)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table/iterator.go:114 +0x4bf fp=0xc44cd70430 sp=0xc44cd70348 pc=0xc749cf
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table.(*blockIterator).Next(0xc42d3aa990)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table/iterator.go:154 +0x191 fp=0xc44cd70480 sp=0xc44cd70430 pc=0xc74bd1
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table.(*blockIterator).Init(0xc42d3aa990)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table/iterator.go:54 +0x3d fp=0xc44cd70498 sp=0xc44cd70480 pc=0xc7414d
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table.(*blockIterator).Seek(0xc42d3aa990, 0xc42d3a4cc0, 0x2b, 0x30, 0x0)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table/iterator.go:84 +0x153 fp=0xc44cd704e8 sp=0xc44cd70498 pc=0xc74303
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table.(*Iterator).seekHelper(0xc42d3a2600, 0x0, 0xc42d3a4cc0, 0x2b, 0x30)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table/iterator.go:270 +0x11f fp=0xc44cd70550 sp=0xc44cd704e8 pc=0xc7551f
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table.(*Iterator).seekFrom(0xc42d3a2600, 0xc42d3a4cc0, 0x2b, 0x30, 0x0)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table/iterator.go:300 +0x12f fp=0xc44cd705b8 sp=0xc44cd70550 pc=0xc756bf
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table.(*Iterator).seek(0xc42d3a2600, 0xc42d3a4cc0, 0x2b, 0x30)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table/iterator.go:316 +0x55 fp=0xc44cd705f0 sp=0xc44cd705b8 pc=0xc75815
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table.(*Iterator).Seek(0xc42d3a2600, 0xc42d3a4cc0, 0x2b, 0x30)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/table/iterator.go:417 +0x82 fp=0xc44cd70620 sp=0xc44cd705f0 pc=0xc75f92
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger.(*levelHandler).get(0xc4203ae8a0, 0xc42d3a4cc0, 0x2b, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/level_handler.go:253 +0x265 fp=0xc44cd706f8 sp=0xc44cd70620 pc=0xc8acc5
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger.(*levelsController).get(0xc420393e30, 0xc42d3a4cc0, 0x2b, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/levels.go:727 +0xf6 fp=0xc44cd70820 sp=0xc44cd706f8 pc=0xc90e76
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger.(*DB).get(0xc42040c700, 0xc42d3a4cc0, 0x2b, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/db.go:507 +0x1fd fp=0xc44cd70940 sp=0xc44cd70820 pc=0xc818fd
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger.(*Item).yieldItemValue(0xc4204202c0, 0xc42d3a4c30, 0x2b, 0x30, 0x2, 0x0, 0xc42d392c23)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/iterator.go:169 +0x414 fp=0xc44cd70aa8 sp=0xc44cd70940 pc=0xc86f94
github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger.(*Item).yieldItemValue(0xc4204202c0, 0xc42d3a4ba0, 0x2b, 0x30, 0x2, 0x0, 0xc42d392c03)
        /home/stef/skewer-gopath/src/github.com/stephane-martin/skewer/vendor/github.com/dgraph-io/badger/iterator.go:178 +0x4d2 fp=0xc44cd70c10 sp=0xc44cd70aa8 pc=0xc87052

And so on afterward. The calls to yieldItemValue stack until explosion.

options: `WithBlockCacheSize` documentation not correct about `BlockCacheSize` default value

What version of Badger is the target?

[email protected] and master

Documentation.

BlockCacheSize: This is set to 256<<20(256Mib?) when using the default options but the WithBlockCacheSize method says the default value is zero.

Additional information.

https://github.com/dgraph-io/badger/blob/main/options.go#L163 https://github.com/dgraph-io/badger/blob/main/options.go#L710
chore(iterator): `yieldItemValue` no error return
Problem

I was trying to trigger an error from func (item *Item) ValueCopy(dst []byte) ([]byte, error) in a unit test, but it turns out errors cannot happen. This just makes internal code clearer no error can happen.

Solution

Change func (item *Item) yieldItemValue() ([]byte, func(), err error) -> func (item *Item) yieldItemValue() ([]byte, func())

Keep exported methods with an error returned to keep compatibility, although some such as ValueCopy cannot produce a non-nil error (as it was before)

[BUG]: badger v3 memory leak

What version of Badger are you using?

No response

What version of Go are you using?

go 1.19.4

Have you tried reproducing the issue with the latest release?

None

What is the hardware spec (RAM, CPU, OS)?

8GB 4Core X86 Linux

What steps will reproduce the bug?

it will OOM

Expected behavior and actual result.

No response

Additional information

`package main

import ( "time" "fmt"

"github.com/google/uuid"
badger "github.com/dgraph-io/badger/v2"
"github.com/dgraph-io/badger/v2/options"

)

func main() {

ch := make(chan bool)

db, err := badger.Open(badger.DefaultOptions("/pv_data/ryou.zhang/temp/data").
	WithCompression(options.None).
	WithIndexCacheSize(256<<20).
	WithBlockCacheSize(0))
if err != nil {
	panic(err)
}
defer db.Close()


for i:=0; i<100; i++ {
go func() {
	for {
		key := uuid.NewString()
		raw := make([]byte, 1024*64)

		txn := db.NewTransaction(true)
		txn.SetEntry(badger.NewEntry([]byte(key), raw))
		txn.Commit()


		// txn := db.NewTransactionAt(uint64(time.Now().UnixNano()), true)
		// txn.SetEntry(badger.NewEntry([]byte(key), raw))
		// txn.Commit()
		<-time.After(1 * time.Millisecond)
	}
}()
}
go func() {
	for {
		fmt.Println("cost:",(db.IndexCacheMetrics().CostAdded()- db.IndexCacheMetrics().CostEvicted())/1024.0/1024.0, "MB item:", db.IndexCacheMetrics().KeysAdded() - db.IndexCacheMetrics().KeysEvicted())
		<-time.After(1000 * time.Millisecond)
	}
}()
<-ch

} `

code like before, when using v3 it will oom, but use v2 it's ok

feat(Publisher): Add DB.SubscribeAsync API.
Problem

In one of my personal projects, I have an API that uses DB.Subscribe to susbcribe to changes to the DB and add these changes to an unbounded queue. An over-simplified version of it would be:

func (x *X) Watch() { go func() { _ = x.db.Subscribe( context.Background(), func(kvs *pb.KVList) error { x.queue.Add(kvs) return nil }, []pb.Match{{Prefix: []byte{"foobar"}}}) }() }

The way I test it, in psudo-Go, is:

func TestWatch() { x := ... x.Watch() doChangesToDb(x.db) verifyQueue(x.queue) }

The problem, as I hope you can see, is a race condition. There's no guarantee I have actually subscribed before I exit x.Watch(). By the time I call doChangesToDb(x.db), depending on the timing of the goroutine in x.Watch(), I might miss some or even all changes. Because DB.Subscribe is blocking, there's no way to know for certain that you have actually subscribed, in case you need to know. The only guaranteed way is to wait for the first cb call, but that's not always convenient or even possible. The next best workaround is to wait for the moment just before the DB.Subscribe call:

func (x *X) Watch() { wg := sync.WaitGroup{} wg.Add(1) go func() { wg.Done() _ = x.db.Subscribe( context.Background(), func(kvs *pb.KVList) error { x.queue.Add(kvs) return nil }, []pb.Match{{Prefix: []byte{"foobar"}}}) }() wg.Wait() }

This workaround can be seen used extensively on publisher_test.go. The problem with it is that, although very likely to work, it is not guaranteed. You see, Golang reserves the right to preempt any goroutine, even if they aren't blocked. The Go scheduler will mark any goroutine that takes more than 10ms as preemptible. If the time between the wg.Done() call and the db.pub.newSubscriber(c, matches) call (inside DB.Subscribe) is just long enough, the goroutine might be preempted and you will end up with the same problem as before. Who knows. Maybe GC kicked in at the wrong time. Although this is very unlikely to happen, I would sleep much better if it were actually impossible (I wish to depend on this behaviour not only for the tests, but for the actual correctness of my project).

Solution

I hope it became clear that the problem is caused by the API being blocking. The solution then, is to add a non-blocking version of the API. The proposed API receives only the []pb.Match query, and returns a <-chan *KVList channel and a UnsubscribeFunc function. The channel is to be used by consumers to read the changes, while the function is how you cancel the operation. I believe this API to be much more idiomatic Go, as it uses channels for communication, making it possible for the caller to select and for range on it. You can see how much simpler the calling code becomes in the new publisher_test.go, where I add a new version of each test using the new API, while keeping the old tests intact.

I have also rewritten the original DB.Subscribe to use the new DB.SubscribeAsync underneath, so as to reuse code, and make both behaviours are the same.

This is my first PR to badger. Please, be kind :). Also, thank you for the awesome project and for any time spent reviewing this PR. You folks rock!
[BUG]: Building a plugin that uses badger fails
What version of Badger are you using?

github.com/dgraph-io/badger/v3 v3.2103.4

What version of Go are you using?

GOVERSION="go1.19.1"

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, CPU, OS)?

Macbook Pro 2021 Intel Core i5 w/ 16GB RAM

What steps will reproduce the bug?

create a plugin that uses badger, try to compile that plugin with go build -buildmode=plugin

Expected behavior and actual result.

Compilation should succeed and a shared object should be produced.

Instead the following error is reported:

# github.com/cespare/xxhash asm: xxhash_amd64.s:120: when dynamic linking, R15 is clobbered by a global variable access and is used here: 00092 (/Users/jasonfowler/go/pkg/mod/github.com/cespare/[email protected]/xxhash_amd64.s:120) ADDQ R15, AX asm: assembly failed

Additional information

No response
Revisit configurable logging

I was reading the comments and was quite disappointed in the solution from a few years back regarding logging. Just because I don’t want to see info messages, does not mean I don’t want to see warnings and errors. The problem is that the GO logging that is built in by default so so weak, it is not really ready for true commercial work. Being able to set the logger to nil throws the baby out with the bath water. Has there been any attempt to revisit this solution, to use logrus or zap? Especially with a database, logging is so important for it to be this badly implemented

Fast key-value DB in Go.

BadgerDB

Project Status [March 24, 2020]

Table of Contents

Getting Started

Installing

Installing Badger Command Line Tool

Choosing a version

Badger Documentation

Resources

Blog Posts

Design

Comparisons

Benchmarks

Projects Using Badger

Contributing

Contact

Owner

Dgraph

Comments

RunValueLogGC crashed

What version of Go are you using (go version)?

What version of Badger are you using?

Does this issue reproduce with the latest master?

What are the hardware specifications of the machine (RAM, OS, Disk)?

What did you do?

What did you expect to see?

What did you see instead?

ARMv7 segmentation fault in oracle.readTs when calling loadUint64

Performance regression 1.6 to 2.0.2

What version of Go are you using (go version)?

What version of Badger are you using?

Does this issue reproduce with the latest master?

What are the hardware specifications of the machine (RAM, OS, Disk)?

What did you do?

What did you expect to see?

What did you see instead?

1.6.0 performance:

2.0.2 performance:

Discard invalid versions of keys during compaction

GC doesn't work? (not cleaning up SST files properly)

What version of Go are you using (go version)?

What version of Badger are you using?

Does this issue reproduce with the latest master?

What are the hardware specifications of the machine (RAM, OS, Disk)?

What did you do?

What did you expect to see?

What did you see instead?

Use pure Go based ZSTD implementation

Benchmarks

Support encryption at rest

Improve GC strategy to reclaim multiple logs

Mobile support.

BadgerDB open() call takes long time (> 2 min) to complete

What version of Go are you using (go version)?

What version of Badger are you using?

Does this issue reproduce with the latest master?

What are the hardware specifications of the machine (RAM, OS, Disk)?

What did you do?

What did you expect to see?

What did you see instead?

Infinite recursion in Item.yieldItemValue ?

options: `WithBlockCacheSize` documentation not correct about `BlockCacheSize` default value

What version of Badger is the target?

Documentation.

Additional information.

chore(iterator): `yieldItemValue` no error return

Problem

Solution

[BUG]: badger v3 memory leak

What version of Badger are you using?

What version of Go are you using?

Have you tried reproducing the issue with the latest release?

What is the hardware spec (RAM, CPU, OS)?

What steps will reproduce the bug?

Expected behavior and actual result.

Additional information

feat(Publisher): Add DB.SubscribeAsync API.

Problem

Solution

What version of Go are you using (`go version`)?

What version of Go are you using (`go version`)?

What version of Go are you using (`go version`)?

What version of Go are you using (`go version`)?