A feature complete and high performance multi-group Raft library in Go.

lni

Last update: Dec 30, 2022

Comments: 16

Dragonboat - A Multi-Group Raft library in Go / 中文版

News

2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes.
2020-03-05 Dragonboat v3.2 has been released, please check CHANGELOG for details.

About

Dragonboat is a high performance multi-group Raft consensus library in pure Go.

Consensus algorithms such as Raft provides fault-tolerance by alllowing a system continue to operate as long as the majority member servers are available. For example, a Raft cluster of 5 servers can make progress even if 2 servers fail. It also appears to clients as a single entity with strong data consistency always provided. All Raft replicas can be used to handle read requests for aggregated read throughput.

Dragonboat handles all technical difficulties associated with Raft to allow users to just focus on their application domains. It is also very easy to use, our step-by-step examples can help new users to master it in half an hour.

Features

Easy to use pure-Go APIs for building Raft based applications
Feature complete and scalable multi-group Raft implementation
Disk based and memory based state machine support
Fully pipelined and TLS mutual authentication support, ready for high latency open environment
Custom Raft log storage and transport support, easy to integrate with latest I/O techs
Prometheus based health metrics support
Built-in tool to repair Raft clusters that permanently lost the quorum
Extensively tested including using Jepsen's Knossos linearizability checker, some results are here

Most features covered in Diego Ongaro's Raft thesis have been supported -

leader election, log replication, snapshotting and log compaction
membership change
ReadIndex protocol for read-only queries
leadership transfer
non-voting member
witness member
idempotent update transparent to applications
batching and pipelining
disk based state machine

Performance

Dragonboat is the fastest open source multi-group Raft implementation on Github.

For 3-nodes system using mid-range hardware (details here) and in-memory state machine, when RocksDB is used as the storage engine, Dragonboat can sustain at 9 million writes per second when the payload is 16bytes each or 11 million mixed I/O per second at 9:1 read:write ratio. High throughput is maintained in geographically distributed environment. When the RTT between nodes is 30ms, 2 million I/O per second can still be achieved using a much larger number of clients.

The number of concurrent active Raft groups affects the overall throughput as requests become harder to be batched. On the other hand, having thousands of idle Raft groups has a much smaller impact on throughput.

Table below shows write latencies in millisecond, Dragonboat has <5ms P99 write latency when handling 8 million writes per second at 16 bytes each. Read latency is lower than writes as the ReadIndex protocol employed for linearizable reads doesn't require fsync-ed disk I/O.

Ops	Payload Size	99.9% percentile	99% percentile	AVG
1m	16	2.24	1.19	0.79
1m	128	11.11	1.37	0.92
1m	1024	71.61	25.91	3.75
5m	16	4.64	1.95	1.16
5m	128	36.61	6.55	1.96
8m	16	12.01	4.65	2.13

When tested on a single Raft group, Dragonboat can sustain writes at 1.25 million per second when payload is 16 bytes each, average latency is 1.3ms and the P99 latency is 2.6ms. This is achieved when using an average of 3 cores (2.8GHz) on each server.

As visualized below, Stop-the-World pauses caused by Go1.11's GC are sub-millisecond on highly loaded systems. Such very short Stop-the-World pause time is further significantly reduced in Go 1.12. Golang's runtime.ReadMemStats reports that less than 1% of the available CPU time is used by GC on highly loaded system.

Requirements

x86_64/Linux, x86_64/MacOS or ARM64/Linux, Go 1.15 or 1.14

Getting Started

Master is our unstable branch for development. Please use the latest released versions for any production purposes. For Dragonboat v3.3.x, please follow the instructions in v3.3.x's README.md.

Go 1.14 or above with Go module support is required.

To use Dragonboat, make sure to import the package github.com/lni/dragonboat/v3. Also add "github.com/lni/dragonboat/v3 v3.3.0" to the require section of your project's go.mod file.

By default, Pebble is used for storing Raft Logs in Dragonboat. RocksDB and other storage engines are also supported, more info here.

You can also follow our examples on how to use Dragonboat.

Documents

FAQ, docs, step-by-step examples, DevOps doc, CHANGELOG and online chat are available.

Examples

Dragonboat examples are here.

Status

Dragonboat is production ready.

Contributing

For reporting bugs, please open an issue. For contributing improvements or new features, please send in the pull request.

License

Dragonboat is licensed under the Apache License Version 2.0. See LICENSE for details.

Third party code used in Dragonboat and their licenses is summarized here.

Owner

lni

[email protected]

https://github.com/lni/dragonboat

Comments

Reducing Memory

Dragonboat version

3.2

Expected behavior

Memory use significantly less than 1G

Actual behavior

Memory hovers around 1G

Steps to reproduce the behavior

Hi mate,

One of my early adopters of roo using a small online machine ran out of memory while installing, and I realized that the resident memory of my setup is also around 1G. I noticed here https://github.com/lni/dragonboat/issues/103#issuecomment-543129283 that @caiwk mentions a few ways to manage the memory, but I was hoping you might share what you'd recommend as a best practice setup for a small db.

Personally my app will probably never need more than 10,000 key values where the values don't get above 4kb using the rocks db example. My app is using the ondisk example.

This would really help as I also use hundreds of aws nano ec2 instances. I found them more cost performant than bigger instances, so moving to my own infrastructure i'd imagine things will be the same.

Thanks! Andrew
IRaftEventListener.LeaderUpdated Called Frequently

My cluster seems bootstrapped into three healthy nodes, but i get very frequent updates on this channel.

It seems that the leader does not get notifications when it is still the leader, but followers get frequent notifications that they are not the leader. Once per heartbeat?

Normal?
Badger support as a pure-go alternative to RocksDB
After discussions in #4, I've add some experimental code to make badger to work with Dragonboat. The code is available in the following branch -

https://github.com/lni/dragonboat/tree/badger-logdb-experimental

As mentioned above, this is some experimental code use to see how badger works. It should never be used in production.

I'd like to point out that there are outstanding issues preventing badger support to be added -

it leaks goroutines (https://github.com/dgraph-io/badger/issues/685)

sstables are not checksumed (https://github.com/dgraph-io/badger/issues/680)

There are probably more, will update when identified.
fsync issues

Tim Fox @purplefox mentioned the following concerns in the gitter chatroom. As I think this might be interested by other potential users as well, I've moved the discussion here so the information can be later searched & read by everyone including those who don't use the gitter chatroom.

Tim Fox @purplefox mentioned that -

Hi @lni thanks for getting back to me. If Raft really requires committed log entries to be 100% durably stored then I can't see how that would work in the real world.

First of all, please note that the fsync usage issue you mentioned is not something unique in Dragonboat or Raft. I am 100% sure that all consensus libraries are doing the same. As explained earlier in Gitter, rather than betting on probabilities, consensus libraries like Dragonboat provide stricter guarantees. This helps developers to better reasoning about their systems and data, for a 3 replica Raft group, when two replicas are up and running, they would be able to know that all their data is available. If durability is not guaranteed, then they would only be able to say that their data is available with very high probability.

The real question here is not whether Dragonboat is requiring too much guarantees on durable storages, it is really about whether you do need strongly consistent data like the one provided by Raft & Paxos.

In the real world disks fail, battery backed caches fail. Also, when running on Amazon EBS (most probably the most common place to run distributed systems) there's an annual failure rate of 0.1 to 0.2% per year https://docs.aws.amazon.com/whitepapers/latest/aws-storage-services-overview/durability-and-availability-3.html that means 1 in 500 of your disk volumes will fail per year. For a company with a lot of volumes (like us) that's a lot of failures every year. Any distributed systems that we build on top of that must be able to recover from failures.

Indeed. In Dragonboat, we consider failure as a feature of life, not some exceptional events. That is why we need to make sure that no data is lost when the majority quorum is available after such failures. Requiring fsync() to do what the posix requires it to do is one of the basic requirement to realize that goal.

As I mentioned before, when using something like Amazon EBS, how do you think it gets fsync speeds of a few ms? EBS is a SAN, so it basically replicates writes to multiple replicas and writes into page cache on each replica. When you call fsync it does not actually fsync that data to the disks on the replicas. It's sufficient for EBS to just replicate in memory and fsync asynchronously in order to comply with the advertised failure rate of 0.1 to 0.2% per year of dataloss. It's all about mathematical probabilities! (These can be worked out quite easily if you know mean time between failures of the replicas). This is how clouds storage works! Performance would be terrible if they actually fsynced every time you called fsync.

Modern SSDs can deliver fsync latencies of a few dozen microseconds, with that in mind, EBS's a few millisecond fsync delays is pretty reasonable to me as in theory they can always attach some SSDs to regular block storages backed by HDDs to just handle fsync data.

Please also note that Dragonboat is not requiring you to use EBS or any cloud services. Being able to durably store data is what it requires. When using Amazon EC2 nodes, surely you can choose to store such durably data on instance store volumes or something similar.

So, even if dragonboat fsyncs on each raft batch then there's no guarantee they are stored durably, and it's impossible for dragonboat to provide a lower failure rate than the one guaranteed by the underlying storage system (e.g. EBS).

Please check the posix manpage on fsync(), it states that -

fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even if the system crashes or is rebooted.

https://man7.org/linux/man-pages/man2/fsync.2.html

If the storage system refuses to be compatible with such well known & defined standards, then all belts are off and you are on your own.

In order for dataloss to occur in a replicated system all replicas need to fail at the same time, and then all unsynced data in those replicas will be lost. This is a very unlikely event and the probability can be calculated from the statistics you gather on cloud about machine failures and also if you know the mean time between asynchronous syncs. In many cases the probability of this occurring leads to a failure rate of less than the EBS advertised failure rate! So, and here is the important point - there is absolutely no point in fsyncing any more as the chance of underlying storage system failure (EBS) is already higher than the chance of dataloss at our level. Again, it's all about probabilities.

It is never about how low is the probability. For the guarantees provided by Raft & Paxos, losing data should never happen when the majority of nodes are available after such repeated reboots. If your system doesn't want such strong guarantee, please feel free to choose from weaker protocols.

What's the worst that can happen if the raft log is not fsynced for every committed record? We assume the underlying log store (e.g. Pebble or RocksDB) does not become corrupt in the case of a sudden machine failure, and this is a good assumption for both Pebble or RocksDB - when they restart after power failure, records may be missing from the end of the log but the log will not be corrupt.

It will lead to total violation of linearizability, which is the single most important feature provided by Raft & Paxos.

If a Raft replica fails after power loss and has lost some records from the end of it's committed raft log (and please note this can also happen with EBS anyway, even if you fsync, and yes, your battery backed cache might fail and this could happen for that reason too!), then on recovery it attempts to join the group again, and it should sync with the leader and receive all missing records? Apologise I am not a raft expert but it's my understand that's what happens.

the problem is what happens if the leader also lose some entries due to the same reason. in a worse situation, for a 3 replica Raft group, if one node loses some entries and two other nodes loses more entries, this will allow those two nodes to rewrite committed logs and cause diverged states on your replicas.

Yes there is dataloss, but so what? If the probability of this event happening is less than the probability of underlying storage system failure (e.g. chance of EBS volume loss) it doesn't matter!

Again, if EBS does cause fsynced data to be lost, you shouldn't be using it.
raft: introduce DisableProposalForwarding option

DisableProposalForwarding set to true means that followers will drop proposals, rather than forwarding them to the leader. Proposal from follower or observer will be dropped.

One use case for this feature would be in a situation where the Raft leader is used to compute the data of a proposal, for example, adding a timestamp from a hybrid logical clock to data in a monotonically increasing way. Forwarding should be disabled to prevent a follower with an inaccurate hybrid logical clock from assigning the timestamp and then forwarding the data to the leader.
module declares its path as: github.com/cockroachdb/pebble but was required as: github.com/petermattis/pebble

Note: for reported bugs, please fill in the following details. bug reports without detailed steps on how to reproduce will be automatically closed.

Dragonboat version

Expected behavior

Successful retrieval of all dependencies

Actual behavior

That's uncertain

Steps to reproduce the behavior

$ go test ./... go: finding module for package github.com/petermattis/pebble go: found github.com/petermattis/pebble in github.com/petermattis/pebble v0.0.0-20200710160639-c9a380a7f499 go: github.com/lni/dragonboat/v3/internal/logdb/kv/pebble imports github.com/petermattis/pebble: github.com/petermattis/[email protected]: parsing go.mod: module declares its path as: github.com/cockroachdb/pebble but was required as: github.com/petermattis/pebble

My environment:

$ go version && go env go version go1.15.2 gollvm LLVM 12.0.0git linux/amd64 GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/oceanfish81/.cache/go-build" GOENV="/home/oceanfish81/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/oceanfish81/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/oceanfish81/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/home/oceanfish81/gollvm_dist" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/home/oceanfish81/gollvm_dist/tools" GCCGO="/home/oceanfish81/gollvm_dist/bin/llvm-goc" AR="ar" CC="/usr/bin/clang" CXX="/usr/bin/clang++" CGO_ENABLED="1" GOMOD="/home/oceanfish81/golang_projects/dragonboat/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build698517013=/tmp/go-build -gno-record-gcc-switches -funwind-tables"

I am on Ubuntu 20.04 (x86_64).
gorocksdb core dump on Raspberry Pi 4's BCM2711 processor

Note: for reported bugs, please fill in the following details. bug reports without detailed steps on how to reproduce will be automatically closed.

Dragonboat version

3.2.2

Expected behavior

gorocksdb writes successfully

Actual behavior

goroutine 0 [idle]: runtime: unknown pc 0xffb2ac stack: frame={sp:0x7f9582bbb0, fp:0x0} stack=[0x7f9502c980,0x7f9582c580) 0000007f9582bab0: 0000007f9582bad0 0000007f9582bad0 0000007f9582bac0: 0000000000000000 0000000000000020 0000007f9582bad0: 0000000000000000 0000000000000000 0000007f9582bae0: 0000000000000000 0000000000000000 0000007f9582baf0: 0000000000000001 0000007f8c002797 0000007f9582bb00: 0000000000000005 0000007f9582bb38 0000007f9582bb10: 0000000000556394 <runtime.netpoll+444> 0000007f94ffdbc8 0000007f9582bb20: 0000000100000077 0000000000000000 0000007f9582bb30: 000000400016ea80 0000007f9582c3a8 0000007f9582bb40: 00000000005612e4 <runtime.findrunnable+1804> 0000007f9582bb90 0000007f9582bb50: 0000007f94ffdbc8 00000001000000e9 0000007f9582bb60: 0000000000000001 0000007f9582bb00 0000007f9582bb70: 0000007f00000001 0000000000000001 0000007f9582bb80: 0000000000000000 0000000000000000 0000007f9582bb90: 000000400016ea80 0000000000000000 0000007f9582bba0: 0000000000000000 0000000000000005 0000007f9582bbb0: <0000007f9582bc40 0000000000e7647c 0000007f9582bbc0: 0000007f9582be10 0000000000003c37 0000007f9582bbd0: 0000000000003c37 0000007f7436e740 0000007f9582bbe0: 0000007f8c006e90 0000000000000001 0000007f9582bbf0: 0000000000000007 0000000000000000 0000007f9582bc00: 0000000000000000 00013c3700000000 0000007f9582bc10: 0000000000000000 0000000000000000 0000007f9582bc20: 0000000000000000 0000000000000000 0000007f9582bc30: 0000000000000000 0000000000000000 0000007f9582bc40: 0000007f9582bd00 0000000000e36308 0000007f9582bc50: 0000007f74199e30 0000007f9582be10 0000007f9582bc60: 0000000000000000 00000000159117c0 0000007f9582bc70: 0000007f7419aad0 0000007f7436e740 0000007f9582bc80: 0000000000000001 0000000000000000 0000007f9582bc90: 0000000000000016 0000007f9582be10 0000007f9582bca0: 000000000148ed98 0000007f9582bce8 runtime: unknown pc 0xffb2ac stack: frame={sp:0x7f9582bbb0, fp:0x0} stack=[0x7f9502c980,0x7f9582c580) 0000007f9582bab0: 0000007f9582bad0 0000007f9582bad0 0000007f9582bac0: 0000000000000000 0000000000000020 0000007f9582bad0: 0000000000000000 0000000000000000 0000007f9582bae0: 0000000000000000 0000000000000000 0000007f9582baf0: 0000000000000001 0000007f8c002797 0000007f9582bb00: 0000000000000005 0000007f9582bb38 0000007f9582bb10: 0000000000556394 <runtime.netpoll+444> 0000007f94ffdbc8 0000007f9582bb20: 0000000100000077 0000000000000000 0000007f9582bb30: 000000400016ea80 0000007f9582c3a8 0000007f9582bb40: 00000000005612e4 <runtime.findrunnable+1804> 0000007f9582bb90 0000007f9582bb50: 0000007f94ffdbc8 00000001000000e9 0000007f9582bb60: 0000000000000001 0000007f9582bb00 0000007f9582bb70: 0000007f00000001 0000000000000001 0000007f9582bb80: 0000000000000000 0000000000000000 0000007f9582bb90: 000000400016ea80 0000000000000000 0000007f9582bba0: 0000000000000000 0000000000000005 0000007f9582bbb0: <0000007f9582bc40 0000000000e7647c 0000007f9582bbc0: 0000007f9582be10 0000000000003c37 0000007f9582bbd0: 0000000000003c37 0000007f7436e740 0000007f9582bbe0: 0000007f8c006e90 0000000000000001 0000007f9582bbf0: 0000000000000007 0000000000000000 0000007f9582bc00: 0000000000000000 00013c3700000000 0000007f9582bc10: 0000000000000000 0000000000000000 0000007f9582bc20: 0000000000000000 0000000000000000 0000007f9582bc30: 0000000000000000 0000000000000000 0000007f9582bc40: 0000007f9582bd00 0000000000e36308 0000007f9582bc50: 0000007f74199e30 0000007f9582be10 0000007f9582bc60: 0000000000000000 00000000159117c0 0000007f9582bc70: 0000007f7419aad0 0000007f7436e740 0000007f9582bc80: 0000000000000001 0000000000000000 0000007f9582bc90: 0000000000000016 0000007f9582be10 0000007f9582bca0: 000000000148ed98 0000007f9582bce8

goroutine 4 [syscall]: runtime.cgocall(0xdb4da0, 0x4045a54918, 0x40002083f8) /usr/local/go/src/runtime/cgocall.go:133 +0x50 fp=0x4045a548e0 sp=0x4045a548a0 pc=0x52b7f8 github.com/lni/dragonboat/v3/internal/logdb/kv/rocksdb/gorocksdb._Cfunc_rocksdb_write(0x7f7436e480, 0x7f74229810, 0x159117c0, 0x40002083f8) _cgo_gotypes.go:4066 +0x40 fp=0x4045a54910 sp=0x4045a548e0 pc=0xa2f088 github.com/lni/dragonboat/v3/internal/logdb/kv/rocksdb/gorocksdb.(*DB).Write.func1(0x400031cb60, 0x40002082b0, 0x40000bc348, 0x40002083f8) /home/pi/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/logdb/kv/rocksdb/gorocksdb/db.go:381 +0xf4 fp=0x4045a54960 sp=0x4045a54910 pc=0xa3b4ac github.com/lni/dragonboat/v3/internal/logdb/kv/rocksdb/gorocksdb.(*DB).Write(0x400031cb60, 0x40002082b0, 0x40000bc348, 0x0, 0x0) /home/pi/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/logdb/kv/rocksdb/gorocksdb/db.go:381 +0x60 fp=0x4045a549c0 sp=0x4045a54960 pc=0xa32328 github.com/lni/dragonboat/v3/internal/logdb/kv/rocksdb.(*KV).CommitWriteBatch(0x4000184bc0, 0x1441a40, 0x40000bc348, 0x100, 0x1441a40) /home/pi/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/logdb/kv/rocksdb/kv_rocksdb.go:323 +0x54 fp=0x4045a54a00 sp=0x4045a549c0 pc=0xa4458c github.com/lni/dragonboat/v3/internal/logdb.(*rdb).saveRaftState(0x4000200f00, 0x400025a000, 0x1, 0x100, 0x14478e0, 0x4000212100, 0x4045a55068, 0xa6b9e8) /home/pi/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/logdb/rdb.go:206 +0x1ec fp=0x4045a55020 sp=0x4045a54a00 pc=0xa4bef4 github.com/lni/dragonboat/v3/internal/logdb.(*ShardedRDB).SaveRaftState(0x40002d6190, 0x400025a000, 0x1, 0x100, 0x14478e0, 0x4000212100, 0x1, 0x40530ac000) /home/pi/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/logdb/sharded_rdb.go:159 +0x90 fp=0x4045a55070 sp=0x4045a55020 pc=0xa500d8 github.com/lni/dragonboat/v3.(*execEngine).execNodes(0x4000116090, 0x8, 0x400054a240, 0x4045a3b050, 0x400018b860) /home/pi/go/pkg/mod/github.com/lni/dragonboat/[email protected]/execengine.go:899 +0x53c fp=0x4045a55d90 sp=0x4045a55070 pc=0xa6ba84 github.com/lni/dragonboat/v3.(*execEngine).nodeWorkerMain(0x4000116090, 0x8) /home/pi/go/pkg/mod/github.com/lni/dragonboat/[email protected]/execengine.go:804 +0x1d0 fp=0x4045a55f20 sp=0x4045a55d90 pc=0xa6b188 github.com/lni/dragonboat/v3.newExecEngine.func1() /home/pi/go/pkg/mod/github.com/lni/dragonboat/[email protected]/execengine.go:663 +0x30 fp=0x4045a55f40 sp=0x4045a55f20 pc=0xa8c7d8 github.com/lni/goutils/syncutil.(*Stopper).runWorker.func1(0x40002eae80, 0x4000164170, 0x0, 0x0, 0x400020a020) /home/pi/go/pkg/mod/github.com/lni/[email protected]/syncutil/stopper.go:104 +0x50 fp=0x4045a55fb0 sp=0x4045a55f40 pc=0xa26868 runtime.goexit() /usr/local/go/src/runtime/asm_arm64.s:1148 +0x4 fp=0x4045a55fb0 sp=0x4045a55fb0 pc=0x5896cc created by github.com/lni/goutils/syncutil.(*Stopper).runWorker /home/pi/go/pkg/mod/github.com/lni/[email protected]/syncutil/stopper.go:94 +0x88

Steps to reproduce the behavior

Run on Raspberian 64 bit
WAL Log Not Compacting After Snapshot

Note: for reported bugs, please fill in the following details. bug reports without detailed steps on how to reproduce will be automatically closed.

Dragonboat version

3.2.8

Expected behavior

Expecting WAL with PebbleDB to compact after Snapshotting.

Actual behavior

It does not appear that the WAL compacts ever. I ran a large enough load test on a 3 node / process cluster running on my development Mac that eventually filled the disk completely. It wasn't even possible to delete files. I had to reboot to free up some disk space so that I could delete the PebbleDB WAL logs.

I added logging to my snapshotting code...

WARN[0084] Dec 3 13:05:07.481502000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0084] Dec 3 13:05:07.481523000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0084] Dec 3 13:05:07.484007000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-2/raft/dragonboat/db is 885869770 WARN[0084] Dec 3 13:05:07.484024000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0084] Dec 3 13:05:07.484075000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-1/raft/dragonboat/db is 885869770 WARN[0084] Dec 3 13:05:07.484098000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0084] Dec 3 13:05:07.488356000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 4.318785ms WARN[0084] Dec 3 13:05:07.488410000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 4.313136ms WARN[0085] Dec 3 13:05:07.521470000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0085] Dec 3 13:05:07.523337000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-0/raft/dragonboat/db is 885869769 WARN[0085] Dec 3 13:05:07.523350000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0085] Dec 3 13:05:07.526876000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 3.518845ms WARN[0096] Dec 3 13:05:18.923764000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0096] Dec 3 13:05:18.925574000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-1/raft/dragonboat/db is 1181331156 WARN[0096] Dec 3 13:05:18.925590000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0096] Dec 3 13:05:18.930094000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 4.494046ms WARN[0096] Dec 3 13:05:18.937993000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0096] Dec 3 13:05:18.939505000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-2/raft/dragonboat/db is 1181325833 WARN[0096] Dec 3 13:05:18.939519000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0096] Dec 3 13:05:18.944310000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 4.784278ms WARN[0096] Dec 3 13:05:19.053852000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0096] Dec 3 13:05:19.064988000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-0/raft/dragonboat/db is 1181340644 WARN[0096] Dec 3 13:05:19.065015000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0096] Dec 3 13:05:19.069520000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 4.502988ms WARN[0108] Dec 3 13:05:30.851621000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0108] Dec 3 13:05:30.853430000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-2/raft/dragonboat/db is 1181517843 WARN[0108] Dec 3 13:05:30.853445000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0108] Dec 3 13:05:30.858827000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 5.371486ms WARN[0108] Dec 3 13:05:30.882149000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0108] Dec 3 13:05:30.884022000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-1/raft/dragonboat/db is 1181523364 WARN[0108] Dec 3 13:05:30.884039000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0108] Dec 3 13:05:30.888981000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 4.939675ms WARN[0108] Dec 3 13:05:30.960826000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0108] Dec 3 13:05:30.962559000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-0/raft/dragonboat/db is 1181526526 WARN[0108] Dec 3 13:05:30.962573000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0108] Dec 3 13:05:30.967921000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 5.337168ms WARN[0119] Dec 3 13:05:42.293330000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0119] Dec 3 13:05:42.295541000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-2/raft/dragonboat/db is 1181766366 WARN[0119] Dec 3 13:05:42.295559000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0119] Dec 3 13:05:42.302124000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 6.549879ms WARN[0120] Dec 3 13:05:42.644760000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0120] Dec 3 13:05:42.646794000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-1/raft/dragonboat/db is 1181772456 WARN[0120] Dec 3 13:05:42.646811000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0120] Dec 3 13:05:42.653431000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 6.609562ms WARN[0120] Dec 3 13:05:43.004613000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0120] Dec 3 13:05:43.006518000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-0/raft/dragonboat/db is 1181770872 WARN[0120] Dec 3 13:05:43.006536000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0120] Dec 3 13:05:43.012994000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 6.4517ms WARN[0131] Dec 3 13:05:53.794408000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0131] Dec 3 13:05:53.805212000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-2/raft/dragonboat/db is 1181893802 WARN[0131] Dec 3 13:05:53.805230000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0131] Dec 3 13:05:53.812846000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 7.598356ms WARN[0131] Dec 3 13:05:54.108714000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0131] Dec 3 13:05:54.110414000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-1/raft/dragonboat/db is 1181891839 WARN[0131] Dec 3 13:05:54.110427000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0131] Dec 3 13:05:54.118722000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 8.280904ms WARN[0131] Dec 3 13:05:54.219610000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0131] Dec 3 13:05:54.221433000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-0/raft/dragonboat/db is 1181907709 WARN[0131] Dec 3 13:05:54.221447000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0131] Dec 3 13:05:54.229209000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 7.749574ms WARN[0143] Dec 3 13:06:05.915555000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0143] Dec 3 13:06:05.917438000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-2/raft/dragonboat/db is 1182021337 WARN[0143] Dec 3 13:06:05.917452000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0143] Dec 3 13:06:05.926135000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 8.670892ms WARN[0144] Dec 3 13:06:06.855846000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0144] Dec 3 13:06:06.857667000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-0/raft/dragonboat/db is 1182018517 WARN[0144] Dec 3 13:06:06.857690000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0144] Dec 3 13:06:06.866358000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 8.66585ms WARN[0144] Dec 3 13:06:06.884854000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0144] Dec 3 13:06:06.904235000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-1/raft/dragonboat/db is 1182014067 WARN[0144] Dec 3 13:06:06.904258000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0144] Dec 3 13:06:06.913431000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 9.159226ms WARN[0154] Dec 3 13:06:17.334577000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0154] Dec 3 13:06:17.336469000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-2/raft/dragonboat/db is 1182152603 WARN[0154] Dec 3 13:06:17.336485000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0154] Dec 3 13:06:17.346031000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 9.534638ms WARN[0155] Dec 3 13:06:18.400891000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0155] Dec 3 13:06:18.401140000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0155] Dec 3 13:06:18.403155000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-1/raft/dragonboat/db is 1182153716 WARN[0155] Dec 3 13:06:18.403173000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0155] Dec 3 13:06:18.412343000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 9.158987ms WARN[0155] Dec 3 13:06:18.422037000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-0/raft/dragonboat/db is 1182149539 WARN[0155] Dec 3 13:06:18.422055000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0155] Dec 3 13:06:18.431830000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 9.758411ms WARN[0166] Dec 3 13:06:29.277614000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0166] Dec 3 13:06:29.279512000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-2/raft/dragonboat/db is 1182274677 WARN[0166] Dec 3 13:06:29.279527000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0166] Dec 3 13:06:29.289601000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 10.062588ms WARN[0167] Dec 3 13:06:30.147168000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0167] Dec 3 13:06:30.147482000 m/jkassis/[email protected]/logger.go: 192 Snapshot: Persist WARN[0167] Dec 3 13:06:30.149168000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-1/raft/dragonboat/db is 1182291227 WARN[0167] Dec 3 13:06:30.149187000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0167] Dec 3 13:06:30.149512000 om/jkassis/[email protected]/entry.go: 313 Snapshot: Persist: Dirsize for /var/tickie/cluster/local-server-0/raft/dragonboat/db is 1182291045 WARN[0167] Dec 3 13:06:30.149526000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: starting WARN[0167] Dec 3 13:06:30.159857000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 10.662336ms WARN[0167] Dec 3 13:06:30.159889000 om/jkassis/[email protected]/entry.go: 313 DBBadgerSnapshot.Backup: took 10.358491ms

Steps to reproduce the behavior
Why not choosing a naive WAL implementation for LogDB?

From the past issues I notice that RocksDB is selected as the raft log store, and others are temporarily not used due to the durability issues. However, as a multi-raft library, why not choose the naive log implementation instead of a KV engine? The innegligible overhead of transaction mechanism, as well as merge operations within RocksDB which often trigger writing stalls, could be avoided when using a naive WAL implementation. Most existing multi-raft database solutions also use RocksDB as the log store except for YugabyteDB, which claims the overwhelming performance compared to other competitive solutions——although it could not be conclude that it is the difference of log store that lead to the performance gap.

Stuck in rocksdb installing when `make install-rocksdb-ull`

Stuck in rocksdb installing when make install-rocksdb-ull,anyone help?

gzip: stdin: unexpected end of file
rocksdb-5.13.4/monitoring/instrumented_mutex.cc
rocksdb-5.13.4/monitoring/instrumented_mutex.h
rocksdb-5.13.4/monitoring/iostats_context.cc
rocksdb-5.13.4/monitoring/iostats_context_imp.h
rocksdb-5.13.4/monitoring/iostats_context_test.cc
rocksdb-5.13.4/monitoring/perf_context.cc
rocksdb-5.13.4/monitoring/perf_context_imp.h
rocksdb-5.13.4/monitoring/perf_level.cc
rocksdb-5.13.4/monitoring/perf_level_imp.h
rocksdb-5.13.4/monitoring/perf_step_timer.h
rocksdb-5.13.4/monitoring/statistics.cc
rocksdb-5.13.4/monitoring/statistics.h
rocksdb-5.13.4/monitoring/statistics_test.cc
rocksdb-5.13.4/monitoring/thread_status_impl.cc
rocksdb-5.13.4/monitoring/thread_status_updater.cc
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
make: *** [get-rocksdb] Error 2

Originally posted by @funlake in https://github.com/lni/dragonboat/issues/4#issuecomment-453404053

mem never come down

every entry is 4M []byte , ondiskStateMachine while my app is working , the mem is growing util 100%. the mem never come down although stop the work of the app, I have to kill the app to release the mem. this is the mem pprof below,

(pprof) top
Showing nodes accounting for 10062.18MB, 99.01% of 10163.01MB total
Dropped 49 nodes (cum <= 50.82MB)
Showing top 10 nodes out of 29
      flat  flat%   sum%        cum   cum%
 8297.58MB 81.64% 81.64%  8297.58MB 81.64%  github.com/lni/dragonboat/v3.prepareProposalPayload
 1026.76MB 10.10% 91.75%  1026.76MB 10.10%  github.com/lni/dragonboat/v3/internal/logdb.newRDBContext
     512MB  5.04% 96.79%      512MB  5.04%  github.com/lni/dragonboat/v3/internal/transport.NewTCPConnection
     204MB  2.01% 98.79%      204MB  2.01%  bytes.makeSlice
   21.34MB  0.21% 99.00%   225.34MB  2.22%  encoding/json.Marshal
    0.50MB 0.0049% 99.01%  8523.92MB 83.87%  main.wwork
         0     0% 99.01%      204MB  2.01%  bytes.(*Buffer).Write
         0     0% 99.01%      204MB  2.01%  bytes.(*Buffer).grow
         0     0% 99.01%      204MB  2.01%  encoding/base64.(*encoder).Write
         0     0% 99.01%      204MB  2.01%  encoding/json.(*encodeState).marshal
(pprof) list github.com/lni/dragonboat/v3.prepareProposalPayload
Total: 9.92GB
ROUTINE ======================== github.com/lni/dragonboat/v3.prepareProposalPayload in /home/test/go/src/cfs/cfs/pkg/mod/github.com/lni/dragonboat/[email protected]/requests.go
    8.10GB     8.10GB (flat, cum) 81.64% of Total
         .          .   1106:           delete(p.pending, key)
         .          .   1107:   }
         .          .   1108:}
         .          .   1109:
         .          .   1110:func prepareProposalPayload(cmd []byte) []byte {
    8.10GB     8.10GB   1111:   dst := make([]byte, len(cmd))
         .          .   1112:   copy(dst, cmd)
         .          .   1113:   return dst
         .          .   1114:}
(pprof)

it's a bug or something wrong in my code ?

syncPropose always return ErrTimeout
Note: for reported bugs, please fill in the following details. bug reports without detailed steps on how to reproduce will be automatically closed.

we are getting calls to NodeHost.syncPropose() return ErrTimeout, if the call is retried it returns ErrTimeout again and never succeeds, it doesn't matter how many times it is retried.

func (e *Engine) syncPropose(cs *client.Session, cmd []byte, market *Market) ([]byte, error) { for { ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) result, err := e.nh.SyncPropose(ctx, cs, cmd) cancel() if err != nil { if err == dragonboat.ErrTimeout { log.Warnf("syncPropose ErrTimeout..., times:%d, market:%+v", times, market) continue } } } }

Dragonboat version

v3.3.5

Expected behavior

syncPropose return ErrTimeout within 10 minutes or shorter

Actual behavior

syncPropose always return ErrTimeout more than 2 days.

Steps to reproduce the behavior
IPv6 support

Seems like today library only supports IPv4 and there is an explicit regex validator for transport that won't let one override IPv6 binding. Can we make changes to support IPv6 or at least have a documented work around for everyone?

Removing leader node

ErrClusterClosed when removing leader, but the node is removed We use backoff on SyncRequestDeleteNode and second attempt fails with ErrClusterNotFound.

It looks like deleting a node is faster than getting a response to a request and exiting the request happens with event Terminated. Reproduced very rarely.

Logs:

msg="[12321:00001] applied REMOVE ccid 0 (79), n00001" component=dragonboat pkg=rsm
msg="[f:1,l:79,t:3,c:79,a:78] [12321:00001] t3 became follower" component=dragonboat pkg=raft
msg="[12321:00001] applied ConfChange Remove for itself" component=dragonboat pkg=dragonboat
msg="leader updated" clid=12321 component=raft leader=0 mds=1 method=LeaderUpdated term=3
msg="raft cluster already closed" clid=12321 component=raft mds=1 <-- first attempt (ErrClusterClosed)
msg="failed to disjoin: failed to delete raft member: cluster not found" component=api <-- out on second attempt (ErrClusterNotFound)

Dragonboat version

v3.3.5

RSM close called twice

The flag was set in the rsm close handler, the close sometimes happens twice.
At the second closing, we throw out the panic:

github.com/lni/dragonboat/v3/internal/rsm.(*OnDiskStateMachine).Close(0xc00032c000)
	/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/rsm/adapter.go:338 +0x43
github.com/lni/dragonboat/v3/internal/rsm.(*NativeSM).Close(0xc0004260d0)
	/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/rsm/managed.go:150 +0x4a
github.com/lni/dragonboat/v3/internal/rsm.(*StateMachine).Close(...)
	/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/rsm/statemachine.go:233
github.com/lni/dragonboat/v3.(*node).destroy(...)
	/go/pkg/mod/github.com/lni/dragonboat/[email protected]/node.go:520
github.com/lni/dragonboat/v3.(*closeWorker).handle(...)
	/go/pkg/mod/github.com/lni/dragonboat/[email protected]/engine.go:777
github.com/lni/dragonboat/v3.(*closeWorker).workerMain(0xc000139520)
	/go/pkg/mod/github.com/lni/dragonboat/[email protected]/engine.go:766 +0x86
github.com/lni/dragonboat/v3.newCloseWorker.func1()
	/go/pkg/mod/github.com/lni/dragonboat/[email protected]/engine.go:755 +0x31
github.com/lni/goutils/syncutil.(*Stopper).runWorker.func1()
	/go/pkg/mod/github.com/lni/[email protected]/syncutil/stopper.go:79 +0x12f
created by github.com/lni/goutils/syncutil.(*Stopper).runWorker
	/go/pkg/mod/github.com/lni/[email protected]/syncutil/stopper.go:74 +0x19

Dragonboat version

v3.3.1

Steps to reproduce the behavior

Sometimes when closing RSM

Snapshot save error

panic: /home/user/repos/storage-69397425/3/raft/dev/00000000000000000001/snapshot-part-1/snapshot-1-3 doesn't exist when creating /home/user/repos/storage-69397425/3/raft/dev/00000000000000000001/snapshot-part-1/snapshot-1-3/snapshot-00000000000003E9-3.generating

goroutine 350 [running]:
github.com/lni/dragonboat/v3/internal/fileutil.Mkdir({0xc004358280, 0x97}, {0x1639918, 0x1d79520})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/fileutil/utils.go:122 +0x2dc
github.com/lni/dragonboat/v3/internal/server.(*SSEnv).createDir(0xc01f9486f0, {0xc004358280, 0x97})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/server/snapshotenv.go:251 +0x86
github.com/lni/dragonboat/v3/internal/server.(*SSEnv).CreateTempDir(0xc01f9486f0)
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/server/snapshotenv.go:200 +0x45
github.com/lni/dragonboat/v3.(*snapshotter).Save(_, {_, _}, {0x3, 0x3e9, 0x169, 0x3e9, {0x0, 0x0, {0x0, ...}, ...}, ...})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/snapshotter.go:104 +0x125
github.com/lni/dragonboat/v3/internal/rsm.(*StateMachine).doSave(_, {0x3, 0x3e9, 0x169, 0x3e9, {0x0, 0x0, {0x0, 0x0}, 0x0, ...}, ...})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/rsm/statemachine.go:802 +0x193
github.com/lni/dragonboat/v3/internal/rsm.(*StateMachine).concurrentSave(_, {_, _, {_, _}, _, _})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/rsm/statemachine.go:758 +0x358
github.com/lni/dragonboat/v3/internal/rsm.(*StateMachine).Save(_, {_, _, {_, _}, _, _})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/internal/rsm/statemachine.go:509 +0x2a5
github.com/lni/dragonboat/v3.(*node).doSave(0xc000420800, {0x0, 0x0, {0x0, 0x0}, 0x0, 0x0})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/node.go:705 +0x2d6
github.com/lni/dragonboat/v3.(*node).save(0xc000420800, {0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, ...})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/node.go:684 +0x7b
github.com/lni/dragonboat/v3.(*ssWorker).save(0xc0003a9f60, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, ...}, ...})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/engine.go:296 +0x78
github.com/lni/dragonboat/v3.(*ssWorker).handle(0xc0003a9f60, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, ...}, ...})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/engine.go:279 +0xba
github.com/lni/dragonboat/v3.(*ssWorker).workerMain(0xc0003a9f60)
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/engine.go:265 +0x1bb
github.com/lni/dragonboat/v3.newSSWorker.func1()
        /home/user/go/pkg/mod/github.com/lni/dragonboat/[email protected]/engine.go:251 +0x25
github.com/lni/goutils/syncutil.(*Stopper).runWorker.func1()
        /home/user/go/pkg/mod/github.com/lni/[email protected]/syncutil/stopper.go:79 +0x173
created by github.com/lni/goutils/syncutil.(*Stopper).runWorker
        /home/user/go/pkg/mod/github.com/lni/[email protected]/syncutil/stopper.go:74 +0x133

Dragonboat version

v3.3.1

Steps to reproduce the behavior

Couldn't reproduce again

A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / 中文版

News

About

Features

Performance

Requirements

Getting Started

Documents

Examples

Status

Contributing

License

Owner

lni

Comments

Reducing Memory

Dragonboat version

Expected behavior

Actual behavior

Steps to reproduce the behavior

IRaftEventListener.LeaderUpdated Called Frequently

Badger support as a pure-go alternative to RocksDB

fsync issues

raft: introduce DisableProposalForwarding option

module declares its path as: github.com/cockroachdb/pebble but was required as: github.com/petermattis/pebble

Dragonboat version

Expected behavior

Actual behavior

Steps to reproduce the behavior

gorocksdb core dump on Raspberry Pi 4's BCM2711 processor

Dragonboat version

Expected behavior

Actual behavior

Steps to reproduce the behavior

WAL Log Not Compacting After Snapshot

Dragonboat version

Expected behavior

Actual behavior

Steps to reproduce the behavior

Why not choosing a naive WAL implementation for LogDB?

Stuck in rocksdb installing when `make install-rocksdb-ull`

mem never come down

syncPropose always return ErrTimeout

Dragonboat version

Expected behavior

Actual behavior

Steps to reproduce the behavior

IPv6 support

Removing leader node

Dragonboat version

RSM close called twice

Dragonboat version

Steps to reproduce the behavior

Snapshot save error

Dragonboat version

Steps to reproduce the behavior

Related tags

6.824-Raft - There are three roles in Raft algorithm, Follower, Candidate, Leader, each node store currentTerm, votedFor and log

Easy to use Raft library to make your app distributed, highly available and fault-tolerant

A distributed key-value storage system developed by Alibaba Group

A linearizability distributed database by raft and wisckey.

An implementation of a distributed KV store backed by Raft tolerant of node failures and network partitions 🚣

Distributed disk storage database based on Raft and Redis protocol.

Golang implementation of the Raft consensus protocol

A distributed MySQL binlog storage system built on Raft

The TinyKV course builds a key-value storage system with the Raft consensus algorithm.

A naive implementation of Raft consensus algorithm.

Raft: a consensus algorithm for managing a replicated log

This is my implementation of Raft consensus algorithm that I did for own learning.

High performance, distributed and low latency publish-subscribe platform.

💡 A Distributed and High-Performance Monitoring System. The next generation of Open-Falcon

short-url distributed and high-performance

High-Performance server for NATS, the cloud native messaging system.

Collection of high performance, thread-safe, lock-free go data structures

Simple, fast and scalable golang rpc library for high load

A zero cost, faster multi-language bidirectional microservices framework in Go, like alibaba Dubbo, but with more features, Scale easily. Try it. Test it. If you feel it's better, use it! 𝐉𝐚𝐯𝐚有𝐝𝐮𝐛𝐛𝐨, 𝐆𝐨𝐥𝐚𝐧𝐠有𝐫𝐩𝐜𝐱!