An implementation of a distributed KV store backed by Raft tolerant of node failures and network partitions 🚣

barge

A simple implementation of a consistent, distributed Key:Value store which uses the Raft Concensus Algorithm.

This project launches a cluster of raft nodes that run as concurrent state machines. A client interface is also provided for finding the leader and sending data to the cluster. Each node is equipped with gRPC clients/servers for communication with the client interface and other nodes.

The cluster is tolerant to node failiures and network partitions (a cluster of N can support ⌈N/2βŒ‰ - 1 failures). The cluster is smart enough to elect a new leader if needed, and the client is smart enough to find the leader.

Interesting Events

This project offers a debug flag which creates a log.html file that contains a table of cluster events. The log used to generate the report below is included as log-heartbeats-purged.html. It contains the data for all the interesting events below, but 8000+ lines of heartbeat events have been manually removed to keep the file short.

1. Initial Leader Election

Initial Leader Election All the nodes start as followers without any data. Node 1's randomized election timer runs out first, so it increments its term starts an election. Nodes 0 and 2 grant their votes to node 1, and node 1 votes for itself. Now, all nodes are in term 1 with node 1 as the leader.

2. Heartbeats

Header Heartbeats As the leader, node 1 has the duty to to indicate that there is a leader in the cluster. It sends empty, idempotent requests to append data to each node in the cluster to indicate that it is functional. No more elections start because each follower node resets its election timer upon receiving these heartbeats.

3. Updating Data in the Cluster

Header Updating Data in the Cluster 0 Node 1, the leader, receives data {key: 100, value: "Some Value"} from the client. It tells nodes 1 and 2 to append that change to their logs. Because both succeed, node 0 knows the entry is successfully duplicated on the majority of servers, so it commits the log entry.

Updating Data in the Cluster 1

In the following heartbeat, the node 1 tells its followers that it has committed upto the last entry, so the nodes 0 and 2 commit their log entries locally.

4. Follower Node Failure

Header Follower Node Failure 0 Node 0 dies and does not recieve heartbeats from node 1.

Follower Node Failure 1 Node 1 gets a request to add a value {key: 200}. Although it cannot send it to node 0, it successfully sends the entry to node 2. It commits it knowing that is duplicated on the majority of nodes.

Follower Node Failure 2 Node 0 comes back to life and starts receiving heartbeats again.

Follower Node Failure 3 Node 1 gets a request to add another value {key: 300}. When sending it to node 0, it sees that node 0 missed the previous value {key: 200} while it was dead. Node 1 catches up node 0 by giving it log entries to append. On the following heartbeat, all the nodes see that the leader has committed up to commit up to {key: 300}, so they commit any missed log entries up to and including the entry with {key: 300}.

5. Leader Node Failure and Re-Election

Header Leader Node Failure and Re-Election 0 Here, the leader node dies. Node 2 starts an election for itself. Node 2 fails to get elected as leader. Most likely, this is because node 0 started an election at roughly the same time and voted for itself. Node starts and election and succeeds, getting a vote from node 2 and itself.

Leader Node Failure and Re-Election 1 As the new leader, node 0 gets a new entry, modifying {key:200}. As the leader, it duplicates the entry onto node 2 and commits it. Node 2 commits it after the following heartbeat.

Leader Node Failure and Re-Election 2 Node 1 comes back online and starts getting heartbeats again.

Leader Node Failure and Re-Election 3 Node 0 gets the request to add an entry to modify {key:300}. When sending it to node 1, it sees that node 1 is out of date, so it catches up node 1. Node 1 appends the entry it missed {key:200} and the new entry {key:300} to its log. Node 2 appends the new entry to its log. Node 0 commits its newest entry, knowing that it was duplicated on the majority of servers (all of them in this case). During the next heartbeat, nodes 1 and 2 see that the leader has committed up to its newest entry {key:300}, so they commit all uncommitted entries up to that.

Resources

Usage

The configuration.yml file defines the networking configs needed to run the simulation.

To launch the cluster with html log: go build && ./barge -debug

To launch the client with: go build && ./barge -client

Client Commands:

  1. {key}={value} Set int32 key to string value
  2. kill {nodeId} Kill a node with a given id (must be a node id from the config file)
  3. revive {nodeId} Restore a dead node to the follower state

Development

Download Deps go mod download

Generate Protocol Buffer for node <-> node RPCs protoc -I node/ node/raft.proto --go_out=plugins=grpc:node

Generate Protocol Buffer for client <-> cluster RPCs protoc -I api/ api/api.proto --go_out=plugins=grpc:api

Formatting
go fmt ./...

Owner
Shehjad Khan
Software & ML Engineer. Mathematics @uWaterloo.
Shehjad Khan
Similar Resources

Scalable, fault-tolerant application-layer sharding for Go applications

ringpop-go (This project is no longer under active development.) Ringpop is a library that brings cooperation and coordination to distributed applicat

Jan 5, 2023

Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed-Services This project is essentially a result of my attempt to under

Jun 1, 2022

Golang implementation of the Raft consensus protocol

raft raft is a Go library that manages a replicated log and can be used with an FSM to manage replicated state machines. It is a library for providing

Jan 9, 2023

A naive implementation of Raft consensus algorithm.

This implementation is used to learn/understand the Raft consensus algorithm. The code implements the behaviors shown in Figure 2 of the Raft paper wi

Dec 3, 2021

This is my implementation of Raft consensus algorithm that I did for own learning.

This is my implementation of Raft consensus algorithm that I did for own learning. Please follow the link to learn more about raft consensus algorithm https://raft.github.io. And Soon, I will be developing same algorithm in Java as well

Jan 12, 2022

A feature complete and high performance multi-group Raft library in Go.

A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / δΈ­ζ–‡η‰ˆ News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

Dec 30, 2022

MySQL Backed Locking Primitive

go-mysql-lock go-mysql-lock provides locking primitive based on MySQL's GET_LOCK Lock names are strings and MySQL enforces a maximum length on lock na

Dec 21, 2022

The TinyKV course builds a key-value storage system with the Raft consensus algorithm.

The TinyKV course builds a key-value storage system with the Raft consensus algorithm.

The TinyKV Course The TinyKV course builds a key-value storage system with the Raft consensus algorithm. It is inspired by MIT 6.824 and TiKV Project.

Nov 19, 2021

Raft: a consensus algorithm for managing a replicated log

Raft Consensus Algorithm Raft is a consensus algorithm for managing a replicated

Dec 20, 2021
Easy to use Raft library to make your app distributed, highly available and fault-tolerant
Easy to use Raft library to make your app distributed, highly available and fault-tolerant

An easy to use customizable library to make your Go application Distributed, Highly available, Fault Tolerant etc... using Hashicorp's Raft library wh

Nov 16, 2022
Raft library Raft is a protocol with which a cluster of nodes can maintain a replicated state machine.

Raft library Raft is a protocol with which a cluster of nodes can maintain a replicated state machine. The state machine is kept in sync through the u

Oct 15, 2021
Dkron - Distributed, fault tolerant job scheduling system https://dkron.io
Dkron - Distributed, fault tolerant job scheduling system https://dkron.io

Dkron - Distributed, fault tolerant job scheduling system for cloud native environments Website: http://dkron.io/ Dkron is a distributed cron service,

Dec 28, 2022
A distributed fault-tolerant order book matching engine
A distributed fault-tolerant order book matching engine

Go - Between A distributed fault-tolerant order book matching engine. Features Limit orders Market orders Order book depth Calculate market price for

Dec 24, 2021
A linearizability distributed database by raft and wisckey.

AlfheimDB A linearizability distributed database by raft and wisckey, which supports redis client. Build This project build by mage, you will need ins

Jul 18, 2022
Distributed disk storage database based on Raft and Redis protocol.
Distributed disk storage database based on Raft and Redis protocol.

IceFireDB Distributed disk storage system based on Raft and RESP protocol. High performance Distributed consistency Reliable LSM disk storage Cold and

Dec 31, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

Dec 30, 2022
A distributed MySQL binlog storage system built on Raft
A distributed MySQL binlog storage system built on Raft

What is kingbus? δΈ­ζ–‡ Kingbus is a distributed MySQL binlog store based on raft. Kingbus can act as a slave to the real master and as a master to the sl

Dec 31, 2022
Lightweight, fault-tolerant message streams.
Lightweight, fault-tolerant message streams.

Liftbridge provides lightweight, fault-tolerant message streams by implementing a durable stream augmentation for the NATS messaging system. It extend

Jan 2, 2023