A course to build distributed key-value service based on TiKV model

The TinyKV Course

This is a series of projects on a key-value storage system built with the Raft consensus algorithm. These projects are inspired by the famous MIT 6.824 course but aim to be closer to industry implementations. The whole course is pruned from TiKV and re-written in Go. After completing this course, you will have the knowledge to implement a horizontally scalable, highly available, key-value storage service with distributed transaction support and a better understanding of TiKV implementation.

The whole project is a skeleton code for a kv server and a scheduler server at the beginning, and you need to finish the core logic step by step:

  • Project1: build a standalone key-value server
  • Project2: build a highly available key-value server with Raft
  • Project3: support multi Raft group and balance scheduling on top of Project2
  • Project4: support distributed transaction on top of Project3

Important note: This course is still under development, and the documentation is incomplete. Any feedback and contribution is greatly appreciated. Please see help wanted issues if you want to join in the development.

Course

Here is a reading list for the knowledge of distributed storage system. Though not all of them are highly related with this course, they can help you construct the knowledge system in this field.

Also, you’d better read the overview of TiKV and PD's design to get a general impression on what you will build:

Getting started

First, please clone the repository with git to get the source code of the project.

git clone https://github.com/pingcap-incubator/tinykv.git

Then make sure you have go >= 1.13 toolchains installed. You should also have make installed. Now you can run make to check that everything is working as expected. You should see it runs successfully.

Overview of the code

overview

Similar to the architecture of TiDB + TiKV + PD that separates the storage and computation, TinyKV only focuses on the storage layer of a distributed database system. If you are also interested in the SQL layer, please see TinySQL. Besides that, there is a component called TinyScheduler acting as a center control of the whole TinyKV cluster, which collects information from the heartbeats of TinyKV. After that, the TinyScheduler can generate some scheduling tasks and distribute them to the TinyKV instances. All of them are communicated via RPC.

The whole project is organized into the following directories:

  • kv: implementation of the TinyKV key/value store.
  • proto: all communication between nodes and processes uses Protocol Buffers over gRPC. This package contains the protocol definitions used by TinyKV, and the generated Go code that you can use.
  • raft: implementation of the Raft distributed consensus algorithm, which is used in TinyKV.
  • scheduler: implementation of the TinyScheduler which is responsible for managing TinyKV nodes and generating timestamps.
  • log: log utility to output log based on level.

Course material

Please follow the course material to learn the background knowledge and finish code step by step.

Deploy to a cluster

After you finish the whole implementation, it becomes runnable. You can try TinyKV by deploying it onto a real cluster, and interact with it through TinySQL.

Build

make

It builds the binary of tinykv-server and tinyscheduler-server to bin dir.

Run

Put the binary of tinyscheduler-server, tinykv-server and tinysql-server into a single dir.

Under the binary dir, run the following commands:

mkdir -p data
./tinyscheduler-server
./tinykv-server -path=data
./tinysql-server --store=tikv --path="127.0.0.1:2379"

Play

mysql -u root -h 127.0.0.1 -P 4000
Owner
TiDB Incubator
TiDB Incubator
TiDB Incubator
Comments
  • Scope/requirements

    Scope/requirements

    I would like to define what is in and out of scope for the initial iteration of TinyKV. I'll start to make a list here, please comment if I got something wrong or if there are other things which should be in or out.

    cc @siddontang, @Connor1996

    In scope

    • [x] Test framework
    • [x] Distribution using Raft (Raft implemented as part of TinyKV, not using a library)
    • [x] Raw and transactional API
    • [x] Optimistic transactions (Percolator-style)
    • [x] PD to balance regions and provide timestamps
    • [x] Interoperable with TinySQL

    Out of scope

    • Any kind of engine abstraction
    • Debug or diagnostics services (for development only)
    • Metrics/statistics
    • Any GC
    • Pessimistic transactions
    • backup/restore; sst import
    • profiler
    • status server
    • coprocessor
    • batched commands
    • security
    • interoperable with real TiDB and real PD
  • 2b tests consume too much memory

    2b tests consume too much memory

    The memory usage of make project2b grows from 2GiB to 12GiB on my machine. It seems that a test like BasicTets won't release memory even after it passes. So the memory usage just grows monotonically. Also, splitting up the test by manually running go test won't take that much memory. Is there a potential memory leak?

    OS: Arch Linux Kernel: 5.10.10-arch1-1

  • `TestHandleHeartbeat2AA`  add test case

    `TestHandleHeartbeat2AA` add test case

    fix #191

    test 1: commit index and log term match, update committed test 2: match but decrease, don't update test 3: index 3's term not match, don't update test 4: committed index 3's term can be either 2 or 3, don't update

  • Question about handling KvPrewrite and KvCommit

    Question about handling KvPrewrite and KvCommit

    I had read this post https://pingcap.com/blog-cn/tikv-source-code-reading and this https://pingcap.com/blog-cn/tikv-source-code-reading-11/, they suggest that TiKV will

    1. latch every key in the KvPrewrite/KvCommit
    2. call RaftKV.async_snapshot() (i.e. propose a RaftSnapCmd) to get a snapshot of the state machine
    3. when the RaftSnapCmd is applied, read the keys from the snapshot and call the callback passed to RaftKV.async_snapshot()
    4. call RaftKV.async_write() (i.e. propose a RaftWriteCmd) to write the keys

    From the comment in https://github.com/tidb-incubator/tinykv/blob/course/kv/transaction/latches/latches.go, I can tell this is also the recommended way to implement KvPrewrite and KvCommit in TinyKV.

    Here is the question: Instead of dividing the procedure into two phases(read and write), why not use a single raft command and leverage badger.Txn, like this:

    1. propose a RaftPrewriteCmd
    2. when the RaftPrewriteCmd is applied, start a badger.Txn and do the read and write in the Txn

    In this way, we don't need latch any more and we cut half of the traffic.

  • running `go mod tidy` or `make project1` gives me an error

    running `go mod tidy` or `make project1` gives me an error

    go version go1.14.3 linux/amd64 I receive the following error when I run any of the above commands:

    go: github.com/pingcap/[email protected] requires github.com/pingcap-incubator/[email protected]: invalid version: unknown revision f660f803e910

  • Add split check test

    Add split check test

    • simplify split check logic
    • add split check test
    • fix ExceedEndKey
    • move kv/rowcodec to kv/coprocessor/rowcodec
    • extract EncodeKey and DecodeUserKey to kv/util/codec
    • move engine_util to kv/util/engine_util
  • fix test bug

    fix test bug

    func TestPeerStorageTerm(t *testing.T) {
    	ents := []eraftpb.Entry{
    		newTestEntry(3, 3), newTestEntry(4, 4), newTestEntry(5, 5), // <<<<<<<-------------------------
    	}
    	tests := []struct {
    		idx  uint64
    		term uint64
    		err  error
    	}{
    		{2, 0, raft.ErrCompacted},
    		{3, 3, nil},
    		{4, 4, nil},
    		{5, 5, nil},
    	}
    	for _, tt := range tests {
    		peerStore := newTestPeerStorageFromEnts(t, ents)
    		term, err := peerStore.Term(tt.idx)
    		if err != nil {
    			assert.Equal(t, tt.err, err)
    		} else {
    			assert.Equal(t, tt.term, term)
    		}
    		cleanUpTestData(peerStore)
    	}
    }
    

    In this test, entries [3, 3], [4, 4], [5, 5] were passed to init the peerstorage. But in fact, all newly created peerstorage have entries index & term starting with 5.

    I think the writer of this test function assumes that all entries slices pass to the ps.Append function are committed by the Raft group, but instead, I checked the entry index in the append function and refused it, so i failed the test.

    I think it is better to change the hard-encoded entries to [6, 6], [7, 7], [8, 8].

  • Confusion about election

    Confusion about election

    Why does candidate need to become follower after election failure?

    func TestLeaderElectionOverwriteNewerLogs2AB(t *testing.T) {
    	cfg := func(c *Config) {
    		c.peers = idsBySize(5)
    	}
    	// This network represents the results of the following sequence of
    	// events:
    	// - Node 1 won the election in term 1.
    	// - Node 1 replicated a log entry to node 2 but died before sending
    	//   it to other nodes.
    	// - Node 3 won the second election in term 2.
    	// - Node 3 wrote an entry to its logs but died without sending it
    	//   to any other nodes.
    	//
    	// At this point, nodes 1, 2, and 3 all have uncommitted entries in
    	// their logs and could win an election at term 3. The winner's log
    	// entry overwrites the losers'. (TestLeaderSyncFollowerLog tests
    	// the case where older log entries are overwritten, so this test
    	// focuses on the case where the newer entries are lost).
    	n := newNetworkWithConfig(cfg,
    		entsWithConfig(cfg, 1),     // Node 1: Won first election
    		entsWithConfig(cfg, 1),     // Node 2: Got logs from node 1
    		entsWithConfig(cfg, 2),     // Node 3: Won second election
    		votedWithConfig(cfg, 3, 2), // Node 4: Voted but didn't get logs
    		votedWithConfig(cfg, 3, 2)) // Node 5: Voted but didn't get logs
    
    	// Node 1 campaigns. The election fails because a quorum of nodes
    	// know about the election that already happened at term 2. Node 1's
    	// term is pushed ahead to 2.
    	n.send(pb.Message{From: 1, To: 1, MsgType: pb.MessageType_MsgHup})
    	sm1 := n.peers[1].(*Raft)
    	if sm1.State != StateFollower {
    		t.Errorf("state = %s, want StateFollower", sm1.State)
    	}
    	if sm1.Term != 2 {
    		t.Errorf("term = %d, want 2", sm1.Term)
    	}
    
    	// Node 1 campaigns again with a higher term. This time it succeeds.
    	n.send(pb.Message{From: 1, To: 1, MsgType: pb.MessageType_MsgHup})
    	if sm1.State != StateLeader {
    		t.Errorf("state = %s, want StateLeader", sm1.State)
    	}
    	if sm1.Term != 3 {
    		t.Errorf("term = %d, want 3", sm1.Term)
    	}
    
    	// Now all nodes agree on a log entry with term 1 at index 1 (and
    	// term 3 at index 2).
    	for i := range n.peers {
    		sm := n.peers[i].(*Raft)
    		entries := sm.RaftLog.entries
    		if len(entries) != 2 {
    			t.Fatalf("node %d: len(entries) == %d, want 2", i, len(entries))
    		}
    		if entries[0].Term != 1 {
    			t.Errorf("node %d: term at index 1 == %d, want 1", i, entries[0].Term)
    		}
    		if entries[1].Term != 3 {
    			t.Errorf("node %d: term at index 2 == %d, want 3", i, entries[1].Term)
    		}
    	}
    }
    
  • Snapshot will always unavailable if error occur when generate snapshot

    Snapshot will always unavailable if error occur when generate snapshot

    https://github.com/pingcap-incubator/tinykv/blob/d13ccb8785702ecdf17368bf2350f7f82c58cab5/kv/raftstore/runner/region_task.go#L81-L88 https://github.com/pingcap-incubator/tinykv/blob/d13ccb8785702ecdf17368bf2350f7f82c58cab5/kv/raftstore/peer_storage.go#L157-L162

    ps.snapState.Receiver will never recive msg if error happend. Snapshot method will always return ErrSnapshotTemporarilyUnavailable

  • `TestRawNodeRestart2AC` & `TestRawNodeRestartFromSnapshot2C` expected result don't have softstate and hardstate

    `TestRawNodeRestart2AC` & `TestRawNodeRestartFromSnapshot2C` expected result don't have softstate and hardstate

    DeepEqual will fail because want and rd differ in softstate and hardstate.

    maybe we need something like

    want := Ready{
    		&SoftState{
    			Lead:      None,
    			RaftState: StateFollower,
    		},
    		pb.HardState{
    			Term:   st.Term,
    			Commit: st.Commit,
    		},
    		[]pb.Entry{},
    		pb.Snapshot{},
    		// commit up to commit index in st
    		entries[:st.Commit],
    		make([]pb.Message, 0), // Messages []pb.Message should be empty slice or nil?
    	}
    
  • How about renaming `InnerServer` to `Storage`

    How about renaming `InnerServer` to `Storage`

    InnerServer actually gives some operations on the underlying storage, maybe we can rename InnerServer and DBReader to a meaningful name, like Storage and StorageReader

  • test-case疑问

    test-case疑问

    https://github.com/talent-plan/tinykv/blob/f251031837db8c286972568cd5108f5f694e98cb/kv/server/server_test.go#L199

    这个测试用例是不是应该改为判断err是否等于KeyNotFound,因为前面已经执行过删除了,所以此时再去查询应该是key不存在的。

  • Would transaction order conflict with apply order?

    Would transaction order conflict with apply order?

    Suppose we have two transactions T1 and T2. And T1 has a smaller start_ts . 1 .T1 modify key1 and start a 2pc. 2. T2 read key1 and blocked by T1. But in raftlog 2 is ahead of 1, so applying raftlog orderly will result in a deadlock. Am I miss something? Can I avoid this without using readindex/lease read?

  • Update TestProvideSnap2C in raft_test.go

    Update TestProvideSnap2C in raft_test.go

    handleSnapshot only affects states in raft, not affects states in storage. More specifically, the snapshot in memory storage will be as it is after handleSnapshot. As a result, when leader is willing to send a snapshot to follower, it is the empty snapshot being sent. However, in my implementation, such a sending will be aborted since the snapshot is empty. Therefore, this test fails. Sending an empty snapshot is definitely shall be rejected, and I update the test case to not confuse ones like me.

  • Update test_test.go

    Update test_test.go

    Update error info. Without client, the error info is something like <client_id> missing element ... which seems to report that there're <client_id> missing element.

TinySQL is a course designed to teach you how to implement a distributed relational database in Go

TinySQL TinySQL is a course designed to teach you how to implement a distributed relational database in Go. TinySQL is also the name of the simplifed

Nov 7, 2021
Labs from MIT's graduate-level Distributed Systems course

Labs from MIT's graduate-level Distributed Systems course Course website here Lab 1: MapReduce Lab 2: Raft Consensus Algorithm Lab 2A: Raft Leader Ele

Jun 20, 2022
This slide deck and supporting material is part of the Introduction to Go training course by Dave Cheney

This slide deck and supporting material is part of the Introduction to Go training course by Dave Cheney.

Nov 14, 2022
Crash Course about the programming language Go / Golang.
Crash Course about the programming language Go / Golang.

Crash Course about the programming language Go / Golang. In this course I have covered some important concepts and topics in programming.

Oct 10, 2022
This is from the udemy course: Go: The Complete Developer's Guide (Golang)

Go Udemy course - "Go: The Complete Developer's Guide (Golang)" How to run the file: go run hello-world.go go run <filename>.go GO CLI commands: go ru

Oct 22, 2021
Implementation diploma work for YANDEX course "GO Musthave"

go-musthave-diploma-tpl Шаблон репозитория для индивидуального дипломного проекта курса "Самостоятельный Go-разработчик" Начало работы Склонируйте реп

Apr 12, 2022
This is the repository for the LinkedIn Learning course Learning Go.
This is the repository for the LinkedIn Learning course Learning Go.

Learning Go This is the repository for the LinkedIn Learning course Learning Go. The full course is available from LinkedIn Learning. What is Go? Go i

Nov 2, 2021
Repo for the final proyect in the internal golang course in nearshore

golang_course Repo for the final proyect in the internal golang course in nearshore Basicamente, insert y lee el uuid insertado en una tabla de AWS Am

Nov 11, 2021
A Udemy course on how to create an industry standard REST API

go-rest-industry-standard This repository is for a Udemy course on how to create an industry standard REST API. It applies the MVC pattern, routing, a

Nov 23, 2021
A combination of work from docs and a udemy course

Learning Go Some scrath work while learning go Motivation I think I need a bit of a break from dynamically typed scripting langauges. I've enjoyed pla

Nov 25, 2021
Walkthrough of Course Project

CourseProj1660Final Walkthrough of How I Complete Project Option 1 for 1660 Build Main App I built the main application by altering the example applic

Nov 29, 2021
Learn the Go programming language (Golang) in this step-by-step tutorial course for beginners

Learn the Go programming language (Golang) in this step-by-step tutorial course for beginners. Go is an open source programming language designed at Google that makes it easy to build simple, reliable, and efficient software.

Dec 16, 2021
Go: The Complete Developer's Guide (Golang) Udemy Course by Stephen Grider

Go-The-Complete-Developers-Guide Go Command line tools 1. go build - compiles a bunch of go source code files go build

Dec 29, 2021
Ssba - Sandbox for exercises from Software Systems: Behind the Abstractions course

Software Systems: Behind the Abstractions This is my messy exercise sandbox for the Bradfield course Software Systems: Behind the Abstractions. The go

Aug 23, 2022
Rps-game-in-go - Learn Go for Beginners Crash Course (Golang)

rps-game-in-go This rock-paper-scissors game was based on the Udemy course "Lear

Mar 20, 2022
A snapshot of the assets for the Learn Go course on FreeCodeCamp's youtube

Assets for "Learn Go" on FreeCodeCamp This is a snapshot of the code samples for the "Learn Go" course on Boot.dev at the time the video for FreeCodeC

May 12, 2023
📖 Build a RESTful API on Go: Fiber, PostgreSQL, JWT and Swagger docs in isolated Docker containers.
📖 Build a RESTful API on Go: Fiber, PostgreSQL, JWT and Swagger docs in isolated Docker containers.

?? Tutorial: Build a RESTful API on Go Fiber, PostgreSQL, JWT and Swagger docs in isolated Docker containers. ?? The full article is published on Marc

Dec 27, 2022
An open source programming language that makes it easy to build simple
An open source programming language that makes it easy to build simple

The Go Programming Language Go is an open source programming language that makes it easy to build simple, reliable, and efficient software. Gopher ima

Oct 15, 2021
7 days golang programs from scratch (web framework Gee, distributed cache GeeCache, object relational mapping ORM framework GeeORM, rpc framework GeeRPC etc) 7天用Go动手写/从零实现系列

7 days golang programs from scratch README 中文版本 7天用Go从零实现系列 7天能写什么呢?类似 gin 的 web 框架?类似 groupcache 的分布式缓存?或者一个简单的 Python 解释器?希望这个仓库能给你答案

Jan 5, 2023