Consistent hashing with bounded loads in Golang

consistent

GoDoc Build Status Coverage Go Report Card License: MIT Mentioned in Awesome Go

This library provides a consistent hashing function which simultaneously achieves both uniformity and consistency.

For detailed information about the concept, you should take a look at the following resources:

Table of Content

Overview

In this package's context, the keys are distributed among partitions and partitions are distributed among members as well.

When you create a new consistent instance or call Add/Remove:

  • The member's name is hashed and inserted into the hash ring,
  • Average load is calculated by the algorithm defined in the paper,
  • Partitions are distributed among members by hashing partition IDs and none of them exceed the average load.

Average load cannot be exceeded. So if all members are loaded at the maximum while trying to add a new member, it panics.

When you want to locate a key by calling LocateKey:

  • The key(byte slice) is hashed,
  • The result of the hash is mod by the number of partitions,
  • The result of this modulo - MOD(hash result, partition count) - is the partition in which the key will be located,
  • Owner of the partition is already determined before calling LocateKey. So it returns the partition owner immediately.

No memory is allocated by consistent except hashing when you want to locate a key.

Note that the number of partitions cannot be changed after creation.

Install

With a correctly configured Go environment:

go get github.com/buraksezer/consistent

You will find some useful usage samples in examples folder.

Configuration

type Config struct {
	// Hasher is responsible for generating unsigned, 64 bit hash of provided byte slice.
	Hasher Hasher

	// Keys are distributed among partitions. Prime numbers are good to
	// distribute keys uniformly. Select a big PartitionCount if you have
	// too many keys.
	PartitionCount int

	// Members are replicated on consistent hash ring. This number controls
	// the number each member is replicated on the ring.
	ReplicationFactor int

	// Load is used to calculate average load. See the code, the paper and Google's 
	// blog post to learn about it.
	Load float64
}

Any hash algorithm can be used as hasher which implements Hasher interface. Please take a look at the Sample section for an example.

Usage

LocateKey function finds a member in the cluster for your key:

// With a properly configured and initialized consistent instance
key := []byte("my-key")
member := c.LocateKey(key)

It returns a thread-safe copy of the member you added before.

The second most frequently used function is GetClosestN.

// With a properly configured and initialized consistent instance

key := []byte("my-key")
members, err := c.GetClosestN(key, 2)

This may be useful to find backup nodes to store your key.

Benchmarks

On an early 2015 Macbook:

BenchmarkAddRemove-4     	  100000	     22006 ns/op
BenchmarkLocateKey-4     	 5000000	       252 ns/op
BenchmarkGetClosestN-4   	  500000	      2974 ns/op

Examples

The most basic use of consistent package should be like this. For detailed list of functions, visit godoc.org. More sample code can be found under _examples.

package main

import (
	"fmt"

	"github.com/buraksezer/consistent"
	"github.com/cespare/xxhash"
)

// In your code, you probably have a custom data type 
// for your cluster members. Just add a String function to implement 
// consistent.Member interface.
type myMember string

func (m myMember) String() string {
	return string(m)
}

// consistent package doesn't provide a default hashing function. 
// You should provide a proper one to distribute keys/members uniformly.
type hasher struct{}

func (h hasher) Sum64(data []byte) uint64 {
	// you should use a proper hash function for uniformity.
	return xxhash.Sum64(data)
}

func main() {
	// Create a new consistent instance
	cfg := consistent.Config{
		PartitionCount:    7,
		ReplicationFactor: 20,
		Load:              1.25,
		Hasher:            hasher{},
	}
	c := consistent.New(nil, cfg)

	// Add some members to the consistent hash table.
	// Add function calculates average load and distributes partitions over members
	node1 := myMember("node1.olric.com")
	c.Add(node1)

	node2 := myMember("node2.olric.com")
	c.Add(node2)

	key := []byte("my-key")
	// calculates partition id for the given key
	// partID := hash(key) % partitionCount
	// the partitions are already distributed among members by Add function.
	owner := c.LocateKey(key)
	fmt.Println(owner.String())
	// Prints node2.olric.com
}

Another useful example is _examples/relocation_percentage.go. It creates a consistent object with 8 members and distributes partitions among them. Then adds 9th member, here is the result with a proper configuration and hash function:

bloom:consistent burak$ go run _examples/relocation_percentage.go
partID: 218 moved to node2.olric from node0.olric
partID: 173 moved to node9.olric from node3.olric
partID: 225 moved to node7.olric from node0.olric
partID:  85 moved to node9.olric from node7.olric
partID: 220 moved to node5.olric from node0.olric
partID:  33 moved to node9.olric from node5.olric
partID: 254 moved to node9.olric from node4.olric
partID:  71 moved to node9.olric from node3.olric
partID: 236 moved to node9.olric from node2.olric
partID: 118 moved to node9.olric from node3.olric
partID: 233 moved to node3.olric from node0.olric
partID:  50 moved to node9.olric from node4.olric
partID: 252 moved to node9.olric from node2.olric
partID: 121 moved to node9.olric from node2.olric
partID: 259 moved to node9.olric from node4.olric
partID:  92 moved to node9.olric from node7.olric
partID: 152 moved to node9.olric from node3.olric
partID: 105 moved to node9.olric from node2.olric

6% of the partitions are relocated

Moved partition count is highly dependent on your configuration and quailty of hash function. You should modify the configuration to find an optimum set of configurations for your system.

_examples/load_distribution.go is also useful to understand load distribution. It creates a consistent object with 8 members and locates 1M key. It also calculates average load which cannot be exceeded by any member. Here is the result:

Maximum key count for a member should be around this:  147602
member: node2.olric, key count: 100362
member: node5.olric, key count: 99448
member: node0.olric, key count: 147735
member: node3.olric, key count: 103455
member: node6.olric, key count: 147069
member: node1.olric, key count: 121566
member: node4.olric, key count: 147932
member: node7.olric, key count: 132433

Average load can be calculated by using the following formula:

load := (consistent.AverageLoad() * float64(keyCount)) / float64(config.PartitionCount)

Contributions

Please don't hesitate to fork the project and send a pull request or just e-mail me to ask questions and share ideas.

License

MIT License, - see LICENSE for more details.

Comments
  • Possibly incorrect implementation

    Possibly incorrect implementation

    I may be wrong, but there doesn't seem to be any place in the code where the load of a member (i.e. a node) is incremented. The only place where load distribution happens is when a new node is added or removed. This doesn't let the ring keep track of how much load each member has, especially once new clients/load are being allocated to the members. However, the point of bounded loads is that the load of each member is getting kept track of, so that if a new client comes in and would be allocated to a member with above-average load, the client will be added to another member instead.

    In effect, I think your implementation may just be the classic consistent hash but not necessarily with bounded loads.

  • Uneven distribution of keys

    Uneven distribution of keys

    I'm seeing uneven distribution with certain parameters

    package main
    
    import (
    	"fmt"
    
    	"github.com/buraksezer/consistent"
    	"github.com/cespare/xxhash"
    )
    
    type hasher struct{}
    
    func (h hasher) Sum64(data []byte) uint64 {
    	return xxhash.Sum64(data)
    }
    
    type myMember string
    
    func (m myMember) String() string {
    	return string(m)
    }
    
    func main() {
    	h := hasher{}
    
    	letters := []string{
    		"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
    	}
    
    	for i := 2; i < 20; i++ {
    		cfg := consistent.Config{
    			PartitionCount:    2,
    			ReplicationFactor: i,
    			Load:              1.25,
    			Hasher:            h,
    		}
    
    		allocator := consistent.New(nil, cfg)
    		allocator.Add(myMember("shard-0"))
    		allocator.Add(myMember("shard-1"))
    
    		shardCounts := make(map[string]int)
    		for _, l := range letters {
    			shard := allocator.LocateKey([]byte(l))
    			shardCounts[shard.String()] = shardCounts[shard.String()] + 1 
    		}
    
    		fmt.Printf("with replication factor of %d shard-0 gets %d keys and shard-1 gets %d\n", i, shardCounts["shard-0"], shardCounts["shard-1"])
    	}
    }
    

    output

    with replication factor of 2 shard-0 gets 14 keys and shard-1 gets 12
    with replication factor of 3 shard-0 gets 0 keys and shard-1 gets 26
    with replication factor of 4 shard-0 gets 0 keys and shard-1 gets 26
    with replication factor of 5 shard-0 gets 12 keys and shard-1 gets 14
    with replication factor of 6 shard-0 gets 12 keys and shard-1 gets 14
    with replication factor of 7 shard-0 gets 12 keys and shard-1 gets 14
    with replication factor of 8 shard-0 gets 12 keys and shard-1 gets 14
    with replication factor of 9 shard-0 gets 26 keys and shard-1 gets 0
    with replication factor of 10 shard-0 gets 26 keys and shard-1 gets 0
    with replication factor of 11 shard-0 gets 26 keys and shard-1 gets 0
    with replication factor of 12 shard-0 gets 26 keys and shard-1 gets 0
    with replication factor of 13 shard-0 gets 26 keys and shard-1 gets 0
    with replication factor of 14 shard-0 gets 26 keys and shard-1 gets 0
    with replication factor of 15 shard-0 gets 26 keys and shard-1 gets 0
    with replication factor of 16 shard-0 gets 26 keys and shard-1 gets 0
    with replication factor of 17 shard-0 gets 26 keys and shard-1 gets 0
    with replication factor of 18 shard-0 gets 26 keys and shard-1 gets 0
    with replication factor of 19 shard-0 gets 14 keys and shard-1 gets 12
    

    There are a lot of 26 vs 0 results, feels weird to me, is it expected, coincidence, coder error, a bug perhaps?

  • Add support to access the Consistent struct in consistent.go

    Add support to access the Consistent struct in consistent.go

    My use case: I am planning to store a bunch of containers as nodes in the hashring. Then, each container would poll periodically to look at its position in the hashring and decide what operations to do based on that. In order to do this, I need to access the hashring itself. Would it possible for you to add a function that returns the ring on the struct or expose the struct itself?

    image

    Thanks, Sai

  • I have some questions about this project

    I have some questions about this project

    Is this project really follow the origin paper about Consistent Hashing with Bounded Loads? I did not see any partition layer in this paper, there are only two roles: ball & bin.

    consistent hashing only balances loads about as well as choosing a random server for each request, when the distribution of requests is equal. But if some content is much more popular than others (as usual for the internet), it can be worse than that.

    Why wasn’t there a way to say “use consistent hashing, but please don’t overload any servers”?

    https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed

    keys -> partitions -> members can only support balance between partitions and members. If there are too many keys mapping to same partition, the load of the member that relates to this partition will significant overload.

    Besides, I think the result of _examples/load_distribution.go is totally depends on the balance of hash(key) % partitionNum. So, what's the different between this solution and dynamo hashing?

    Otherwise, is this project is target to store system rather than connection load balance?

  • [Bug] RLock called twice in GetClosestN and cause the deadlock

    [Bug] RLock called twice in GetClosestN and cause the deadlock

    We recently encountered a deadlock with the library. After some digging, we found out that it tries to acquire the read lock in getClosetN, and then in GetPartitionOwner.

    https://github.com/buraksezer/consistent/blob/f0660af7331feefacfe92d272ecd86deb7a87c0a/consistent.go#L302-L312

    https://github.com/buraksezer/consistent/blob/f0660af7331feefacfe92d272ecd86deb7a87c0a/consistent.go#L284-L287

    It's not safe to call RLock and RUnlock twice in one goroutine while having another goroutine call Lock simultaneously. The discussion here and here explain it better than me. In addition, to quote from the RWMutex documentation:

    If a goroutine holds a RWMutex for reading and another goroutine might call Lock, no goroutine should expect to be able to acquire a read lock until the initial read lock is released. In particular, this prohibits recursive read locking.

    You can also go to this Go playground and replicate it yourself: https://go.dev/play/p/pEhnPbsnDG

    The probability is low since it requires a Lock happens immediately after the first RLock, but it would happen when you have a large amount of reads/writes and run for a sufficiently long time.

  • AverageLoad() function panics with

    AverageLoad() function panics with "divide by zero" when no members are in the hash ring

    If there are no members in the hashring, calling the AverageLoad() function results in a "divide-by-zero" panic.

    Example, this will result in a panic:

    package main
    
    import (
    	"fmt"
    
    	"github.com/buraksezer/consistent"
    	"github.com/cespare/xxhash"
    )
    
    type myMember string
    
    func (m myMember) String() string {
    	return string(m)
    }
    
    type hasher struct{}
    
    func (h hasher) Sum64(data []byte) uint64 {
    	return xxhash.Sum64(data)
    }
    
    func main() {
    	cfg := consistent.Config{
    		PartitionCount:    7,
    		ReplicationFactor: 20,
    		Load:              1.25,
    		Hasher:            hasher{},
    	}
    	c := consistent.New(nil, cfg)
            fmt.Printf("average load: %f", c.AverageLoad())
    }
    

    The AverageLoad() function should check if the number of members is > 0, and if it is not simply return a fixed value (such as 0 or -1).

  • Add receive a memeber but Remove receive a string

    Add receive a memeber but Remove receive a string

    // Add adds a new member to the consistent hash circle.
    func (c *Consistent) Add(member Member) {
    }
    
    // Remove removes a member from the consistent hash circle.
    func (c *Consistent) Remove(name string) {
    }
    
  • `ErrInsufficientMemberCount` when calling `GetClosestN(key, 2)` on a Consistent with 2 members

    `ErrInsufficientMemberCount` when calling `GetClosestN(key, 2)` on a Consistent with 2 members

    I've been testing this library as a possible solution and came across this curious behavior. I have a Consistent object with 3 members, and then .Remove()'d one of them, and started getting ErrInsufficientMemberCount, which surprised me.

    I see in the code you check against len(c.members)-1, is this a typo or or there some reason why all the members shouldn't be returned?

    https://github.com/buraksezer/consistent/blob/master/consistent.go#L308

  • about delSlice

    about delSlice

    Hi, line 231, function delSlice, seems be better if make a new slice to store and assign to c.sortedSet or , change in-place like below:

    func main(){
    	m:=[]int{1,2,3,4,45,6,7,5,3,4,5,4,4,4,4,4}
    	for i:=0;i<len(m)-1;{
    		if m[i]==4{
    			m=append(m[:i],m[i+1:]...)
    			continue
    		}
    		i++
    	}
    	if m[len(m)-1]==4{
    		m=m[:len(m)-1]
    	}
    	fmt.Println(m)
    }
    
  • Need guidance for optimum configuration recommendation

    Need guidance for optimum configuration recommendation

    Hi @buraksezer

    Thanks for the implementation. I needed some guidance for my use case, will be glad if you can provide recommendation on how should I configure the hash ring.

    Here are some details: number of servers: 2-10 number of clients: 200-500 frequency of addition/removal of client: daily/weekly/monthly.

  • Potential collision between member name and replica ID

    Potential collision between member name and replica ID

    Replica names are created by smashing the member name and the replica number together.

    https://github.com/buraksezer/consistent/blob/a1a9c4ab8d2da437ecbf64219cc0b6cf58639b7e/consistent.go#L250

    Member names that end in numbers may collide with replicas of other members. For example, with member names of member and member1 with a ReplicationFactor of 11, there will be a collision with the 11th replica of member: member10 and the 1st replica of member1: member10.

  • Better Understanding the PartitionCount Configuration

    Better Understanding the PartitionCount Configuration

    @buraksezer First of all, I want to say thanks for this awesome implementation! I've started to use it in one of my projects.

    I did, however, have a question about the PartitionCount configuration. I haven't seen a lot of information in the literature about how this impacts the hashing and, more importantly, any strategies on how to best chose the value for any arbitrary type of query workload. What I'm trying to do is uniformly distribute queries for any given resourceID across N cluster members. I like this library because if one of the cluster members fails then the load is redistributed such that any one of the remaining members will receive <= (Load * 100 ) percent of the residual traffic.

    But I don't quite yet understand how that goal relates with the PartitionCount, and if I choose a lower value such as 3, 5, or 9 then I experience issues with the library being able to distribute when I add more members. Your input/guidance would be appreciated :)

  •  Why add the partitionCount parameter

    Why add the partitionCount parameter

    I read your implementation, but I don't quite understand the variable of PartitionCount. Why not directly use sortedSet to locate the member where the data is located

  • A corner case will cause panic.

    A corner case will cause panic.

    This demo will panic:

    package main
    
    import (
    	"github.com/buraksezer/consistent"
    	"github.com/cespare/xxhash"
    )
    
    type hasher struct{}
    
    type myMember string
    
    func (m myMember) String() string {
    	return string(m)
    }
    
    func (h hasher) Sum64(data []byte) uint64 {
    	// you should use a proper hash function for uniformity.
    	return xxhash.Sum64(data)
    }
    
    func main() {
    	cfg := consistent.Config{
    		PartitionCount:    1,
    		ReplicationFactor: 1,
    		Load:              1,
    		Hasher:            hasher{},
    	}
    	c := consistent.New(nil, cfg)
    
    	node1 := myMember("node1")
    	c.Add(node1)
    }
    

    Here is the error message:

    panic: not enough room to distribute partitions
    
    goroutine 1 [running]:
    github.com/buraksezer/consistent.(*Consistent).distributeWithLoad(0x450060, 0x0, 0x0, 0x43e2e0, 0x43e2c0, 0x34c96acd)
    	/tmp/gopath552882815/pkg/mod/github.com/buraksezer/[email protected]/consistent.go:165 +0x3a0
    github.com/buraksezer/consistent.(*Consistent).distributePartitions(0x450060, 0x1604d0)
    	/tmp/gopath552882815/pkg/mod/github.com/buraksezer/[email protected]/consistent.go:196 +0xc0
    github.com/buraksezer/consistent.(*Consistent).Add(0x450060, 0x1604d0, 0x40c150, 0x2b5f)
    	/tmp/gopath552882815/pkg/mod/github.com/buraksezer/[email protected]/consistent.go:227 +0x180
    main.main()
    	/tmp/sandbox195891616/prog.go:30 +0xe0
    

    So this is expected behavior or a bug?

hego aims to provide a consistent API for several metaheuristics

hego hego aims to provide a consistent API for several metaheuristics (black box optimization algorithms) while being performant. Even though most of

Dec 24, 2022
Golang CS:GO external base. Development currently halted due to compiler/runtime Golang bugs.

gogo Golang CS:GO External cheat/base. Also, my first Golang project. Wait! Development momentarily halted due to compiler/runtime bugs. Disclaimer Th

Jun 25, 2022
Belajar Golang Install Golang

Golang belajar Golang Install Golang = download di https://golang.org/dl/ = pilih yg Zip = extract file zipnya = buka foldernya - copy folder go = pas

Nov 15, 2021
Golang-module-references - A reference for how to setup a Golang project with modules - Task Management + Math Examples

Golang Module Project The purpose of this project is to act as a reference for setting up future Golang projects using modules. This project has a mat

Jan 2, 2022
Golang-echo-sample - Make an out-of-the-box backend based on golang-echo

Golang-echo-sample - Make an out-of-the-box backend based on golang-echo

Dec 31, 2021
Minimalistic, pluggable Golang evloop/timer handler with dependency-injection

Anagent Minimalistic, pluggable Golang evloop/timer handler with dependency-injection - based on codegangsta/inject - go-macaron/inject and chuckpresl

Sep 27, 2022
GoLang Library for Browser Capabilities Project

Browser Capabilities GoLang Project PHP has get_browser() function which tells what the user's browser is capable of. You can check original documenta

Sep 27, 2022
Golang counters for readers/writers

Datacounter Golang counters for readers/writers. Examples ReaderCounter buf := bytes.Buffer{} buf.Write(data) counter := datacounter.NewReaderCounter(

Oct 9, 2022
Golang beautify data display for Humans

Golang beautify data display for Humans English 简体中文 Install # Stable version go get -u -v gopkg.in/ffmt.v1 # Latest version go get -u -v github.com/

Dec 22, 2022
a generic object pool for golang

Go Commons Pool The Go Commons Pool is a generic object pool for Golang, direct rewrite from Apache Commons Pool. Features Support custom PooledObject

Jan 5, 2023
Resiliency patterns for golang

go-resiliency Resiliency patterns for golang. Based in part on Hystrix, Semian, and others. Currently implemented patterns include: circuit-breaker (i

Jan 3, 2023
psutil for golang

gopsutil: psutil for golang This is a port of psutil (https://github.com/giampaolo/psutil). The challenge is porting all psutil functions on some arch

Jan 2, 2023
Type-safe Prometheus metrics builder library for golang

gotoprom A Prometheus metrics builder gotoprom offers an easy to use declarative API with type-safe labels for building and using Prometheus metrics.

Dec 5, 2022
Simple licensing library for golang.

license-key A simple licensing library in Golang, that generates license files containing arbitrary data. Note that this implementation is quite basic

Dec 24, 2022
Some utilities for Persian language in Go (Golang)

persian Some utilities for Persian language in Go (Golang). Installation go get github.com/mavihq/persian API .ToPersianDigits Converts all English d

Oct 22, 2022
A Golang library to manipulate strings according to the word parsing rules of the UNIX Bourne shell.

shellwords A Golang library to manipulate strings according to the word parsing rules of the UNIX Bourne shell. Installation go get github.com/Wing924

Sep 27, 2022
A golang URL Shortener

url-shortener A golang URL Shortener with mysql support. Using Bijective conversion between natural numbers (IDs) and short strings Installation Using

Dec 10, 2022
:guardsman: A teeny tiny and somewhat opinionated generator for your next golang project

A Yeoman Golang Generator We are very sorry Gophers, but other names for the generator where taken, so we choose go-lang. But we have gocreate as an a

Sep 27, 2022
Flow-based and dataflow programming library for Go (golang)
Flow-based and dataflow programming library for Go (golang)

GoFlow - Dataflow and Flow-based programming library for Go (golang) Status of this branch (WIP) Warning: you are currently on v1 branch of GoFlow. v1

Dec 30, 2022