A faster RWLock primitive in Go, 2-3 times faster than RWMutex. A Go implementation of concurrency control algorithm in paper

Last update: Jan 6, 2023

Comments: 4

Go Left Right Concurrency

A Go implementation of the left-right concurrency control algorithm in paper

This library provides a concurrency primitive for high concurrency reads over a single-writer data structure. The micro benchmark shows the left-right pattern is 2-3x faster than the RWMutex.

Why

In Go, RWMutex is your best choice in most cases when handling concurrent read/write situation. However, in some special extreme cases: you do not want the high-concurrent reads to be blocked by infrequent writes.

In this library, an example left-right-pattern map was implemented to compare with the classic RWMutex-based map.

You can apply the same technique on slice, list, stack etc, to make them concurrent-safe if it meets your case: high concurrency reads over a single-writer.

// Only writes: RWMutex-Map is 2x faster than LR-Map 
BenchmarkLRMap_Write                  	12231741	        93.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkLRMap_Write-2                	12355836	        93.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkLRMap_Write-4                	12448788	        92.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkLockMap_Write                	20728842	        54.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkLockMap_Write-2              	20953489	        54.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkLockMap_Write-4              	20412688	        54.0 ns/op	       0 B/op	       0 allocs/op

// Only reads: RWMutex-Map is slightly faster than LR-Map
BenchmarkLRMap_Read                   	 4505625	       224 ns/op	      39 B/op	       0 allocs/op
BenchmarkLRMap_Read-2                 	 5114546	       212 ns/op	      34 B/op	       0 allocs/op
BenchmarkLRMap_Read-4                 	 5271961	       212 ns/op	      33 B/op	       0 allocs/op
BenchmarkLockMap_Read                 	 7331290	       158 ns/op	      11 B/op	       0 allocs/op
BenchmarkLockMap_Read-2               	 6270297	       162 ns/op	      14 B/op	       0 allocs/op
BenchmarkLockMap_Read-4               	 6244382	       162 ns/op	      14 B/op	       0 allocs/op

// 5 readers vs 1 writer: on 2.4 CPUs, LR-map is 2-3x faster than RWMutex-map
BenchmarkLRMap_Read_Write_5_1         	    5794	    205062 ns/op	      46 B/op	       1 allocs/op
BenchmarkLRMap_Read_Write_5_1-2       	    3088	    405067 ns/op	      72 B/op	       1 allocs/op
BenchmarkLRMap_Read_Write_5_1-4       	    3014	    415573 ns/op	      81 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_5_1       	    7491	    160224 ns/op	      27 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_5_1-2     	    1210	   1031627 ns/op	      94 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_5_1-4     	    1106	   1067645 ns/op	     101 B/op	       1 allocs/op


// 10 readers vs 1 writer: on 2.4 CPUs, LR-map is 2-3x faster than RWMutex-map
BenchmarkLRMap_Read_Write_10_1        	    3236	    354466 ns/op	      70 B/op	       1 allocs/op
BenchmarkLRMap_Read_Write_10_1-2      	    1609	    772303 ns/op	     123 B/op	       1 allocs/op
BenchmarkLRMap_Read_Write_10_1-4      	    1593	    726723 ns/op	     132 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_10_1      	    4263	    271487 ns/op	      36 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_10_1-2    	     676	   1726200 ns/op	     154 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_10_1-4    	     604	   2007962 ns/op	     172 B/op	       1 allocs/op

// 50 readers vs 1 writer: on 2.4 CPUs, LR-map is 2x faster than RWMutex-map
BenchmarkLRMap_Read_Write_50_1        	     586	   1733163 ns/op	     310 B/op	       1 allocs/op
BenchmarkLRMap_Read_Write_50_1-2      	     340	   3532511 ns/op	     527 B/op	       1 allocs/op
BenchmarkLRMap_Read_Write_50_1-4      	     367	   3192197 ns/op	     493 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_50_1      	    1004	   1177003 ns/op	     101 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_50_1-2    	     223	   5803691 ns/op	     433 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_50_1-4    	     195	   5741745 ns/op	     466 B/op	       1 allocs/op

// 100 readers vs 1 writer: on 2.4 CPUs, LR-map is 2x faster than RWMutex-map
BenchmarkLRMap_Read_Write_100_1       	     384	   3126528 ns/op	     471 B/op	       1 allocs/op
BenchmarkLRMap_Read_Write_100_1-2     	     182	   6503080 ns/op	     974 B/op	       1 allocs/op
BenchmarkLRMap_Read_Write_100_1-4     	     182	   6345014 ns/op	     969 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_100_1     	     500	   2351661 ns/op	     188 B/op	       1 allocs/op
BenchmarkLockMap_Read_Write_100_1-2   	     100	  10701998 ns/op	     973 B/op	       2 allocs/op
BenchmarkLockMap_Read_Write_100_1-4   	     124	  10646446 ns/op	     776 B/op	       1 allocs/op

What

Left-Right is a technique with some similarities with Double Instance Locking because it uses two instances, and has a mutex to serialize mutations. In this sense, it is reminiscent of the “Double Buffering” technique used in computer graphics.
Left-Right is a generic, linearizable, technique that can provide mutual exclusivity for any object in memory or data structure
Left-Right is non-blocking for read-only operations, and fast
it is Wait-Free Population Oblivious

See all the details: link

How

go get github.com/csimplestring/go-left-right

Then you can use it to wrap any data structures. See the below example.

import github.com/csimplestring/go-left-right/primitive

type LRMap struct {
	*primitive.LeftRightPrimitive

    // you have to provides 2 identical instances
	left  map[int]int
	right map[int]int
}

func newIntMap() *LRMap {

	m := &LRMap{
		left:  make(map[int]int),
		right: make(map[int]int),
	}

	m.LeftRightPrimitive = primitive.New()

	return m
}

func (lr *LRMap) Get(k int) (val int, exist bool) {

    // Go does not have generics, so have to use interface{} for lambda's arguments
	lr.ApplyReadFn(lr.left, lr.right, func(instance interface{}) {
		m := instance.(map[int]int)
		val, exist = m[k]
	})

	return
}

func (lr *LRMap) Put(key, val int) {
	lr.ApplyWriteFn(lr.left, lr.right, func(instance interface{}) {
		m := instance.(map[int]int)
		m[key] = val
	})
}

Owner

wangyi

https://github.com/csimplestring/go-left-right

Similar Resources

A simple thread-safe, fixed size LRU written in Go. Based on dominictarr's Hashlru Algorithm. 🔃

go-hashlru A simple thread-safe, fixed size LRU written in Go. Based on dominictarr's Hashlru Algorithm. 🔃 Uses map[interface{}]interface{} to allow

Dec 5, 2022

A concurrent rate limiter library for Golang based on Sliding-Window rate limiter algorithm.

ratelimiter A generic concurrent rate limiter library for Golang based on Sliding-window rate limitng algorithm. The implementation of rate-limiter al

Jan 6, 2023

golang implement gossip algorithm

Dec 1, 2022

Scalable golang ratelimiter using the sliding window algorithm. Currently supports only Redis.

go-ratelimiter Scalable golang ratelimiter using the sliding window algorithm. Currently supports only Redis. Example usage client := redis.NewClient

Oct 19, 2021

EVM-compatible chain secured by the Lachesis consensus algorithm.

ICICB-Galaxy EVM-compatible chain secured by the Lachesis consensus algorithm. Building the source Building galaxy requires both a Go (version 1.14 or

Oct 7, 2021

Faster golang reflection

In order to solve a problem that can work with any golang type, one has no choice but to use reflection. Native golang reflection comes with hefty performance price, on benchmarking simple getter/setter case to manipulate struct dynamically I've seen around 100 time worst performance comparing to statically typed code.

Dec 21, 2022

Package ethtool allows control of the Linux ethtool generic netlink interface.

ethtool Package ethtool allows control of the Linux ethtool generic netlink interface.

Dec 14, 2022

sigbypass4xx is a utility to automate well-know techniques used to bypass access control restrictions.

sigbypass4xx sigbypass4xx is a utility to automate well-know techniques used to bypass access control restrictions. Resources Usage Installation From

Nov 9, 2022

go implementation of timsort

timsort timsort is a Go implementation of Tim Peters's mergesort sorting algorithm. For many input types it is 2-3 times faster than Go's built-in sor

Nov 7, 2022

Comments

moves primitives to package primitive
fixes requiring namespace overriding of this package as an import

import primitive "github.com/csimplestring/go-left-right"

becomes

import "github.com/csimplestring/go-left-right/primitive"

This also had to fix the issue that I have addressed in my earlier PR (moving to primitive) exposes the bug
suggested changes

A testrun and tried out tevino/abool in stead of atomic int.

you have a bug in ApplyWriteFn which is attached to the LRMap, not the primitive

I moved the code into folders, this also fixed having to rename import as you'd have to with using the root (dashes in the repo name)

GL

[Discussion] readerIndicator is not conclusively faster than RWMutex

Hi,

I tried rewriting the project but used a linked left-right approach. also I thought the readerInicator looked a lot like an RWMutex, so I tried with that in stead.

I think the resulting code is a lot more readable: linked-left-right

But I also found that this solution performed quite a lot better in moderate load situations (break-even about 50 goroutines)

your master branch:

goos: linux
goarch: amd64
pkg: github.com/csimplestring/go-left-right
cpu: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
BenchmarkLRMap_Write-8                   9962917               111.4 ns/op
BenchmarkLockMap_Write-8                18546358                62.40 ns/op
BenchmarkLRMap_Read-8                    3806511               272.0 ns/op
BenchmarkLockMap_Read-8                  6499641               170.4 ns/op
BenchmarkLRMap_Read_Write_5_1-8             1575            812337 ns/op
BenchmarkLockMap_Read_Write_5_1-8           1669            675591 ns/op
BenchmarkLRMap_Read_Write_10_1-8            1281           1380485 ns/op
BenchmarkLockMap_Read_Write_10_1-8          1042           1107362 ns/op
BenchmarkLRMap_Read_Write_50_1-8             531           2365397 ns/op
BenchmarkLockMap_Read_Write_50_1-8           268           4396724 ns/op
BenchmarkLRMap_Read_Write_100_1-8            331           3951561 ns/op
BenchmarkLockMap_Read_Write_100_1-8          136           8815394 ns/op
BenchmarkLRMap_Read_Write_500_1-8             70          15398926 ns/op
BenchmarkLockMap_Read_Write_500_1-8          139           8690447 ns/op
PASS
ok      github.com/csimplestring/go-left-right  31.352s

my branch MikkelHJuul/linked-left-right:

goos: linux
goarch: amd64
pkg: github.com/csimplestring/go-left-right
cpu: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
BenchmarkLRMap_Write-8                   9037975               124.6 ns/op
BenchmarkLockMap_Write-8                19078170                61.88 ns/op
BenchmarkLRMap_Read-8                    4329091               248.6 ns/op
BenchmarkLockMap_Read-8                  6337088               182.4 ns/op
BenchmarkLRMap_Read_Write_5_1-8             3476            359224 ns/op
BenchmarkLockMap_Read_Write_5_1-8           1406            801319 ns/op
BenchmarkLRMap_Read_Write_10_1-8            2132            548529 ns/op
BenchmarkLockMap_Read_Write_10_1-8          1041           1186699 ns/op
BenchmarkLRMap_Read_Write_50_1-8             526           2256797 ns/op
BenchmarkLockMap_Read_Write_50_1-8           290           4430084 ns/op
BenchmarkLRMap_Read_Write_100_1-8            254           4453970 ns/op
BenchmarkLockMap_Read_Write_100_1-8          148           8315832 ns/op
PASS
ok      github.com/csimplestring/go-left-right  29.527s

It's a fun data structure to work with :)

New reader indicator

In this PR, a new technique called 'ingress/egress' is used to reduce the read contention on the read-indicator. The core idea is to use the LongAdder (similar JDK LongAdder), which spread the contention in different cells. This is very useful and scalable when there are too many readers.

The benchmark comparison results are:

Benchmark-LongAdder-LRMap_Read_Write_100_1       	     249	   4649508 ns/op	     707 B/op	       1 allocs/op
Benchmark-LongAdder-LRMap_Read_Write_100_1-2     	     172	   7267487 ns/op	    1055 B/op	       2 allocs/op
Benchmark-LongAdder-LRMap_Read_Write_100_1-4     	     254	   4801497 ns/op	     975 B/op	       3 allocs/op
Benchmark-LongAdder-LRMap_Read_Write_100_1-8     	     315	   3792883 ns/op	    1004 B/op	       4 allocs/op
Benchmark-LongAdder-LRMap_Read_Write_500_1       	      66	  23151452 ns/op	    2647 B/op	       3 allocs/op
Benchmark-LongAdder-LRMap_Read_Write_500_1-2     	      33	  40375654 ns/op	    5535 B/op	       7 allocs/op
Benchmark-LongAdder-LRMap_Read_Write_500_1-4     	      46	  24362344 ns/op	    6045 B/op	      18 allocs/op
Benchmark-LongAdder-LRMap_Read_Write_500_1-8     	      67	  17489811 ns/op	    5068 B/op	      17 allocs/op


Benchmark-AtomicAdder-LRMap_Read_Write_100_1       	     368	   3177685 ns/op	     481 B/op	       1 allocs/op
Benchmark-AtomicAdder-LRMap_Read_Write_100_1-2     	     174	   6815903 ns/op	    1011 B/op	       1 allocs/op
Benchmark-AtomicAdder-LRMap_Read_Write_100_1-4     	     181	   6525143 ns/op	     973 B/op	       1 allocs/op
Benchmark-AtomicAdder-LRMap_Read_Write_100_1-8     	     201	   6011937 ns/op	     922 B/op	       1 allocs/op
Benchmark-AtomicAdder-LRMap_Read_Write_500_1       	      75	  15888525 ns/op	    2306 B/op	       2 allocs/op
Benchmark-AtomicAdder-LRMap_Read_Write_500_1-2     	      31	  33664236 ns/op	    5605 B/op	       5 allocs/op
Benchmark-AtomicAdder-LRMap_Read_Write_500_1-4     	      34	  32456822 ns/op	    5108 B/op	       5 allocs/op
Benchmark-AtomicAdder-LRMap_Read_Write_500_1-8     	      38	  31595880 ns/op	    5873 B/op	       8 allocs/op

As is shown, the throughput is linear along with the cpu core number and the LongAdder is more scalable then AtomicAdder.

A faster RWLock primitive in Go, 2-3 times faster than RWMutex. A Go implementation of concurrency control algorithm in paper

Go Left Right Concurrency

Why

What

How

Owner

wangyi

Similar Resources

A simple thread-safe, fixed size LRU written in Go. Based on dominictarr's Hashlru Algorithm. 🔃

A concurrent rate limiter library for Golang based on Sliding-Window rate limiter algorithm.

golang implement gossip algorithm

Scalable golang ratelimiter using the sliding window algorithm. Currently supports only Redis.

EVM-compatible chain secured by the Lachesis consensus algorithm.

Faster golang reflection

Package ethtool allows control of the Linux ethtool generic netlink interface.

sigbypass4xx is a utility to automate well-know techniques used to bypass access control restrictions.

go implementation of timsort

Comments

moves primitives to package primitive

suggested changes

[Discussion] readerIndicator is not conclusively faster than RWMutex

New reader indicator

Related tags

Go implementation of the Heaven's Gate technique

ms - 'my story' creates a secure password string which can be memorized with a technique shared by Max.

Slice conversion between primitive types

Cogger is a standalone binary and a golang library that reads an internally tiled geotiff

Generic Free List implementation to reuse memory and avoid allocations

A tool to determine the highest version number that's smaller than a target version number

A small & fast dependency-free library for parsing micro expressions.

Code Generation for Functional Programming, Concurrency and Generics in Golang

Helper library for full uint64 randomness, pool backed for efficient concurrency

Concurrency in Go video course with in depth explanations & examples