Ristretto is a fast, concurrent cache library built with a focus on performance and correctness.

Ristretto

Go Doc Go Report Card Coverage Tests

Ristretto is a fast, concurrent cache library built with a focus on performance and correctness.

The motivation to build Ristretto comes from the need for a contention-free cache in Dgraph.

Use Discuss Issues for reporting issues about this repository.

Features

  • High Hit Ratios - with our unique admission/eviction policy pairing, Ristretto's performance is best in class.
    • Eviction: SampledLFU - on par with exact LRU and better performance on Search and Database traces.
    • Admission: TinyLFU - extra performance with little memory overhead (12 bits per counter).
  • Fast Throughput - we use a variety of techniques for managing contention and the result is excellent throughput.
  • Cost-Based Eviction - any large new item deemed valuable can evict multiple smaller items (cost could be anything).
  • Fully Concurrent - you can use as many goroutines as you want with little throughput degradation.
  • Metrics - optional performance metrics for throughput, hit ratios, and other stats.
  • Simple API - just figure out your ideal Config values and you're off and running.

Status

Ristretto is usable but still under active development. We expect it to be production ready in the near future.

Table of Contents

Usage

Example

func main() {
	cache, err := ristretto.NewCache(&ristretto.Config{
		NumCounters: 1e7,     // number of keys to track frequency of (10M).
		MaxCost:     1 << 30, // maximum cost of cache (1GB).
		BufferItems: 64,      // number of keys per Get buffer.
	})
	if err != nil {
		panic(err)
	}

	// set a value with a cost of 1
	cache.Set("key", "value", 1)
	
	// wait for value to pass through buffers
	time.Sleep(10 * time.Millisecond)

	value, found := cache.Get("key")
	if !found {
		panic("missing value")
	}
	fmt.Println(value)
	cache.Del("key")
}

Config

The Config struct is passed to NewCache when creating Ristretto instances (see the example above).

NumCounters int64

NumCounters is the number of 4-bit access counters to keep for admission and eviction. We've seen good performance in setting this to 10x the number of items you expect to keep in the cache when full.

For example, if you expect each item to have a cost of 1 and MaxCost is 100, set NumCounters to 1,000. Or, if you use variable cost values but expect the cache to hold around 10,000 items when full, set NumCounters to 100,000. The important thing is the number of unique items in the full cache, not necessarily the MaxCost value.

MaxCost int64

MaxCost is how eviction decisions are made. For example, if MaxCost is 100 and a new item with a cost of 1 increases total cache cost to 101, 1 item will be evicted.

MaxCost can also be used to denote the max size in bytes. For example, if MaxCost is 1,000,000 (1MB) and the cache is full with 1,000 1KB items, a new item (that's accepted) would cause 5 1KB items to be evicted.

MaxCost could be anything as long as it matches how you're using the cost values when calling Set.

BufferItems int64

BufferItems is the size of the Get buffers. The best value we've found for this is 64.

If for some reason you see Get performance decreasing with lots of contention (you shouldn't), try increasing this value in increments of 64. This is a fine-tuning mechanism and you probably won't have to touch this.

Metrics bool

Metrics is true when you want real-time logging of a variety of stats. The reason this is a Config flag is because there's a 10% throughput performance overhead.

OnEvict func(hashes [2]uint64, value interface{}, cost int64)

OnEvict is called for every eviction.

KeyToHash func(key interface{}) [2]uint64

KeyToHash is the hashing algorithm used for every key. If this is nil, Ristretto has a variety of defaults depending on the underlying interface type.

Note that if you want 128bit hashes you should use the full [2]uint64, otherwise just fill the uint64 at the 0 position and it will behave like any 64bit hash.

Cost func(value interface{}) int64

Cost is an optional function you can pass to the Config in order to evaluate item cost at runtime, and only for the Set calls that aren't dropped (this is useful if calculating item cost is particularly expensive and you don't want to waste time on items that will be dropped anyways).

To signal to Ristretto that you'd like to use this Cost function:

  1. Set the Cost field to a non-nil function.
  2. When calling Set for new items or item updates, use a cost of 0.

Benchmarks

The benchmarks can be found in https://github.com/dgraph-io/benchmarks/tree/master/cachebench/ristretto.

Hit Ratios

Search

This trace is described as "disk read accesses initiated by a large commercial search engine in response to various web search requests."

Database

This trace is described as "a database server running at a commercial site running an ERP application on top of a commercial database."

Looping

This trace demonstrates a looping access pattern.

CODASYL

This trace is described as "references to a CODASYL database for a one hour period."

Throughput

All throughput benchmarks were ran on an Intel Core i7-8700K (3.7GHz) with 16gb of RAM.

Mixed

Read

Write

FAQ

How are you achieving this performance? What shortcuts are you taking?

We go into detail in the Ristretto blog post, but in short: our throughput performance can be attributed to a mix of batching and eventual consistency. Our hit ratio performance is mostly due to an excellent admission policy and SampledLFU eviction policy.

As for "shortcuts," the only thing Ristretto does that could be construed as one is dropping some Set calls. That means a Set call for a new item (updates are guaranteed) isn't guaranteed to make it into the cache. The new item could be dropped at two points: when passing through the Set buffer or when passing through the admission policy. However, this doesn't affect hit ratios much at all as we expect the most popular items to be Set multiple times and eventually make it in the cache.

Is Ristretto distributed?

No, it's just like any other Go library that you can import into your project and use in a single process.

Comments
  • cmSketch not benefitting from four rows

    cmSketch not benefitting from four rows

    I believe the commit f823dc4a broke the purpose of the cm sketch to have four rows. The bitwise-xor and mask are not creating the mix of indexes expected by the design. The same Increment/Estimate results, the same values, are achieved from a single row with or without the bitwise-xor. The earlier implementation seems to have been a good and very fast approximation to a distinct hash for each row except when the high 32 bits were all zero. One solution to fixing the earlier version could be to bitwise-or a single bit into the top 32 half. I can provide my testing and benchmarking on this offline if interested.

    For small values of row length as used in the current unit tests, this doesn't matter. As the row length gets larger, the gap between the earlier algorithm and the current one widens and I believe becomes significant.

  • When using `string` as keys, there could be hash collisions, and thus `Get` could return the wrong data

    When using `string` as keys, there could be hash collisions, and thus `Get` could return the wrong data

    This is more a "design question" than an actual "issue", however given its implications I think it is an important question which impacts either the design or the usage constraints.

    Given that the KeyToHash (1) function supports string (and []byte), and it returns uint64, it is possible that there are hash collisions.

    However the cache.Get (2) method doesn't check if the found entry (if any) actually has the given key. (In fact the store doesn't even support storing the key.)

    (1) https://github.com/dgraph-io/ristretto/blob/master/z/z.go#L20 (2) https://github.com/dgraph-io/ristretto/blob/master/cache.go#L83

  • MaxCost sometimes ignored

    MaxCost sometimes ignored

    I have written a simple program that Gets and Sets entries in Ristretto. The program sets a MaxCost (in bytes) and passes a value for the cost to Set based on the size in bytes of the value.

    Code can be found here: https://gist.github.com/erikdubbelboer/bd904c51f00079421fd3eb2a061a50c0

    As you can see from the source Ristretto should be limited to 1GB of memory. The program doesn't do anything strange.

    In most cases this program works fine and Ristretto limits the memory usage correctly. But in the case of the exact code as above the memory usage keeps growing.

    Also keep in mind that runtime.GC() is run every second so it's not just uncollected allocations we seeing here. Not running runtime.GC() has the same result just more variation in the amount allocated.

    This is the output when run for a couple of seconds (just go run ristretto-bug.go):

    time: 1s alloc: 1033.72 MiB hits: 43 misses: 2614
    time: 2s alloc: 1035.74 MiB hits: 71 misses: 3230
    time: 3s alloc: 1038.76 MiB hits: 98 misses: 3210
    time: 4s alloc: 1048.77 MiB hits: 87 misses: 3236
    time: 5s alloc: 1065.79 MiB hits: 109 misses: 3229
    ...
    time: 45s alloc: 5893.52 MiB hits: 642 misses: 2773
    time: 46s alloc: 6049.54 MiB hits: 704 misses: 2689
    

    The output of http://localhost:6060/debug/pprof/heap?debug=1 shows that all memory is allocated inside strings.Repeat.

    My guess is that in some cases Ristretto keeps references to memory for entries that aren't part of the cache anymore.

  • Port TinyLFU eviction policy from Caffeine

    Port TinyLFU eviction policy from Caffeine

    The implementation included in this change is non-concurrent. It allows injection of admission policies, but provides none yet. Further, storage of key/value is left as a separate exercise; this tracks only whether a key should be cached or evicted. Some small improvements have been made over Caffeine to reduce allocation of list nodes.


    This change is Reviewable

  • Question - Keys added and Keys evicted graphs are similar??

    Question - Keys added and Keys evicted graphs are similar??

    Hey Folks,

    I am testing Ristretto in QA environment. I am still trying to understand few things. Once thing that I noticed, line for Keys Added and Keys Evicted are super close to each other.

    Screen Shot 2020-06-26 at 10 05 49 AM

    I have a feeling that I am doing something wrong here.

    Case size = 10gig

  • Improve memory performance

    Improve memory performance

    • Use an int64 instead of a time.Time struct to represent the time.
    • By default, include the cost of the storeItem in the cost calculation.

    Related to DGRAPH-1378


    This change is Reviewable

  • New release?

    New release?

    The last release was 112 days ago. Some useful commits have been merged into master since then. Maybe it's time for a new release?

    Changes since the last version:

    • 49dc42c Add Anurag as codeowner (#158) (Anurag)
    • 7a3f2d3 z: use MemHashString and xxhash.Sum64String (#153) (Koichi Shiraishi)
    • 9c31bb2 Check conflict key before updating expiration map. (#154) (Martin Martinez Rivera)
    • 62cb731 Fix race condition in Cache.Clear (#133) (Martin Martinez Rivera)
    • 9af1934 Docs and whitespace changes for readability. (#143) (Martin Martinez Rivera)
  • Why is ristretto using so much memory?

    Why is ristretto using so much memory?

    package main
    
    import (
    	"fmt"
    	"math/rand"
    	"strconv"
    
    	"github.com/dgraph-io/ristretto"
    )
    
    const (
    	maxCnt  = 1e7
    	maxLoop = 1e8
    )
    
    func main() {
    	cahchePool, err := ristretto.NewCache(&ristretto.Config{
    		NumCounters: 1 * 1e7,
    		MaxCost:     200 * (1 << 20), // allocate 200M, but running it need 2GB, why?(when i run this program, it kill by OOM)
    		BufferItems: 64,
    		Metrics:     true,
    	})
    
    	if err != nil {
    		panic(err)
    	}
    
    	key := "def48b5abb6388d3cbbb18e550f47b4cYB6eRI3cK1VN2qCEHp8kvuMuH20dq10cYDcG2e"
    	for i := 0; i < maxLoop; i++ {
    		suffix := strconv.Itoa(rand.Intn(maxCnt))
    		cahchePool.Set(key+suffix, suffix, int64(len(key+suffix)))
    		if (i % maxCnt) == 0 {
    			fmt.Println(cahchePool.Metrics)
    		}
    	}
    
    	cnt := 0
    	for i := 0; i < maxCnt; i++ {
    		if _, found := cahchePool.Get(key + strconv.Itoa(i)); found {
    			cnt++
    		}
    	}
    	fmt.Println("cnt:", cnt, "\n", cahchePool.Metrics)
    }
    
    
  • Ristretto causing memory corruption

    Ristretto causing memory corruption

    After adding Ristretto to our micro-service orchestrator written in go, we have been having data corruption issues. The cached data has randomly been inaccurate, forcing me to remove this in-memory cache implementation. Not sure how I would go about logging this information. I can try and take some time over the weekend to create an example repo to exhibit the issue.

  • Add ability to hook into the

    Add ability to hook into the "eviction" event

    Please provide another member onEvict in the Config struct which would be a function that would be called on eviction if it is present. This is needed so that users would be able to do some more, extra house-keeping from their side.

  • feat: support msync for windows

    feat: support msync for windows

    Hi, I encountered a windows power loss caused problem while using badger, and see that msync on windows is not supported.

    I add the implementation and test it in a windows10 VM, it just works.

    see: https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-flushviewoffile


    This change is Reviewable

  • fix(cmSketch)

    fix(cmSketch)

    Fixes https://github.com/dgraph-io/ristretto/issues/108.

    Actually there is one fix and one improvement.

    Fix:

    The fix addresses the initial concern raised in https://github.com/dgraph-io/ristretto/issues/108, namely that there had been a prior commit that removed the influence of the row index i on the index used to access the counter field. So while there were four rows, the results were like there was just one row. The benefit of three extra rows in the sketch was lost. This commit fixes that by using part of the spread key differently for each row again. Now the minimum that is computed by the Estimate function really has four distinct values to work with, rather than four copies of the same counter value.

    Improvement:

    A few years ago I had the discussion in https://github.com/dgraph-io/ristretto/issues/108 and ended up getting xor and shift operations from the Caffeine author and the hash-prospector tool to be able to index into the cmSketch rows much more uniformly, even when the keys being given had low entropy.

    In further testing, this spread the indexes being used much more evenly, meaning there would be fewer times the counter value had to be clipped at the max value, and this should logically make the value returned by Estimate more accurate.

    I used the hash-propector tool for 64 bit hashes to come up with the order of xor and shift operations and the constants to use but given the size of the search space, of course could not get an exhaustive answer. And in the years since, the tool has improved. I see a lot of progress was made for 32 bit hashes. Someone is welcome to look for new constants for an even better result, or even if they just wanted to document where the constants come from better than I can now.

  • fix(cmSketch test) and add better cmSketch test

    fix(cmSketch test) and add better cmSketch test

    One Sketch test was intended to check that the seeds had caused each sketch row to be unique but the way the test was iterating, the failure wasn't going to be triggered.

    With this fix, the test passes seeming to indicate the seeds are
    working as intended on the row creation. But there is a problem.
    

    A new Sketch test is added that goes to the trouble of sorting the counters of each row to ensure that each row isn't just essentially a copy of another row, where the only difference is which position the counters occupy.

    And the new Sketch test is performed twice, once with high-entropy hashes, because they come from a PRNG, and once with low-entropy hashes, which in many cases will be normal, because they are all small numbers, good for indexing, but not good for getting a spread when simply applied to a bitwise XOR operation.

    These new tests show a problem with the counter increment logic
    within the cmSketch.Increment method which was most likely
    introduced by commit f823dc4a.
    
    A subsequent commit addresses the problems surfaced. But as the
    discussion from issue #108 shows (discussion later moved to
    https://discuss.dgraph.io/t/cmsketch-not-benefitting-from-four-rows/8712 ),
    other ideas for incrementing the counters were considered
    by previous authors as well.
    

    Fix existing cmSketch test and add improved cmSketch test

    (Marked as draft because this introduces a test that fails for now. I can commit a fix to the cmSketch increment method to get the new test to pass - if a maintainer agrees there is a problem to be fixed. See #108. I tried a few years ago.)

    One Sketch test was intended to check that the seeds had caused each sketch row to be unique but the way the test was iterating, the failure wasn't going to be triggered.

    With this fix, the test passes seeming to indicate the seeds are
    working as intended on the row creation. But there is a problem with
    the actual increment method. A new Sketch test is added that goes
    further and sorts the counters of each row to ensure that each row
    isn't just essentially a copy of another row, where the only
    difference is which position the counters occupy.
    

    And the new Sketch test is performed twice, once with high-entropy hashes, because they come from a PRNG, and once with low-entropy hashes, which in many cases will be normal, because they are all small numbers, good for indexing, but not good for getting a spread when simply applied with a bitwise XOR operation.

    These new tests show a problem with the counter increment logic
    within the cmSketch.Increment method which was most likely
    introduced by commit f823dc4a.
    
    A subsequent commit addresses the problems surfaced. But as the
    discussion from issue #108 shows (discussion later moved to
    https://discuss.dgraph.io/t/cmsketch-not-benefitting-from-four-rows/8712),
    other ideas for incrementing the counters were considered by
    previous authors of the original Java code as well.
    

    Problem

    Solution

  • fix(Buffer): uint32 -> uint64 in slice methods

    fix(Buffer): uint32 -> uint64 in slice methods

    Problem

    I'm currently on Dgraph 21.12 unable to export my data. Export fails on just two nodes out of 7 with the "Unexpected EOF" error:

    Dec 14 11:31:51 dm-dgraph-04 dgraph[224355]: I1214 11:31:51.840130  224355 log.go:34] Export [01h26m22s] Scan (12): ~2.1 TiB/2.5 TiB at 177 MiB/sec. Sent: 801.5 GiB at 231 MiB/sec. jemalloc: 7.5 GiB
    Dec 14 11:31:55 dm-dgraph-04 dgraph[224355]: W1214 11:31:55.408201  224355 log.go:36] Error while sending: unexpected EOF
    

    Skipping rather long investigation of this issue I came to find length of slice, written to the Buffer during export exceed the size of uint32 (i've decoded varint before Value field in Badger KV struct with RDF's to get something around 4.5Gb, which is expected for a rather bloated reverse edge to the one of the most common nodes in my DB. Also count query returns 72 105 794 connected nodes which is, welp, quite a lot).

    Not to mention that working with int which is almost always is int64 and then casually casting it to uint32 w/o any checks or warnings is as bad as it gets.

    Solution

    Find any 4 and Uint32 and carefully replace them with 8 and Uint64. As this happens only in slice-related methods the fix is quite easy. Locally tests run just fine, but i had to patch the sort one to accommodate for size changes. Also i did test 21.12-related badger version and tests run fine too.

    Afterword

    I'm somewhat in a hurry, so sorry for not being able to provide more info. I'll write something here some time later. Also, as far as a can tell - any version of dgraph should be affected by this. Not sure how I turned out to be the first one.

  • [BUG]: ristretto used about 2 x MaxCost memory??

    [BUG]: ristretto used about 2 x MaxCost memory??

    What version of Ristretto are you using?

    0.11

    What version of Go are you using?

    go 1.19.3

    Have you tried reproducing the issue with the latest release?

    Yes

    What is the hardware spec (RAM, CPU, OS)?

    4c/8g Linux 5.4.219-126.411.amzn2.x86_64 #1

    What steps will reproduce the bug?

    package main

    import ( "time" "fmt"

    ro "github.com/dgraph-io/ristretto"
    "github.com/google/uuid"
    

    )

    func main() {

    ch := make(chan bool)
    
    cache, err := ro.NewCache(&ro.Config{
    	NumCounters: 1000000,    // number of keys to track frequency of (10M).
    	MaxCost:     1024*1024*100, // maximum cost of cache (1GB).
    	BufferItems: 64,         // number of keys per Get buffer.
    	Metrics: true,
    	IgnoreInternalCost:false,
    })
    if err != nil {
    	panic(err)
    }
    
    for i:=0; i<10; i++ {
    go func() {
    	for {
    		raw := make([]byte, 1024*16)
    		cache.Set(uuid.NewString(), raw, int64(len(raw)))
    		<-time.After(1 * time.Millisecond)
    	}
    
    }()
    }
    go func() {
    	for {
    		fmt.Println("cost:",(cache.Metrics.CostAdded()- cache.Metrics.CostEvicted())/1024.0/1024.0, "MB item:", cache.Metrics.KeysAdded() - cache.Metrics.KeysEvicted())
    		<-time.After(1000 * time.Millisecond)
    	}
    }()
    <-ch
    

    }

    Expected behavior and actual result.

    in my demo, the max-cost is 100MB, but top your pid, you will find it's used 200MB memory

    Additional information

    No response

  • chore: use actual allocated size as internal cost

    chore: use actual allocated size as internal cost

    Now we calculate internal item struct cost by using unsafe.SizeOf, this MR tries to use roundupsize to returns size of the memory block that mallocgc will allocate.

Ristretto - A high performance memory-bound Go cache

Ristretto Ristretto is a fast, concurrent cache library built with a focus on pe

Dec 5, 2022
Dec 28, 2022
Cache library for golang. It supports expirable Cache, LFU, LRU and ARC.
Cache library for golang. It supports expirable Cache, LFU, LRU and ARC.

GCache Cache library for golang. It supports expirable Cache, LFU, LRU and ARC. Features Supports expirable Cache, LFU, LRU and ARC. Goroutine safe. S

Dec 30, 2022
Package cache is a middleware that provides the cache management for Flamego.

cache Package cache is a middleware that provides the cache management for Flamego. Installation The minimum requirement of Go is 1.16. go get github.

Nov 9, 2022
A mem cache base on other populator cache, add following feacture

memcache a mem cache base on other populator cache, add following feacture add lazy load(using expired data, and load it asynchronous) add singlefligh

Oct 28, 2021
Cache - A simple cache implementation

Cache A simple cache implementation LRU Cache An in memory cache implementation

Jan 25, 2022
Gin-cache - Gin cache middleware with golang

Gin-cache - Gin cache middleware with golang

Nov 28, 2022
fastcache - fast thread-safe inmemory cache for big number of entries in Go

Fast thread-safe inmemory cache for big number of entries in Go. Minimizes GC overhead

Dec 27, 2022
Fast key-value cache written on pure golang

GoCache Simple in-memory key-value cache with default or specific expiration time. Install go get github.com/DylanMrr/GoCache Features Key-value stor

Nov 16, 2021
groupcache is a caching and cache-filling library, intended as a replacement for memcached in many cases.

groupcache Summary groupcache is a distributed caching and cache-filling library, intended as a replacement for a pool of memcached nodes in many case

Dec 31, 2022
☔️ A complete Go cache library that brings you multiple ways of managing your caches
☔️ A complete Go cache library that brings you multiple ways of managing your caches

Gocache Guess what is Gocache? a Go cache library. This is an extendable cache library that brings you a lot of features for caching data. Overview He

Jan 1, 2023
An in-memory cache library for golang. It supports multiple eviction policies: LRU, LFU, ARC

GCache Cache library for golang. It supports expirable Cache, LFU, LRU and ARC. Features Supports expirable Cache, LFU, LRU and ARC. Goroutine safe. S

May 31, 2021
An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications.

go-cache go-cache is an in-memory key:value store/cache similar to memcached that is suitable for applications running on a single machine. Its major

Dec 29, 2022
gdcache is a pure non-intrusive distributed cache library implemented by golang
gdcache is a pure non-intrusive distributed cache library implemented by golang

gdcache is a pure non-intrusive distributed cache library implemented by golang, you can use it to implement your own distributed cache

Sep 26, 2022
A REST-API service that works as an in memory key-value store with go-minimal-cache library.

A REST-API service that works as an in memory key-value store with go-minimal-cache library.

Aug 25, 2022
A multi-level cache library with stampede prevention for Go

HybridCache A multi-level cache library with cache stampede prevention for Go import "github.com/cshum/hybridcache" // Redis cache adapter based on R

Nov 21, 2022
An in-memory key:value store/cache library written in Go 1.18 generics

go-generics-cache go-generics-cache is an in-memory key:value store/cache that is suitable for applications running on a single machine. This in-memor

Dec 27, 2022
Gocodecache - An in-memory cache library for code value master in Golang

gocodecache An in-memory cache library for code master in Golang. Installation g

Jun 23, 2022
A zero-dependency cache library for storing data in memory with generics.

Memory Cache A zero-dependency cache library for storing data in memory with generics. Requirements Golang 1.18+ Installation go get -u github.com/rod

May 26, 2022