A memcached proxy that manages data chunking and L1 / L2 caches

Rend: Memcached-Compatible Server and Proxy

Build Status Dev chat at https://gitter.im/Netflix/rend GoDoc GoReportCard

Rend is a proxy whose primary use case is to sit on the same server as both a memcached process and an SSD-backed L2 cache. It is written in Go and is under active development at Netflix. Some more points about Rend:

  • Designed to handle tens of thousands of concurrent connections
  • Speaks a subset of the memcached text and binary protocols
  • Comes with a load testing and correctness testing client package
  • Modular design to allow different pieces to be replaced

Rend is currently in production at Netflix and serving live member traffic.

Motivation

Caching is used several ways at Netflix. Some people use it as a true working set cache, while others use it as the only storage mechanism for their service. Others use it as a session cache. This means that some services can continue as usual with some data loss, while others will permanently lose data and start to serve fallbacks. Rend is built to complement EVCache, which is the main caching solution in use at Netflix.

The genesis of Rend starts with Memcached memory management. Internally, Memcached keeps a set of slabs for different size data. Slabs are logical groupings of pages, which are a fixed size set on startup. Pages map to physical memory and are split based on the slab's data size. In versions 1.4.24 and prior, pages were permanently allocated to a particular slab and never released even if empty. As well, if there were many holes in the data in RAM, there was no compaction and therefore memory could get very fragmented over time. This has changed over time, and now this is less of a problem than it was before.

The second half of the story is within Netflix. Every night, a big batch process computes recommendations for each of our members in multiple steps. Each of these steps loads their output into EVCache. An underlying data source changed one day in such a way that caused the output of one batch compute process to change drastically in size. When the data set was being written to the cache, it was different enough in size to land in a different Memcached slab. The cache was sized to hold one copy of the data, not two, so when the new data was written, the memory filled completely. Once full, Memcached started evicting large portions of newly-computed data while holding on to mostly empty memory in a different slab.

So what was the solution? Take the incoming data and split it into fixed-size chunks prior to inserting into Memcached. This bypassed the complication of the slab allocator. If everything is the same size, there will never be holes that are out of reach for new data. This hardened us against future data changes, which are inevitable. Rend (which means "to tear apart") is the server-side solution to this problem, which also enables much more intelligence on the server.

Components

Rend is a server and a set of libraries that can be used to compose a Memcached-compatible server of your own. It consists of packages for protocol parsing, a server loop, request orchestration (for L1 / L2 logic), and a set of handlers that communicate with the backing storage. It also includes a metrics library that is unintrusive and fast. The memproxy.go file acts as the main function for Rend as a server and showcases the usage of all of the available components.

Rend internals

Setup and Prerequisites

Dependencies

To just get started, everything needed is in this repository. The Basic Server section shows how to stand up a simple server.

In order to use the proxy in L1-only mode, it is required to have a Memcached-compatible server running on the local machine. For our production deployment, this is Memcached itself. It is always recommended to use the latest version. This version has the full set of features used by the proxy as well as a bunch of performance and stability improvements. The version that ships with Mac OS X does not work (it is very old). You can see installation instructions for Memcached at https://memcached.org.

To run the project in L1/L2 mode it is required to run a Rend-based server as the L2. The logic within Rend uses a Memcached protocol extension (the gete command) to retrieve the TTL from the L2. There's plans to make this optional, but it is not yet.

As well, to build Rend, a working Go distribution is required. The latest Go version is used for development.

Get the Source Code

go get github.com/netflix/rend

Build and Run

Rend doesn't require any special build steps. It also does not have any external dependencies. The Go toolchain is used to build and run.

go build github.com/netflix/rend
./rend

Basic Server

Using the default Rend server (memproxy.go)

Getting a basic rend server running is easy:

go get github.com/netflix/rend
go build github.com/netflix/rend
./rend --l1-inmem

To test it, open another console window and use netcat to try it out: (> means typed input)

$ nc localhost 11211
> get foo
END
> set foo 0 0 6
> foobar
STORED
> get foo
VALUE foo 0 6
foobar
END
> touch foo 2
TOUCHED
> get foo
VALUE foo 0 6
foobar
END
> get foo
END
> quit
Bye

It should be noted here that the in-memory L1 implementation is functionally correct, but it is for debugging only. It does not free memory when an entry expires and keeps everything in a simple map with an RWMutex.

Using Rend as a set of libraries

To get a working debug server using the Rend libraries, it takes 21 lines of code, including imports and whitespace:

package main

import (
    "github.com/netflix/rend/handlers"
    "github.com/netflix/rend/handlers/inmem"
    "github.com/netflix/rend/orcas"
    "github.com/netflix/rend/server"
)

func main() {
    server.ListenAndServe(
        server.ListenArgs{
            Type: server.ListenTCP,
            Port: 11211,
        },
        server.Default,
        orcas.L1Only,
        inmem.New,
        handlers.NilHandler(""),
    )
}

Testing

Rend comes with a separately developed client library under the client directory. It is used to do load and functional testing of Rend during development.

blast.go

The blast script sends random requests of all types to the target, including:

  • set
  • add
  • replace
  • append
  • prepend
  • get
  • batch get
  • touch
  • get-and-touch
  • delete

Use the binary Memcached protocol with 10 worker goroutines (i.e. 10 connections) to send 1,000,000 requests with a key length of 5.

go run blast.go --binary -n 1000000 -p 11211 -w 10 -kl 5

setget.go

Run sets followed by gets, with verification of contents. The data is between 5 and 20k in length.

go run setget.go --binary -n 100000 -p 11211 -w 10

sizes.go

Runs sets of a steadily increasing size to catch errors with specific size data. It runs sets from 0 bytes all the way up to 100k for the value.

go run sizes.go --binary -p 11211

fill.go

Simply sends sets into the cache to test set rate and eviction policy. The following sends 1 billion sets with random 10 character keys on 100 connections:

go run fill.go --binary -p 11211 -h localhost -kl 10 -w 100 -n 1000000000

setops.go

Sends all different kinds of set operations at the target, including:

  • set
  • add
  • replace
  • append
  • prepend
go run setops.go --binary -p 11211 -n 1000000 -w 10 -kl 3
Owner
Netflix, Inc.
Netflix Open Source Platform
Netflix, Inc.
Comments
  • Add connection options to the all handlers to allow for TCP connections

    Add connection options to the all handlers to allow for TCP connections

    it seems that l2 only work with unix socket, can't we pass a tcp port?

    func Regular(sock string) handlers.HandlerConst {
    	return func() (handlers.Handler, error) {
    		conn, err := net.Dial("unix", sock)
    
  • Keep track of hot keys

    Keep track of hot keys

    Need ability to figure out hot keys that are repeatedly being hit within a short duration and able to report them so that we can take corrective action.

  • Use bucket histograms as well for collecting latency metrics

    Use bucket histograms as well for collecting latency metrics

    These aren't as accurate in terms of percentiles but aggregate much better across fleets. The Atlas metrics system inside Netflix also has support for bucketized metrics aggregation that we currently can't take advantage of because the data is not placed into fixed buckets.

    This would help us get fleet-wide percentiles that are still approximations, but at least have some basis in sound math and statistics instead of our current average-of-95th method.

  • fix : client textprot Get always return

    fix : client textprot Get always return "END\r\n"

    Hi !

    I was testing my custom handler then I spotted a weird bug :)

    > go run setget.go -p 11211 -w 1 -n 1
    2017/02/09 10:50:42 Done generating keys
    Connected to memcached.
    Setting key AAAB to value of length 8101
    STORED
    
    Set key AAAB
    Getting key AAAB
    VALUE AAAB 0 8101
    
    ZJQFS
    [...]
    LOXFW
    
    END
    
    Got key AAAB
    
    return = [69 78 68 13 10] --> "END\CR\LF"
    
    2017/02/09 10:50:42 Data returned from server does not match! 
    Data len sent: 8101 
    Data len recv: 5
    2017/02/09 10:50:42 Total comm time: 3421833
    

    This patch should make textprot Get function return what is expected (trimmed value)

    Thanks for your work !

  • Add support for Redis protocol

    Add support for Redis protocol

    Thank you for writing and open-sourcing this project! It looks very interesting.

    It looks like Rend only supports the memcached protocol at this time. @tyagihas and I were wondering if there are any plans to add Redis protocol as well, or if not, whether you would accept contributions to generalize the system to support multiple protocols, and add support for Redis.

    What do you think?

  • Add .travis.yml config for Travis CI support.

    Add .travis.yml config for Travis CI support.

    The default options for Go projects might be sufficient as per instructions. This config simply adds the setting to use the most-recent Linux distro available on Travis CI (Trusty) and disables sudo to use the more efficient container-based infrastructure.

    Additional customizations can be added once this config is enabled and it's easy to test new iterations via the GitHub PR mechanism, which will get Travis to test each PR automatically.

    Note that a repo admin needs to activate this repo on Travis CI by going to this page: https://travis-ci.org/Netflix/rend

  • Support append and prepend commands

    Support append and prepend commands

    The append command should be supported in both the text and binary protocols. This is part of the subset of commands used at Netflix which will be supported out of the gate. This implies that the external protocols support "append" in their form (text or binary) and that the local handlers are able to perform an append operation.

  • Batching connection fix

    Batching connection fix

    This change adds error handling to the batching connection handler to recover when memcached crashes or is killed. This was a somewhat large change as most things needed to gain some extra retry logic.

  • Better handling of problematic sets

    Better handling of problematic sets

    Previously, things like an out of memory error may mean that there is an inconsistent state in L1 after a set operation. On any kind of error during a set operation a delete will be sent to L1 afterwards and the operation will succeed even though L1 had an error.

    This PR is the beginning of fixing #102

  • Add SASL support to proxy side

    Add SASL support to proxy side

    Hi, Any plans to add SASL support to proxy side? It'll be nice to move auth from memcached nodes to proxy server, because proxy servers usually have better CPU and gives more unified way to auth requests.

  • Use monotonic clocks for timing

    Use monotonic clocks for timing

    Currently the timing metrics collected use a realtime clock provided by time.Now(). This is inaccurate for timing purposes where we care about the actual amount of time that has passed vs. the time that has passed in the "real" wall-clock time. Clock adjustments can provide wildly inaccurate timing for short-term actions. The answer, unfortunately, is to create a package that can provide a raw monotonic clock read by calling clock_gettime through VDSO.

    This must be done in assembly to conform to the C calling conventions. This is a paved path, which is really nice for us. spacemonkeygo has a proper package to do this:

    https://github.com/spacemonkeygo/monotime/blob/master/mono_linux_amd64.s

    This code can be added to the rend project (along with proper license handling) to handle raw monotonic time.

  • cant build problem with timer.Now

    cant build problem with timer.Now

    Im trying to build, but probably im missing some variable, could you kindly help me The error is the following: im using linux and amd64 debian@rend1:~$ go get github.com/netflix/rend

    github.com/netflix/rend

    github.com/netflix/rend/timer.Now: relocation target runtime.__vdso_clock_gettime_sym not defined debian@servprod1:~$

  • .travis.yml: The 'sudo' tag is now deprecated in Travis CI

    .travis.yml: The 'sudo' tag is now deprecated in Travis CI

  • Add if statement to remote.Close() call

    Add if statement to remote.Close() call

    After an error in listener.Accept() remote object can be nil.https://github.com/Netflix/rend/blob/d3db570668d3ecd97cbdf0988c92145a9040e235/server/listen.go#L122 Calling remote.Close() in this line causes a panic.https://github.com/Netflix/rend/blob/d3db570668d3ecd97cbdf0988c92145a9040e235/server/listen.go#L125

    We noticed this behaviour because this error Error accepting connection from remote: accept tcp [::]:11211: accept4: too many open files causes our implementation to panic.

    Since the wanted behavior is to continue the listener.Accept() loop in case of error, we should wrap this particular remote.Close() call with an if remote != nil statement

    Fixes https://github.com/Netflix/rend/issues/129

  • Error accepting connection from remote causes panic

    Error accepting connection from remote causes panic

    After an error in listener.Accept() remote object can be nil.https://github.com/Netflix/rend/blob/d3db570668d3ecd97cbdf0988c92145a9040e235/server/listen.go#L122 Calling remote.Close() in this line causes a panic.https://github.com/Netflix/rend/blob/d3db570668d3ecd97cbdf0988c92145a9040e235/server/listen.go#L125

    We noticed this behaviour because this error Error accepting connection from remote: accept tcp [::]:11211: accept4: too many open files causes our implementation to panic.

    Since the wanted behavior is to continue the listener.Accept() loop in case of error, we should wrap this particular remote.Close() call with an if remote != nil statement

  • CAS command

    CAS command

    Hi guys, I've seen that the CAS commands are not implemented since they were not used: https://github.com/Netflix/rend/blob/992c5314c3d257f5ee3489583cdaecca4cfdc101/protocol/binprot/headers.go#L193

    Do you have any plans about this or would you accept PRs?

    Thanks!

Related tags
groupcache is a caching and cache-filling library, intended as a replacement for memcached in many cases.

groupcache Summary groupcache is a distributed caching and cache-filling library, intended as a replacement for a pool of memcached nodes in many case

Dec 31, 2022
A memcached binary protocol toolkit for go.

gomemcached This is a memcached binary protocol toolkit in go. It provides client and server functionality as well as a little sample server showing h

Nov 9, 2022
An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications.

go-cache go-cache is an in-memory key:value store/cache similar to memcached that is suitable for applications running on a single machine. Its major

Dec 29, 2022
Go Memcached client library #golang

About This is a memcache client library for the Go programming language (http://golang.org/). Installing Using go get $ go get github.com/bradfitz/gom

Dec 28, 2022
memcached operator

memcached-operator Operator SDK 中的 Go 编程语言支持可以利用 Operator SDK 中的 Go 编程语言支持,为 Memcached 构 建基于 Go 的 Operator 示例、分布式键值存储并管理其生命周期。 前置条件 安装 Docker Desktop,

Sep 18, 2022
🧩 Redify is the optimized key-value proxy for quick access and cache of any other database throught Redis and/or HTTP protocol.

Redify (Any database as redis) License Apache 2.0 Redify is the optimized key-value proxy for quick access and cache of any other database throught Re

Sep 25, 2022
Efficient cache for gigabytes of data written in Go.

BigCache Fast, concurrent, evicting in-memory cache written to keep big number of entries without impact on performance. BigCache keeps entries on hea

Dec 30, 2022
LFU Redis implements LFU Cache algorithm using Redis as data storage

LFU Redis cache library for Golang LFU Redis implements LFU Cache algorithm using Redis as data storage LFU Redis Package gives you control over Cache

Nov 10, 2022
🦉owlcache is a lightweight, high-performance, non-centralized, distributed Key/Value memory-cached data sharing application written by Go
 🦉owlcache is a lightweight, high-performance, non-centralized, distributed Key/Value memory-cached data sharing application written by Go

??owlcache is a lightweight, high-performance, non-centralized, distributed Key/Value memory-cached data sharing application written by Go . keyword : golang cache、go cache、golang nosql

Nov 5, 2022
Rotating cache for small data with lock-free access.

Rotating cache Byte cache implementation with lock-free access. Designed to work with small data under high pressure. Lock-free access (both read and

Dec 5, 2021
A zero-dependency cache library for storing data in memory with generics.

Memory Cache A zero-dependency cache library for storing data in memory with generics. Requirements Golang 1.18+ Installation go get -u github.com/rod

May 26, 2022
Concurrency-safe Go caching library with expiration capabilities and access counters

cache2go Concurrency-safe golang caching library with expiration capabilities. Installation Make sure you have a working Go environment (Go 1.2 or hig

Jan 1, 2023
Cache library for golang. It supports expirable Cache, LFU, LRU and ARC.
Cache library for golang. It supports expirable Cache, LFU, LRU and ARC.

GCache Cache library for golang. It supports expirable Cache, LFU, LRU and ARC. Features Supports expirable Cache, LFU, LRU and ARC. Goroutine safe. S

Dec 30, 2022
Cachy is a simple and lightweight in-memory cache api.
Cachy is a simple and lightweight in-memory cache api.

cachy Table of Contents cachy Table of Contents Description Features Structure Configurability settings.json default values for backup_file_path Run o

Apr 24, 2022
Design and Implement an in-memory caching library for general use

Cache Implementation in GoLang Problem Statement Design and Implement an in-memory caching library for general use. Must Have Support for multiple Sta

Dec 28, 2021
lru: the most concise and efficient LRU algorithm based on golang

lru This package of lru is the most concise and efficient LRU algorithm based on golang. Example Quick start: package main import ( "fmt" "github.

Dec 27, 2021
Thread-safe LRU cache with permanency and context-based expiration

go-wlru Thread-safe LRU cache with permanency and context-based expiration Operational Complexity (Time) Operation Best Average Worst Access Θ(1) Θ(1)

Mar 7, 2022
Ristretto - A fast, concurrent cache library built with a focus on performance and correctness

Ristretto Ristretto is a fast, concurrent cache library built with a focus on pe

Aug 21, 2022