Distributed reliable key-value store for the most critical data of a distributed system

etcd

Go Report Card Coverage Build Status Travis Build Status Semaphore Docs Godoc Releases LICENSE

Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order to get stable binaries.

etcd Logo

etcd is a distributed reliable key-value store for the most critical data of a distributed system, with a focus on being:

  • Simple: well-defined, user-facing API (gRPC)
  • Secure: automatic TLS with optional client cert authentication
  • Fast: benchmarked 10,000 writes/sec
  • Reliable: properly distributed using Raft

etcd is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log.

etcd is used in production by many companies, and the development team stands behind it in critical deployment scenarios, where etcd is frequently teamed with applications such as Kubernetes, locksmith, vulcand, Doorman, and many others. Reliability is further ensured by rigorous testing.

See etcdctl for a simple command line client.

Community meetings

etcd contributors and maintainers have monthly (every four weeks) meetings at 11:00 AM (USA Pacific) on Thursday.

An initial agenda will be posted to the shared Google docs a day before each meeting, and everyone is welcome to suggest additional topics or other agendas.

Time:

Join Hangouts Meet: meet.google.com/umg-nrxn-qvs

Join by phone: +1 405-792-0633‬ PIN: ‪299 906‬#

Getting started

Getting etcd

The easiest way to get etcd is to use one of the pre-built release binaries which are available for OSX, Linux, Windows, and Docker on the release page.

For more installation guides, please check out play.etcd.io and operating etcd.

For those wanting to try the very latest version, build the latest version of etcd from the master branch. This first needs Go installed (version 1.13+ is required). All development occurs on master, including new features and bug fixes. Bug fixes are first targeted at master and subsequently ported to release branches, as described in the branch management guide.

Running etcd

First start a single-member cluster of etcd.

If etcd is installed using the pre-built release binaries, run it from the installation location as below:

/tmp/etcd-download-test/etcd

The etcd command can be simply run as such if it is moved to the system path as below:

mv /tmp/etcd-download-test/etcd /usr/local/bin/
etcd

If etcd is built from the master branch, run it as below:

./bin/etcd

This will bring up etcd listening on port 2379 for client communication and on port 2380 for server-to-server communication.

Next, let's set a single key, and then retrieve it:

etcdctl put mykey "this is awesome"
etcdctl get mykey

etcd is now running and serving client requests. For more, please check out:

etcd TCP ports

The official etcd ports are 2379 for client requests, and 2380 for peer communication.

Running a local etcd cluster

First install goreman, which manages Procfile-based applications.

Our Procfile script will set up a local example cluster. Start it with:

goreman start

This will bring up 3 etcd members infra1, infra2 and infra3 and optionally etcd grpc-proxy, which runs locally and composes a cluster.

Every cluster member and proxy accepts key value reads and key value writes.

Follow the steps in Procfile.learner to add a learner node to the cluster. Start the learner node with:

goreman -f ./Procfile.learner start

Next steps

Now it's time to dig into the full etcd API and other guides.

Contact

Contributing

See CONTRIBUTING for details on submitting patches and the contribution workflow.

Reporting bugs

See reporting bugs for details about reporting any issues.

Reporting a security vulnerability

See security disclosure and release process for details on how to report a security vulnerability and how the etcd team manages it.

Issue and PR management

See issue triage guidelines for details on how issues are managed.

See PR management for guidelines on how pull requests are managed.

etcd Emeritus Maintainers

These emeritus maintainers dedicated a part of their career to etcd and reviewed code, triaged bugs, and pushed the project forward over a substantial period of time. Their contribution is greatly appreciated.

  • Fanmin Shi
  • Anthony Romano

License

etcd is under the Apache 2.0 license. See the LICENSE file for details.

Owner
etcd-io
etcd Development and Communities
etcd-io
Comments
  • Random performance issue on etcd 3.4

    Random performance issue on etcd 3.4

    Hello,

    We are running a 5 node etcd cluster:

    $ etcdctl endpoint status 
    etcd01, 507905ef22a349ce, 3.4.7, 3.1 GB, true, false, 785, 17694912855, 17694912855, 
    etcd02, 96622104eaa8652d, 3.4.7, 3.1 GB, false, false, 785, 17694912881, 17694912880, 
    ectd03, e91fce12ee84c080, 3.4.7, 3.1 GB, false, false, 785, 17694912903, 17694912903, 
    etcd04, 400fc14411f50272, 3.4.7, 3.1 GB, false, false, 785, 17694912989, 17694912985, 
    etcd05, 87c46f0b178dc777, 3.4.7, 3.1 GB, false, false, 785, 17694913043, 17694913028, 
    

    And we're having some weird performance issue eg:

    # etcdctl endpoint health
    etcd01 is healthy: successfully committed proposal: took = 12.462058ms
    etcd03 is healthy: successfully committed proposal: took = 18.826686ms
    etcd02 is healthy: successfully committed proposal: took = 19.418745ms
    etcd04 is healthy: successfully committed proposal: took = 24.314474ms
    etcd05 is healthy: successfully committed proposal: took = 244.761598ms
    
    # etcdctl endpoint health
    etcd01 is healthy: successfully committed proposal: took = 13.505405ms
    etcd03 is healthy: successfully committed proposal: took = 21.905048ms
    etcd04 is healthy: successfully committed proposal: took = 22.569332ms
    etcd02 is healthy: successfully committed proposal: took = 23.10597ms
    etcd05 is healthy: successfully committed proposal: took = 24.182998ms
    
    # etcdctl endpoint health
    etcd05is healthy: successfully committed proposal: took = 24.854541ms
    etcd01 is healthy: successfully committed proposal: took = 86.045049ms
    etcd03 is healthy: successfully committed proposal: took = 171.771975ms
    etcd04 is healthy: successfully committed proposal: took = 576.218846ms
    etcd02 is healthy: successfully committed proposal: took = 1.06666032s
    

    Not sure how to debug it, it looks pretty random. Feel free to ask for more info!

  • Data inconsistency in etcd version 3.3.11

    Data inconsistency in etcd version 3.3.11

    etcdctl get command returns values sometimes and sometimes it does not return a value even if key value is present in etcd. You can see following command output executed immediately one by one.

    bash-4.4$ etcdctl put /test thisistestvalue OK bash-4.4$ etcdctl get /test bash-4.4$ bash-4.4$ etcdctl get /test bash-4.4$ etcdctl get /test /test thisistestvalue bash-4.4$ etcdctl get /test /test thisistestvalue

    From below command, we can see that the inconsistence happens. We can see each time we query using etcdctl get and create_revision is different sometimes giving different values.

    bash-4.4$ ETCDCTL_API=3 etcdctl get /test --write-out json --consistency="s" {"header":{"cluster_id":10661059405016682411,"member_id":7511149175418186860,"revision":36793,"raft_term":16}} bash-4.4$ ETCDCTL_API=3 etcdctl get /test --write-out json --consistency="s"

    {"header":{"cluster_id":10661059405016682411,"member_id":14491470182485552592,"revision":10495,"raft_term":16} ,"kvs":[{"key":"L3Rlc3Q=","create_revision":6830,"mod_revision":6830,"version":1,"value":"dGVzdHZhbHVl"}],"count":1} bash-4.4$ ETCDCTL_API=3 etcdctl get /test --write-out json --consistency="s" {"header":{"cluster_id":10661059405016682411,"member_id":12240595110633392601,"revision":36802,"raft_term":16}} bash-4.4$

    bash-4.4$ ETCDCTL_API=3 etcdctl get /test1 --prefix=true --write-out json

    {"header":{"cluster_id":10661059405016682411,"member_id":12240595110633392601,"revision":36818,"raft_term":16} ,"kvs":[{"key":"L2VyaWMtY2Nlcy1leHRlbnNpb24tbWFuYWdlci90ZXN0","create_revision":33064,"mod_revision":33064,"version":1,"value":"dmFsdWV0ZXN0"}],"count":1} bash-4.4$ bash-4.4$ ETCDCTL_API=3 etcdctl get /test1 --prefix=true --write-out json

    {"header":{"cluster_id":10661059405016682411,"member_id":14491470182485552592,"revision":10511,"raft_term":16} ,"kvs":[{"key":"L2VyaWMtY2Nlcy1leHRlbnNpb24tbWFuYWdlci90ZXN0","create_revision":3641,"mod_revision":3641,"version":1,"value":"bXl0ZXN0dmFsdWU="}],"count":1} bash-4.4$ ETCDCTL_API=3 etcdctl get /test1 --prefix=true --write-out json

    {"header":{"cluster_id":10661059405016682411,"member_id":7511149175418186860,"revision":36819,"raft_term":16} ,"kvs":[{"key":"L2VyaWMtY2Nlcy1leHRlbnNpb24tbWFuYWdlci90ZXN0","create_revision":33064,"mod_revision":33064,"version":1,"value":"dmFsdWV0ZXN0"}],"count":1}

    Check the operation test as below: After performing Delete operation also we are able to get value for the deleted key.

    bash-4.4$ etcdctl put /temp/test mytestvalue OK bash-4.4$ etcdctl get /temp/test /temp/test mytestvalue bash-4.4$ bash-4.4$ etcdctl del /temp/test 1 bash-4.4$ etcdctl get /temp/test /temp/test mytestvalue bash-4.4$ etcdctl get /temp/test bash-4.4$ etcdctl get /temp/test bash-4.4$ etcdctl get /temp/test /temp/test mytestvalue bash-4.4$ etcdctl get /temp/test bash-4.4$ etcdctl get /temp/test /temp/test mytestvalue bash-4.4$ etcdctl get /temp/test bash-4.4$ etcdctl get /temp/test bash-4.4$ etcdctl get /temp/test /temp/test mytestvalue bash-4.4$ etcdctl del /temp/test 0 bash-4.4$ etcdctl get /temp/test /temp/test mytestvalue bash-4.4$ etcdctl del /temp/test 0 bash-4.4$ etcdctl get /temp/test bash-4.4$ etcdctl get /temp/test bash-4.4$ etcdctl get /temp/test bash-4.4$ etcdctl get /temp/test bash-4.4$ etcdctl get /temp/test bash-4.4$ etcdctl get /temp/test /temp/test mytestvalue

    These kind of data inconsistency is seen in etcd . ETCD guarantees Data consistency. Could you please help understanding the issue here.? whats happening exactly?

  • 3.5 release next steps: Code freeze Monday 5/17

    3.5 release next steps: Code freeze Monday 5/17

    etcd 3.5 has seen a tremendous amount of new features bug fixes and performance and stability improvements. At this time I would like to thank everyone for the hard work and dedication. As per the community meeting on May 6th we have outlined Monday, May 17th as code freeze for the 3.5 release. Below are the proposed series of events for input.

    Steps:

    • [x] cut release-3.5 branch from master (now main). All future merges to main will be 3.6 moving forward (Monday AM 5/17).
    • [x] cut new v3.5.0-beta.3
    • [x] cut new modularized release v3.5.0-beta.3 (Monday AM 5/17).
    • [x] begin correctness and performance/stress testing (ideas, ownership?)

    After testing and validation of the beta, we can consider a sequential beta release or move towards a v3.5 release candidate.

    Your input is greatly appreciated.

    cc @xiang90 @wenjiaswe @ptabor @gyuho @jingyih @lilic @jpbetz @spzala

  • Bad latency (>100ms) of storage

    Bad latency (>100ms) of storage

    https://github.com/coreos/etcd/pull/4070 added a new benchmark for stressing etcd storage. It found a very high latency of put process.

    When a number of written keys is not so large (e.g. 1000), the result will be like this:

    total: 2.686489ms
    average: 2.686µs
    minimum latency: 1.545µs
    maximum latency: 18.209µs
    

    When the number of keys is larger (e.g. 10000), the result will be like this:

    total: 184.821125ms
    average: 18.482µs
    rate: 54106.3691
    minimum latency: 1.719µs
    maximum latency: 142.160798ms
    

    The high latency is also confirmed by @xiang90 : https://github.com/coreos/etcd/pull/4070#issuecomment-168287088

    The result can be easily reproduced with the benchmark e.g. ./benchmark storage put --total 10000.

  • client: file and environment variables based configuration

    client: file and environment variables based configuration

    Current etcd client library doesn't provide a mechanism for creating client.Client object based on file or env vars. Therefore users must write their own feature for the purpose in their applications.

    For avoiding the duplicated implementations, this PR adds new functions for creating client with file and environment variables based on viper.

    NewWithFile(client.Config, configPath string): create a new Client with a file. Any file formats supported by viper will be accepted.

    NewWithEnv(client.Config): create a new Client with values in environment variables.

    Like the existing New(), application must pass a client.Config object because there are some fields which isn't related to configuration (e.g. Transport).

    Fixes https://github.com/coreos/etcd/issues/4008

  • Move etcd to github.com/etcd-io/etcd*

    Move etcd to github.com/etcd-io/etcd*

    We are moving etcd and other sub-projects to its own GitHub organization.

    The new org will be https://github.com/etcd-io.

    Many popular Go projects have done this, for better project management:

    • https://github.com/google/protobuf/issues/4796
    • https://github.com/kubernetes/kubernetes/issues/12211, https://github.com/kubernetes/kubernetes/issues/29014
    • https://github.com/vitessio/vitess/pull/3702, https://github.com/vitessio/vitess/pull/3725

    Some of our motivations are:

    • Better team management.
    • Better CI resource utilization; currently, etcd relies on free-tier public CI service, and sharing all resources with other github.com/coreos projects slows down development process.
    • More visibilities to sub-projects and adopt more community projects under etcd organization.

    Move github.com/coreos/etcd and github.com/coreos/bbolt:

    • [x] Decide which namespace we will be using (1 ~ 2 weeks)
    • [x] Announce to etcd and Kubernetes communities (Mon, August 6, 2018)
    • [x] Update all internal github.com/coreos/etcd import paths in all branches (5PM PST, Mon, August 27, 2018)
    • [x] Disable CI integration with current org (5PM PST, Mon, August 27, 2018)
    • [x] Transfer ownership (5PM PST, Mon, August 27, 2018)
    • [x] Set up new CIs (5PM PST, Mon, August 27, 2018)
      • Add environmental variable ETCD_ELECTION_TIMEOUT_TICKS=600
    • [x] Make sure old URL redirects to new URL (5PM PST, Mon, August 27, 2018)
    • [x] Set up vanity import paths to go.etcd.io/$proj

    Sub-projects that do not have downstream projects can be transferred right away:

    • [x] https://github.com/coreos/cetcd (9PM PST, Tue, August 7, 2018)
    • [x] https://github.com/coreos/zetcd (9PM PST, Tue, August 7, 2018)
    • [x] https://github.com/coreos/gofail (9PM PST, Tue, August 7, 2018)
    • [x] https://github.com/coreos/dbtester (Mon, August 6, 2018)
    • [x] https://github.com/coreos/etcdlabs (Mon, August 6, 2018)
    • [x] https://github.com/coreos/etcd-play (Mon, August 6, 2018)
    • [x] https://github.com/coreos/protodoc (Mon, August 6, 2018)
    • [x] https://github.com/coreos/jetcd (9PM PST, Tue, August 7, 2018)

    Projects that won't be moved:

    • https://github.com/coreos/etcd-operator
    • https://github.com/coreos/discovery.etcd.io

    Note: GitHub will redirect all requests to new URL.

    /cc @xiang90 @jpbetz @lburgazzoli @philips @jberkus

  • Durability API guarantee broken in single node cluster

    Durability API guarantee broken in single node cluster

    I observed the possibility of data loss and I would like the community to comment / correct me otherwise.

    Before explaining that, I would like to explain the happy path when user does a PUT <key, value>. I have tried to only necessary steps to focus this issue. And considered a single etcd instance.

    ==================================================================================== ----------api thread --------------

    User calls etcdctl PUT k v

    It lands in v3_server.go::put function with the message about k,v

    Call delegates to series of function calls and enters v3_server.go::processInternalRaftRequestOnce

    It registers for a signal with wait utility against this keyid

    Call delegates further to series of function calls and enters raft/node.go::stepWithWaitOption(..message..)

    It wraps this message in a msgResult channel and updates its result channel; then sends this message to propc channel.

    After sending it waits on msgResult.channel ----------api thread waiting --------------

    On seeing a message in propc channel, raft/node.go::run(), it wakes up and sequence of calls adds the message.Entries to raftLog

    Notifies the msgResult.channel

    ----------api thread wakes-------------- 10. Upon seeing the msgResult.channel, api thread wakes and returns down the stack back to v3_server.go::processInternalRaftRequestOnce and waits for signal that it registered at step#4 ----------api thread waiting --------------

    In next iteration of raft/node.go::run(), it gets the entry from raftLog and add it to readyc etcdserver/raft.go::start wakes up on seeing this entry in readyc and adds this entry to applyc channel and synchronously writes to wal log ---------------------> wal log etcdserver/server.go wakes up on seeing entry in applyc channel (added in step https://github.com/etcd-io/etcd/pull/12) From step#14, the call goes through series of calls and lands in server.go::applyEntryNormal applyEntryNormal calls applyV3.apply which will eventually puts the KV to mvcc kvstore txn kvindex applyEntryNormal now sends the signal for this key which is basically to wake up api thread that is waiting in 7 ----------api thread wakes-------------- 18. User thread here wakes and sends back acknowledgement ----------user sees ok--------------

    Batcher flushes the entries added to kvstore txn kvindex to database file. (also this can happen before 18 based on its timer)

    Here if step https://github.com/etcd-io/etcd/pull/13 thread is pre-empted and rescheduled by the underlying operating system after completing step https://github.com/etcd-io/etcd/pull/18 and when there is a power failure at the end of step 18 where after user sees error, then the kv is neither written to wal nor to database file

    I think this is not seen today because it is a small window where the server has to restart immediately after step 18 (and immediately after step 12 the underlying os must have pre-empted the etcdserver/raft.go::start and added to end of the runnable Q.). Given these multiple conditions, it appears that we dont see data loss.

    But it appears from the code that it is possible. To simulate, added sleep after step 12 (also added exit) and 19. I was able to see ok but the data is not in both wal and db.

    If I am not correct, my apology and also please correct my understanding.

    Before repro please do the changes:

    1. Do the code changes in raft.go image

    2.Do the code changes in tx.go image

    1. Rebuild etcd server

    Now follow the steps to repro //1. Start etcd server with changes

    //2. Add a key value. Allow etcdserver to acknowledge and exit immediately (with just sleep and exit to simulate the explanation) $ touch /tmp/exitnow; ./bin/etcdctl put /k1 v1 OK

    //3. Remove this control flag file and restart the etcd server $ rm /tmp/exitnow

    //4. Check if key present $ ./bin/etcdctl get /k --prefix $

    // We can see no key-value

  • etcd not compatible with grpc v1.30.0

    etcd not compatible with grpc v1.30.0

    To reproduce:

    $ go get google.golang.org/grpc@latest
    go: google.golang.org/grpc latest => v1.30.0
    $ go mod tidy
    ...
    go.etcd.io/etcd/v3/clientv3/naming imports
            google.golang.org/grpc/naming: module google.golang.org/grpc@latest found (v1.30.0), but does not contain package google.golang.org/grpc/naming
    

    This is a problem for dependents of etcd who want to use the latest grpc.

    The naming package was removed at https://github.com/grpc/grpc-go/pull/3314 and was deprecated in favour of resolver.

    I can work on a pull request if that's helpful.

  • clientv3: grpc-go (v1.27.0) made API changes to balancer / resolver.

    clientv3: grpc-go (v1.27.0) made API changes to balancer / resolver.

    After the release of grpc-go v1.27.0, an error occurred while pulling etcd / clientv3. The steps to reproduce it are as follows:

    1. go.mod:
    go 1.13
    
    require (
    	github.com/coreos/etcd v3.3.18+incompatible // indirect
    	github.com/coreos/go-systemd v0.0.0-00010101000000-000000000000 // indirect
    	github.com/coreos/pkg v0.0.0-20180928190104-399ea9e2e55f // indirect
    	github.com/gogo/protobuf v1.3.1 // indirect
    	github.com/google/uuid v1.1.1 // indirect
    	go.etcd.io/etcd v3.3.18+incompatible // indirect
    	go.uber.org/zap v1.13.0 // indirect
    	google.golang.org/grpc v1.27.0 // indirect
    )
    
    replace github.com/coreos/go-systemd => github.com/coreos/go-systemd/v22 v22.0.0
    
    1. command:
    $ go get go.etcd.io/etcd/clientv3
    # github.com/coreos/etcd/clientv3/balancer/resolver/endpoint
    ../../go/pkg/mod/github.com/coreos/[email protected]+incompatible/clientv3/balancer/resolver/endpoint/endpoint.go:114:78: undefined: resolver.BuildOption
    ../../go/pkg/mod/github.com/coreos/[email protected]+incompatible/clientv3/balancer/resolver/endpoint/endpoint.go:182:31: undefined: resolver.ResolveNowOption
    # github.com/coreos/etcd/clientv3/balancer/picker
    ../../go/pkg/mod/github.com/coreos/[email protected]+incompatible/clientv3/balancer/picker/err.go:37:44: undefined: balancer.PickOptions
    ../../go/pkg/mod/github.com/coreos/[email protected]+incompatible/clientv3/balancer/picker/roundrobin_balanced.go:55:54: undefined: balancer.PickOptions
    
    1. PR:https://github.com/etcd-io/etcd/pull/11564
  • Proxies & Config API

    Proxies & Config API

    Overview

    This pull request adds the ability for nodes to join as proxies when the cluster size limit is reached.

    Changes

    • Peer & Proxy Management added to Registry.
    • Mode added to PeerServer. This can be either PeerMode or ProxyMode.
    • JoinCommand sets overflow nodes as proxies.
    • RemoveCommand removes proxies in addition to peers.
    • Leader responses return X-Leader-Peer-URL & X-Leader-Client-URL headers.
    • New Error (402): Proxy Internal Error

    Questions

    How should we update the proxy urls?

    I would like to minimize the internal state that proxies have to hold and make them as dumb as possible. Currently it retrieves the proxy url via the initial join command. It can also update the proxy urls if the leader changes but the original leader is still available. However, it doesn't handle a leader failure.

    My thought is to simply have a new leader continually ping the proxies after leader change until all proxies have been notified.

    The other option would be to maintain a cluster configuration within the proxies but that means that we're maintaining more state.

    /cc @philips @xiangli-cmu

  • Etcd size sometimes starts growing and grows until

    Etcd size sometimes starts growing and grows until "mvcc: database space exceeded"

    We have observed already few cases, where suddenly (after days of running a GKE cluster), the size of database starts growing.

    As an example, we have a cluster, that was running without any issues for ~2 weeks (size of database was ~16MB), and then its database started growing. The growth wasn't immediate - it took ~2days, before it reached 4GB limit and there were steps of growing. For the reference we have backups (snapshots) (done via etcdctl snapshot) that reflect the growth speed (the name contains the time when it was made)

      ... // all snapshots are roughly 16MB
      2017-05-24T04:57:24-07:00_snapshot.db 16,27 MB
      2017-05-24T05:57:26-07:00_snapshot.db 29,06 MB
      2017-05-24T06:57:30-07:00_snapshot.db 108,98 MB
      2017-05-24T07:57:36-07:00_snapshot.db 177,57 MB
      2017-05-24T08:57:51-07:00_snapshot.db 308,4 MB        
      2017-05-24T09:58:32-07:00_snapshot.db 534,54 MB
      2017-05-24T11:00:16-07:00_snapshot.db 655,73 MB
      2017-05-24T12:00:55-07:00_snapshot.db 764,22 MB
      ... // all snapshots of the same size
      2017-05-25T15:15:10-07:00_snapshot.db 764,22 MB
      2017-05-25T16:16:25-07:00_snapshot.db 818,14 MB
      2017-05-25T17:26:35-07:00_snapshot.db 963,93 MB
      ... // all snapshots of the same size
      2017-05-25T22:25:08-07:00_snapshot.db 963,93 MB
      2017-05-25T23:27:03-07:00_snapshot.db 1,56 GB
      2017-05-26T00:30:13-07:00_snapshot.db 1,56 GB
      2017-05-26T01:05:24-07:00_snapshot.db 1,56 GB
      2017-05-26T02:24:21-07:00_snapshot.db 2,18 GB
      ... // all snapshots of the same size
      2017-05-26T08:43:07-07:00_snapshot.db 2,18 GB
      2017-05-26T09:46:47-07:00_snapshot.db 2,19 GB
      ... // all snapshots of the same size
      2017-05-26T16:11:31-07:00_snapshot.db 2,19 GB
      2017-05-26T17:16:47-07:00_snapshot.db 2,65 GB
      2017-05-26T18:22:37-07:00_snapshot.db 3,12 GB
      2017-05-26T19:29:07-07:00_snapshot.db 3,86 GB
      2017-05-26T20:33:24-07:00_snapshot.db 4,6 GB
      <boom>
    

    We've checked that we were doing compaction very regularly every 5m for the whole time - so it doesn't seem to be the same as: https://github.com/coreos/etcd/issues/7944 I'm attaching the interested lines from etcd logs in etcd-compaction.txt

    [Note time in that logs are in UTC, and time in snapshot names is PST, so 7 hours difference]

    To summarize, the compaction was always at most few thousands of transactions (so it's not that we did a lot during some 5m period), though there were some longer compactions, up to ~7s.

    I started digging into individual snapshots and found some strange thing (I was using bolt)

    1. 16MB snapshot:
    Aggregate statistics for 10 buckets
    
    Page count statistics
            Number of logical branch pages: 10
            Number of physical branch overflow pages: 0
            Number of logical leaf pages: 789
            Number of physical leaf overflow pages: 518
    Tree statistics
            Number of keys/value pairs: 1667
            Number of levels in B+tree: 3
    Page size utilization
            Bytes allocated for physical branch pages: 40960
            Bytes actually used for branch data: 26494 (64%)
            Bytes allocated for physical leaf pages: 5353472
            Bytes actually used for leaf data: 3411680 (63%)
    Bucket statistics
            Total number of buckets: 10
            Total number on inlined buckets: 9 (90%)
            Bytes used for inlined buckets: 536 (0%)
    
    1. 534MB snapshot (5 hours later):
    Aggregate statistics for 10 buckets
    
    Page count statistics
            Number of logical branch pages: 65
            Number of physical branch overflow pages: 0
            Number of logical leaf pages: 5559
            Number of physical leaf overflow pages: 107743
    Tree statistics
            Number of keys/value pairs: 13073
            Number of levels in B+tree: 3
    Page size utilization
            Bytes allocated for physical branch pages: 266240
            Bytes actually used for branch data: 186912 (70%)
            Bytes allocated for physical leaf pages: 464084992
            Bytes actually used for leaf data: 451590110 (97%)
    Bucket statistics
            Total number of buckets: 10
            Total number on inlined buckets: 9 (90%)
            Bytes used for inlined buckets: 536 (0%)
    
    1. 1.56GB snapshot (another ~36 hours later):
    Aggregate statistics for 10 buckets
    
    Page count statistics
            Number of logical branch pages: 70
            Number of physical branch overflow pages: 0
            Number of logical leaf pages: 4525
            Number of physical leaf overflow pages: 115179
    Tree statistics
            Number of keys/value pairs: 10978
            Number of levels in B+tree: 3
    Page size utilization
            Bytes allocated for physical branch pages: 286720
            Bytes actually used for branch data: 152723 (53%)
            Bytes allocated for physical leaf pages: 490307584
            Bytes actually used for leaf data: 478196884 (97%)
    Bucket statistics
            Total number of buckets: 10
            Total number on inlined buckets: 9 (90%)
            Bytes used for inlined buckets: 536 (0%)
    
    1. 3.86GB snapshot (another ~18 hours later)
    Aggregate statistics for 10 buckets
    
    Page count statistics
            Number of logical branch pages: 90
            Number of physical branch overflow pages: 0
            Number of logical leaf pages: 6219
            Number of physical leaf overflow pages: 6791
    Tree statistics
            Number of keys/value pairs: 15478
            Number of levels in B+tree: 3
    Page size utilization
            Bytes allocated for physical branch pages: 368640
            Bytes actually used for branch data: 209621 (56%)
            Bytes allocated for physical leaf pages: 53288960
            Bytes actually used for leaf data: 36704465 (68%)
    Bucket statistics
            Total number of buckets: 10
            Total number on inlined buckets: 9 (90%)
            Bytes used for inlined buckets: 536 (0%)
    
    1. 4.6GB snapshot (1hour later, right before exceeding space):
    Aggregate statistics for 10 buckets
    
    Page count statistics
            Number of logical branch pages: 89
            Number of physical branch overflow pages: 0
            Number of logical leaf pages: 6074
            Number of physical leaf overflow pages: 6713
    Tree statistics
            Number of keys/value pairs: 15173
            Number of levels in B+tree: 3
    Page size utilization
            Bytes allocated for physical branch pages: 364544
            Bytes actually used for branch data: 204788 (56%)
            Bytes allocated for physical leaf pages: 52375552
            Bytes actually used for leaf data: 36092789 (68%)
    Bucket statistics
            Total number of buckets: 10
            Total number on inlined buckets: 9 (90%)
            Bytes used for inlined buckets: 564 (0%)
    

    What is extremely interesting to me is that both:

    • Number of physical leaf overflow pages
    • Bytes allocated for physical leaf pages dropped by order of magnitude in this 3.86GB snapshot, but the total size of database didn't drop

    Unfortunately I can't provide any of those snapshots due to privacy reasons, but maybe you can see anything that we can investigate (or share results of some commands) that can help with debugging?

    @xiang90 @hongchaodeng @mml @lavalamp

  • Reimplement the tool `rw-heatmaps` using golang and rename it to `rw-benchmark`

    Reimplement the tool `rw-heatmaps` using golang and rename it to `rw-benchmark`

    Points:

    • The original plot_data.py is out of maintainance.
    • Remove the dependency on python and related python modules.
    • It isn't a best practice to compare benchmark results of two difference branches (e.g. main vs dev) in two charts. Instead, it's better to display the benchmarks to be compared in one chart.

    Please see the example HTML page at rw_benchmark.html. One of the line charts is below,

    etcd rw benchmark (RW Ratio 0 500000, Value Size 16)

    The read QPS is displayed as blue, and write QPS is displayed as red. The data in the second CSV file is rendered as dashed line if present. Note each line in the line chart can be hidden or displayed by clicking on the related legend.

    Note: although there are 2.5K+ lines of change, actually only tools/rw-benchmark/plot_data.go and the tools/rw-benchmark/README.md need to be carefully reviewed.

    Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

  • Add a Code based member for cancel reason in WatchResponse

    Add a Code based member for cancel reason in WatchResponse

    What would you like to be added?

    Currently WatchResponse message has a string type member CancelReason. The member is used for storing an error which happened during watch. It's better to add a new Code based member for representing an error in the watch API.

    Why is this needed?

    Using the existing member for error handling code isn't good because of fragility and hard to track behavior of the etcd wire protocol.

    See also:

    • https://github.com/kubernetes/kubernetes/pull/114403
    • https://github.com/etcd-io/etcd/pull/14995
    • https://github.com/etcd-io/etcd/discussions/14992
  • ETCD node skyrocketing to the maximum of available memory on the node (16GB) in a matter of seconds

    ETCD node skyrocketing to the maximum of available memory on the node (16GB) in a matter of seconds

    What happened?

    We saw that memory on ETCD node skyrocketing to the maximum of available memory on the node (16GB) in a matter of seconds.

    image

    Is anything in the etcd architecture that would cause it to consume 16GB of memory in seconds? Our etcd DB size is less than 800MB.

    What did you expect to happen?

    The cluster to start up normally without a large spike in memory usage.

    How can we reproduce it (as minimally and precisely as possible)?

    Restarting the kubernetes cluster.

    Anything else we need to know?

    No response

    Etcd version (please run commands below)

    $ etcd --version
    # paste output here
    etcd Version: 3.4.13
    Git SHA: GitNotFound
    Go Version: go1.15.8
    Go OS/Arch: linux/amd64
    
    $ etcdctl version
    # paste output here
    etcdctl version: 3.4.13
    API version: 3.4
    

    Etcd configuration (command line flags or environment variables)

    paste your configuration here

    Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

    $ etcdctl member list -w table
    # paste output here
    +------------------+---------+------------------+---------------------------+---------------------------+------------+
    |        ID        | STATUS  |       NAME       |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
    +------------------+---------+------------------+---------------------------+---------------------------+------------+
    | 20db1dfdde84d300 | started | <obfuscated> | https://<obfuscated>:2380 | https://<obfuscated>:2379 |      false |
    | 46fa1903d0e73fed | started | <obfuscated> | https://<obfuscated>:2380 | https://<obfuscated>:2379 |      false |
    | 621e0ad80f4b7107 | started | <obfuscated> | https://<obfuscated>:2380 | https://<obfuscated>:2379 |      false |
    | 790a135280172971 | started | <obfuscated> | https://<obfuscated>:2380 | https://<obfuscated>:2379 |      false |
    | bff106870c7c1525 | started | <obfuscated> | https://<obfuscated>:2380 | https://<obfuscated>:2379 |      false |
    +------------------+---------+------------------+---------------------------+---------------------------+------------+
    
    $ etcdctl --endpoints=<member list> endpoint status -w table
    # paste output here
    +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    |         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | https://<obfuscated> | 790a135280172971 |  3.4.13 |  817 MB |     false |      false |      1145 | 3225584887 |         3225584887 |        |
    +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    
    

    Relevant log output

    No response

Simple webhook delivery system powered by Golang and PostgreSQL

postmand Simple webhook delivery system powered by Golang and PostgreSQL. Features Simple rest api with only three endpoints (webhooks/deliveries/deli

Dec 22, 2022
Async management of servers, containers, workstations...basically anything that runs an operating system.

steward What is it ? Command And Control anything asynchronously. Send shell commands to control your servers by passing a message that will have guar

Dec 14, 2022
CasaOS - A simple, easy-to-use, elegant open-source home server system.
CasaOS - A simple, easy-to-use, elegant open-source home server system.

CasaOS - A simple, easy-to-use, elegant open-source home server system. CasaOS is an open-source home server system based on the Docker ecosystem and

Jan 8, 2023
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

Dec 30, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

Jan 9, 2023
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

Dec 30, 2022
Golang-key-value-store - Key Value Store API Service with Go DDD Architecture

This document specifies the tools used in the Key-Value store and reorganizes how to use them. In this created service, In-Memory Key-Value Service was created and how to use the API is specified in the HTML file in the folder named "doc"

Jul 31, 2022
Tapestry is an underlying distributed object location and retrieval system (DOLR) which can be used to store and locate objects. This distributed system provides an interface for storing and retrieving key-value pairs.

Tapestry This project implements Tapestry, an underlying distributed object location and retrieval system (DOLR) which can be used to store and locate

Mar 16, 2022
Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Olric Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service. With

Jan 4, 2023
GhostDB is a distributed, in-memory, general purpose key-value data store that delivers microsecond performance at any scale.
GhostDB is a distributed, in-memory, general purpose key-value data store that delivers microsecond performance at any scale.

GhostDB is designed to speed up dynamic database or API driven websites by storing data in RAM in order to reduce the number of times an external data source such as a database or API must be read. GhostDB provides a very large hash table that is distributed across multiple machines and stores large numbers of key-value pairs within the hash table.

Jan 6, 2023
Distributed cache and in-memory key/value data store.

Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Dec 30, 2022
Akutan is a distributed knowledge graph store, sometimes called an RDF store or a triple store.

Akutan is a distributed knowledge graph store, sometimes called an RDF store or a triple store. Knowledge graphs are suitable for modeling data that is highly interconnected by many types of relationships, like encyclopedic information about the world. A knowledge graph store enables rich queries on its data, which can be used to power real-time interfaces, to complement machine learning applications, and to make sense of new, unstructured information in the context of the existing knowledge.

Jan 7, 2023
A distributed key-value store. On Disk. Able to grow or shrink without service interruption.

Vasto A distributed high-performance key-value store. On Disk. Eventual consistent. HA. Able to grow or shrink without service interruption. Vasto sca

Jan 6, 2023
A distributed key value store in under 1000 lines. Used in production at comma.ai

minikeyvalue Fed up with the complexity of distributed filesystems? minikeyvalue is a ~1000 line distributed key value store, with support for replica

Jan 9, 2023
Distributed key-value store
Distributed key-value store

Keva Distributed key-value store General Demo Start the server docker-compose up --build Insert data curl -XPOST http://localhost:5555/storage/test1

Nov 15, 2021
A simple distributed key-value store by using hashicorp/raft

raftkv This repository holds a simple distributed key-value store by using hashicorp/raft. raftkv provides gRPC and HTTP APIs. Please take a look API

Nov 30, 2022
A distributed key value store in under 1000 lines. Used in production at comma.ai

minikeyvalue Fed up with the complexity of distributed filesystems? minikeyvalue is a ~1000 line distributed key value store, with support for replica

Jan 9, 2023
A simple, fast, embeddable, persistent key/value store written in pure Go. It supports fully serializable transactions and many data structures such as list, set, sorted set.

NutsDB English | 简体中文 NutsDB is a simple, fast, embeddable and persistent key/value store written in pure Go. It supports fully serializable transacti

Jan 1, 2023