This library contains utilities that are useful for building distributed services.

Grafana Dskit

This library contains utilities that are useful for building distributed services.

Current state

This library is still in development. During the first stage we plan to move over utilities from the Cortex project.

Contributing

If you're interested in contributing to this project:

License

Apache 2.0 License

Owner
Grafana Labs
Grafana Labs is behind leading open source projects Grafana and Loki, and the creator of the first open & composable observability platform.
Grafana Labs
Comments
  • Adding min version and cipher suite variables to tls client configuration

    Adding min version and cipher suite variables to tls client configuration

    What this PR does: This PR adds MinVersion and CipherSuites to the TLS configuration. This allows these TLS configuration paramters to be configurable outside of the defaults set by golang.

    Which issue(s) this PR fixes:

    Checklist

    • [x] Tests updated
    • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • Parallelize memberlist notified message processing

    Parallelize memberlist notified message processing

    What this PR does:

    This PR adapts KV memberlist to process notified messages in parallel.

    It aims to facilitate vertical scalability in conditions where UDP packet pressure is high due to a high number of instances in a memberlist cluster.

    Which issue(s) this PR fixes:

    N/A

    Checklist

    • [X] Tests updated
    • [X] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • Empty ring right after startup when using memberlist

    Empty ring right after startup when using memberlist

    In Mimir, we're occasionally seeing "empty ring" ring right after a process startup (e.g. querier). It's an issue that has started after the migration to memberlist.

    Possible root cause

    I think the issue is caused by the ring client implementation not guaranteeing to wait to get the initial ring state before switching to Running state. In the following I share some thoughts about the code.

    The ring client service is expected to switch to Running state only after it initialized its internal state with the ring data structure. This is why it calls r.KVClient.Get() in the Ring.starting(): https://github.com/grafana/dskit/blob/e441b77be7780e03f2c37659839bfe90dfde7dd3/ring/ring.go#L252-L256

    When using Consul or etcd as backend, the r.KVClient.Get() guarantees to return the state of the ring, but I think this guarantee has been lost in the memberlist implementation and it could return a zero data structure.

    The memberlist client Get() is implemented here: https://github.com/grafana/dskit/blob/e441b77be7780e03f2c37659839bfe90dfde7dd3/kv/memberlist/memberlist_client.go#L63-L70

    It waits until the backend KV client is running. But does waiting for it to be running guarantee the ring data structure to be populated before that? I don't think so.

    The memberlist KV.starting() just initialise memberlist but doesn't join the cluster: https://github.com/grafana/dskit/blob/e441b77be7780e03f2c37659839bfe90dfde7dd3/kv/memberlist/memberlist_client.go#L426-L453

    The memberlist cluster is joined only in the KV.running(), but that's too late, because at that point our code assume the ring data structure to be already populated: https://github.com/grafana/dskit/blob/e441b77be7780e03f2c37659839bfe90dfde7dd3/kv/memberlist/memberlist_client.go#L457-L472

  • change the ruler ring key

    change the ruler ring key

    What this PR does:

    Changes the ruler's ring key from ring to ruler so it does not conflict with the ingester's ring key.

    Which issue(s) this PR fixes:

    If the ruler and ingester's rings have the same prefix, then they will both register to the same ring. This can result in the querier trying to query the ruler, which fails. As I don't think there's any need to explicitly prevent rings from sharing a common prefix (for instance, if the KV store is being used for other types of data, they may want to prefix everything grafana related under grafana, or loki, for example), we should make sure each ring key is unique.

    Checklist

    • [ ] Tests updated
    • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • add util/strings

    add util/strings

    What this PR does: This PR ports util/strings from cortex. It's two utility functions to look for a string in a collection of strings and a function to make a map from a collection of strings. This is used by the ring package.

    Checklist

    • [-] Tests updated
    • [X] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • Add support in the manager to load multiple runtime config files

    Add support in the manager to load multiple runtime config files

    What this PR does: This PR enables users to provide a comma separated list of yaml runtime config files where they will be merged into one yaml document and sent to the underlying service.

    Which issue(s) this PR fixes:

    Fixes https://github.com/grafana/mimir/issues/1798

    Checklist

    • [X] Tests updated
    • [X] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • Allow custom status page templates

    Allow custom status page templates

    What this PR does:

    Upstream projects may want to render pages in different ways, applying custom branding where necessary. This allows passing a custom page template to both memberlist and ring status handlers through the configuration.

    Also extracted the templates into separate .gohtml files, this enables proper syntax highlighting in the IDEs. Sorry for the noise here as it's not really important for the change I'm proposing, but I just couldn't see that long constants along with the code.

    Which issue(s) this PR fixes:

    First approach for https://github.com/grafana/dskit/issues/148

    Checklist

    • [ ] Tests updated
    • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • Add LimitedConcurrencySingleFlight

    Add LimitedConcurrencySingleFlight

    What this PR does:

    Makes an addition to the concurrency package. WorkerPool allows to repeatedly and concurrently run a function for each element in an input slice. WorkerPool ensures that only one function is running for each unique element across all WorkerPool invocations. WorkerPool also limits the concurrency of all function invocations across all WorkerPool invocations.

    From the docs of WorkerPool:

    WorkerPool ensures that for any number of concurrent ForEachNotInFlight calls each with any number of tokens only up to numWorkers f invocations are executing concurrently. See the docs of ForEachNotInFlight for the uniqueness semantics of tokens.

    ForEachNotInFlight invokes f for every token in tokens that is not in-flight (not still being executed) in a different concurrent call to ForEachNotInFlight. ForEachNotInFlight returns when invocations to f for all such tokens have returned. Upon context cancellation ForEachNotInFlight stops making new invocations of f for tokens and waits for all already started invocations of f to return. ForEachNotInFlight returns the combined errors from all f invocations.

    Which issue(s) this PR fixes:

    Fixes #

    Checklist

    • [x] Tests updated
    • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • Allow to override default of `final-sleep` and change default to `0`

    Allow to override default of `final-sleep` and change default to `0`

    What this PR does: Allows to override default of final-sleep and change default to 0

    Which issue(s) this PR fixes:

    Fixes #

    Checklist

    • [ n/a ] Tests updated
    • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • Expose memberlist label configs

    Expose memberlist label configs

    This makes three changes

    1. Switches the vendored version of memberlist to a branch in our fork where we fixed the skip-inbound-label-check (PR)
    2. Exposes the relevant options to allow users of dskit to configure the memberlist label feature
    3. Adds unit tests which test 4 configurations of memberlist clusters:
      1. TestMultipleClients Cluster with no labels and skip-inbound-label-check disabled, expected to succeed in joining
      2. TestMultipleClientsWithMixedLabelsAndExpectFailure Cluster where some members have labels and some don't and skip-inbound-label-check is disabled, this cluster is expected to fail because the members can't join each other
      3. TestMultipleClientsWithMixedLabelsAndSkipLabelCheck Cluster where some members have labels and some don't and skip-inbound-label-check is enabled, this cluster is expected to succeed in joining
      4. TestMultipleClientsWithSameLabelWithoutSkipLabelCheck Cluster where all members have the same label and skip-inbound-label-check is disabled, this cluster is expected to succeed in joining

    The above unit tests basically test a migration scenario where we migrate an existing cluster which currently doesn't use labels to use labels by going through the following steps:

    1. Enable skip-inbound-label-check so that members with different labels (some without any label) can join each other
    2. Roll out a label to all processes, since this change gets rolled out slowly across the pods it is important that pods with different labels can join each other into one cluster
    3. Disable skip-inbound-label-check

    Furthermore the unit tests also verify that if skip-inbound-label-check is disabled, processes which have different labels cannot join each other, providing isolation between the processes.

  • Change moduleService to implement NamedService

    Change moduleService to implement NamedService

    What this PR does:

    The moduleService has a name, and it's trivial to implement ServiceName method.

    I came across this while trying to add debug information about running services in mimir

    Which issue(s) this PR fixes:

    Fixes #

    Checklist

    • [ ] Tests updated
    • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • [modules]: add test for ordered shutdown

    [modules]: add test for ordered shutdown

    What this PR does:

    I had some questions about the shutdown process with regard to ordering. I see some of the code is ordering by name, and so here I included a test that checks the shutdown order of the various services.

    Checklist

    • [x] Tests updated
    • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • etcd client set log

    etcd client set log

    etcd client can't config log, which causes log output by the etcd client to be on the stderr,This may cause sigpipe problems when cooperating with systemd. Whether the function of log can be configured?

  • Flaky `TestRing_ShuffleShardWithLookback_CorrectnessWithFuzzy`

    Flaky `TestRing_ShuffleShardWithLookback_CorrectnessWithFuzzy`

    The test failed in this run, but succeeded in the retry. Revision was https://github.com/grafana/dskit/pull/214/commits/f73a2dbcda33f09b8e800dcc411b4650da30ce0d

    --- FAIL: TestRing_ShuffleShardWithLookback_CorrectnessWithFuzzy (88.81s)
        --- FAIL: TestRing_ShuffleShardWithLookback_CorrectnessWithFuzzy/num_instances_=_9,_num_zones_=_3,_update_oldest_registered_timestamp_=_true (0.02s)
            ring_test.go:1840: random generator seed: 1663955757664614088
            ring_test.go:1958: subring generated after event 4 is expected to include instance instance-6 from ring state at time 2022-09-23 19:11:57.669186493 +0000 UTC m=+4673.344713179 but it's missing (actual instances are: instance-9, instance-10, instance-2)
    
  • Flaky `TestMultipleClientsWithSameLabelWithClusterLabelVerification`

    Flaky `TestMultipleClientsWithSameLabelWithClusterLabelVerification`

    The test failed the first run, but succeeded the second. Revision is https://github.com/grafana/dskit/commit/126e8a816b1701654dd6ef3977f513fc67ac5c80

    --- FAIL: TestMultipleClientsWithSameLabelWithClusterLabelVerification (24.37s)
        memberlist_client_test.go:672: Waiting before start
        memberlist_client_test.go:680: Observing ring ...
        memberlist_client_test.go:694: Update 68.522298ms : Ring has 2 members, and 256 tokens, oldest timestamp: 825.359625ms avg timestamp: 825.359625ms youngest timestamp: 825.359625ms
        memberlist_client_test.go:694: Update 153.817701ms : Ring has 3 members, and 384 tokens, oldest timestamp: 910.655027ms avg timestamp: 910.655027ms youngest timestamp: 910.655027ms
        memberlist_client_test.go:694: Update 459.555858ms : Ring has 4 members, and 512 tokens, oldest timestamp: 1.216393192s avg timestamp: 1.216393192s youngest timestamp: 1.216393192s
        memberlist_client_test.go:694: Update 460.579224ms : Ring has 5 members, and 640 tokens, oldest timestamp: 1.217416552s avg timestamp: 1.217416552s youngest timestamp: 1.217416552s
        memberlist_client_test.go:694: Update 461.713676ms : Ring has 6 members, and 768 tokens, oldest timestamp: 1.218551007s avg timestamp: 1.218551007s youngest timestamp: 1.218551007s
        memberlist_client_test.go:694: Update 463.242127ms : Ring has 7 members, and 896 tokens, oldest timestamp: 1.220079455s avg timestamp: 1.220079455s youngest timestamp: 1.220079455s
        memberlist_client_test.go:694: Update 503.895559ms : Ring has 10 members, and 1280 tokens, oldest timestamp: 2.260732883s avg timestamp: 2.260732883s youngest timestamp: 1.260732883s
        memberlist_client_test.go:694: Update 558.913038ms : Ring has 10 members, and 1280 tokens, oldest timestamp: 2.315750343s avg timestamp: 2.315750343s youngest timestamp: 1.315750343s
        memberlist_client_test.go:694: Update 1.002496549s : Ring has 10 members, and 1280 tokens, oldest timestamp: 2.759333874s avg timestamp: 1.759333874s youngest timestamp: 759.333874ms
        memberlist_client_test.go:694: Update 1.084007026s : Ring has 10 members, and 1280 tokens, oldest timestamp: 2.840844352s avg timestamp: 1.840844352s youngest timestamp: 840.844352ms
        memberlist_client_test.go:694: Update 1.13045055s : Ring has 10 members, and 1280 tokens, oldest timestamp: 2.887287876s avg timestamp: 1.887287876s youngest timestamp: 887.287876ms
        memberlist_client_test.go:694: Update 1.136422462s : Ring has 10 members, and 1280 tokens, oldest timestamp: 2.89325979s avg timestamp: 1.89325979s youngest timestamp: 893.25979ms
        memberlist_client_test.go:694: Update 1.258210064s : Ring has 10 members, and 1280 tokens, oldest timestamp: 3.015047392s avg timestamp: 2.015047392s youngest timestamp: 1.015047392s
        memberlist_client_test.go:694: Update 1.278908586s : Ring has 10 members, and 1280 tokens, oldest timestamp: 3.035745902s avg timestamp: 2.035745902s youngest timestamp: 1.035745902s
        memberlist_client_test.go:702: Ring updates observed: 14
        memberlist_client_test.go:723: KV 0: number of known members: 10
        memberlist_client_test.go:620: 
            	Error Trace:	memberlist_client_test.go:620
            	Error:      	Received unexpected error:
            	            	Member 0: invalid state of member Member-9 in the ring: 2 
            	Test:       	TestMultipleClientsWithSameLabelWithClusterLabelVerification
    
  • Add net.LookipIP DNS provider implementation

    Add net.LookipIP DNS provider implementation

    As we are looking forward to using dskit inside Grafana for HA, we would require an implementation for the kv/memberlist.DNSProvider. Currently it looks like the interface closely matches the API from Thanos, but we would need something lighter, possibly built over net.LookupIP. This PR introduces the "builtin" DNSProvider implementation and a basic test to make sure it works.

An experimental library for building clustered services in Go

Donut is a library for building clustered applications in Go. Example package main import ( "context" "log" "os" // Wait for etcd client v3.4, t

Nov 17, 2022
The repository aims to share some useful about distributed system

The repository aims to share some useful about distributed system

Dec 14, 2021
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.

Dapr is a portable, serverless, event-driven runtime that makes it easy for developers to build resilient, stateless and stateful microservices that run on the cloud and edge and embraces the diversity of languages and developer frameworks.

Jan 5, 2023
Skynet is a framework for distributed services in Go.
Skynet is a framework for distributed services in Go.

##Introduction Skynet is a communication protocol for building massively distributed apps in Go. It is not constrained to Go, so it will lend itself n

Nov 18, 2022
A distributed, proof of stake blockchain designed for the financial services industry.

Provenance Blockchain Provenance is a distributed, proof of stake blockchain designed for the financial services industry.

Dec 14, 2022
Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.

What Dlock is a distributed lock manager [1]. It is designed after flock utility but for multiple machines. When client disconnects, all his locks are

Dec 24, 2019
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

Dec 30, 2022
Full-featured BitTorrent client package and utilities

torrent This repository implements BitTorrent-related packages and command-line utilities in Go. The emphasis is on use as a library from other projec

Jan 4, 2023
Dec 27, 2022
A Go library for master-less peer-to-peer autodiscovery and RPC between HTTP services

sleuth sleuth is a Go library that provides master-less peer-to-peer autodiscovery and RPC between HTTP services that reside on the same network. It w

Dec 28, 2022
Lockgate is a cross-platform locking library for Go with distributed locks using Kubernetes or lockgate HTTP lock server as well as the OS file locks support.

Lockgate Lockgate is a locking library for Go. Classical interface: 2 types of locks: shared and exclusive; 2 modes of locking: blocking and non-block

Dec 16, 2022
A distributed systems library for Kubernetes deployments built on top of spindle and Cloud Spanner.

hedge A library built on top of spindle and Cloud Spanner that provides rudimentary distributed computing facilities to Kubernetes deployments. Featur

Nov 9, 2022
A distributed locking library built on top of Cloud Spanner and TrueTime.

A distributed locking library built on top of Cloud Spanner and TrueTime.

Sep 13, 2022
Easy to use Raft library to make your app distributed, highly available and fault-tolerant
Easy to use Raft library to make your app distributed, highly available and fault-tolerant

An easy to use customizable library to make your Go application Distributed, Highly available, Fault Tolerant etc... using Hashicorp's Raft library wh

Nov 16, 2022
distributed data sync with operational transformation/transforms

DOT The DOT project is a blend of operational transformation, CmRDT, persistent/immutable datastructures and reactive stream processing. This is an im

Dec 16, 2022
High performance, distributed and low latency publish-subscribe platform.
High performance, distributed and low latency publish-subscribe platform.

Emitter: Distributed Publish-Subscribe Platform Emitter is a distributed, scalable and fault-tolerant publish-subscribe platform built with MQTT proto

Jan 2, 2023
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Jan 1, 2023
Go Micro is a framework for distributed systems development

Go Micro Go Micro is a framework for distributed systems development. Overview Go Micro provides the core requirements for distributed systems develop

Jan 8, 2023