Durable time-series database that's API-compatible with Prometheus.

Last update: Oct 20, 2022

Comments: 14

Project status

Timbala is in a very early stage of development and is not yet production-ready. Please do not use it yet for any data that you care about.

There are several known issues that prevent any serious use of Timbala at this stage; please see the MVP milestone for details.

Timbala is a distributed, fault-tolerant time-series database intended to provide durable long-term storage for multi-dimensional metrics.

It is designed to integrate easily with Prometheus, supports PromQL and is API-compatible with Prometheus, but can be used standalone.

Data stored in Timbala can be visualised using Grafana by configuring a Prometheus data source pointing to Timbala.

Design goals

Ease of operation

one server binary
no external dependencies
all nodes have the same role
all nodes can serve read and write requests

Fault-tolerant

no single points of failure
data is replicated and sharded across multiple nodes
planned features for read repair and active anti-entropy

Highly available

high write throughput and availability

Owner

Matt Bostock

Formerly @Cloudflare, @alphagov. OSS advocate and contributor. Running, climbing and travel.

https://github.com/mattbostock/timbala

Comments

Cluster tests run very slowly when race detection is enabled

The tests in internal/cluster run very slowly when race detection is enabled, especially since 831cd72 added matrix tests for varying cluster sizes and replication factors. The tests now around a minute to run.

Consider disabling race detection when running just these tests, which will reduce the test run time significantly.
Partition and replicate ingested data samples using hashring

Build on the work done in #54 to partition and replicate ingested data samples to peers in the cluster using the consistent hashing algorithm chosen in #27 and the partition key schema chosen in #12.
Design API for data repair

When a node in the cluster has corrupted or missing data, it must be able to receive a known good copy of data from another node.

The aim of this issue is to determine what the interface for sending/receiving data for the purpose of repair should look like.

The primary requirement for such an interface is that it must be efficient; the CPU and memory cost of serialising and deserialising the data as well as the size of data sent over the wire must be kept to a minimum.
Use command-line flags for configuration

Command-line flags are self-documenting via the flag.Usage() function, so use those instead of environment variables for configuration.

I may also allow the flag values to be set using environment variables.

Probably worth evaluating flag libraries other than the one from Go's standard library as it could do this for me.
Implement Prometheus remote read API

The Prometheus 'remote read' API should be supported to improve operability with Prometheus, allowing AthensDB to be used a storage backend for Prometheus.
Add MkDocs stub for documentation and website
Use the MkDocs static site generator to add stub documentation.

Uses the Material theme for MkDocs because:

it has a clean design

the search functionality is very slick

it has a responsive layout; works on phones/tablets

it supports box-out notes and warnings

it supports Markdown

https://github.com/squidfunk/mkdocs-material

To see a documentation in a development environment, use:

make servedocs

Include building the documentation in strict mode as part of the test make target to ensure that we have not broken the docs when making changes. Add the generated site directory to .gitignore.

Add Docker as a dependency for Travis tests so that we can use Docker to test the documentation builds correctly.

Pin the version of the squidfunk/mkdocs-material Docker image to ensure the build is reproducible.
Determine data to use for benchmarking

Options: https://github.com/prometheus/prombench/blob/master/apps/load-generator/main.py https://github.com/prometheus/tsdb/blob/master/testdata/20k.series
Basic acceptance tests for Prometheus v1 API
Add basic acceptance tests for the Prometheus v1 API, i.e. for simple queries. As opposed to the API unit tests already in master, these tests should test the server as a 'blackbox'; they should start the server and query the API endpoints as a normal client would do.

Potential options:

use the webdriver protocol (e.g. Selenium, PhantomJS), which will work for acceptance-testing the API and also for testing any future web UI (e.g. for viewing the status of the cluster)

use Go's net/http library
Vendor dependencies

Vendor the current project dependencies so that we can have reproducible builds and add a Makefile so that future dependencies can be vendored easily.
Simplify read fanout
Rename fanoutQuerier to remoteQuerier, which is more descriptive of its role

Only pass a URL to remoteQuerier to improve encapsulation

Rename Read() to remoteRead() and make it a method of remoteQuerier since most of its arguments are fields belonging to remoteQuerier d4e2a2e

Remove duplicated error handling
Upgrade Prometheus library dependencies
Update to the latest version of the Prometheus libraries and make necessary changes to satisfy API changes:

Use NewMergeSeriesSet() instead of DeduplicateSeriesSet(), which was removed in prometheus/prometheus@bb724f1.

Explicitly specify the options for the PromQL engine to ensure that PromQL metrics are registered following the change in prometheus/prometheus@f8fccc7. The options used here are copied from the defaults.

Prometheus vendors its own dependencies but also provides library packages, meaning that we have to manage its dependencies as transient dependencies. Using the dep tool, we cannot manage transient dependencies using regular constraints so I had to pin the versions in Gopkg.toml using overrides.

See: https://github.com/golang/dep/blob/master/docs/FAQ.md#how-do-i-constrain-a-transitive-dependencys-version golang/dep#999 (comment) golang/dep#302
Current partition key schema requires that queries must be run against all nodes

The current partition key schema, in addition to the lack of any centralised index, requires that queries must be run against all nodes in the cluster.

The current schema can be represented as:

<salt>:<bucket_end_time_as_YYYYMMDD>:<metric_name>:[<label_name>,<label_name>...]

Since the label names are often not known at query time, and PromQL allows querying without the metric name, the current schema means that all nodes must be queried in order to ensure that all matches time-series are retrieved.

The partition key schema should be improved to limit the number of nodes that must be queried. There is an inherent tension between limiting the number of nodes that need to be queried and balancing ingestion across as many nodes in the cluster as possible.
Nodes must persist which bucket in the hashring they belong to

Using Jumphash, nodes are represented by buckets.

In order to minimise the amount of data that moves when nodes are added or removed, nodes must remember which bucket they belong to and use that same bucket when they reconnect to the cluster.

Found during the implementation of #149.

This limitation is mentioned in the abstract for Jumphash:

Its main limitation is that the buckets must be numbered sequentially, which makes it more suitable for data storage applications than for distributed web caching.

...the consequences of which were not clear to me when deciding on #27.
Determine whether to support deleting series

The API endpoint to delete series is disabled currently: https://github.com/mattbostock/timbala/blob/8a2f093838778f5e84d1355970ed339eea918c17/internal/api/v1/api.go#L346-L347

Determine whether deleting series is something Timbala should support; doing so has implications for consistency in a distributed system.

StaticBackend is a simple backend server API handling user mgmt, database, storage and real-time component

StaticBackend is a simple backend that handles user management, database, file storage, forms, and real-time experiences via channel/topic-based communication for web and mobile applications.

Jan 7, 2023

CPU usage percentage is the ratio of the total time the CPU was active, to the elapsed time of the clock on your wall.

Docker-Kubernetes-Container-CPU-Utilization Implementing CPU Load goroutine requires the user to call the goroutine from the main file. go CPULoadCalc

Dec 15, 2021

The Oracle Database Operator for Kubernetes (a.k.a. OraOperator) helps developers, DBAs, DevOps and GitOps teams reduce the time and complexity of deploying and managing Oracle Databases

The Oracle Database Operator for Kubernetes (a.k.a. OraOperator) helps developers, DBAs, DevOps and GitOps teams reduce the time and complexity of deploying and managing Oracle Databases. It eliminates the dependency on a human operator or administrator for the majority of database operations.

Dec 14, 2022

A simple tool who pulls data from Online.net API and parse them to a Prometheus format

Dedibox backup monitoring A simple tool who reads API from Online.net and parse them into a Prometheus-compatible format. Conceived to be lightweight,

Aug 16, 2022

The dumb container runtime trying to be compatible with Kubernetes CRI

Go Dumb CRI The dumb container runtime trying to be compatible with Kubernetes CRI. Usage Run the server and create an IPC socket in /tmp/go-dumbcri.s

Dec 12, 2021

A reverse engineered github actions compatible self-hosted runner using nektos/act to execute your workflow steps

github-act-runner A reverse engineered github actions compatible self-hosted runner using nektos/act to execute your workflow steps. Unlike the offici

Dec 24, 2022

kubectl plugin for generating nginx-ingress compatible basic-auth secrets on kubernetes clusters

kubectl-htpasswd kubectl plugin for easily generating hashed basic auth secrets. Supported hash algorithms bcrypt Examples Create the secret on the cl

Jul 17, 2022

Integrated ssh-agent for windows. (pageant compatible. openSSH ssh-agent etc ..)

OmniSSHAgent About The chaotic windows ssh-agent has been integrated into one program. Chaos Map of SSH-Agent on Windows There are several different c

Dec 19, 2022

El Carro is a new project that offers a way to run Oracle databases in Kubernetes as a portable, open source, community driven, no vendor lock-in container orchestration system. El Carro provides a powerful declarative API for comprehensive and consistent configuration and deployment as well as for real-time operations and monitoring.

El Carro: The Oracle Operator for Kubernetes Run Oracle on Kubernetes with El Carro El Carro is a new project that offers a way to run Oracle database

Jan 7, 2023

Reconstruct Open API Specifications from real-time workload traffic seamlessly

Reconstruct Open API Specifications from real-time workload traffic seamlessly: Capture all API traffic in an existing environment using a service-mes

Jan 1, 2023

A simple CLI and API client for One-Time Secret

OTS Go client otsgo is a simple CLI and API client for One-Time Secret written i

Dec 27, 2021

A simple go application that uses Youtube Data API V3 to show the real-time stats for a youtube channel such as the subs, views, avg. earnings etc.

Youtube-channel-monitor A simple go application that uses Youtube Data API V3 to show the real-time stats for a youtube channel such as the subs, view

Dec 30, 2021

Durable time-series database that's API-compatible with Prometheus.

Project status

Design goals

Ease of operation

Fault-tolerant

Highly available

Owner

Matt Bostock

Comments

Cluster tests run very slowly when race detection is enabled

Partition and replicate ingested data samples using hashring

Design API for data repair

Use command-line flags for configuration

Implement Prometheus remote read API

Add MkDocs stub for documentation and website

Determine data to use for benchmarking

Basic acceptance tests for Prometheus v1 API

Vendor dependencies

Simplify read fanout

Upgrade Prometheus library dependencies

Current partition key schema requires that queries must be run against all nodes

Nodes must persist which bucket in the hashring they belong to

Determine whether to support deleting series

Related tags

StaticBackend is a simple backend server API handling user mgmt, database, storage and real-time component

CPU usage percentage is the ratio of the total time the CPU was active, to the elapsed time of the clock on your wall.

The Oracle Database Operator for Kubernetes (a.k.a. OraOperator) helps developers, DBAs, DevOps and GitOps teams reduce the time and complexity of deploying and managing Oracle Databases

A simple tool who pulls data from Online.net API and parse them to a Prometheus format

The dumb container runtime trying to be compatible with Kubernetes CRI

A reverse engineered github actions compatible self-hosted runner using nektos/act to execute your workflow steps

kubectl plugin for generating nginx-ingress compatible basic-auth secrets on kubernetes clusters

Integrated ssh-agent for windows. (pageant compatible. openSSH ssh-agent etc ..)

Reconstruct Open API Specifications from real-time workload traffic seamlessly

A simple CLI and API client for One-Time Secret

A simple go application that uses Youtube Data API V3 to show the real-time stats for a youtube channel such as the subs, views, avg. earnings etc.

A seed repository that contains a Go project that accepts input via a REST API and saves data to an Oracle database.

Translate Prometheus Alerts into Kubernetes pod readiness

A beginner friendly introduction to prometheus 🔥

Doraemon is a Prometheus based monitor system

A set of tests to check compliance with the Prometheus Remote Write specification

Automating Kubernetes Rollouts with Argo and Prometheus. Checkout the demo URL below

📡 Prometheus exporter that exposes metrics from SpaceX Starlink Dish