Durable time-series database that's API-compatible with Prometheus.

Build Status

Project status

Timbala is in a very early stage of development and is not yet production-ready. Please do not use it yet for any data that you care about.

There are several known issues that prevent any serious use of Timbala at this stage; please see the MVP milestone for details.


Timbala logo

Timbala is a distributed, fault-tolerant time-series database intended to provide durable long-term storage for multi-dimensional metrics.

It is designed to integrate easily with Prometheus, supports PromQL and is API-compatible with Prometheus, but can be used standalone.

Data stored in Timbala can be visualised using Grafana by configuring a Prometheus data source pointing to Timbala.

Design goals

Ease of operation

  • one server binary
  • no external dependencies
  • all nodes have the same role
  • all nodes can serve read and write requests

Fault-tolerant

  • no single points of failure
  • data is replicated and sharded across multiple nodes
  • planned features for read repair and active anti-entropy

Highly available

  • high write throughput and availability
Owner
Matt Bostock
Formerly @Cloudflare, @alphagov. OSS advocate and contributor. Running, climbing and travel.
Matt Bostock
Comments
  • Cluster tests run very slowly when race detection is enabled

    Cluster tests run very slowly when race detection is enabled

    The tests in internal/cluster run very slowly when race detection is enabled, especially since 831cd72 added matrix tests for varying cluster sizes and replication factors. The tests now around a minute to run.

    Consider disabling race detection when running just these tests, which will reduce the test run time significantly.

  • Partition and replicate ingested data samples using hashring

    Partition and replicate ingested data samples using hashring

    Build on the work done in #54 to partition and replicate ingested data samples to peers in the cluster using the consistent hashing algorithm chosen in #27 and the partition key schema chosen in #12.

  • Design API for data repair

    Design API for data repair

    When a node in the cluster has corrupted or missing data, it must be able to receive a known good copy of data from another node.

    The aim of this issue is to determine what the interface for sending/receiving data for the purpose of repair should look like.

    The primary requirement for such an interface is that it must be efficient; the CPU and memory cost of serialising and deserialising the data as well as the size of data sent over the wire must be kept to a minimum.

  • Use command-line flags for configuration

    Use command-line flags for configuration

    Command-line flags are self-documenting via the flag.Usage() function, so use those instead of environment variables for configuration.

    I may also allow the flag values to be set using environment variables.

    Probably worth evaluating flag libraries other than the one from Go's standard library as it could do this for me.

  • Implement Prometheus remote read API

    Implement Prometheus remote read API

    The Prometheus 'remote read' API should be supported to improve operability with Prometheus, allowing AthensDB to be used a storage backend for Prometheus.

  • Add MkDocs stub for documentation and website

    Add MkDocs stub for documentation and website

    Use the MkDocs static site generator to add stub documentation.

    Uses the Material theme for MkDocs because:

    • it has a clean design
    • the search functionality is very slick
    • it has a responsive layout; works on phones/tablets
    • it supports box-out notes and warnings
    • it supports Markdown

    https://github.com/squidfunk/mkdocs-material

    To see a documentation in a development environment, use:

    make servedocs
    

    Include building the documentation in strict mode as part of the test make target to ensure that we have not broken the docs when making changes. Add the generated site directory to .gitignore.

    Add Docker as a dependency for Travis tests so that we can use Docker to test the documentation builds correctly.

    Pin the version of the squidfunk/mkdocs-material Docker image to ensure the build is reproducible.

  • Determine data to use for benchmarking

    Determine data to use for benchmarking

    Options: https://github.com/prometheus/prombench/blob/master/apps/load-generator/main.py https://github.com/prometheus/tsdb/blob/master/testdata/20k.series

  • Basic acceptance tests for Prometheus v1 API

    Basic acceptance tests for Prometheus v1 API

    Add basic acceptance tests for the Prometheus v1 API, i.e. for simple queries. As opposed to the API unit tests already in master, these tests should test the server as a 'blackbox'; they should start the server and query the API endpoints as a normal client would do.

    Potential options:

    • use the webdriver protocol (e.g. Selenium, PhantomJS), which will work for acceptance-testing the API and also for testing any future web UI (e.g. for viewing the status of the cluster)
    • use Go's net/http library
  • Vendor dependencies

    Vendor dependencies

    Vendor the current project dependencies so that we can have reproducible builds and add a Makefile so that future dependencies can be vendored easily.

  •  Simplify read fanout

    Simplify read fanout

    • Rename fanoutQuerier to remoteQuerier, which is more descriptive of its role

    • Only pass a URL to remoteQuerier to improve encapsulation

    • Rename Read() to remoteRead() and make it a method of remoteQuerier since most of its arguments are fields belonging to remoteQuerier d4e2a2e

    • Remove duplicated error handling

  • Upgrade Prometheus library dependencies

    Upgrade Prometheus library dependencies

    Update to the latest version of the Prometheus libraries and make necessary changes to satisfy API changes:

    • Use NewMergeSeriesSet() instead of DeduplicateSeriesSet(), which was removed in prometheus/prometheus@bb724f1.

    • Explicitly specify the options for the PromQL engine to ensure that PromQL metrics are registered following the change in prometheus/prometheus@f8fccc7. The options used here are copied from the defaults.

    Prometheus vendors its own dependencies but also provides library packages, meaning that we have to manage its dependencies as transient dependencies. Using the dep tool, we cannot manage transient dependencies using regular constraints so I had to pin the versions in Gopkg.toml using overrides.

    See: https://github.com/golang/dep/blob/master/docs/FAQ.md#how-do-i-constrain-a-transitive-dependencys-version golang/dep#999 (comment) golang/dep#302

  • Current partition key schema requires that queries must be run against all nodes

    Current partition key schema requires that queries must be run against all nodes

    The current partition key schema, in addition to the lack of any centralised index, requires that queries must be run against all nodes in the cluster.

    The current schema can be represented as:

    <salt>:<bucket_end_time_as_YYYYMMDD>:<metric_name>:[<label_name>,<label_name>...]

    Since the label names are often not known at query time, and PromQL allows querying without the metric name, the current schema means that all nodes must be queried in order to ensure that all matches time-series are retrieved.

    The partition key schema should be improved to limit the number of nodes that must be queried. There is an inherent tension between limiting the number of nodes that need to be queried and balancing ingestion across as many nodes in the cluster as possible.

  • Nodes must persist which bucket in the hashring they belong to

    Nodes must persist which bucket in the hashring they belong to

    Using Jumphash, nodes are represented by buckets.

    In order to minimise the amount of data that moves when nodes are added or removed, nodes must remember which bucket they belong to and use that same bucket when they reconnect to the cluster.

    Found during the implementation of #149.

    This limitation is mentioned in the abstract for Jumphash:

    Its main limitation is that the buckets must be numbered sequentially, which makes it more suitable for data storage applications than for distributed web caching.

    ...the consequences of which were not clear to me when deciding on #27.

  • Determine whether to support deleting series

    Determine whether to support deleting series

    The API endpoint to delete series is disabled currently: https://github.com/mattbostock/timbala/blob/8a2f093838778f5e84d1355970ed339eea918c17/internal/api/v1/api.go#L346-L347

    Determine whether deleting series is something Timbala should support; doing so has implications for consistency in a distributed system.

StaticBackend is a simple backend server API handling user mgmt, database, storage and real-time component
StaticBackend is a simple backend server API handling user mgmt, database, storage and real-time component

StaticBackend is a simple backend that handles user management, database, file storage, forms, and real-time experiences via channel/topic-based communication for web and mobile applications.

Jan 7, 2023
CPU usage percentage is the ratio of the total time the CPU was active, to the elapsed time of the clock on your wall.

Docker-Kubernetes-Container-CPU-Utilization Implementing CPU Load goroutine requires the user to call the goroutine from the main file. go CPULoadCalc

Dec 15, 2021
The Oracle Database Operator for Kubernetes (a.k.a. OraOperator) helps developers, DBAs, DevOps and GitOps teams reduce the time and complexity of deploying and managing Oracle Databases

The Oracle Database Operator for Kubernetes (a.k.a. OraOperator) helps developers, DBAs, DevOps and GitOps teams reduce the time and complexity of deploying and managing Oracle Databases. It eliminates the dependency on a human operator or administrator for the majority of database operations.

Dec 14, 2022
A simple tool who pulls data from Online.net API and parse them to a Prometheus format

Dedibox backup monitoring A simple tool who reads API from Online.net and parse them into a Prometheus-compatible format. Conceived to be lightweight,

Aug 16, 2022
The dumb container runtime trying to be compatible with Kubernetes CRI

Go Dumb CRI The dumb container runtime trying to be compatible with Kubernetes CRI. Usage Run the server and create an IPC socket in /tmp/go-dumbcri.s

Dec 12, 2021
A reverse engineered github actions compatible self-hosted runner using nektos/act to execute your workflow steps

github-act-runner A reverse engineered github actions compatible self-hosted runner using nektos/act to execute your workflow steps. Unlike the offici

Dec 24, 2022
kubectl plugin for generating nginx-ingress compatible basic-auth secrets on kubernetes clusters

kubectl-htpasswd kubectl plugin for easily generating hashed basic auth secrets. Supported hash algorithms bcrypt Examples Create the secret on the cl

Jul 17, 2022
Integrated ssh-agent for windows. (pageant compatible. openSSH ssh-agent etc ..)
Integrated ssh-agent for windows. (pageant compatible. openSSH ssh-agent etc ..)

OmniSSHAgent About The chaotic windows ssh-agent has been integrated into one program. Chaos Map of SSH-Agent on Windows There are several different c

Dec 19, 2022
Reconstruct Open API Specifications from real-time workload traffic seamlessly
Reconstruct Open API Specifications from real-time workload traffic seamlessly

Reconstruct Open API Specifications from real-time workload traffic seamlessly: Capture all API traffic in an existing environment using a service-mes

Jan 1, 2023
A simple CLI and API client for One-Time Secret

OTS Go client otsgo is a simple CLI and API client for One-Time Secret written i

Dec 27, 2021
A simple go application that uses Youtube Data API V3 to show the real-time stats for a youtube channel such as the subs, views, avg. earnings etc.
A simple go application that uses Youtube Data API V3 to show the real-time stats for a youtube channel such as the subs, views, avg. earnings etc.

Youtube-channel-monitor A simple go application that uses Youtube Data API V3 to show the real-time stats for a youtube channel such as the subs, view

Dec 30, 2021
A seed repository that contains a Go project that accepts input via a REST API and saves data to an Oracle database.

rest-oracle-go-seed A seed repository that contains a Go project that accepts input via a REST API and saves data to an Oracle database. Why Oracle? T

Apr 18, 2022
Translate Prometheus Alerts into Kubernetes pod readiness

prometheus-alert-readiness Translates firing Prometheus alerts into a Kubernetes readiness path. Why? By running this container in a singleton deploym

Oct 31, 2022
A beginner friendly introduction to prometheus 🔥
A beginner friendly introduction to prometheus 🔥

Prometheus-Basics A beginner friendly introduction to prometheus. Table of Contents What is prometheus ? What are metrics and why is it important ? Ba

Dec 29, 2022
Doraemon is a Prometheus based monitor system
Doraemon is a Prometheus based monitor system

English | 中文 Doraemon Doraemon is a Prometheus based monitor system ,which are made up of three components——the Rule Engine,the Alert Gateway and the

Nov 28, 2022
A set of tests to check compliance with the Prometheus Remote Write specification

Prometheus Remote Write Compliance Test This repo contains a set of tests to check compliance with the Prometheus Remote Write specification. The test

Dec 4, 2022
Automating Kubernetes Rollouts with Argo and Prometheus. Checkout the demo URL below
Automating Kubernetes Rollouts with Argo and Prometheus. Checkout the demo URL below

observe-argo-rollout Demo for Automating and Monitoring Kubernetes Rollouts with Argo and Prometheus Performing Demo The demo can be found on Katacoda

Nov 16, 2022
📡 Prometheus exporter that exposes metrics from SpaceX Starlink Dish
📡  Prometheus exporter that exposes metrics from SpaceX Starlink Dish

Starlink Prometheus Exporter A Starlink exporter for Prometheus. Not affiliated with or acting on behalf of Starlink(â„¢) ?? Starlink Monitoring System

Dec 19, 2022