TalariaDB is a distributed, highly available, and low latency time-series database for Presto

Talaria

TalariaDB is a distributed, highly available, and low latency time-series database that stores real-time data. It's built on top of Badger DB.
Blog: https://engineering.grab.com/big-data-real-time-presto-talariadb

Announcement

We've moved! This repository is no longer maintained actively. Feel free to contribute to Kelindar/Talaria that is maintained actively.

Grab has migrated to the latest build of that repository and will continue to contribute there.

About TalariaDB

In Grab, millions and millions of transactions and connections take place every day on our platform, which requires data-driven decision making. And these decisions need to be made based on real-time data. For example, an experiment might inadvertently cause a significant increase of waiting time for riders.

To overcome the challenge of retrieving information from large amounts of data, we designed and built TalariaDB. It addresses our need to query at least 2-3 terabytes of data per hour with predictable low query latency and low cost. Most importantly, it plays very nicely with the different tools’ ecosystems and lets us query data using SQL.

Architecture

alt text

The diagram above shows how TalariaDB ingests and serves data.

  • The upstream ETL pipeline prepares the ORC files and store them in the S3 bucket as input.
  • AWS SQS is created as the event notification service of the S3 bucket and notifies TalariaDB of any new object uploaded on S3.
  • Presto is connected to TalariaDB through Thrift since it implements PrestoThriftService https://prestodb.github.io/docs/current/connector/thrift.html.
  • AWS Route53 is used as DNS web service of TalariaDB, whose domain is registered on Presto cluster. The IP addresses of the DNS record is registered by TalariaDB using Gossip protocol in bootstrap phase.

Currently this project is currently highly coupled with AWS services like SQS, S3 and Route53. We will make these components (storage, DNS) pluggable and make TalariaDB useful for more generic case.

Quick Start

Preconditions

  • setup AWS profile and make sure your machine is accessible to AWS and has enough permission to read from S3, SQS and manipulate Route53 records.

Steps

  1. Set env vars
export X_TALARIA_CONF=(path-to-this-repo)/config-ci.json
  1. Edit config-ci.json with your own configurations (including AWS Route53 and SQS configs)
  2. Start application
go run (path-to-this-repo)/main.go

About Us

TalariaDB is maintained by:

License

TalariaDB is licensed under the --- (LICENSE.md)

Owner
Grab
Driving South-east Asia forward!
Grab
Similar Resources

Beerus-DB: a database operation framework, currently only supports Mysql, Use [go-sql-driver/mysql] to do database connection and basic operations

Beerus-DB · Beerus-DB is a database operation framework, currently only supports Mysql, Use [go-sql-driver/mysql] to do database connection and basic

Oct 29, 2022

CockroachDB - the open source, cloud-native distributed SQL database.

CockroachDB - the open source, cloud-native distributed SQL database.

CockroachDB is a cloud-native SQL database for building global, scalable cloud services that survive disasters. What is CockroachDB? Docs Quickstart C

Jan 2, 2023

The lightweight, distributed relational database built on SQLite.

The lightweight, distributed relational database built on SQLite.

rqlite is a lightweight, distributed relational database, which uses SQLite as its storage engine. Forming a cluster is very straightforward, it grace

Jan 5, 2023

TiDB is an open source distributed HTAP database compatible with the MySQL protocol

TiDB is an open source distributed HTAP database compatible with the MySQL protocol

Slack Channel Twitter: @PingCAP Reddit Mailing list: lists.tidb.io For support, please contact PingCAP What is TiDB? TiDB ("Ti" stands for Titanium) i

Jan 9, 2023

LBADD: An experimental, distributed SQL database

LBADD: An experimental, distributed SQL database

LBADD Let's build a distributed database. LBADD is an experimental distributed SQL database, written in Go. The goal of this project is to build a dat

Nov 29, 2022

A course to build the SQL layer of a distributed database.

TinySQL TinySQL is a course designed to teach you how to implement a distributed relational database in Go. TinySQL is also the name of the simplifed

Jan 8, 2023

decentralized,distributed,peer-to-peer database.

P2PDB 中文 | English 简介 P2PDB(p2p数据库),是一个去中心化、分布式、点对点数据库、P2PDB使用IPFS为其数据存储和IPFS Pubsub自动与对等方同步数据。P2PDB期望打造一个去中心化的分布式数据库,使P2PDB 成为去中心化应用程序 (dApps)、区块链应用程

Jan 1, 2023

Couchbase - distributed NoSQL cloud database

couchbase Couchbase is distributed NoSQL cloud database. create Scope CREATE SCO

Feb 16, 2022

This is a simple graph database in SQLite, inspired by "SQLite as a document database".

About This is a simple graph database in SQLite, inspired by "SQLite as a document database". Structure The schema consists of just two structures: No

Jan 3, 2023
Comments
  • Bump github.com/aws/aws-sdk-go from 1.25.31 to 1.33.0

    Bump github.com/aws/aws-sdk-go from 1.25.31 to 1.33.0

    Bumps github.com/aws/aws-sdk-go from 1.25.31 to 1.33.0.

    Changelog

    Sourced from github.com/aws/aws-sdk-go's changelog.

    Release v1.33.0 (2020-07-01)

    Service Client Updates

    • service/appsync: Updates service API and documentation
    • service/chime: Updates service API and documentation
      • This release supports third party emergency call routing configuration for Amazon Chime Voice Connectors.
    • service/codebuild: Updates service API and documentation
      • Support build status config in project source
    • service/imagebuilder: Updates service API and documentation
    • service/rds: Updates service API
      • This release adds the exceptions KMSKeyNotAccessibleFault and InvalidDBClusterStateFault to the Amazon RDS ModifyDBInstance API.
    • service/securityhub: Updates service API and documentation

    SDK Features

    • service/s3/s3crypto: Introduces EncryptionClientV2 and DecryptionClientV2 encryption and decryption clients which support a new key wrapping algorithm kms+context. (#3403)
      • DecryptionClientV2 maintains the ability to decrypt objects encrypted using the EncryptionClient.
      • Please see s3crypto documentation for migration details.

    Release v1.32.13 (2020-06-30)

    Service Client Updates

    • service/codeguru-reviewer: Updates service API and documentation
    • service/comprehendmedical: Updates service API
    • service/ec2: Updates service API and documentation
      • Added support for tag-on-create for CreateVpc, CreateEgressOnlyInternetGateway, CreateSecurityGroup, CreateSubnet, CreateNetworkInterface, CreateNetworkAcl, CreateDhcpOptions and CreateInternetGateway. You can now specify tags when creating any of these resources. For more information about tagging, see AWS Tagging Strategies.
    • service/ecr: Updates service API and documentation
      • Add a new parameter (ImageDigest) and a new exception (ImageDigestDoesNotMatchException) to PutImage API to support pushing image by digest.
    • service/rds: Updates service documentation
      • Documentation updates for rds

    Release v1.32.12 (2020-06-29)

    Service Client Updates

    • service/autoscaling: Updates service documentation and examples
      • Documentation updates for Amazon EC2 Auto Scaling.
    • service/codeguruprofiler: Updates service API, documentation, and paginators
    • service/codestar-connections: Updates service API, documentation, and paginators
    • service/ec2: Updates service API, documentation, and paginators
      • Virtual Private Cloud (VPC) customers can now create and manage their own Prefix Lists to simplify VPC configurations.

    Release v1.32.11 (2020-06-26)

    Service Client Updates

    • service/cloudformation: Updates service API and documentation
      • ListStackInstances and DescribeStackInstance now return a new StackInstanceStatus object that contains DetailedStatus values: a disambiguation of the more generic Status value. ListStackInstances output can now be filtered on DetailedStatus using the new Filters parameter.
    • service/cognito-idp: Updates service API

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2

    Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2

    Bumps github.com/gogo/protobuf from 1.3.1 to 1.3.2.

    Release notes

    Sourced from github.com/gogo/protobuf's releases.

    Release v.1.3.2

    Tested versions:

    go 1.15.6 protoc 3.14.0

    Bug fixes:

    skippy peanut butter

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

Related tags
The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Dec 31, 2022
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
VictoriaMetrics: fast, cost-effective monitoring solution and time series database

VictoriaMetrics VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database. It is available in binary release

Jan 8, 2023
LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability.
LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability.

LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability. LinDB stores all monitoring data of ELEME Inc, there is 88TB incremental writes per day and 2.7PB total raw data.

Jan 1, 2023
Export output from pg_stat_activity and pg_stat_statements from Postgres into a time-series database that supports the Influx Line Protocol (ILP).

pgstat2ilp pgstat2ilp is a command-line program for exporting output from pg_stat_activity and pg_stat_statements (if the extension is installed/enabl

Dec 15, 2021
Time Series Database based on Cassandra with Prometheus remote read/write support

SquirrelDB SquirrelDB is a scalable high-available timeseries database (TSDB) compatible with Prometheus remote storage. SquirrelDB store data in Cass

Oct 20, 2022
🤔 A minimize Time Series Database, written from scratch as a learning project.
🤔 A minimize Time Series Database, written from scratch as a learning project.

mandodb ?? A minimize Time Series Database, written from scratch as a learning project. 时序数据库(TSDB: Time Series Database)大多数时候都是为了满足监控场景的需求,这里先介绍两个概念:

Jan 3, 2023
Redwood is a highly-configurable, distributed, realtime database that manages a state tree shared among many peers

Redwood is a highly-configurable, distributed, realtime database that manages a state tree shared among many peers. Imagine something like a Redux store, but distributed across all users of an application, that offers offline editing and is resilient to poor connectivity.

Jan 8, 2023
Low-level key/value store in pure Go.
Low-level key/value store in pure Go.

Description Package slowpoke is a simple key/value store written using Go's standard library only. Keys are stored in memory (with persistence), value

Jan 2, 2023
Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures.

Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures. capabilities which owl provides include Process approval、sql Audit、sql execute and execute as crontab、data backup and recover .

Nov 9, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

Jan 9, 2023