Fast specialized time-series database for IoT, real-time internet connected devices and AI analytics.

unitdb GoDoc Go Report Card Build Status Coverage Status

Unitdb is blazing fast specialized time-series database for microservices, IoT, and realtime internet connected devices. As Unitdb satisfy the requirements for low latency and binary messaging, it is a perfect time-series database for applications such as internet of things and internet connected devices.

Don't forget to ⭐ this repo if you like Unitdb!

About unitdb

Key characteristics

  • 100% Go
  • Can store larger-than-memory data sets
  • Optimized for fast lookups and writes
  • Supports writing billions of records per hour
  • Supports opening database with immutable flag
  • Supports database encryption
  • Supports time-to-live on message entries
  • Supports writing to wildcard topics
  • Data is safely written to disk with accuracy and high performant block sync technique

Quick Start

To build Unitdb from source code use go get command.

go get github.com/unit-io/unitdb

Usage

Detailed API documentation is available using the go.dev service.

Make use of the client by importing it in your Go client source code. For example,

import "github.com/unit-io/unitdb"

Unitdb supports Get, Put, Delete operations. It also supports encryption, batch operations, and writing to wildcard topics. See usage guide.

Samples are available in the examples directory for reference.

Clustering

To bring up the Unitdb cluster start 2 or more nodes. For fault tolerance 3 nodes or more are recommended.

> ./bin/unitdb -listen=:6060 -grpc_listen=:6080 -cluster_self=one -db_path=/tmp/unitdb/node1
> ./bin/unitdb -listen=:6061 -grpc_listen=:6081 -cluster_self=two -db_path=/tmp/unitdb/node2

Above example shows each Unitdb node running on the same host, so each node must listen on different ports. This would not be necessary if each node ran on a different host.

Architecture Overview

The unitdb engine handles data from the point put request is received through writing data to the physical disk. Data is compressed and encrypted (if encryption is set) then written to a WAL for immediate durability. Entries are written to memdb and become immediately queryable. The memdb entries are periodically written to log files in the form of blocks.

To efficiently compact and store data, the unitdb engine groups entries sequence by topic key, and then orders those sequences by time and each block keep offset of previous block in reverse time order. Index block offset is calculated from entry sequence in the time-window block. Data is read from data block using index entry information and then it un-compresses the data on read (if encryption flag was set then it un-encrypts the data on read).

Unitdb stores compressed data (live records) in a memdb store. Data records in a memdb are partitioned into (live) time-blocks of configured capacity. New time-blocks are created at ingestion, while old time-blocks are appended to the log files and later sync to the disk store.

When Unitdb receives a put or delete request, it first writes records into tiny-log for recovery. Tiny-logs are added to the log queue to write it to the log file. The tiny-log write is triggered by the time or size of tiny-log incase of backoff due to massive loads.

The tiny-log queue is maintained in memory with a pre-configured size, and during massive loads the memdb backoff process will block the incoming requests from proceeding before the tiny-log queue is cleared by a write operation. After records are appended to the tiny-log, and written to the log files the records are then sync to the disk store using blazing fast block sync technique.

Next steps

In the future, we intend to enhance the Unitdb with the following features:

  • Distributed design: We are working on building out the distributed design of Unitdb, including replication and sharding management to improve its scalability.
  • Developer support and tooling: We are working on building more intuitive tooling, refactoring code structures, and enriching documentation to improve the onboarding experience, enabling developers to quickly integrate Unitdb to their time-series database stack.
  • Expanding feature set: We also plan to expand our query feature set to include functionality such as window functions and nested loop joins.
  • Query engine optimization: We will also be looking into developing more advanced ways to optimize query performance such as GPU memory caching.

Contributing

As Unitdb is under active development and at this time Unitdb is not seeking major changes or new features from new contributors. However, small bugfixes are encouraged.

Licensing

This project is licensed under Apache-2.0 License.

Owner
Saffat Technologies
Saffat Technologies
Similar Resources

Real-time Geospatial and Geofencing

Real-time Geospatial and Geofencing

Tile38 is an open source (MIT licensed), in-memory geolocation data store, spatial index, and realtime geofence. It supports a variety of object types

Dec 30, 2022

Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on commands and key-regexes written by GO

Welcome to NIPO Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on command

Dec 28, 2022

Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures.

Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures. capabilities which owl provides include Process approval、sql Audit、sql execute and execute as crontab、data backup and recover .

Nov 9, 2022

Beerus-DB: a database operation framework, currently only supports Mysql, Use [go-sql-driver/mysql] to do database connection and basic operations

Beerus-DB · Beerus-DB is a database operation framework, currently only supports Mysql, Use [go-sql-driver/mysql] to do database connection and basic

Oct 29, 2022

rosedb is an embedded and fast k-v database based on LSM + WAL

rosedb is an embedded and fast k-v database based on LSM + WAL

A simple k-v database in pure Golang, supports string, list, hash, set, sorted set.

Dec 30, 2022

BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go

BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go

BadgerDB BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go. It is the underlying database for Dgraph, a fast,

Dec 10, 2021

This is a simple graph database in SQLite, inspired by "SQLite as a document database".

About This is a simple graph database in SQLite, inspired by "SQLite as a document database". Structure The schema consists of just two structures: No

Jan 3, 2023

Hard Disk Database based on a former database

Hard Disk Database based on a former database

Nov 1, 2021

Simple key value database that use json files to store the database

KValDB Simple key value database that use json files to store the database, the key and the respective value. This simple database have two gRPC metho

Nov 13, 2021
Comments
  • 300% + CPU load. go 1.17.2

    300% + CPU load. go 1.17.2

    single node. 1 client reconnected. and send some messages. CPU load will be 300% in 5-10minutes.

    even if I disabled all routing spawned in stats.go: New().

    please find config below: { // Default HTTP(S) address:port to listen on for websocket. Either a // numeric or a canonical name, e.g. ":80" or ":https". Could include a host name, e.g. // "localhost:80". // Could be blank: if TLS is not configured, will use ":80", otherwise ":443". // Can be overridden from the command line, see option --listen. "listen": ":6060",

    // Default HTTP(S) address:port to listen on for grpc. Either a
    // numeric or a canonical name, e.g. ":80" or ":https". Could include a host name, e.g.
    // "localhost:80".
    // Could be blank: if TLS is not configured, will use ":80", otherwise ":443".
    // Can be overridden from the command line, see option --listen.
    "grpc_listen": ":6080",
    
    // Default logging level is "InfoLevel" so to enable the debug log set the "LogLevel" to "DebugLevel".
    "logging_level": "Error",
    
    // Maximum message size allowed from client in bytes (262144 = 256KB).
    // Intended to prevent malicious clients from sending very large messages inband (does
    // not affect out-of-band large files).
    "max_message_size": 262144,
    
    // Maximum number of subscribers per group topic.
    "max_subscriber_count": 128,
    
    // Encryption configuration
    "encryption_config": {
        // chacha20poly1305 encryption key for client Ids and topic keys. 32 random bytes base64-encoded.
    	// Generate your own and keep it secret.
        "key": "4BWm1vZletvrCDGWsF6mex8oBSd59m6I",
        // Key identifier. it is useful when you use multiple keys.
        "identifier":"local",
        // slealed flag tells if key in the configuration is sealed.
        "sealed":false,
        // timestamp is helpful to determine the latest key in case of keyroll over.
        "timestamp":1522325758
    },
    
    // Cluster-mode configuration.
    "cluster_config": {
    	// Name of this node. Can be assigned from the command line.
    	// Empty string disables clustering.
    	"self": "",
    
    	// List of available nodes.
    	"nodes": [
    		// Name and TCP address of every node in the cluster.
    		{"name": "one", "addr":"localhost:12001"},
    		{"name": "two", "addr":"localhost:12002"},
    		{"name": "three", "addr":"localhost:12003"}
    	],
    
    	// Failover config.
    	"failover": {
    		// Failover is enabled.
    		"enabled": true,
    		// Time in milliseconds between heartbeats.
    		"heartbeat": 100,
    		// Initiate leader election when the leader is not available for this many heartbeats.
    		"vote_after": 8,
    		// Consider node failed when it missed this many heartbeats.
    		"node_fail_after": 16
    	}
    },
    
    // Database configuration
    "store_config": {
    	// reset message store on service restart 
    	"reset": false,
    	// Configurations of individual adapters.
    	"adapters": {
    		// unitdb configuration.
    		"unitdb": {
    			// Name of the database.
    			"database": "unitdb",
    			// Memdb message store size
    			"mem_size": 500000000
    		}
    	}
    }
    

    }

Scalable datastore for metrics, events, and real-time analytics

InfluxDB InfluxDB is an open source time series platform. This includes APIs for storing and querying data, processing it in the background for ETL or

Jan 5, 2023
Scalable datastore for metrics, events, and real-time analytics

InfluxDB InfluxDB is an open source time series platform. This includes APIs for storing and querying data, processing it in the background for ETL or

Jan 4, 2023
A GPU-powered real-time analytics storage and query engine.
A GPU-powered real-time analytics storage and query engine.

AresDB AresDB is a GPU-powered real-time analytics storage and query engine. It features low query latency, high data freshness and highly efficient i

Jan 7, 2023
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
VictoriaMetrics: fast, cost-effective monitoring solution and time series database

VictoriaMetrics VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database. It is available in binary release

Jan 8, 2023
The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Dec 31, 2022
LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability.
LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability.

LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability. LinDB stores all monitoring data of ELEME Inc, there is 88TB incremental writes per day and 2.7PB total raw data.

Jan 1, 2023
TalariaDB is a distributed, highly available, and low latency time-series database for Presto
TalariaDB is a distributed, highly available, and low latency time-series database for Presto

TalariaDB is a distributed, highly available, and low latency time-series database that stores real-time data. It's built on top of Badger DB.

Nov 16, 2022
Export output from pg_stat_activity and pg_stat_statements from Postgres into a time-series database that supports the Influx Line Protocol (ILP).

pgstat2ilp pgstat2ilp is a command-line program for exporting output from pg_stat_activity and pg_stat_statements (if the extension is installed/enabl

Dec 15, 2021
Time Series Database based on Cassandra with Prometheus remote read/write support

SquirrelDB SquirrelDB is a scalable high-available timeseries database (TSDB) compatible with Prometheus remote storage. SquirrelDB store data in Cass

Oct 20, 2022
🤔 A minimize Time Series Database, written from scratch as a learning project.
🤔 A minimize Time Series Database, written from scratch as a learning project.

mandodb ?? A minimize Time Series Database, written from scratch as a learning project. 时序数据库(TSDB: Time Series Database)大多数时候都是为了满足监控场景的需求,这里先介绍两个概念:

Jan 3, 2023