An observability database aims to ingest, analyze and store Metrics, Tracing and Logging data.

Last update: Dec 31, 2022

Comments: 12

BanyanDB

BanyanDB, as an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data. It's designed to handle observability data generated by Observability platform and APM system, like Apache SkyWalking etc.

Resource

Data model design

Contributing

For developers who want to contribute to this project, see Contribution Guide

License

Apache 2.0 License.

Owner

The Apache Software Foundation

https://github.com/apache/skywalking-banyandb https://skywalking.apache.org/

Comments

Add streaming API and topN aggregator
This PR tends to introduce a simple stream processing API and implementation for TopN aggregation.

Design

Flow is an abstraction for the streaming processes, with following operator(s),

Source: provide data stream for the Flow. As we've discussed before, it should be a listener consuming Measure write request continuously. Later, we could use a global binlog/WAL.

Mapper: func(T) R which transforms an element from T to R

Filter: func(T) bool which predicates whether an element should be passed to the downstream

Windows: Currently only SlidingEventTimeWindows is implemented

Sink: the place to write final result. We have to write the final result, e.g. TopN into a separate Measure storage.

Filter

s := flow.New(tt.input). Filter(func(i int) bool { return i%2 == 0 }). To(snk)

The Filter operator allows us to filter by criteria set in TopNAggregation.

Mapper

s := flow.New(tt.input). Map(func(i int) int { return i * 2 }). To(snk)

The Mapper operator allows us to extract field from the record and transform it by groupBy operation.

We currently do not have a separate keyBy (for example Apache/Flink) operation to do groupBy for simplicity

Windows

Generally, the split of the window can be related to the "time", which could be either of the following concepts,

Event Time

Processing Time

The above graph from Flink community discriminates these concepts. But in our case, the only "time" we care about is EventTime which represents the exact moment the record is produced, since we need to use the EventTime to drive the timely flush of the aggregation results, e.g. TopN ranks.
Sliding windows can fulfill our requirement in the sense that we need to flush the data more frequently.
It means the flush interval should be much smaller than the interval of the real data points. For example in OAP, the downsampling rate can be MINUTE while the flush timer is set to 25 seconds by default.

Technically, the SlidingEventWindow is built on,

A PriorityQueue maintains records which have not yet been emitted,

A PriorityQueueSet maintains all registered (depulicated) timers which will be triggered later

TopN

With the above semantics, we can impl TopN as a window aggregation function,

flow.New(tt.input). Filter(...). // where Mapper(...) // select and groupBy Window(NewSlidingTimeWindows(time.Minute*1, time.Second*15)). TopN(10, OrderBy(modelv1.SORT_DESC), ...) // TopN with parameters To(snk)

TopN is implemented with the help of a TreeMap which maps sortedKey to the collection of records.
Add elementUI, sass and [email protected]
Add elementUI, sass and [email protected]

Initialize page structure

Add Database.vue and Structure.vue

delete Laws.vue

Add Header Component

Add NavMenu from ElementUI
Add groupBy to the measure query request
Add groupBy and aggregation function to the query request:

the query request doesn't support sub or nested aggregation

the response's timestamp field is null on returning the aggregated result

the result is as same as the order by if the request doesn't specify the agg function on grouping
Add docs

Fixes https://github.com/apache/skywalking/issues/8989

I leave empty CRUD examples for the future CLI tools.

Signed-off-by: Gao Hongtao [email protected]

Benchmark flatbuffers and protobuf

Benchmark env

CPU: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
Memory: 8 GB 2133 MHz LPDDR3
Java: JDK8u292b10
protoc: 3.17.3
protobuf-java: 3.17.2
flatc: 2.0.0
flatbuffers-java: 2.0.2

Performance: Serialization+Java

/**
 * # JMH version: 1.32
 * # VM version: JDK 1.8.0_292, OpenJDK 64-Bit Server VM, 25.292-b10
 * # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
 * # VM options: -javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=55698:/Applications/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
 * # Blackhole mode: full + dont-inline hint
 * # Warmup: 5 iterations, 10 s each
 * # Measurement: 5 iterations, 10 s each
 * # Timeout: 10 min per iteration
 * # Threads: 1 thread, will synchronize iterations
 * # Benchmark mode: Average time, time/op
 * <p>
 * Benchmark                                                                  Mode  Cnt     Score    Error   Units
 * WriteEntitySerializationTest.flatbuffers                                   avgt   25  3044.054 ± 34.837   ns/op
 * WriteEntitySerializationTest.flatbuffers:·gc.alloc.rate                    avgt   25   911.872 ± 10.301  MB/sec
 * WriteEntitySerializationTest.flatbuffers:·gc.alloc.rate.norm               avgt   25  3056.000 ±  0.001    B/op
 * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Eden_Space           avgt   25   912.394 ± 10.261  MB/sec
 * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Eden_Space.norm      avgt   25  3057.783 ± 10.009    B/op
 * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Survivor_Space       avgt   25     0.190 ±  0.018  MB/sec
 * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Survivor_Space.norm  avgt   25     0.637 ±  0.059    B/op
 * WriteEntitySerializationTest.flatbuffers:·gc.count                         avgt   25  3878.000           counts
 * WriteEntitySerializationTest.flatbuffers:·gc.time                          avgt   25  2168.000               ms
 * WriteEntitySerializationTest.protobuf                                      avgt   25   514.010 ± 12.638   ns/op
 * WriteEntitySerializationTest.protobuf:·gc.alloc.rate                       avgt   25  3833.648 ± 90.162  MB/sec
 * WriteEntitySerializationTest.protobuf:·gc.alloc.rate.norm                  avgt   25  2168.000 ±  0.001    B/op
 * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Eden_Space              avgt   25  3835.530 ± 94.020  MB/sec
 * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Eden_Space.norm         avgt   25  2168.989 ±  6.134    B/op
 * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Survivor_Space          avgt   25     0.195 ±  0.020  MB/sec
 * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Survivor_Space.norm     avgt   25     0.110 ±  0.011    B/op
 * WriteEntitySerializationTest.protobuf:·gc.count                            avgt   25  3629.000           counts
 * WriteEntitySerializationTest.protobuf:·gc.time                             avgt   25  2227.000               ms
 */

Performance: Deserialization+Go

goos: darwin
goarch: amd64
pkg: github.com/apache/skywalking-banyandb/benchmark/go-bench
cpu: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
Benchmark_Deser_Flatbuffers-8   	100000000	       730.5 ns/op	      64 B/op	       2 allocs/op
Benchmark_Deser_Protobuf-8      	14826262	      5044 ns/op	    1944 B/op	      49 allocs/op
PASS
ok  	github.com/apache/skywalking-banyandb/benchmark/go-bench	153.927s

Size

For the same entity as illustrated in WriteEntitySerializationTest.EntityModel, (Unit in bytes)

Flatbuffers: 512
Protobuf: 169

which means Protobuf is much more compact.

Conclusion

From the perspective of bandwidth and write performance, protobuf is definitely a better choice.

However, flatbuffer has better deserialization performance, in particular for partially read. The process of deserialization (i.e. GetRootAs***) actually does nothing. The real deserialization happens when users try to read from the byte buffer.

References

https://www.ida.liu.se/~nikca89/papers/networking20c.pdf

Our conclusion is similar to what has been described in the above paper. See Fig.3,4,5.

Feat: query module
Design

This is a very perlimenary PR for the query module so far. Much things have to be considered further.

Since it would be a principle module, I just want to discuss the current design/implementation ASAP to avoid improper design and find some better ideas to proceed.

Logical Plan

A Logical Plan is a DAG (Directed Acyclic Graph) of Params. The Param defines necessary parameters for query execution. The parameters are well prepared during the logical plan composing in order to reduce extra cost while executing physical plan.

Plot

the logical plan can be plotted as Dot graph. For example,

digraph { n2[label="ChunkIDsFetch{metadata={group=skywalking,name=trace},projection=[TraceID startTime]}"]; n1[label="ChunkIDsMerge{}"]; n7[label="IndexScan{begin=1623203253604099000,end=1623214053604099000,KeyName=duration,conditions=[<=1000],metadata={group=skywalking,name=trace}}"]; n4[label="Pagination{Offset=0,Limit=0}"]; n5[label="Root{}"]; n3[label="SortMerge{fieldName=startTime,sort=DESC}"]; n6[label="TraceIDFetch{TraceID=aaaaaaaa,metadata={group=skywalking,name=trace},projection=[TraceID startTime]}"]; n2->n3; n1->n2; n7->n1; n5->n6; n5->n7; n3->n4; n6->n3; }

We can leverage the online toolkit to visualize the logical plan,

Physical Plan

A Physical Plan contains the logical plan and the Transform(s) corresponding to each logical.Op.

While the plan is triggered to run, a reversed topology-sorted slices (Future as items) will be generated from the logical plan.

Tasks

[x] client utils for building EntityCriteria

[x] logical plan: Ops such as Sort, OffsetAndLimit, ChunkIDMerge, TableScan and IndexScan

[x] physical plan: topology sort, Transform

[ ] complete API to connect with Liaison (Add handlers) (Maybe next PR)

To be discussed

Index selection and optimization stage

For now, I only use single-value indexes. But in the current implementation, we may be able to improve index selection during the process of generating Logical Plan.

Any better idea? Since for the traditional databases, normally they have optimization stages (usually after generating hierarchical logical plan?) for indexes selection. How can we fit this optimization stage in our implementations?

Sort and field orderliness

I believe we have to impose stronger preconditions to sort-field since it is not possible to sort on a sparse field.

And Sort requires the Field to be arranged in a strict order, i.e. we have to use fieldIndex number to access the field that is needed to be sorted quickly. Otherwise, it may cost much resources to find the specific field every time.
Add bydbctl's examples

The CRUD and query examples are based on bydbctl.

@lujiajing1126 @wankai123 you could use this command line tool to query schemas and data from server.
Update go 1.19
Lint issues

Several lint errors occur after upgrading to go 1.19

Package comments missing: https://tip.golang.org/doc/go1.19#go-doc

Migrate github.com/golang/protobuf/jsonpb -> google.golang.org/protobuf/encoding/protojson

fully deprecate io/ioutil
Add measure query
This PR introduces basic measure query feature with local index scan.

The implementation is based on,

No global index for measure

No limit and offset for measure

As we've discussed, GroupBy and Aggregation will come after this PR.
Reload stream when metadata changes

This PR supersedes the previous PR #65 to allow metadata reload while it put logic mostly in the stream module instead of pursuing a strong-consistent metadata in the previous PR.

As a result, the stream model starts a serially-running background job to continuously reconcile the opened stream (with underlying storage).

Please have a review with the new design @hanahmily

More test cases will be added later. I suppose Eventually method is necessary for these kinds of tests.
Introduce bytebuffer pool

In this PR, I've introduced a very simple bytebuffer to optimize byte manipulation.

The benchmark of query path has been added, the result shows approx. ~10% less allocation.
Add UI for creating and editing tagfamilies and tags. Change some UI styles.
This PR is mainly for add UI for creating and editing tagfamilies and tags, and to change some UI styles.

1. add UI for creating and editing tagfamilies and tags.

The original 'new resources' UI：

The old UI does not provide the function of adding and editing tagfamilies and tags,This is not good for the user experience.So I added a UI for this, which provides new and editing functions.

First, you can right-click the group and click 'new resources' to enter the dialog box of adding' resources'：

Obviously, it provides add, delete, edit and batch delete functions.

You can click the 'Add the tagfamilies' button to enter the dialog box for adding' tagfamilies'：

It provides add, delete, edit and batch delete functions too.

Tips: It is worth noting that the UI is not connected to the background interface temporarily, but it does not affect the original new 'resources' function. I need to know whether the newly added 'resources' interface provides the newly added tagfamily function, and what is its data structure? Maybe you can help me?

change some UI styles：

The old right click menu:

The new right click menu:

The old dialog:

The new dialog:

Distributed tracing using OpenTelemetry and ClickHouse

Distributed tracing backend using OpenTelemetry and ClickHouse Uptrace is a dist

Jan 2, 2023

A tool I made to quickly store bug bounty program scopes in a local sqlite3 database

GoScope A tool I made to quickly store bug bounty program scopes in a local sqlite3 database. Download or copy a Burpsuite configuration file from the

Nov 18, 2021

GoRose(go orm), a mini database ORM for golang, which inspired by the famous php framwork laravle's eloquent. It will be friendly for php developer and python or ruby developer. Currently provides six major database drivers: mysql,sqlite3,postgres,oracle,mssql, Clickhouse.

GoRose ORM _______ ______ .______ ______ _______. _______ / _____| / __ \ | _ \ / __ \ / || ____| |

Dec 24, 2022

[mirror] the database client and tools for the Go vulnerability database

The Go Vulnerability Database golang.org/x/vulndb This repository is a prototype of the Go Vulnerability Database. Read the Draft Design. Neither the

Dec 29, 2022

Database - Example project of database realization using drivers and models

database Golang based database realization Description Example project of databa

Feb 10, 2022

Library for scanning data from a database into Go structs and more

scany Overview Go favors simplicity, and it's pretty common to work with a database via driver directly without any ORM. It provides great control and

Jan 9, 2023

Lightweight SQL database written in Go for prototyping and playing with text (CSV, JSON) data

gopicosql Lightweight SQL database written in Go for prototyping and playing wit

Jul 27, 2022

Convert data exports from various services to a single SQLite database

Bionic Bionic is a tool to convert data exports from web apps to a single SQLite database. Bionic currently supports data exports from Google, Apple H

Dec 9, 2022

Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

?? Dumpling Dumpling is a tool and a Go library for creating SQL dump from a MySQL-compatible database. It is intended to replace mysqldump and mydump

Nov 9, 2022

An observability database aims to ingest, analyze and store Metrics, Tracing and Logging data.

BanyanDB

Resource

Contributing

License

Owner

The Apache Software Foundation

Comments

Add streaming API and topN aggregator

Design

Filter

Mapper

Windows

TopN

Add elementUI, sass and [email protected]

Add groupBy to the measure query request

Add docs

Benchmark flatbuffers and protobuf

Benchmark env

Performance: Serialization+Java

Performance: Deserialization+Go

Size

Conclusion

References

Feat: query module

Design

Logical Plan

Plot

Physical Plan

Tasks

To be discussed

Index selection and optimization stage

Sort and field orderliness

Add bydbctl's examples

Update go 1.19

Lint issues

Add measure query

Reload stream when metadata changes

Introduce bytebuffer pool

Add UI for creating and editing tagfamilies and tags. Change some UI styles.

Related tags

Distributed tracing using OpenTelemetry and ClickHouse

A tool I made to quickly store bug bounty program scopes in a local sqlite3 database

GoRose(go orm), a mini database ORM for golang, which inspired by the famous php framwork laravle's eloquent. It will be friendly for php developer and python or ruby developer. Currently provides six major database drivers: mysql,sqlite3,postgres,oracle,mssql, Clickhouse.

[mirror] the database client and tools for the Go vulnerability database

Database - Example project of database realization using drivers and models

Library for scanning data from a database into Go structs and more

Lightweight SQL database written in Go for prototyping and playing with text (CSV, JSON) data

Convert data exports from various services to a single SQLite database

Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

Create key value sqlite3 database from tabular data, fast.

Make a sqlite3 database from tabular data, fast.

A database connection tool for sensitive data

A go package to add support for data at rest encryption if you are using the database/sql.

A tool to run queries in defined frequency and expose the count as prometheus metrics. Supports MongoDB and SQL

InfluxDB metrics exporter for OpenCensus.io

Simple key-value store on top of SQLite or MySQL

A Go rest API project that is following solid and common principles and is connected to local MySQL database.

Database Access Layer for Golang - Testable, Extendable and Crafted Into a Clean and Elegant API

🏋️ dbbench is a simple database benchmarking tool which supports several databases and own scripts