An observability database aims to ingest, analyze and store Metrics, Tracing and Logging data.

BanyanDB

BanyanDB, as an observability database, aims to ingest, analyze and store Metrics, Tracing and Logging data. It's designed to handle observability data generated by Observability platform and APM system, like Apache SkyWalking etc.

Resource

Contributing

For developers who want to contribute to this project, see Contribution Guide

License

Apache 2.0 License.

Comments
  • Add streaming API and topN aggregator

    Add streaming API and topN aggregator

    This PR tends to introduce a simple stream processing API and implementation for TopN aggregation.

    Design

    Flow is an abstraction for the streaming processes, with following operator(s),

    • Source: provide data stream for the Flow. As we've discussed before, it should be a listener consuming Measure write request continuously. Later, we could use a global binlog/WAL.
    • Mapper: func(T) R which transforms an element from T to R
    • Filter: func(T) bool which predicates whether an element should be passed to the downstream
    • Windows: Currently only SlidingEventTimeWindows is implemented
    • Sink: the place to write final result. We have to write the final result, e.g. TopN into a separate Measure storage.

    Filter

    s := flow.New(tt.input).
        Filter(func(i int) bool {
            return i%2 == 0
        }).
        To(snk)
    

    The Filter operator allows us to filter by criteria set in TopNAggregation.

    Mapper

    s := flow.New(tt.input).
        Map(func(i int) int {
            return i * 2
        }).
        To(snk)
    

    The Mapper operator allows us to extract field from the record and transform it by groupBy operation.

    We currently do not have a separate keyBy (for example Apache/Flink) operation to do groupBy for simplicity

    Windows

    Generally, the split of the window can be related to the "time", which could be either of the following concepts,

    • Event Time
    • Processing Time
    image

    The above graph from Flink community discriminates these concepts. But in our case, the only "time" we care about is EventTime which represents the exact moment the record is produced, since we need to use the EventTime to drive the timely flush of the aggregation results, e.g. TopN ranks.

    image Sliding windows can fulfill our requirement in the sense that we need to flush the data more frequently.

    It means the flush interval should be much smaller than the interval of the real data points. For example in OAP, the downsampling rate can be MINUTE while the flush timer is set to 25 seconds by default.

    Technically, the SlidingEventWindow is built on,

    • A PriorityQueue maintains records which have not yet been emitted,
    • A PriorityQueueSet maintains all registered (depulicated) timers which will be triggered later

    TopN

    With the above semantics, we can impl TopN as a window aggregation function,

    flow.New(tt.input).
        Filter(...). // where
        Mapper(...) // select and groupBy 
        Window(NewSlidingTimeWindows(time.Minute*1, time.Second*15)).
        TopN(10, OrderBy(modelv1.SORT_DESC), ...) // TopN with parameters
        To(snk)
    

    TopN is implemented with the help of a TreeMap which maps sortedKey to the collection of records.

  • Add elementUI, sass and sass-loader@7.3.1

    Add elementUI, sass and [email protected]

    • Add elementUI, sass and [email protected]
    • Initialize page structure
      • Add Database.vue and Structure.vue
      • delete Laws.vue
    • Add Header Component
      • Add NavMenu from ElementUI
  • Add groupBy to the measure query request

    Add groupBy to the measure query request

    Add groupBy and aggregation function to the query request:

    • the query request doesn't support sub or nested aggregation
    • the response's timestamp field is null on returning the aggregated result
    • the result is as same as the order by if the request doesn't specify the agg function on grouping
  • Add docs

    Add docs

    Fixes https://github.com/apache/skywalking/issues/8989

    I leave empty CRUD examples for the future CLI tools.

    Signed-off-by: Gao Hongtao [email protected]

  • Benchmark flatbuffers and protobuf

    Benchmark flatbuffers and protobuf

    Benchmark env

    • CPU: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
    • Memory: 8 GB 2133 MHz LPDDR3
    • Java: JDK8u292b10
    • protoc: 3.17.3
    • protobuf-java: 3.17.2
    • flatc: 2.0.0
    • flatbuffers-java: 2.0.2

    Performance: Serialization+Java

    /**
     * # JMH version: 1.32
     * # VM version: JDK 1.8.0_292, OpenJDK 64-Bit Server VM, 25.292-b10
     * # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
     * # VM options: -javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=55698:/Applications/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
     * # Blackhole mode: full + dont-inline hint
     * # Warmup: 5 iterations, 10 s each
     * # Measurement: 5 iterations, 10 s each
     * # Timeout: 10 min per iteration
     * # Threads: 1 thread, will synchronize iterations
     * # Benchmark mode: Average time, time/op
     * <p>
     * Benchmark                                                                  Mode  Cnt     Score    Error   Units
     * WriteEntitySerializationTest.flatbuffers                                   avgt   25  3044.054 ± 34.837   ns/op
     * WriteEntitySerializationTest.flatbuffers:·gc.alloc.rate                    avgt   25   911.872 ± 10.301  MB/sec
     * WriteEntitySerializationTest.flatbuffers:·gc.alloc.rate.norm               avgt   25  3056.000 ±  0.001    B/op
     * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Eden_Space           avgt   25   912.394 ± 10.261  MB/sec
     * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Eden_Space.norm      avgt   25  3057.783 ± 10.009    B/op
     * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Survivor_Space       avgt   25     0.190 ±  0.018  MB/sec
     * WriteEntitySerializationTest.flatbuffers:·gc.churn.PS_Survivor_Space.norm  avgt   25     0.637 ±  0.059    B/op
     * WriteEntitySerializationTest.flatbuffers:·gc.count                         avgt   25  3878.000           counts
     * WriteEntitySerializationTest.flatbuffers:·gc.time                          avgt   25  2168.000               ms
     * WriteEntitySerializationTest.protobuf                                      avgt   25   514.010 ± 12.638   ns/op
     * WriteEntitySerializationTest.protobuf:·gc.alloc.rate                       avgt   25  3833.648 ± 90.162  MB/sec
     * WriteEntitySerializationTest.protobuf:·gc.alloc.rate.norm                  avgt   25  2168.000 ±  0.001    B/op
     * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Eden_Space              avgt   25  3835.530 ± 94.020  MB/sec
     * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Eden_Space.norm         avgt   25  2168.989 ±  6.134    B/op
     * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Survivor_Space          avgt   25     0.195 ±  0.020  MB/sec
     * WriteEntitySerializationTest.protobuf:·gc.churn.PS_Survivor_Space.norm     avgt   25     0.110 ±  0.011    B/op
     * WriteEntitySerializationTest.protobuf:·gc.count                            avgt   25  3629.000           counts
     * WriteEntitySerializationTest.protobuf:·gc.time                             avgt   25  2227.000               ms
     */
    

    Performance: Deserialization+Go

    goos: darwin
    goarch: amd64
    pkg: github.com/apache/skywalking-banyandb/benchmark/go-bench
    cpu: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
    Benchmark_Deser_Flatbuffers-8   	100000000	       730.5 ns/op	      64 B/op	       2 allocs/op
    Benchmark_Deser_Protobuf-8      	14826262	      5044 ns/op	    1944 B/op	      49 allocs/op
    PASS
    ok  	github.com/apache/skywalking-banyandb/benchmark/go-bench	153.927s
    

    Size

    For the same entity as illustrated in WriteEntitySerializationTest.EntityModel, (Unit in bytes)

    • Flatbuffers: 512
    • Protobuf: 169

    which means Protobuf is much more compact.

    Conclusion

    From the perspective of bandwidth and write performance, protobuf is definitely a better choice.

    However, flatbuffer has better deserialization performance, in particular for partially read. The process of deserialization (i.e. GetRootAs***) actually does nothing. The real deserialization happens when users try to read from the byte buffer.

    References

    https://www.ida.liu.se/~nikca89/papers/networking20c.pdf

    Our conclusion is similar to what has been described in the above paper. See Fig.3,4,5.

  • Feat: query module

    Feat: query module

    Design

    This is a very perlimenary PR for the query module so far. Much things have to be considered further.

    Since it would be a principle module, I just want to discuss the current design/implementation ASAP to avoid improper design and find some better ideas to proceed.

    Logical Plan

    A Logical Plan is a DAG (Directed Acyclic Graph) of Params. The Param defines necessary parameters for query execution. The parameters are well prepared during the logical plan composing in order to reduce extra cost while executing physical plan.

    Plot

    the logical plan can be plotted as Dot graph. For example,

    digraph  {
    
    	n2[label="ChunkIDsFetch{metadata={group=skywalking,name=trace},projection=[TraceID startTime]}"];
    	n1[label="ChunkIDsMerge{}"];
    	n7[label="IndexScan{begin=1623203253604099000,end=1623214053604099000,KeyName=duration,conditions=[<=1000],metadata={group=skywalking,name=trace}}"];
    	n4[label="Pagination{Offset=0,Limit=0}"];
    	n5[label="Root{}"];
    	n3[label="SortMerge{fieldName=startTime,sort=DESC}"];
    	n6[label="TraceIDFetch{TraceID=aaaaaaaa,metadata={group=skywalking,name=trace},projection=[TraceID startTime]}"];
    	n2->n3;
    	n1->n2;
    	n7->n1;
    	n5->n6;
    	n5->n7;
    	n3->n4;
    	n6->n3;
    }
    

    We can leverage the online toolkit to visualize the logical plan,

    graphviz

    Physical Plan

    A Physical Plan contains the logical plan and the Transform(s) corresponding to each logical.Op.

    While the plan is triggered to run, a reversed topology-sorted slices (Future as items) will be generated from the logical plan.

    Tasks

    • [x] client utils for building EntityCriteria
    • [x] logical plan: Ops such as Sort, OffsetAndLimit, ChunkIDMerge, TableScan and IndexScan
    • [x] physical plan: topology sort, Transform
    • [ ] complete API to connect with Liaison (Add handlers) (Maybe next PR)

    To be discussed

    Index selection and optimization stage

    For now, I only use single-value indexes. But in the current implementation, we may be able to improve index selection during the process of generating Logical Plan.

    Any better idea? Since for the traditional databases, normally they have optimization stages (usually after generating hierarchical logical plan?) for indexes selection. How can we fit this optimization stage in our implementations?

    Sort and field orderliness

    I believe we have to impose stronger preconditions to sort-field since it is not possible to sort on a sparse field.

    And Sort requires the Field to be arranged in a strict order, i.e. we have to use fieldIndex number to access the field that is needed to be sorted quickly. Otherwise, it may cost much resources to find the specific field every time.

  • Add bydbctl's examples

    Add bydbctl's examples

    The CRUD and query examples are based on bydbctl.

    @lujiajing1126 @wankai123 you could use this command line tool to query schemas and data from server.

  • Update go 1.19

    Update go 1.19

    Lint issues

    Several lint errors occur after upgrading to go 1.19

    • Package comments missing: https://tip.golang.org/doc/go1.19#go-doc
    • Migrate github.com/golang/protobuf/jsonpb -> google.golang.org/protobuf/encoding/protojson
    • fully deprecate io/ioutil
  • Add measure query

    Add measure query

    This PR introduces basic measure query feature with local index scan.

    The implementation is based on,

    1. No global index for measure
    2. No limit and offset for measure

    As we've discussed, GroupBy and Aggregation will come after this PR.

  • Reload stream when metadata changes

    Reload stream when metadata changes

    This PR supersedes the previous PR #65 to allow metadata reload while it put logic mostly in the stream module instead of pursuing a strong-consistent metadata in the previous PR.

    As a result, the stream model starts a serially-running background job to continuously reconcile the opened stream (with underlying storage).

    Please have a review with the new design @hanahmily

    More test cases will be added later. I suppose Eventually method is necessary for these kinds of tests.

  • Introduce bytebuffer pool

    Introduce bytebuffer pool

    In this PR, I've introduced a very simple bytebuffer to optimize byte manipulation.

    The benchmark of query path has been added, the result shows approx. ~10% less allocation.

  • Add UI for creating and editing tagfamilies and tags. Change some UI styles.

    Add UI for creating and editing tagfamilies and tags. Change some UI styles.

    This PR is mainly for add UI for creating and editing tagfamilies and tags, and to change some UI styles.

    1. add UI for creating and editing tagfamilies and tags.

    • The original 'new resources' UI:

    image

    • The old UI does not provide the function of adding and editing tagfamilies and tags,This is not good for the user experience.So I added a UI for this, which provides new and editing functions.
    • First, you can right-click the group and click 'new resources' to enter the dialog box of adding' resources':

    image image

    • Obviously, it provides add, delete, edit and batch delete functions.
    • You can click the 'Add the tagfamilies' button to enter the dialog box for adding' tagfamilies':

    image It provides add, delete, edit and batch delete functions too.

    Tips: It is worth noting that the UI is not connected to the background interface temporarily, but it does not affect the original new 'resources' function. I need to know whether the newly added 'resources' interface provides the newly added tagfamily function, and what is its data structure? Maybe you can help me?

    • change some UI styles:
    • The old right click menu:

    image

    • The new right click menu:

    image

    • The old dialog:

    image image

    • The new dialog:

    image image image

Distributed tracing using OpenTelemetry and ClickHouse

Distributed tracing backend using OpenTelemetry and ClickHouse Uptrace is a dist

Jan 2, 2023
A tool I made to quickly store bug bounty program scopes in a local sqlite3 database

GoScope A tool I made to quickly store bug bounty program scopes in a local sqlite3 database. Download or copy a Burpsuite configuration file from the

Nov 18, 2021
[mirror] the database client and tools for the Go vulnerability database

The Go Vulnerability Database golang.org/x/vulndb This repository is a prototype of the Go Vulnerability Database. Read the Draft Design. Neither the

Dec 29, 2022
Database - Example project of database realization using drivers and models

database Golang based database realization Description Example project of databa

Feb 10, 2022
Library for scanning data from a database into Go structs and more

scany Overview Go favors simplicity, and it's pretty common to work with a database via driver directly without any ORM. It provides great control and

Jan 9, 2023
Lightweight SQL database written in Go for prototyping and playing with text (CSV, JSON) data

gopicosql Lightweight SQL database written in Go for prototyping and playing wit

Jul 27, 2022
Convert data exports from various services to a single SQLite database
Convert data exports from various services to a single SQLite database

Bionic Bionic is a tool to convert data exports from web apps to a single SQLite database. Bionic currently supports data exports from Google, Apple H

Dec 9, 2022
Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

?? Dumpling Dumpling is a tool and a Go library for creating SQL dump from a MySQL-compatible database. It is intended to replace mysqldump and mydump

Nov 9, 2022
Create key value sqlite3 database from tabular data, fast.
Create key value sqlite3 database from tabular data, fast.

Turn tabular data into a lookup table using sqlite3. This is a working PROTOTYPE with limitations, e.g. no customizations, the table definition is fixed, etc.

Apr 2, 2022
Make a sqlite3 database from tabular data, fast.
Make a sqlite3 database from tabular data, fast.

MAKTA make a database from tabular data Turn tabular data into a lookup table using sqlite3. This is a working PROTOTYPE with limitations, e.g. no cus

Apr 2, 2022
A database connection tool for sensitive data
A database connection tool for sensitive data

go-sql 用于快速统计数据库行数、敏感字段匹配、数据库连接情况。 usage ./go-sql_darwin_amd64 -h ./go-sql_darwin_amd64 -f db.yaml -k name,user ./go-sql_darwin_amd64 -f db.yaml --min

Apr 4, 2022
A go package to add support for data at rest encryption if you are using the database/sql.

go-lockset A go package to add support for data at rest encryption if you are using the database/sql to access your database. Installation In your Gol

Jan 30, 2022
A tool to run queries in defined frequency and expose the count as prometheus metrics. Supports MongoDB and SQL
A tool to run queries in defined frequency and expose the count as prometheus metrics. Supports MongoDB and SQL

query2metric A tool to run db queries in defined frequency and expose the count as prometheus metrics. Why ? Product metrics play an important role in

Jul 1, 2022
InfluxDB metrics exporter for OpenCensus.io

opencensus-exporter-influxdb InfluxDB metrics exporter for OpenCensus.io Installation $ go get -u github.com/starvn/opencensus-exporter-influxdb Regi

Nov 6, 2021
Simple key-value store on top of SQLite or MySQL

KV Work in progress, not ready for prime time. A simple key/value store on top of SQLite or MySQL (Go port of GitHub's KV). Aims to be 100% compatible

Dec 3, 2022
A Go rest API project that is following solid and common principles and is connected to local MySQL database.
A Go rest API project that is following solid and common principles and is connected to local MySQL database.

This is an intermediate-level go project that running with a project structure optimized RESTful API service in Go. API's of that project is designed based on solid and common principles and connected to the local MySQL database.

Dec 25, 2022
Database Access Layer for Golang - Testable, Extendable and Crafted Into a Clean and Elegant API

REL Modern Database Access Layer for Golang. REL is golang orm-ish database layer for layered architecture. It's testable and comes with its own test

Dec 29, 2022
🏋️ dbbench is a simple database benchmarking tool which supports several databases and own scripts

dbbench Table of Contents Description Example Installation Supported Databases Usage Custom Scripts Troubeshooting Development Acknowledgements Descri

Dec 30, 2022