Jaeger ClickHouse storage plugin implementation

Jaeger ClickHouse

This is implementation of Jaeger's storage plugin for ClickHouse. See as well jaegertracing/jaeger/issues/1438 for historical discussion regarding Clickhouse plugin.

Note that this project is community maintained. If it is not up-to-date or missing any features please open the issue or submit a pull-request.


Refer to the config.yaml for all supported configuration options.

Build & Run

Docker database example

docker run --rm -it -p9000:9000 --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server:21
GOOS=linux make build run
make run-hotrod

Open localhost:16686 and localhost:8080.

Custom database

You need to specify connection options in config.yaml file, then you can run

make build
SPAN_STORAGE_TYPE=grpc-plugin {Jaeger binary adress} --query.ui-config=jaeger-ui.json --grpc-storage-plugin.binary=./{name of built binary} --grpc-storage-plugin.configuration-file=config.yaml --grpc-storage-plugin.log-level=debug


This project is based on https://github.com/bobrik/jaeger/tree/ivan/clickhouse/plugin/storage/clickhouse.

Jaeger - Distributed Tracing Platform
  • Explanation of '{cluster}'

    Our replication and sharding guide uses https://github.com/pavolloffay/jaeger-clickhouse/blob/main/guide-sharding-and-replication.md#replication '{cluster}' substitution when creating distributed table e.g.

    CREATE TABLE IF NOT EXISTS jaeger_spans ON CLUSTER '{cluster}' AS jaeger_spans_local ENGINE = Distributed('{cluster}', default, jaeger_spans_local, cityHash64(traceID));

    I am not sure if I understand what it exactly does. Could somebody explain it? @EinKrebs @chhetripradeep

    Let's say my CH deployment defines two clusters


    So if the create command is executed would it crate tables on all clusters?

  • TLS support, connection options & some refactoring

    • Added database name, username and password options for connection to database;
    • abled connection using TLS;
    • did some code refactoring;
    • changed README.

    Can you please give me code review and some ideas about README content?

    Resolves #18

  • Document and add support for deleting data/TTL

    We should document how the old data can be removed (alter table jager_spans drop partition 20201) and add support for TTL https://clickhouse.tech/docs/en/sql-reference/statements/alter/ttl/ (The user could specify the number of days in the config).


    CREATE TABLE IF NOT EXISTS jaeger_index_local (
         timestamp DateTime CODEC(Delta, ZSTD(1)),
         traceID String CODEC(ZSTD(1)),
         service LowCardinality(String) CODEC(ZSTD(1)),
         operation LowCardinality(String) CODEC(ZSTD(1)),
         durationUs UInt64 CODEC(ZSTD(1)),
         tags Array(String) CODEC(ZSTD(1)),
         INDEX idx_tags tags TYPE bloom_filter(0.01) GRANULARITY 64,
         INDEX idx_duration durationUs TYPE minmax GRANULARITY 1
    ) ENGINE MergeTree()
    PARTITION BY toDate(timestamp)
    ORDER BY (service, -toUnixTimestamp(timestamp))
    TTL timestamp + INTERVAL 90 DAY
    SETTINGS index_granularity=1024

    cc) @chhetripradeep could you please loop in and document how do you delete old data?

  • Bump clickhouse-go: v1.5.4 -> v2.3.0

    Signed-off-by: Pradeep Chhetri [email protected]

    Which problem is this PR solving?

    Resolves https://github.com/jaegertracing/jaeger-clickhouse/issues/113

    Short description of the changes

    This change will give good performance gain.

  • Durable database writes

    Hi! Thanks for the project, I believe it's of a great value to the community.

    Currently, this plugin accumulated data and writes it to the database. I think it's important to do several things to ensure more durable writes:

    1. Retry network and database failures. Use exponential backoff in a case when the database cannot server write immediately.
    2. Buffer data not written to DB. Ensure that the buffer does not overflow. Sacrifice data intentionally if it cannot be stored in DB.
    3. Reload connection string when requested: a user can add new shards to CH installation

    What do you think?

  • Make replicated deployment work without user explicitly creating tables

    The https://github.com/pavolloffay/jaeger-clickhouse/blob/main/guide-sharding-and-replication.md#replication requires uses to run SQL scripts on one node (bc we use ON CLUSTER).

    We could add a new config option replication: true that would indicate that replication is enabled. The plugin would then use

    • replicated merge trees in local tables
    • create global tables

    cc) @EinKrebs is this smth that interests you?

  • Looking for maintainers

    This project does not seem to have an active maintainer. There are a couple of open PRs from @nickbp and @bocharovf. Is anybody of you willing to take part in the project and maintain it?

    cc) @EinKrebs

  • Expose metrics

    Closes https://github.com/pavolloffay/jaeger-clickhouse/issues/19

    Adds metrics for batch size and flush interval along with their count in prometheus exposition format.

    Signed-off-by: Pradeep Chhetri [email protected]

  • Running with hotrod results in  Too many simultaneous queries. Maximum: 100

    2021.07.14 17:06:49.783711 [ 219 ] {11925d3b-7684-4919-827b-319af811c400} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
    2021.07.14 17:06:49.783769 [ 1010 ] {d4de6e5e-6305-4802-842e-13c660886ef2} <Error> TCPHandler: Code: 202, e.displayText() = DB::Exception: Too many simultaneous queries. Maximum: 100, Stack trace:
    0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8d31b5a in /usr/bin/clickhouse
    1. DB::ProcessList::insert(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::IAST const*, std::__1::shared_ptr<DB::Context const>) @ 0xfcd6802 in /usr/bin/clickhouse
    2. ? @ 0xfe21ab3 in /usr/bin/clickhouse
    3. DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, bool) @ 0xfe208e3 in /usr/bin/clickhouse
    4. DB::TCPHandler::runImpl() @ 0x1069f6c2 in /usr/bin/clickhouse
    5. DB::TCPHandler::run() @ 0x106b25d9 in /usr/bin/clickhouse
    6. Poco::Net::TCPServerConnection::start() @ 0x1338b30f in /usr/bin/clickhouse
    7. Poco::Net::TCPServerDispatcher::run() @ 0x1338cd9a in /usr/bin/clickhouse
    8. Poco::PooledThread::run() @ 0x134bfc19 in /usr/bin/clickhouse
    9. Poco::ThreadImpl::runnableEntry(void*) @ 0x134bbeaa in /usr/bin/clickhouse
    10. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
    11. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so

    The DB is started as docker run --rm -it -p9000:9000 --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server:21

  • Add Operation.SpanKind support

    Requirement - what kind of business use case are you trying to solve?

    I ran jaeger grpc-plugin integration tests with this plugin and it failed.

    Problem - what in Jaeger blocks you from solving the requirement?

    Integration test failed because this plugin doesn't support jaeger/spanstore.Operation.SpanKind.

  • Fix development env for os other than linux

    Docker container expect linux binaries. We are always looking for GOOS and GOARCH which will be different in macos hence it will fail with /data/jaeger-clickhouse-darwin-amd64: exec format error

    On OSX:

    ❯ make run
    docker run --rm --name jaeger -e JAEGER_DISABLED=true --link some-clickhouse-server -it -u 502 -p16686:16686 -p14250:14250 -p14268:14268 -p6831:6831/udp -v "/Users/pradeep/gh/jaeger-clickhouse:/data" -e SPAN_STORAGE_TYPE=grpc-plugin jaegertracing/all-in-one:1.24.0 --query.ui-config=/data/jaeger-ui.json --grpc-storage-plugin.binary=/data/jaeger-clickhouse-darwin-amd64 --grpc-storage-plugin.configuration-file=/data/config.yaml --grpc-storage-plugin.log-level=debug
    2021/07/17 04:44:26 maxprocs: Leaving GOMAXPROCS=6: CPU quota undefined
    {"level":"info","ts":1626497066.4456441,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
    {"level":"info","ts":1626497066.445714,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
    {"level":"info","ts":1626497066.4459236,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"}
    {"level":"info","ts":1626497066.445999,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":14269"}
    {"level":"info","ts":1626497066.446192,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:14269","health-status":"unavailable"}
    2021-07-17T04:44:26.446Z [DEBUG] starting plugin: path=/data/jaeger-clickhouse-darwin-amd64 args=["/data/jaeger-clickhouse-darwin-amd64", "--config", "/data/config.yaml"]
    {"level":"fatal","ts":1626497066.4515028,"caller":"command-line-arguments/main.go:103","msg":"Failed to init storage factory","error":"grpc-plugin builder failed to create a store: error attempting to connect to plugin rpc cl
    ient: fork/exec /data/jaeger-clickhouse-darwin-amd64: exec format error","stacktrace":"main.main.func1\n\tcommand-line-arguments/main.go:103\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.
    go:838\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:943\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:883\nmain.main\n\tcommand-line-argument

    Signed-off-by: Pradeep Chhetri [email protected]

  • Bump clickhouse-go: v2.3.0 -> 2.4.3 and jaeger: 1.38.2 -> 1.39.0

    Signed-off-by: Pradeep Chhetri [email protected]

    Which problem is this PR solving?

    Bumping dependencies

    Short description of the changes

    • Replace BindMounts with Mounts for testcontainers
    • Fix linter issues due to version bump
  • Add alter statements for altering table ttl after table creation

    Signed-off-by: Pradeep Chhetri [email protected]

    Tested in local development environment by updating the ttl field in configuration file.

    Which problem is this PR solving?

    Resolves https://github.com/jaegertracing/jaeger-clickhouse/issues/120

    Short description of the changes

    We added alter statements for each of the three tables which gets executed if ttl is greater than 0. Since materialized views don't support altering ttl configuration, we weren't able to do it for operations table.

  • [Feature]: Allow changing TTL configuration on existing tables

    As a Jaeger Operator I want to be able to modify the TTL configuration of my tables/databases So that I can change these settings after the initial database creation


    Currently, TTL is set ONLY on database creation. A change on TTL config values, after database creation, will not get propagated to the ddbb nor tables


    We can add sqlscripts to perform the TTL adjustment independenty from ddbb creation.

    For the spans table, this new script will look similar to

    ALTER TABLE {{.SpansTable}}
        MODIFY {{.TTLTimestamp}}

    and we should make sure we run this script AFTER the one that creates the table, so it wont fail on new installs.

    Open questions

    No response

  • [Feature]: Add ttl_only_drop_parts into table setting or possible be configured

    If we would like to change TTL days clickhouse by default will be merge by rows a lot of data https://clickhouse.com/docs/en/operations/settings/settings/#ttl_only_drop_parts


    It would be great don't waste resources during merges by rows expired by TTL, right now this is setting is not possible to set during creation time. If you would like to change TTL later , it will consume a lot of CPU resources.


    Add ttl_only_drop_parts settings for tables in to 1 (drop by parts means days) by default or make it possible configure before.

    Open questions

    No response

  • [Feature]: Support Native JSON columns in Clickhouse

    As a Clickhouse analytics user, I want the clickhouse-jaeger schema to allow using Clickhouse native JSON columns so that we can query data in clickhouse more efficiently (both in terms of performance and query simplicity)


    Currently, Clickhouse-Jaeger stores JSON span data as a string column-type, which makes it quite verbose to have to query based on fields within the column using Clickhouse's JSON functions , especially if you get past 2 levels of nesting.

    This is very evident, when you want to query the ingested data to generate your own analytics/insights. It would be nice if jaeear-clickhouse added support for Clickhouse native JSON columns


    A solution may be to start providing support for the native JSON datatype (It's still "experimental", but the spec has been quite stable for a while)

    Open questions

    The major open question is how this would affect the split between protobuf and json encoded data (currently, string supports both) and whether it'll add more complexities to the project. Need to observe more to see the impact of this, but wanted to raise this with the community/maintainers to get an idea of their thoughts.

  • Model alternative for jaeger_index table

    On jaeger_index tables, the tags is coded as a nested array with key and values. It is good for the only usage of Jaeger-query but in our company we are using jaeger also for analytics purposes. Since Clickhouse 21.3, the Map type (https://clickhouse.com/docs/en/sql-reference/data-types/map/) is available. I think It could be a good alternative to Nested .

    Do you have already made some performance (time and storage) tests with Map ? Could it be an acceptable contribution (with a flag to not activate it by default) ?

