Open Source HTTP Reverse Proxy Cache and Time Series Dashboard Accelerator

Last update: Dec 30, 2022

Comments: 16

Trickster is an HTTP reverse proxy/cache for http applications and a dashboard query accelerator for time series databases.

Learn more below, and check out our roadmap to find out what else is in the works.

Trickster is hosted by the Cloud Native Computing Foundation (CNCF) as a sandbox level project. If you are a company that wants to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented, consider joining the CNCF.

Note: Trickster v1.1 is the production release, sourced from the v1.1.x branch. The main branch sources Trickster 2.0, which is currently in beta.

HTTP Reverse Proxy Cache

Trickster is a fully-featured HTTP Reverse Proxy Cache for HTTP applications like static file servers and web API's.

Proxy Feature Highlights

A unique and powerful Application Load Balancer for Time Series and generic HTTP endpoints
Supports TLS and HTTP/2 for frontend termination and backend origination
Offers several options for a caching layer, including in-memory, filesystem, Redis and bbolt
Highly customizable, using simple yaml configuration settings, down to the HTTP Path
Built-in Prometheus metrics and customizable Health Check Endpoints for end-to-end monitoring
Negative Caching to prevent domino effect outages
High-performance Collapsed Forwarding
Best-in-class Byte Range Request caching and acceleration.
Distributed Tracing via OpenTelemetry, supporting Jaeger and Zipkin
Rules engine for custom request routing and rewriting

Time Series Database Accelerator

Trickster dramatically improves dashboard chart rendering times for end users by eliminating redundant computations on the TSDBs it fronts. In short, Trickster makes read-heavy Dashboard/TSDB environments, as well as those with highly-cardinalized datasets, significantly more performant and scalable.

Compatibility

Trickster works with virtually any Dashboard application that makes queries to any of these TSDB's:

Prometheus

ClickHouse

InfluxDB

Circonus IRONdb

See the Supported TSDB Providers document for full details

How Trickster Accelerates Time Series

1. Time Series Delta Proxy Cache

Most dashboards request from a time series database the entire time range of data they wish to present, every time a user's dashboard loads, as well as on every auto-refresh. Trickster's Delta Proxy inspects the time range of a client query to determine what data points are already cached, and requests from the tsdb only the data points still needed to service the client request. This results in dramatically faster chart load times for everyone, since the tsdb is queried only for tiny incremental changes on each dashboard load, rather than several hundred data points of duplicative data.

2. Step Boundary Normalization

When Trickster requests data from a tsdb, it adjusts the clients's requested time range slightly to ensure that all data points returned are aligned to normalized step boundaries. For example, if the step is 300s, all data points will fall on the clock 0's and 5's. This ensures that the data is highly cacheable, is conveyed visually to users in a more familiar way, and that all dashboard users see identical data on their screens.

3. Fast Forward

Trickster's Fast Forward feature ensures that even with step boundary normalization, real-time graphs still always show the most recent data, regardless of how far away the next step boundary is. For example, if your chart step is 300s, and the time is currently 1:21p, you would normally be waiting another four minutes for a new data point at 1:25p. Trickster will break the step interval for the most recent data point and always include it in the response to clients requesting real-time data.

Trying Out Trickster

Check out our end-to-end Docker Compose demo composition for a zero-configuration running environment.

Installing

Docker

Docker images are available on Docker Hub:

$ docker run --name trickster -d -v /path/to/trickster.yaml:/etc/trickster/trickster.yaml -p 0.0.0.0:8480:8480 trickstercache/trickster

See the 'deploy' Directory for more information about using or creating Trickster docker images.

Kubernetes

See the 'deploy' Directory for Kube and deployment files and examples.

Helm

Trickster Helm Charts are located at https://helm.tricksterproxy.io for installation, and maintained at https://github.com/trickstercache/helm-charts. We welcome chart contributions.

Building from source

To build Trickster from the source code yourself you need to have a working Go environment with version 1.17 or greater installed.

You can directly use the go tool to download and install the trickster binary into your GOPATH:

    $ go get github.com/trickstercache/trickster/cmd/trickster
    # this starts a prometheus accelerator proxy for the provided endpoint
    $ trickster -origin-url http://prometheus.example.com:9090 -provider prometheus

You can also clone the repository yourself and build using make:

    $ mkdir -p $GOPATH/src/github.com/trickstercache
    $ cd $GOPATH/src/github.com/trickstercache
    $ git clone https://github.com/trickstercache/trickster.git
    $ cd trickster
    $ make build
    $ ./OPATH/trickster -origin-url http://prometheus.example.com:9090 -provider prometheus

The Makefile provides several targets, including:

build: build the trickster binary
docker: build a docker container for the current HEAD
clean: delete previously-built binaries and object files
test: runs unit tests
bench: runs benchmark tests
rpm: builds a Trickster RPM

More information

Refer to the docs directory for additional info.

Contributing

Refer to CONTRIBUTING.md

Who Is Using Trickster

As the Trickster community grows, we'd like to keep track of who is using it in their stack. We invite you to submit a PR with your company name and @githubhandle to be included on the list.

© 2021 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.

Owner

https://github.com/trickstercache/trickster https://trickstercache.org

Comments

Trickster provokes grafana display quirks

I noticed a few grafana display bugs in my graphs which at first I blamed on Thanos but forgot I had a trickster in front of it.

Expected graph

Thanos + Trickster graph

The data are the same but only come from different origin/path.

This only happens when I set the end of range to now in grafana. If fix the end of range to a fixed time I can't reproduce.

Taking a look at the data coming from Prometheus and Thanos + Trickster I got the following diff

As you can see the last timestamp is not aligned with the step in the Thanos + Trickster case and it confuses grafana's output.

Bypassing Trickster fixes the problem.
Unable to handle scalar responses

If I send a simple scalar query such as /api/v1/query?query=5 trickster now returns errors. I bisected this and it seems that the error was introduced in ec4eff34d5532b1907723eeaabe620f02dd25b32 . The basic problem here is that trickster assumes the response type from prometheus as a vector, when in reality it can be (1) scalar (2) vector or (3) string. The way this worked before was that the unmarshal error was ignored and the response handed back (I have a patch that puts it back to that behavior.

From what I see that looks like caching won't work on those scalar values -- as it wasn't able to unmarshal. Alternatively we could change the model to use model.Value and then add some type-switching. But since that is a potentially large change I figured I'd get others' input first.

Redis configuration is not caching any request

After configuring Trcikster to use AWS ElastiCache/Redis 5.0.4, deployed as Master + 2 Replicas and without cluster mode enabled.

Using Trickster 1.0-beta8 image

Deployed on K8s with 3 replicas and exposing Trickester through an Ingress to allow Grafana to use it as Datasource.

Every time Grafana runs a query Trickster misses the cache, even copying the request from Grafana and running it with curl multiple times it keeps failing the cache

    [caches]

        [caches.default]
        # cache_type defines what kind of cache Trickster uses
        # options are 'bbolt', 'filesystem', 'memory', 'redis' and 'redis_cluster'
        # The default is 'memory'.
        type = 'redis'

        # compression determines whether the cache should be compressed. default is true
        # changing the compression setting will leave orphans in your cache for the duration of timeseries_ttl_secs
        compression = true

        # timeseries_ttl_secs defines the relative expiration of cached timeseries. default is 6 hours (21600 seconds)
        timeseries_ttl_secs = 21600

        # fastforward_ttl_secs defines the relative expiration of cached fast forward data. default is 15s
        fastforward_ttl_secs = 15

        # object_ttl_secs defines the relative expiration of generically cached (non-timeseries) objects. default is 30s
        object_ttl_secs = 30

            ### Configuration options for the Cache Index
            # The Cache Index handles key management and retention for bbolt, filesystem and memory
            # Redis handles those functions natively and does not use the Trickster's Cache Index
            [caches.default.index]

            # reap_interval_secs defines how long the Cache Index reaper sleeps between reap cycles. Default is 3 (3s)
            reap_interval_secs = 3

            # flush_interval_secs sets how often the Cache Index saves its metadata to the cache from application memory. Default is 5 (5s)
            flush_interval_secs = 5

            # max_size_bytes indicates how large the cache can grow in bytes before the Index evicts least-recently-accessed items. default is 512MB
            max_size_bytes = 536870912

            # max_size_backoff_bytes indicates how far below max_size_bytes the cache size must be to complete a byte-size-based eviction exercise. default is 16MB
            max_size_backoff_bytes = 16777216

            # max_size_objects indicates how large the cache can grow in objects before the Index evicts least-recently-accessed items. default is 0 (infinite)
            max_size_objects = 0

            # max_size_backoff_objects indicates how far under max_size_objects the cache size must be to complete object-size-based eviction exercise. default is 100
            max_size_backoff_objects = 100



            ### Configuration options when using a Redis Cache
            [caches.default.redis]
            # protocol defines the protocol for connecting to redis ('unix' or 'tcp') 'tcp' is default
            protocol = 'tcp'
            # endpoint defines the fqdn+port or path to a unix socket file for connecting to redis
            # default is 'redis:6379'
            endpoint = 'redis-common.external-service.svc.cluster.local:6379'
            # password provides the redis password
            # default is empty
            password = ''

Ability to cache older-but-frequently-accessed data

Setting value_retention_factor = 1536 to conf file has no impact. Only responses that have less than default 1024 sample are fully cached. Config file is correctly used because I can switch to filesystem caching.

This is issue when Grafana Prometheus datasource use Resolution 1/1. It result small possible range_query "step", sample per pixel. For example querying 24h data the step is set to 60s --> 1441 sample returned and only 1024 are cached resulting always kmiss and phit.

Workaround: When Grafana resolution is set to 1/2 --> step size increase to 120s --> 721 sample returned and Trickster has 100% cache hit rate.

Panic with v0.1.3

Hi,

I installed v0.1.3 in order to see if it fixes https://github.com/Comcast/trickster/issues/92 but I encountered this.

panic: runtime error: index out of range
goroutine 493 [running]:
main.(*PrometheusMatrixEnvelope).cropToRange(0xc0001c06f8, 0x5c06535c, 0x0)
	/go/src/github.com/Comcast/trickster/handlers.go:1014 +0x4ee
main.(*TricksterHandler).originRangeProxyHandler(0xc000062140, 0xc000312640, 0x41, 0xc00014c8a0)
	/go/src/github.com/Comcast/trickster/handlers.go:840 +0x615
created by main.(*TricksterHandler).queueRangeProxyRequest
	/go/src/github.com/Comcast/trickster/handlers.go:660 +0x276

[Question] Use with Prometheus High Availability?

I'm super excited about this project! Thanks for sharing it with the community!

I had a question about this part of the docs:

In a Multi-Origin placement, you have one dashboard endpoint, one Trickster endpoint, and multiple Prometheus endpoints. Trickster is aware of each Prometheus endpoint and treats them as unique databases to which it proxies and caches data independently of each other.

Could this work for load balancing multiple Prometheus servers in a HA setup? We currently have a pair of Prometheus servers in each region, redundantly scraping the same targets. Currently our Grafana is just pinned to one Prometheus server in each region, meaning that if that one goes down our dashboards go down until we manually change the datasource to point to the other one (and by that point we would have just restored the first server anyway). It's kind of a bummer because it means that while HA works great for alerting itself, it doesn't work for dashboards.

Would be awesome if there was a way to achieve this with Trickster!
issues with helm deployment

I faced an issue yesterday with Chart 1.1.2, it seems the 1.0-beta chaged to be copy of 1.0-beta10 which caused trickster to fail to start after updating my kubernetes cluster nodes and pulling the docker image. then I tried to update to latest chart.

for 1.3.0 also some values in values.yaml is duplicated like service section which cause an issue with config file having an empty listen_port

also PVC contains deprecated values for example https://github.com/Comcast/trickster/blob/6ac009f29aeeea9476da9db6311d0aa7cf39033c/deploy/helm/trickster/templates/pvc.yaml#L1 there is no section called config in current values.yaml

workaround: used tag 1.0.9-beta9
High memory usage with redis cache backend

We're using the redis cache backend in our instance of trickster and we're seeing suprisingly high memory usage.

We are running trickster in kubernetes, with memory limited to 4GB. Trickster only needs to be up for half an hour before it's killed for using more than 4GB memory (OOM).

Has anyone seen any similar behavior? Any idea what could be going on? We've tested increasing the memory limits, they were originally set to 2GB, but the problem persists.
1.1 performance regression

It seems like when we load test trickster with a prometheus backend, that 1.1 has a reasonably large performance regression as compared to 1.0

both 1.1.2 and 1.1.3 seems to max out at around 100 requests per second when we load test it. 1.0 doesn't seem to get throttled in the same way and can go to several hundred (haven't tried higher yet). We also see much higher cpu and memory usage for 1.1.

We're trying about 15 different queries set to query over the last hour.
alignStepBoundaries edge cases
This method is not very large, but does a few things which IMO are questionable.

reversing start/end if the user flipped start/end, it is not the responsibility of trickster to flip it. This is additionally confusing as things will work, but if trickster is removed from the call path all queries this fixes will break. IMO it is not the place of the cache to "correct" user queries.

time.Now() checks I think its fine to cut off queries from going into the future, but this doesn't handle the case where both start/end are in the future. IMO in that case this should return an error. In the remaining cases I think its fine to leave it truncating end as the query results will remain unaffected, although I'd rather it didn't (the time.Now() constraint is a cache issue, not a query issue).

default step param If the user didn't define one, this is not the place of a caching proxy to correct it. Here we should just return an error and make the user correct their query.
Add warning for potential Redis misconfigurations

Hi - I've set trickster to cache to redis, I'm using beta 8 (but have tried multiple versions). Trickster seems to connect fine and even tries to store data:

-- time=2019-06-19T12:56:57.101537228Z app=trickster caller=proxy/engines/cache.go:71 level=debug event="compressing cached data" cacheKey=thanos-query:9090.0a9332e4c9046613a62ba8a6e4a2e78a.sz time=2019-06-19T12:56:57.101899271Z app=trickster caller=cache/redis/redis.go:82 level=debug event="redis cache store" key=thanos-query:9090.0a9332e4c9046613a62ba8a6e4a2e78a.sz

However i get no keys in redis and the dashboard(s) load no quicker - I have tried Trickster with the in-memory option and this works as expected.

I am also able to write to redis both using the CLI and an external test application just to rule redis out.

I've also tried standing up multiple Redis types (e.g standard, cluster and sentinel)

Thanks!
Logger not flushed before exitFatal()

https://github.com/trickstercache/trickster/blob/2eeb4ba048ed1676105cf954c849a39278ce38cc/pkg/proxy/listener/listener.go#L185-L189

This calls f() to exitFatal() but the log message does not show on the terminal as the loggers are not flushed. I wasted an hour trying to understand why the program won't start. Turns out I had a port already bind issue from a previous run but the log message won't print and since exitFatal() does not show stack trace there is no way to know what happened.
Frontend and Backend should be able to handle tls independently

https://github.com/trickstercache/trickster/blob/main/pkg/proxy/tls/options/options.go#L79-L109

This validation check requires TLS is used for both frontend and backend. It should be possible to only use tls for backend and not use tls for frontend.
Request Status Always "proxy-only" and General Improvements

Trickster Version

Trickster version: 1.1.5, buildInfo: 2022-11-10T14:56:50+0000 , goVersion: go1.17.12, copyright: © 2018 Comcast Corporation

Problem

Every request made to an influxdb origin is returning X-Trickster-Result: engine=HTTPProxy; status=proxy-only.

Below is my configuration (left) and 2 contiguous attempts at querying trickster (right):

Is there something I am doing incorrectly in my configuration?

On that note - I've been trying to get Trickster to work for ClickHouse, Prometheus, and InfluxDB, and have only gotten it to work with Prometheus. Are there plans to maintain this project more actively? I've been told that the ClickHouse plugin does not work, and in general it would be great if the logging was improved to explain why certain states like proxy-only occur.

My company is at a point where we're considering new tools to help load balancing and caching toward ClickHouse, Prometheus, and Influx. Trickster appears to perfectly satisfy of our requirements based on what I've read, and it would be incredibly useful to for our use case. Thank you for your work so far!
Non ts cache

Adds the capability from IronDB to cache SELECT requests in the short term to InfluxDB. Support for InfluxDB 2.0 and Flux incoming as part of these changes; draft to keep changes public & up to date, since this is one of the roadmap items.
Purge from cache by key, or by path on local admin router
Quick notes on changes:

Purge by key uses Gorilla mux with path matching, ex /trickster/purge/key/{backend}/{key}. Purge by path uses http.ServeMux with query parameters, where the path is URL encoded, ex `/trickster/purge/path?backend={backend}&path={path}. This is to help manage URL encoded items, and to avoid altering too much of the existing admin router code.

Purging by path uses a recreation of cache key derivation for requests, calculating an MD5 sum of the passed path.

The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Dec 31, 2022

VictoriaMetrics: fast, cost-effective monitoring solution and time series database

VictoriaMetrics VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database. It is available in binary release

Jan 8, 2023

TalariaDB is a distributed, highly available, and low latency time-series database for Presto

TalariaDB is a distributed, highly available, and low latency time-series database that stores real-time data. It's built on top of Badger DB.

Nov 16, 2022

Export output from pg_stat_activity and pg_stat_statements from Postgres into a time-series database that supports the Influx Line Protocol (ILP).

pgstat2ilp pgstat2ilp is a command-line program for exporting output from pg_stat_activity and pg_stat_statements (if the extension is installed/enabl

Dec 15, 2021

Open Source HTTP Reverse Proxy Cache and Time Series Dashboard Accelerator

HTTP Reverse Proxy Cache

Proxy Feature Highlights

Time Series Database Accelerator

Compatibility

How Trickster Accelerates Time Series

1. Time Series Delta Proxy Cache

2. Step Boundary Normalization

3. Fast Forward

Trying Out Trickster

Installing

Docker

Kubernetes

Helm

Building from source

More information

Contributing

Who Is Using Trickster

Owner

Comments

Trickster provokes grafana display quirks

Unable to handle scalar responses

Redis configuration is not caching any request

Ability to cache older-but-frequently-accessed data

Panic with v0.1.3

[Question] Use with Prometheus High Availability?

issues with helm deployment

High memory usage with redis cache backend

1.1 performance regression

alignStepBoundaries edge cases

Add warning for potential Redis misconfigurations

Logger not flushed before exitFatal()

Frontend and Backend should be able to handle tls independently

Request Status Always "proxy-only" and General Improvements

Trickster Version

Problem

Non ts cache

Purge from cache by key, or by path on local admin router

Related tags

The Prometheus monitoring system and time series database.

VictoriaMetrics: fast, cost-effective monitoring solution and time series database

TalariaDB is a distributed, highly available, and low latency time-series database for Presto

Export output from pg_stat_activity and pg_stat_statements from Postgres into a time-series database that supports the Influx Line Protocol (ILP).

Time Series Database based on Cassandra with Prometheus remote read/write support

🤔 A minimize Time Series Database, written from scratch as a learning project.

:handbag: Cache arbitrary data with an expiration time.

CockroachDB - the open source, cloud-native distributed SQL database.

TiDB is an open source distributed HTAP database compatible with the MySQL protocol

An open-source graph database

RadonDB is an open source, cloud-native MySQL database for building global, scalable cloud services

Set out to become the de facto open-source alternative to MongoDB

groupcache is a caching and cache-filling library, intended as a replacement for memcached in many cases.

Distributed cache and in-memory key/value data store.

Eventually consistent distributed in-memory cache Go library

Efficient cache for gigabytes of data written in Go.

Fast thread-safe inmemory cache for big number of entries in Go. Minimizes GC overhead

An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications.

Distributed cache with gossip peer membership enrollment.