Service for firewalling graphite metrics

hadrianus

Hadrianus

Block incoming graphite metrics if they come in too fast for downstream carbon-relay/carbon-cache to handle.

Building

Hadrianus is written in Go so all you need is: go build

As a convenience there's a small Makefile. To build:

  • make all <- Generates a Linux amd64 executable
  • make all-mac <- Generates a Darwin amd64 executable

Usage

Basic usage

hadrianus listeningport outport1...

  • listeningport is the port number for listening to incoming newline delimited graphite protocol messages.
  • outport1 denotes the first (out of possibly many) output ports for carbon-relay process instances. Metrics will be distributed to the destinations in a "round robin" fashion.

Options

  • -cleanupmaxage Maximum time in seconds since last message before metric path is removed from memory.
  • -cleanuptimegranularity Seconds between memory cleanup events (default 601)
  • -enablenewmetrics Initially enable new metrics and block them later if needed.
  • -maxdrymessages Maximum allowed consecutive identical values before marking metric as stale.
  • -minimumtimeinterval Minimum allowed time interval between incoming metrics in seconds. Lower values makes hadrianus more "generous" in how often applications may send a specific metric.
  • -mirrordestination Secondary destination(s) to mirror traffic to.
  • -override Filename for per-path override file that allows allowlisting.
  • -staleresendinterval Time after which stale messages are resent in seconds.
  • -statstimegranularity Time between statistics messages in seconds.
  • -tertiarydestination Tertiary destination(s) to mirror traffic to.

What

A newline delimited graphite message works like this: metric_path value timestamp\n. The number of messages can be limited per metric path for a time period. For example, setting a timelimit of 60 would result in a message only being transmitted once per minute, or more seldom.

This can be useful to increase stability and reliability if you have applications producing more messages than the graphite/carbon system can handle.

Example commandline usage

Typical usage

hadrianus -minimumtimeinterval=14 2003 2103 2203 server01.iambk.com:2303

Tell hadrianus to discard unique metric paths that arrive sooner than or equal to every 14 seconds. It will listen for plaintext graphite messages on port 2003. It will attempt to distribute the incoming messages to 127.0.0.1:2103, 127.0.0.1:2203 and server01.iambk.com:2303 in a round-robin fashion.

Allowing a metric path to pass through unmodified

It's possible to allow a metric path to pass through without being touched by the blocking logic by adding a matching pattern in a separate configuration file. In this example the configuration file allowlist.conf is as follows:

[hadrianus]
pattern = ^server\.hadrianus\.
allowunmodified = true

Referring to the "allowlist" file is done on the commandline by using the -override flag as follows: hadrianus -override=allowlist.conf -minimumtimeinterval=14 2003 2103 2203 server01.iambk.com:2303

Comments
  • Improve stability & observability

    Improve stability & observability

    Add:

    • buffered channel writes
    • mechanism to recover from when all channel buffers are full
    • internal metrics to give additional insight into the stability of the service
    • documentation on internal metrics
    • enable "Nagle's algorithm" for output to consumers to improve network performance

    This change will improve the service's stability, speed, and observability. However, the changes may increase latency and delays for the metrics consumers due to the added buffering.

  • Specify internal Hadrianus metrics as Go template

    Specify internal Hadrianus metrics as Go template

    Currently, the graphite metrics path for internal Hadrianus metrics are available on a hard-coded metrics path. However, to suit how different organizations or individuals organize their metrics, it should be possible to customize where you can find the metrics. The most flexible way to accomplish this could be to specify this on the command line (or configuration) using Go templates.

  • Improve

    Improve "choppy" graphs

    It becomes very choppy in many of our graphs where the value rarely changes: image

    We can solve the problem like this:

    • Remember how long it took before Hadrianus removed the value from the "stale" state.
    • Use the stale time interval to decide how much additional time should pass until the metric is placed in a "stale" state next time.

    It may also be a good idea to set a maximum "additional stale time" limit. When the limit is 0, the feature is disabled.

  • Add Grafana Cloud output support

    Add Grafana Cloud output support

    Since one of the current uses of Hadrianus is to send metrics to Grafana Cloud, it might simplify the setup to add Grafana Cloud support to Hadrianus. That way, we'd be able to spin up Hadrianus as a scratch container in EKS with minimal fuss, without using carbon-relay-ng.

    The integration of the code seems super simple: https://github.com/grafana/cloud-graphite-scripts/blob/master/send/main.go

    The above example uses "plain json" (content-type "application/json"), which is pretty inefficient. Instead, it should use "binary protocol, snappy compressed" (content-type "rt-metric-binary-snappy"). Some work needs to be spent on understanding how this is done.

    We still need to think about how the UI/config should look for it (config maps? env variables?).

    One way to handle this is to define named output aliases in a configuration file. It overrides the standard resolve hostname/port mechanism when such a name is encountered. Each output alias needs to have:

    • name
    • type (for example "grafanaNet" or "plaintext")
    • optional configuration details (for example, "address URL", "API key", "schemas file", aggregation file" for "grafanaNet")
  • Add Prometheus endpoint

    Add Prometheus endpoint

    Internal Hadrianus metrics should also (optionally?) be exposed as a Prometheus endpoint. This could have the additional benefit of being usable as a service health check in Kubernetes or similar.

  • Fix clientConnectionsActive data

    Fix clientConnectionsActive data

    server.hadrianus.*.clientConnectionsActive will sometimes drift out of sync and give strange results.

    Possibly, this could be fixed by replacing the existing naïve counters with "atomic counters" (https://gobyexample.com/atomic-counters).

  • Implement

    Implement "cardinality guard" functionality

    It can be problematic for a graphite setup if metrics producers create new metrics paths at a high rate.

    There should be a way to limit the creation of new metrics paths for a metrics producer to stop the downstream systems from being overwhelmed.

    Suggestions:

    • The production of new metrics should be possible to throttle by producer IP address.
    • Once throttled, a producer should only be allowed to create a limited number of new metrics paths per measurement period.

    A mock example of keeping track of this could look like this. The first level is a 128-bit IPv6 address. The second level contains the 32 bit hash of a graphite metrics path, mapped to a 64 bit UNIX timestamp. The idea is that every N seconds (whatever your measurement time interval is), you count how many paths have a timestamp that is greater than or equal to the current timestamp minus N for every IP address. This approach will give you the number of unique metric paths produced for the time interval N for every IP address:

    {
        "::ffff:90.16.154.34": {
            "c65a134b": 1642598151,
            "c296b298": 1642598151
        },
        "dc8b:334c:b60c:d107:47d3:ca23:2860:df1a": {
            "dd7ee3a6": 1642598153,
            "a3cf9f1b": 1642598153,
            "d4c8af8d": 1642598153
        }
    }
    
  • clientConnectionsActive could be more helpful

    clientConnectionsActive could be more helpful

    The internal Hadrianus metric server.hadrianus.*.clientConnectionsActive isn't beneficial when you have a small number of connections, since it's just sampling the number of connections currently.

    It should (perhaps) instead of this, show an average of the number of concurrent active network connections during the sampling interval.

Type-safe Prometheus metrics builder library for golang

gotoprom A Prometheus metrics builder gotoprom offers an easy to use declarative API with type-safe labels for building and using Prometheus metrics.

Dec 5, 2022
Go port of Coda Hale's Metrics library

go-metrics Go port of Coda Hale's Metrics library: https://github.com/dropwizard/metrics. Documentation: http://godoc.org/github.com/rcrowley/go-metri

Dec 30, 2022
A tool to run queries in defined frequency and expose the count as prometheus metrics.
A tool to run queries in defined frequency and expose the count as prometheus metrics.

A tool to run queries in defined frequency and expose the count as prometheus metrics. Supports MongoDB and SQL

Jul 1, 2022
Prometheus support for go-metrics

go-metrics-prometheus This is a reporter for the go-metrics library which will post the metrics to the prometheus client registry . It just updates th

Nov 13, 2022
a tool for getting metrics in containers

read metrics in container if environment is container, the cpu ,memory is relative to container, else the metrics is relative to host. juejing link :

Oct 13, 2022
Collect and visualize metrics from Brigade 2

Brigade Metrics: Monitoring for Brigade 2 Brigade Metrics adds monitoring capabilities to a Brigade 2 installation. It utilizes Brigade APIs to export

Sep 8, 2022
Count Dracula is a fast metrics server that counts entries while automatically expiring old ones

In-Memory Expirable Key Counter This is a fast metrics server, ideal for tracking throttling. Put values to the server, and then count them. Values ex

Jun 17, 2022
rsync wrapper (or output parser) that pushes metrics to prometheus

rsync-prom An rsync wrapper (or output parser) that pushes metrics to prometheus. This allows you to then build dashboards and alerting for your rsync

Dec 11, 2022
mackerel-agent is an agent program to post your hosts' metrics to mackerel.io.
mackerel-agent is an agent program to post your hosts' metrics to mackerel.io.

mackerel-agent mackerel-agent is a client software for Mackerel. Mackerel is an online visualization and monitoring service for servers. Once mackerel

Jan 7, 2023
⛑ Gatus - Automated service health dashboard
⛑ Gatus - Automated service health dashboard

A service health dashboard in Go that is meant to be used as a docker image with a custom configuration file. I personally deploy it in my Kubernetes

Dec 31, 2022
HTTP service to generate PDF from Json requests

pdfgen HTTP service to generate PDF from Json requests Install and run The recommended method is to use the docker container by mounting your template

Dec 2, 2022
Kratos Service Layout

Kratos Layout Install Kratos

Jan 1, 2023
An HTTP service for customizing import path of your Go packages.

Go Packages A self-host HTTP service that allow customizing your Go package import paths. Features Reports. Badges. I18N. Preview I launch up a free H

Nov 27, 2022
A service for predicting the order of keys to use for opening doors in Ladder Slasher

A service for predicting the order of keys to use for opening doors in Ladder Slasher.

Oct 29, 2021
Example hello-world service uses go-fx-grpc-starter boilerplate code

About Example hello-world service uses https://github.com/srlk/go-fx-grpc-starter boilerplate code. Implementation A hello world grpc service is creat

Nov 14, 2021
Typesafe lazy instantiation to improve service start time

Package lazy is a light wrapper around sync.Once providing support for return values. It removes the burden of capturing return values via closures from the caller.

May 30, 2022
Implement a toy in-memory store information service for a delivery company

Implement a toy in-memory store information service for a delivery company

Nov 22, 2021
An in-memory, key-value store HTTP API service

This is an in-memory key-value store HTTP API service, with the following endpoints: /get/{key} : GET method. Returns the value of a previously set ke

May 23, 2022
A Simple Bank Web Service implemented in Go, HTTP & GRPC, PostgreSQL, Docker, Kubernetes, GitHub Actions CI

simple-bank Based on this Backend Master Class by TECH SCHOOL: https://youtube.com/playlist?list=PLy_6D98if3ULEtXtNSY_2qN21VCKgoQAE Requirements Insta

Dec 9, 2021