Streaming approximate histograms in Go

gohistogram - Histograms in Go

build status

This package provides Streaming Approximate Histograms for efficient quantile approximations.

The histograms in this package are based on the algorithms found in Ben-Haim & Yom-Tov's A Streaming Parallel Decision Tree Algorithm (PDF). Histogram bins do not have a preset size. As values stream into the histogram, bins are dynamically added and merged.

Another implementation can be found in the Apache Hive project (see NumericHistogram).

An example:

histogram

The accurate method of calculating quantiles (like percentiles) requires data to be sorted. Streaming histograms make it possible to approximate quantiles without sorting (or even individually storing) values.

NumericHistogram is the more basic implementation of a streaming histogram. WeightedHistogram implements bin values as exponentially-weighted moving averages.

A maximum bin size is passed as an argument to the constructor methods. A larger bin size yields more accurate approximations at the cost of increased memory utilization and performance.

A picture of kittens:

stack of kittens

Getting started

Using in your own code

$ go get github.com/VividCortex/gohistogram
import "github.com/VividCortex/gohistogram"

Running tests and making modifications

Get the code into your workspace:

$ cd $GOPATH
$ git clone [email protected]:VividCortex/gohistogram.git ./src/github.com/VividCortex/gohistogram

You can run the tests now:

$ cd src/github.com/VividCortex/gohistogram
$ go test .

API Documentation

Full source documentation can be found here.

Contributing

We only accept pull requests for minor fixes or improvements. This includes:

  • Small bug fixes
  • Typos
  • Documentation or comments

Please open issues to discuss new features. Pull requests for new features will be rejected, so we recommend forking the repository and making changes in your fork for your use case.

License

Copyright (c) 2013 VividCortex

Released under MIT License. Check LICENSE file for details.

Comments
  • Set a version and tag a release

    Set a version and tag a release

    I think this project is very useful and I've noted other projects in Go use it as well. Merely a suggestion, but I think you could make things simpler for people pulling this in if you have a tagged release tag to pull from. (See https://dave.cheney.net/2016/06/24/gophers-please-tag-your-releases) It is super quick to add in Github.

    Thanks for writing this.

  • Merge performance and serialization changes

    Merge performance and serialization changes

    My main contribution is a benchmark to quantify the proposed performance enhancements by @nemothekid

    Before:
    < go test -run none -bench . -benchtime 2s
    PASS
    BenchmarkNumericHistogram20-8    3000000               891 ns/op
    BenchmarkNumericHistogram50-8    2000000              1396 ns/op
    BenchmarkNumericHistogram100-8   1000000              2120 ns/op
    ok      github.com/signalfx/gohistogram 9.901s
    
    After:
    < go test -run none -bench . -benchtime 2s
    PASS
    BenchmarkNumericHistogram20-8   20000000               205 ns/op
    BenchmarkNumericHistogram50-8   10000000               340 ns/op
    BenchmarkNumericHistogram100-8   5000000               482 ns/op
    ok      github.com/signalfx/gohistogram 10.964s
    
  • question on ewma function not using existing value

    question on ewma function not using existing value

    hello, I find the following a bit odd. I'll go take a look at the Java code, but either its wrong or can be simplified.

    func ewma(existingVal float64, newVal float64, alpha float64) (result float64) {
        result = newVal*(1-alpha) + existingVal*alpha  // *** existingVal is always 0 ??
        return
    }
    
    func (h *WeightedHistogram) scaleDown(except int) {
        for i := range h.bins {
            if i != except {
                h.bins[i].count = ewma(h.bins[i].count, 0, h.alpha) //****  <--- 0 ??
            }
        }
    }
    
  • Is there a way to print the histogram?

    Is there a way to print the histogram?

    You display in the README a printed histogram, but I can't find any instructions in the API docs for outputting the histogram to console. Is there an easy way to do this from your package?

  • Got panic under unknown circumstances

    Got panic under unknown circumstances

    When I'm using numeric histograms in different goroutines getting panic from time to time: runtime error: index out of range

    Panic is here: github.com/VividCortex/gohistogram/numerichistogram.go:124

  • Configure WhiteSource for GitHub.com

    Configure WhiteSource for GitHub.com

    Welcome to WhiteSource for GitHub.com! This is an onboarding PR to help you understand and configure settings before WhiteSource starts scanning your repository for security vulnerabilities.

    :vertical_traffic_light: WhiteSource for GitHub.com will start scanning your repository only once you merge this Pull Request. To disable WhiteSource for GitHub.com, simply close this Pull Request.


    What to Expect

    This PR contains a '.whitesource' configuration file which can be customized to your needs. If no changes were applied to this file, WhiteSource for GitHub.com will use the default configuration.

    Before merging this PR, Make sure the Issues tab is enabled. Once you merge this PR, WhiteSource for GitHub.com will scan your repository and create a GitHub Issue for every vulnerability detected in your repository.

    If you do not want a GitHub Issue to be created for each detected vulnerability, you can edit the '.whitesource' file and set the 'minSeverityLevel' parameter to 'NONE'.

    If WhiteSource Remediate Workflow Rules are set on your repository (from the WhiteSource 'Integrate' tab), WhiteSource will also generate a fix Pull Request for relevant vulnerabilities.


    :question: Got questions? Check out WhiteSource for GitHub.com docs. If you need any further assistance then you can also request help here.

  • Make bin merging more robust

    Make bin merging more robust

    Outliers significantly affect the current merging algorithm. We can end up with situations like this:

    Total: 109
    -1000    .
    -900     .
    -800     .
    -700     .
    -600     .
    -0.041725873527114175    .......................................
    700      .
    800      .
    900      .
    1000     .
    
  • How to get count nuber at specific bin value?

    How to get count nuber at specific bin value?

    Not a math person here, been looking into the code but no clue. I assume it must be pretty simple, so please bear with me.

    How to get a count number for a specific bin value (numerichistogram)? Thank you πŸ™πŸ»

Fast, realtime regex-extraction, and aggregation into common formats such as histograms, numerical summaries, tables, and more!
Fast, realtime regex-extraction, and aggregation into common formats such as histograms, numerical summaries, tables, and more!

rare A file scanner/regex extractor and realtime summarizor. Supports various CLI-based graphing and metric formats (histogram, table, etc). Features

Dec 29, 2022
Server and client implementation of the grpc go libraries to perform unary, client streaming, server streaming and full duplex RPCs from gRPC go introduction

Description This is an implementation of a gRPC client and server that provides route guidance from gRPC Basics: Go tutorial. It demonstrates how to u

Nov 24, 2021
:envelope: A streaming Go library for the Internet Message Format and mail messages

go-message A Go library for the Internet Message Format. It implements: RFC 5322: Internet Message Format RFC 2045, RFC 2046 and RFC 2047: Multipurpos

Dec 26, 2022
A cloud native distributed streaming network telemetry.
A cloud native distributed streaming network telemetry.

Panoptes Streaming Panoptes Streaming is a cloud native distributed streaming network telemetry. It can be installed as a single binary or clustered n

Sep 27, 2022
Go library for writing standalone Map/Reduce jobs or for use with Hadoop's streaming protocol

dmrgo is a Go library for writing map/reduce jobs. It can be used with Hadoop's streaming protocol, but also includes a standalone map/reduce impleme

Nov 27, 2022
httpstream provides HTTP handlers for simultaneous streaming uploads and downloads of objects, as well as persistence and a standalone server.

httpfstream httpfstream provides HTTP handlers for simultaneous streaming uploads and downloads of files, as well as persistence and a standalone serv

May 1, 2021
Declarative streaming ETL for mundane tasks, written in Go
Declarative streaming ETL for mundane tasks, written in Go

Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform h

Dec 28, 2022
Build event-driven and event streaming applications with ease

Commander ?? Commander is Go library for writing event-driven applications. Enabling event sourcing, RPC over messages, SAGA's, bidirectional streamin

Dec 19, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023
Go Twitter REST and Streaming API v1.1

go-twitter go-twitter is a Go client library for the Twitter API. Check the usage section or try the examples to see how to access the Twitter API. Fe

Dec 28, 2022
Parse and generate m3u8 playlists for Apple HTTP Live Streaming (HLS) in Golang (ported from gem https://github.com/sethdeckard/m3u8)

go-m3u8 Golang package for m3u8 (ported m3u8 gem https://github.com/sethdeckard/m3u8) go-m3u8 provides easy generation and parsing of m3u8 playlists d

Nov 19, 2022
Fast, concurrent, streaming access to Amazon S3, including gof3r, a CLI. http://godoc.org/github.com/rlmcpherson/s3gof3r

s3gof3r s3gof3r provides fast, parallelized, pipelined streaming access to Amazon S3. It includes a command-line interface: gof3r. It is optimized for

Dec 26, 2022
nanoQ β€” high-performance brokerless Pub/Sub for streaming real-time data

nanoQ β€” high-performance brokerless Pub/Sub for streaming real-time data nanoQ is a very minimalistic (opinionated/limited) Pub/Sub transport library.

Nov 9, 2022
Declarative streaming ETL for mundane tasks, written in Go
Declarative streaming ETL for mundane tasks, written in Go

Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads.

Dec 29, 2022
Streaming Fast on Ethereum
Streaming Fast on Ethereum

Stream Ethereum data like there's no tomorrow

Dec 15, 2022
Personal video streaming server.

tube This is a Golang project to build a self hosted "tube"-style video player for watching your own video collection over HTTP or hosting your own ch

Jan 5, 2023
πŸ¦– Streaming-Serverless Framework for Low-latency Edge Computing applications, running atop QUIC protocol, engaging 5G technology.
πŸ¦– Streaming-Serverless Framework for Low-latency Edge Computing applications, running atop QUIC protocol, engaging 5G technology.

YoMo YoMo is an open-source Streaming Serverless Framework for building Low-latency Edge Computing applications. Built atop QUIC Transport Protocol an

Dec 29, 2022
Simple, high-performance event streaming broker

Styx Styx is a simple and high-performance event streaming broker. It aims to provide teams of all sizes with a simple to operate, disk-persisted publ

Nov 24, 2022
Take control over your live stream video by running it yourself. Streaming + chat out of the box.
Take control over your live stream video by running it yourself.  Streaming + chat out of the box.

Take control over your content and stream it yourself. Explore the docs Β» View Demo Β· Use Our Server for Testing Β· FAQ Β· Report Bug Table of Contents

Jan 1, 2023
live video streaming server in golang
live video streaming server in golang

δΈ­ζ–‡ Simple and efficient live broadcast server: Very simple to install and use; Pure Golang, high performance, and cross-platform; Supports commonly us

Jan 4, 2023