The Prometheus monitoring system and time series database.

Prometheus

CircleCI Docker Repository on Quay Docker Pulls Go Report Card CII Best Practices Gitpod ready-to-code

Visit prometheus.io for the full documentation, examples and guides.

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.

The features that distinguish Prometheus from other metrics and monitoring systems are:

  • A multi-dimensional data model (time series defined by metric name and set of key/value dimensions)
  • PromQL, a powerful and flexible query language to leverage this dimensionality
  • No dependency on distributed storage; single server nodes are autonomous
  • An HTTP pull model for time series collection
  • Pushing time series is supported via an intermediary gateway for batch jobs
  • Targets are discovered via service discovery or static configuration
  • Multiple modes of graphing and dashboarding support
  • Support for hierarchical and horizontal federation

Architecture overview

Install

There are various ways of installing Prometheus.

Precompiled binaries

Precompiled binaries for released versions are available in the download section on prometheus.io. Using the latest production release binary is the recommended way of installing Prometheus. See the Installing chapter in the documentation for all the details.

Docker images

Docker images are available on Quay.io or Docker Hub.

You can launch a Prometheus container for trying it out with

$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus

Prometheus will now be reachable at http://localhost:9090/.

Building from source

To build Prometheus from source code, first ensure that have a working Go environment with version 1.14 or greater installed. You also need Node.js and Yarn installed in order to build the frontend assets.

You can directly use the go tool to download and install the prometheus and promtool binaries into your GOPATH:

$ go get github.com/prometheus/prometheus/cmd/...
$ prometheus --config.file=your_config.yml

However, when using go get to build Prometheus, Prometheus will expect to be able to read its web assets from local filesystem directories under web/ui/static and web/ui/templates. In order for these assets to be found, you will have to run Prometheus from the root of the cloned repository. Note also that these directories do not include the new experimental React UI unless it has been built explicitly using make assets or make build.

An example of the above configuration file can be found here.

You can also clone the repository yourself and build using make build, which will compile in the web assets so that Prometheus can be run from anywhere:

$ mkdir -p $GOPATH/src/github.com/prometheus
$ cd $GOPATH/src/github.com/prometheus
$ git clone https://github.com/prometheus/prometheus.git
$ cd prometheus
$ make build
$ ./prometheus --config.file=your_config.yml

The Makefile provides several targets:

  • build: build the prometheus and promtool binaries (includes building and compiling in web assets)
  • test: run the tests
  • test-short: run the short tests
  • format: format the source code
  • vet: check the source code for common errors
  • docker: build a docker container for the current HEAD
  • assets: build the new experimental React UI

React UI Development

For more information on building, running, and developing on the new React-based UI, see the React app's README.md.

More information

Contributing

Refer to CONTRIBUTING.md

License

Apache License 2.0, see LICENSE.

Comments
  • TSDB data import tool for OpenMetrics format.

    TSDB data import tool for OpenMetrics format.

    Created a tool to import data formatted according to the Prometheus exposition format. The tool can be accessed via the TSDB CLI.

    closes prometheus/prometheus#535

    Signed-off-by: Dipack P Panjabi [email protected]

    (Port of https://github.com/prometheus/tsdb/pull/671)

  • Add mechanism to perform bulk imports

    Add mechanism to perform bulk imports

    Currently the only way to bulk-import data is a hacky one involving client-side timestamps and scrapes with multiple samples per time series. We should offer an API for bulk import. This relies on https://github.com/prometheus/prometheus/issues/481.

    EDIT: It probably won't be an web-based API in Prometheus, but a command-line tool.

  • Create a section ANNOTATIONS with user-defined payload and generalize RUNBOOK, DESCRIPTION, SUMMARY into fields therein.

    Create a section ANNOTATIONS with user-defined payload and generalize RUNBOOK, DESCRIPTION, SUMMARY into fields therein.

    RUNBOOK was added in a hurry in #843 for an internal demo of one of our users, which didn't give it enough time to be fully discussed. The demo has been done, so we can reconsider this.

    I think we should revert this change, and remove RUNBOOK:

    • Our general policy is that if it can be done with labels, do it with labels
    • All notification methods in the alertmanager will need extra code to deal with this
    • In future, all alertmanager notification templates will need extra code to deal with this
    • In general, all user code touching the alertmanager will need extra code to deal with this
    • This presumes a certain workflow in that you have something called a "runbook" (and not any other name - playbook is also common) and that you have exactly one of them

    Runbooks are not a fundamental aspect of an alert, are not in use by all of our users and thus I don't believe they meet the bar for first-class support within prometheus. This is especially true considering that they don't add anything that isn't already possible with labels.

  • Implement strategies to limit memory usage.

    Implement strategies to limit memory usage.

    Currently, Prometheus simply limits the chunks in memory to a fixed number.

    However, this number doesn't directly imply the total memory usage as many other things take memory as well.

    Prometheus could measure its own memory consumption and (optionally) evict chunks early if it needs too much memory.

    It's non-trivial to measure "actual" memory consumption in a platform independent way.

  • '@ <timestamp>' modifier

    '@ ' modifier

    This PR implements @ <timestamp> modifier as per this design doc.

    An example query:

    rate(process_cpu_seconds_total[1m]) 
      and
    topk(7, rate(process_cpu_seconds_total[1h] @ 1234))
    

    which ranks based on last 1h rate and w.r.t. unix timestamp 1234 but actually plots the 1m rate.

    Closes #7903

    This PR is to be followed up with an easier way to represent the start, end, range of a query in PromQL so that we could do @ <end>, metric[<range>] easily.

  • Port isolation from old TSDB PR

    Port isolation from old TSDB PR

    The original PR was https://github.com/prometheus/tsdb/pull/306 .

    I tried to carefully adjust to the new world order, but please give this a very careful review, especially around iterator reuse (marked with a TODO).

    On the bright side, I definitely found and fixed a bug in txRing.

  • 2.3.0 significatnt memory usage increase.

    2.3.0 significatnt memory usage increase.

    Bug Report

    What did you do? Upgraded to 2.3.0

    What did you expect to see? General improvements.

    What did you see instead? Under which circumstances? Memory usage, possibly driven by queries, has considerably increased. Upgrade at 09:27, the memory usage drops on the graph after then are from container restarts due to OOM.

    container_memory_usage_bytes

    image

    Environment

    Prometheus in kubernetes 1.9

    • System information: Standard docker containers, on docker kubelet on linux.

    • Prometheus version: 2.3.0 insert output of prometheus --version here

  • Support for environment variable substitution in configuration file

    Support for environment variable substitution in configuration file

    I think that would be a good idea to substitute environment variables in the configuration file.

    That could be done really easily using os.ExpandEnv on configuration string when loading configuration string.

    That would be much easier to substitute environment variables only on configuration values. go -ini provides a valueMapper but yaml.v2 doesn't have such mechanism.

  • React UI: Implement more sophisticated autocomplete

    React UI: Implement more sophisticated autocomplete

    It would be great to have more sophisticated expression field autocompletion in the new React UI.

    Currently it only autocompletes metric names, and only when the expression field doesn't contain any other sub-expressions yet.

    Things that would be nice to autocomplete:

    • metric names anywhere within an expression
    • label names
    • label values
    • function names
    • etc.

    For autocomplete functionality not to annoy users, it needs to be as highly performant, correct, and unobtrusive as possible. Grafana does many things right here already, but they also have a few really annoying bugs, like inserting closing parentheses in incorrect locations of an expression.

    Currently @slrtbtfs has indicated interest in building a language-server-based autocomplete implementation.

  • Benchmark tsdb master

    Benchmark tsdb master

    DO NOT MERGE

    Benchmark 1

    Benchmark the following PRs against 2.11.1

    1. For queries: https://github.com/prometheus/tsdb/pull/642
    2. For compaction: https://github.com/prometheus/tsdb/pull/643 https://github.com/prometheus/tsdb/pull/654 https://github.com/prometheus/tsdb/pull/653
    3. Opening block: https://github.com/prometheus/tsdb/pull/645

    Results

    Did not test for compaction from on-disk blocks. Could not really see the allocation optimizations in compaction, that might be because the savings are mostly in the number of allocations and not the size of allocation (size is what is showed in the dashboards). That would mean CPU to be saved, but couldn't make a huge difference, but a slight increase in gap during compaction.

    The gains looked good in

    1. Allocations
    2. CPU (because of allocations?)
    3. RSS was also lower (upto 10 GiB lower! ~60 vs ~70).
    4. Also a tiny-good improvement in query inner_eval times.
    5. Compaction time (this should help the increase in compaction time that https://github.com/prometheus/tsdb/pull/627 is going to bring).
    6. System load.

    And bad in

    1. result_sort for the queries. Not sure why.

    Benchmark 2

    Benchmark https://github.com/prometheus/tsdb/pull/627 (which includes all the PRs from above Benchmark 1) against 2.11.1

  • M-map full chunks of Head from disk

    M-map full chunks of Head from disk

    tl-dr desc for the PR from @krasi-georgiev


    When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

    Prom startup now happens in these stages

    • Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
    • Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

    If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

    Mmaped chunks format - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks. The block chunks are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files. In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

    Prombench results

    WAL Replay

    1h Wal reply time 30% less wal reply time - 4m31 vs 3m36 2h Wal reply time 20% less wal reply time - 8m16 vs 7m

    Memory During WAL Replay

    High Churn 10-15% less RAM - 32gb vs 28gb 20% less RAM after compaction 34gb vs 27gb No Churn 20-30% less RAM - 23gb vs 18gb 40% less RAM after compaction 32.5gb vs 20gb

    Screenshots are in this comment


    Prerequisite: https://github.com/prometheus/prometheus/pull/6830 (Merged)

    Closes https://github.com/prometheus/prometheus/issues/6377. More info in the linked issue and the doc in that issue and the doc inside that doc inside that issue :)

    • [x] Add tests
    • [x] Explore possible ways to get rid of new globals added in head.go
    • [x] Wait for https://github.com/prometheus/prometheus/pull/6830 to be merged
    • [x] Fix windows tests
  • Add sum of squares for histogram and summary for stddev calculation

    Add sum of squares for histogram and summary for stddev calculation

    Proposal

    Stddev is third by importante analytic metric for data (after avg and count). As i can see, currenty prometheus does not calculate stddev for histograms. Suprisely found, that https://github.com/prometheus/prometheus/issues/7030, https://github.com/prometheus/prometheus/issues/3998 was closed without solution, no way was found to calculate this.

    But solution is very simple, collector should return _sum_of_squares metric in addition to _count and _sum metrics. Example:

    First collection prometheus_http_request_duration_seconds_count 1000 prometheus_http_request_duration_seconds_sum 1000 prometheus_http_request_duration_seconds_sum_of_squares 1000

    Data values 0 8 8 2 3 are processed between collections

    Second collection prometheus_http_request_duration_seconds_count 1005 prometheus_http_request_duration_seconds_sum 1021 prometheus_http_request_duration_seconds_sum_of_squares 1132

    count diff - 5, sum diff - 21, summ of squares diff = 132, avg = 21/5=4.2, average square = 132/5 = 28,2, stdvar = average square - square of average = 28.2-4.2^2 = 10.56, stddev = sqrt (stdvar) = 3.249615.

    Let's assume we had next collection (using same values for example wich does not change avg and stddev)

    Data values 0 0 8 8 8 8 2 2 3 3 are processed between collections

    Third collection prometheus_http_request_duration_seconds_count 1015 prometheus_http_request_duration_seconds_sum 1063 prometheus_http_request_duration_seconds_sum_of_squares 1396

    For second inteval count diff - 10, sum diff - 42, summ of squares diff = 264, avg = 24/10=4.2, average square = 264/10 = 28,2, stdvar = average square - square of average = 28.2-4.2^2 = 10.56, stddev = sqrt (stdvar) = 3.249615.

    For both intervals together count diff - 15, sum diff - 63, summ of squares diff = 396, avg = 63/15=4.2, average square = 396/15 = 28,2, stdvar = average square - square of average = 28.2-4.2^2 = 10.56, stddev = sqrt (stdvar) = 3.249615.

    These are correctly calculated sdtdev & stdvar for given dataset of 15 number, but with only 3 counters transferred from collector to server. But if we try to calculate stddev on server, we will get 0, because avg value is 4,2 for both intervals, meaning no deviation, and that's incorrect.

  • Alert Unit Test integration for certain Query Functions

    Alert Unit Test integration for certain Query Functions

    I know certain query functions, ie. time() can be easily mocked in unit testing the alert rules within promtool. Others are not supported though. In my specific case, I am using predict_linear in an expression and would like to be able to write a unit test within the promtool suite. Now, this can be extended to some of the other query functions as well but I am not familiar with which ones have support or not.

    So really asking for 2 big things:

    • Can the alerting unit test docs provide more info on unit tests for expressions utilizing query functions
    • Can the team look into adding support for unit testing all of the query functions (and if not explicitly state which ones can and cant be tested in the unit testing docs)

    Here is the specific expression I am using if that provides any value: node_filesystem_avail_bytes{fstype="nfs"} / node_filesystem_size_bytes{fstype="nfs"} - (predict_linear(node_filesystem_avail_bytes{fstype="nfs"}[12h], 604800) < 0) * 604800

  • Verify checksum on downloads with curl in Makefiles

    Verify checksum on downloads with curl in Makefiles

    @roidelapluie This implements #10660 and verifies checksums in files downloaded by curl in makefiles, to prevent attacks.

    Manual testing

    When checksum is ok then files are downloaded as expected

    rm /Users/jrodrig/go/bin/promu ; make promu
    $ echo $?
    0
    # ok
    promu --help
    
    rm /Users/jrodrig/go/bin/golangci-lint ; make /Users/jrodrig/go/bin/golangci-lint
    $ echo $?
    0
    golangci-lint -v
    

    When the checksumis wrong (after modifying scripts/checksums/promu_sha256sums.txt and scripts/checksums/golangci-lint_install_sha256sums.txt to fail the checksum verification), the make target fail

    $ rm /Users/jrodrig/go/bin/promu ; make promu
    if ! ./scripts/download_and_checksum.sh https://github.com/prometheus/promu/releases/download/v0.13.0/promu-0.13.0.darwin-amd64.tar.gz /var/folders/n0/sd11nx1s0mnb94yvvk_f1rcr0000gn/T/tmp.NSH9d02k scripts/checksums/promu_sha256sums.txt ; then \
                rm -rf /var/folders/n0/sd11nx1s0mnb94yvvk_f1rcr0000gn/T/tmp.NSH9d02k; \
                    exit 1; \
            fi
    Error: https://github.com/prometheus/promu/releases/download/v0.13.0/promu-0.13.0.darwin-amd64.tar.gz failed checksum
    make: *** [/Users/jrodrig/go/bin/promu] Error 1
    $ echo $?
    2
    $ promu
    bash: /Users/jrodrig/go/bin/promu: No such file or directory
    
    $ rm /Users/jrodrig/go/bin/golangci-lint ; make /Users/jrodrig/go/bin/golangci-lint
    mkdir -p /Users/jrodrig/go/bin
    if ! ./scripts/download_and_checksum.sh "https://raw.githubusercontent.com/golangci/golangci-lint/v1.45.2/install.sh" /Users/jrodrig/go scripts/checksums/golangci-lint_install_sha256sums.txt ; then \
                rm -f /Users/jrodrig/go/install.sh; \
                    exit 1; \
            fi
    Error: https://raw.githubusercontent.com/golangci/golangci-lint/v1.45.2/install.sh failed checksum
    make: *** [/Users/jrodrig/go/bin/golangci-lint] Error 1
    $ echo $?
    2
    $ golangci-lint -v
    bash: /Users/jrodrig/go/bin/golangci-lint: No such file or directory
    

    Signed-off-by: Juan Rodriguez Hortala [email protected]

  • build(deps): bump github.com/Azure/go-autorest/autorest from 0.11.27 to 0.11.28

    build(deps): bump github.com/Azure/go-autorest/autorest from 0.11.27 to 0.11.28

    Bumps github.com/Azure/go-autorest/autorest from 0.11.27 to 0.11.28.

    Release notes

    Sourced from github.com/Azure/go-autorest/autorest's releases.

    autorest/v0.11.28

    Update crypto dependency to latest version #708

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • build(deps): bump github.com/ionos-cloud/sdk-go/v6 from 6.1.1 to 6.1.2

    build(deps): bump github.com/ionos-cloud/sdk-go/v6 from 6.1.1 to 6.1.2

    Bumps github.com/ionos-cloud/sdk-go/v6 from 6.1.1 to 6.1.2.

    Release notes

    Sourced from github.com/ionos-cloud/sdk-go/v6's releases.

    Release v6.1.2

    Fixes :

    • Changed from manageDbaas to manageDBaaS field in model_group_properties.go : provides privilege for a group to manage DBaaS related functionality. Admin users already have it enabled by default.
    • Issue #26
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Time Series Database based on Cassandra with Prometheus remote read/write support

SquirrelDB SquirrelDB is a scalable high-available timeseries database (TSDB) compatible with Prometheus remote storage. SquirrelDB store data in Cass

Jun 18, 2022
Fast specialized time-series database for IoT, real-time internet connected devices and AI analytics.
Fast specialized time-series database for IoT, real-time internet connected devices and AI analytics.

unitdb Unitdb is blazing fast specialized time-series database for microservices, IoT, and realtime internet connected devices. As Unitdb satisfy the

Aug 2, 2022
LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability.
LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability.

LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability. LinDB stores all monitoring data of ELEME Inc, there is 88TB incremental writes per day and 2.7PB total raw data.

Jul 30, 2022
TalariaDB is a distributed, highly available, and low latency time-series database for Presto
TalariaDB is a distributed, highly available, and low latency time-series database for Presto

TalariaDB is a distributed, highly available, and low latency time-series database that stores real-time data. It's built on top of Badger DB.

Jun 29, 2022
Export output from pg_stat_activity and pg_stat_statements from Postgres into a time-series database that supports the Influx Line Protocol (ILP).

pgstat2ilp pgstat2ilp is a command-line program for exporting output from pg_stat_activity and pg_stat_statements (if the extension is installed/enabl

Dec 15, 2021
🤔 A minimize Time Series Database, written from scratch as a learning project.
🤔 A minimize Time Series Database, written from scratch as a learning project.

mandodb ?? A minimize Time Series Database, written from scratch as a learning project. 时序数据库(TSDB: Time Series Database)大多数时候都是为了满足监控场景的需求,这里先介绍两个概念:

Aug 2, 2022
Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures.

Owl is a db manager platform,committed to standardizing the data, index in the database and operations to the database, to avoid risks and failures. capabilities which owl provides include Process approval、sql Audit、sql execute and execute as crontab、data backup and recover .

Jun 17, 2022
Beerus-DB: a database operation framework, currently only supports Mysql, Use [go-sql-driver/mysql] to do database connection and basic operations

Beerus-DB · Beerus-DB is a database operation framework, currently only supports Mysql, Use [go-sql-driver/mysql] to do database connection and basic

Mar 5, 2022
This is a simple graph database in SQLite, inspired by "SQLite as a document database".

About This is a simple graph database in SQLite, inspired by "SQLite as a document database". Structure The schema consists of just two structures: No

Aug 1, 2022
Hard Disk Database based on a former database

Hard Disk Database based on a former database

Nov 1, 2021
Simple key value database that use json files to store the database

KValDB Simple key value database that use json files to store the database, the key and the respective value. This simple database have two gRPC metho

Nov 13, 2021
Vitess is a database clustering system for horizontal scaling of MySQL through generalized sharding.

Vitess is a database clustering system for horizontal scaling of MySQL through generalized sharding.

Aug 7, 2022
Go reproduction of Bustub--a simple relational database system.

Bustub in Golang Bustub is the course project of CMU15-445 Database System, which is a simple relational database system. This repo is a golang reprod

Dec 18, 2021
Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on commands and key-regexes written by GO

Welcome to NIPO Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on command

Jun 13, 2022
Scalable datastore for metrics, events, and real-time analytics

InfluxDB InfluxDB is an open source time series platform. This includes APIs for storing and querying data, processing it in the background for ETL or

Aug 8, 2022
Scalable datastore for metrics, events, and real-time analytics

InfluxDB InfluxDB is an open source time series platform. This includes APIs for storing and querying data, processing it in the background for ETL or

Aug 9, 2022
A GPU-powered real-time analytics storage and query engine.
A GPU-powered real-time analytics storage and query engine.

AresDB AresDB is a GPU-powered real-time analytics storage and query engine. It features low query latency, high data freshness and highly efficient i

Jul 30, 2022
Real-time Geospatial and Geofencing
Real-time Geospatial and Geofencing

Tile38 is an open source (MIT licensed), in-memory geolocation data store, spatial index, and realtime geofence. It supports a variety of object types

Jul 31, 2022
:handbag: Cache arbitrary data with an expiration time.

cache Cache arbitrary data with an expiration time. Features 0 dependencies About 100 lines of code 100% test coverage Usage // New cache c := cache.N

Jul 27, 2022