mtail - extract internal monitoring data from application logs for collection into a timeseries database

mtail

mtail - extract internal monitoring data from application logs for collection into a timeseries database

ci GoDoc Go Report Card OSS-Fuzz CirrusCI Build Status codecov

mtail is a tool for extracting metrics from application logs to be exported into a timeseries database or timeseries calculator for alerting and dashboarding.

It fills a monitoring niche by being the glue between applications that do not export their own internal state (other than via logs) and existing monitoring systems, such that system operators do not need to patch those applications to instrument them or writing custom extraction code for every such application.

The extraction is controlled by mtail programs which define patterns and actions:

# simple line counter
counter lines_total
/$/ {
  lines_total++
}

Metrics are exported for scraping by a collector as JSON or Prometheus format over HTTP, or can be periodically sent to a collectd, StatsD, or Graphite collector socket.

Read the programming guide if you want to learn how to write mtail programs.

Ask general questions on the users mailing list: https://groups.google.com/g/mtail-users

Installation

There are various ways of installing mtail.

Precompiled binaries

Precompiled binaries for released versions are available in the Releases page on Github. Using the latest production release binary is the recommended way of installing mtail.

Windows, OSX and Linux binaries are available.

Building from source

The simplest way to get mtail is to go get it directly.

go get github.com/google/mtail/cmd/mtail

This assumes you have a working Go environment with a recent Go version. Usually mtail is tested to work with the last two minor versions (e.g. Go 1.12 and Go 1.11).

If you want to fetch everything, you need to turn on Go Modules to succeed because of the way Go Modules have changed the way go get treats source trees with no Go code at the top level.

GO111MODULE=on go get -u github.com/google/mtail
cd $GOPATH/src/github.com/google/mtail
make install

If you develop the compiler you will need some additional tools like goyacc to be able to rebuild the parser.

See the Build instructions for more details.

A Dockerfile is included in this repository for local development as an alternative to installing Go in your environment, and takes care of all the build dependency installation, if you don't care for that.

Deployment

mtail works best when it paired with a timeseries-based calculator and alerting tool, like Prometheus.

So what you do is you take the metrics from the log files and you bring them down to the monitoring system?

It deals with the instrumentation so the engineers don't have to! It has the extraction skills! It is good at dealing with log files!!

Read More

Full documentation at http://google.github.io/mtail/

Read more about writing mtail programs:

Read more about hacking on mtail

Read more about deploying mtail and your programs in a monitoring environment

After that, if you have any questions, please email (and optionally join) the mailing list: https://groups.google.com/forum/#!forum/mtail-users or file a new issue.

Owner
Google
Google ❀️ Open Source
Google
Comments
  • file tailing not working when a relative log path name is passed to --logs flag

    file tailing not working when a relative log path name is passed to --logs flag

    Hi!

    Using the count_lines sample, since version rc4 I can't get tailing to work. I get the right results if I run mtail in one_shot mode, but no metric on the Prometheus metrics endpoint.

    log:

    $ ./mtail_rc10 --progs="/apps/solidmon/agent/mtail/programs/" --logs="logs/a.log" -v=10 -logtostderr
    I0405 06:10:19.350138   62463 main.go:108] mtail version v3.0.0-rc10 git revision 60ed333f3672ec835c09c507f095fdd0be6ec1d7 go version go1.9.4
    I0405 06:10:19.350205   62463 main.go:109] Commandline: ["./mtail_rc10" "--progs=/apps/solidmon/agent/mtail/programs/" "--logs=logs/a.log" "-v=10" "-logtostderr"]
    I0405 06:10:19.350429   62463 lexer.go:179] Emitting COUNTER at count_lines.mtail:2:23
    I0405 06:10:19.350455   62463 lexer.go:179] Emitting ID at count_lines.mtail:2:9-18
    I0405 06:10:19.350467   62463 lexer.go:179] Emitting NL at count_lines.mtail:3:20
    I0405 06:10:19.350478   62463 lexer.go:179] Emitting DIV at count_lines.mtail:3:1
    I0405 06:10:19.350506   62463 yaccpar:954] position marked at count_lines.mtail:3:1
    I0405 06:10:19.350515   62463 driver.go:76] Entering regex
    I0405 06:10:19.350521   62463 lexer.go:179] Emitting REGEX at count_lines.mtail:3:2
    I0405 06:10:19.350530   62463 lexer.go:555] Exiting regex
    I0405 06:10:19.350535   62463 lexer.go:556] Regex at line 2, startcol 2, col 2
    I0405 06:10:19.350542   62463 lexer.go:179] Emitting DIV at count_lines.mtail:3:3
    I0405 06:10:19.350554   62463 lexer.go:179] Emitting LCURLY at count_lines.mtail:3:5
    I0405 06:10:19.350564   62463 lexer.go:179] Emitting NL at count_lines.mtail:4:7
    I0405 06:10:19.350574   62463 lexer.go:179] Emitting ID at count_lines.mtail:4:3-12
    I0405 06:10:19.350585   62463 lexer.go:179] Emitting INC at count_lines.mtail:4:13-14
    I0405 06:10:19.350595   62463 lexer.go:179] Emitting NL at count_lines.mtail:5:16
    I0405 06:10:19.350605   62463 lexer.go:179] Emitting RCURLY at count_lines.mtail:5:1
    I0405 06:10:19.350618   62463 lexer.go:179] Emitting EOF at count_lines.mtail:5:2
    I0405 06:10:19.350660   62463 checker.go:84] found sym &{line_count variable typeVar4 count_lines.mtail:2:9-18 <nil> 0 false}
    I0405 06:10:19.350691   62463 types.go:285] Unifying Int and typeVar4
    I0405 06:10:19.350700   62463 types.go:285] Unifying typeVar4 and Int
    I0405 06:10:19.350707   62463 types.go:300] Making "typeVar4" type "Int"
    I0405 06:10:19.350746   62463 loader.go:200] Loaded program count_lines.mtail
    I0405 06:10:19.350755   62463 loader.go:219] Program count_lines.mtail has goroutine marker 0x636f756e
    I0405 06:10:19.350785   62463 vm.go:640] Starting program count_lines.mtail
    I0405 06:10:19.350800   62463 loader.go:223] Started count_lines.mtail
    I0405 06:10:19.350849   62463 mtail.go:55] Tail pattern "logs/a.log"
    I0405 06:10:19.350872   62463 tail.go:129] glob matches: [logs/a.log]
    I0405 06:10:19.350931   62463 tail.go:222] Read: 0 EOF
    I0405 06:10:19.350942   62463 tail.go:227] Suspected truncation.
    I0405 06:10:19.350948   62463 tail.go:194] current seek position at 629148964
    I0405 06:10:19.350955   62463 tail.go:204] File size is 629148964
    I0405 06:10:19.350961   62463 tail.go:230] handletrunc with error 'no truncate appears to have occurred'
    I0405 06:10:19.350975   62463 tail.go:404] EOF on first read
    I0405 06:10:19.351001   62463 tail.go:423] Tailing logs/a.log
    I0405 06:10:19.351137   62463 mtail.go:227] Listening on port :3903
    I0405 06:10:27.916078   62463 log_watcher.go:72] watcher event "logs/a.log": WRITE
    I0405 06:10:27.916197   62463 log_watcher.go:72] watcher event "logs/a.log": WRITE
    I0405 06:10:27.916243   62463 loader.go:97] Skipping logs/a.log due to file extension.
    I0405 06:10:27.916260   62463 loader.go:97] Skipping logs/a.log due to file extension.
    I0405 06:10:27.916307   62463 tail.go:435] Event type watcher.UpdateEvent{Pathname:"logs/a.log"}
    I0405 06:10:27.916334   62463 tail.go:435] Event type watcher.UpdateEvent{Pathname:"logs/a.log"}
    

    At 06:10:27 I added a line to the tailed file using echo "Hello" >>logs/a.log. in the metrics endpoint I get:

    # TYPE line_count counter
    # line_count defined at count_lines.mtail:2:9-18
    line_count{prog="count_lines.mtail"} 0
    

    The same scenario works perfectly under version rc2:

    $ ./mtail_rc2 --progs="/apps/solidmon/agent/mtail/programs/" --logs="logs/a.log" -v=10 -logtostderr  I0405 06:14:47.558366   63852 main.go:101] mtail version v3.0.0-rc2 git revision 5e6d38908091a8648c0f26c44ebd708e241f3814 go version go1.9.4
    I0405 06:14:47.558424   63852 main.go:102] Commandline: ["./mtail_rc2" "--progs=/apps/solidmon/agent/mtail/programs/" "--logs=logs/a.log" "-v=10" "-logtostderr"]
    I0405 06:14:47.558702   63852 loader.go:196] Loaded program count_lines.mtail
    I0405 06:14:47.558713   63852 loader.go:215] Program count_lines.mtail has goroutine marker 0x636f756e
    I0405 06:14:47.558730   63852 loader.go:218] Started count_lines.mtail
    I0405 06:14:47.558772   63852 mtail.go:103] Tail pattern "logs/a.log"
    I0405 06:14:47.558796   63852 tail.go:125] glob matches: [logs/a.log]
    I0405 06:14:47.559063   63852 tail.go:349] Tailing /apps/solidmon/agent/mtail/logs/a.log
    I0405 06:14:47.559087   63852 vm.go:715] Starting program count_lines.mtail
    I0405 06:14:47.559131   63852 mtail.go:290] Listening on port :3903
    

    with metrics output:

    # TYPE line_count counter
    # line_count defined at count_lines.mtail:2:9-18
    line_count{prog="count_lines.mtail"} 1
    

    In both versions running with the option one_shot works perfectly.

    Am I doing something wrong or is tailing broken currently?

    I've tried with the precompiled binaries and builds I made myself. I've tested in ubuntu 16.04, an old redhat and wsl on windows with Ubuntu 16.04.

  • Suggestion: Continuous Fuzzing

    Suggestion: Continuous Fuzzing

    Hi, I'm Yevgeny Founder of Fuzzit - Continuous fuzzing as a service platform.

    I saw that you implemented Fuzz targets but they are currently not running as part of the CI.

    We have a free plan for OSS and I would be happy to contribute a PR if that's interesting. The PR will include the following

    • Continuous Fuzzing of master branch which will generate new corpus and look for new crashes
    • Regression on every PR that will run the fuzzers through all the generated corpus and fixed crashes from previous step. This will prevent new or old bugs from crippling into master.

    You can see our basic example here and you can see an example of "in the wild" integration here.

    Let me know if this is something worth working on.

    Cheers, Yevgeny

  • Adding UNIX Socket support option for HTTP endpoints

    Adding UNIX Socket support option for HTTP endpoints

    Expose prometheus metrics through UNIX Socket endpoint is a crucial need for my infrastructure. Please consider this modification for the main project or add a similar option.

  • Split pollInterval option to reduce filepath.Glob overhead

    Split pollInterval option to reduce filepath.Glob overhead

    Hi, i'm using mtail to monitor the docker logs of the machine, and every machine has 500+ matched log files . Below is the configuration:

    mtail --progs /etc/mtail --logs "/home/docker/logs/*/*/*/*/*/platform-*.log" --expired_metrics_gc_interval 10m
    

    I notice that mtail always keeps more than 60% cpu usage, even no logs produced. After profiling the process, i found that the primary reason is tailer#PollLogPatterns calls frequently (run per 250ms by default). We can't separately control the interval of poll-log-pattern or poll-log-stream. I tried to split the option and deployed the fixed mtail on production for a few weeks, and it works well. Could you help me to review the feature?

  • Stream log lines with a goroutine per log source.

    Stream log lines with a goroutine per log source.

    Instead of juggling files and their offsets, tracking the filesystem for updates to files, and trying to blend both fsnotify and polling, let us instead open a goroutine that reads a single file descriptor until it's completed. This allows us to avoid polling the filesystem for updates, because the goroutine can just try to read more from the descriptor. It also means that truncation/rotation testing is simpler, reducing the amount of code required to track and possibly also the number of Stat calls per round.

    Issue: #339 #338 #300 #276 #249 #289

  • Delay in processing the logs with mtail polling

    Delay in processing the logs with mtail polling

    We are processing the following things:

    Number of log line per second : 75000 Polling interval: 1s

    For mtail polling I saw the mismatched data in prometheus. I am not sure but I think there is delay in processing such huge amount of log.

    The spike we confirmed was 75k per second but with mtail we only saw the spike of 29k. In mtail endpoint there was no error of reading the logfile.

    We are running the process with following options: The logs are written into NFS here. That means we are reading from the NFS file system.

    nice -n 19 /home/mtail \
    -poll_interval 1s \
    -port 3906 \
    -progs progs/accessLogs.mtail \
    -logs /prod/access.log \
    &
    

    How can we get the correct numbers ? Also how to improve the performance in terms of reading the files with 70k lines per second? Your comments will be helpful.

  • mtail stops sending metrics to graphite and answering prometheus metrics endpoint

    mtail stops sending metrics to graphite and answering prometheus metrics endpoint

    Hi! I thought initially this was similar to #50, though there are no logs from mtail itself this time. We are running a simple mtail instance to tail three log files (ranging from 300 lines/s to 1.2k lines/s roughly) on a central syslog host. The three files are rotated by logrotate daily, we see this in mtail logs at rotation time:

    Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.635592   28177 tail.go:144] read /srv/syslog/syslog.log: file already closed
    Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.670146   28177 tail.go:144] read /srv/syslog/swift.log: file already closed
    ...
    Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.704315   28177 tail.go:144] read /srv/syslog/syslog.log: file already closed
    Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.707416   28177 tail.go:315] Tailing /srv/syslog/syslog.log
    Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.708463   28177 tail.go:315] Tailing /srv/syslog/swift.log
    Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.708490   28177 tail.go:315] Tailing /srv/syslog/apache2.log
    

    Though after about a week of mtail running (from git 9ae83e2c) no more logs are emitted and mtail is effectively hung as far as metrics reporting is concerned (datapoints not pushed to graphite, asking prometheus metrics on /metrics just hangs)

    I've dumped the goroutines while mtail is hanging at https://phabricator.wikimedia.org/P5569 and the respective heap at https://phabricator.wikimedia.org/P5568

    Happy to do more testing if needed, hope this helps!

  • unable to read from unix socket

    unable to read from unix socket

    It seems like systemd-journald support was left out due to the possiblity to read from a named pipe or unix socket. While trying to configure mtail to read from the /run/systemd/journal/syslog where systemd-journald forwards all messages to, I got the following error

    ...
    I1108 13:21:25.043783    2636 loader.go:224] Loaded program linecount.mtail
    I1108 13:21:25.043926    2636 mtail.go:99] Tail pattern "/run/systemd/journal/syslog"
    I1108 13:21:25.044052    2636 tail.go:134] AddPattern: /run/systemd/journal/syslog
    I1108 13:21:25.044189    2636 log_watcher.go:268] Adding a watch on resolved path "/run/systemd/journal"
    I1108 13:21:25.044335    2636 log_watcher.go:248] No abspath in watched list, added new one for /run/systemd/journal
    I1108 13:21:25.044479    2636 tail.go:158] glob matches: [/run/systemd/journal/syslog]
    I1108 13:21:25.044620    2636 log_watcher.go:268] Adding a watch on resolved path "/run/systemd/journal/syslog"
    I1108 13:21:25.044744    2636 log_watcher.go:248] No abspath in watched list, added new one for /run/systemd/journal/syslog
    I1108 13:21:25.044886    2636 tail.go:265] openlogPath /run/systemd/journal/syslog false
    I1108 13:21:25.045035    2636 log_watcher.go:268] Adding a watch on resolved path "/run/systemd/journal"
    I1108 13:21:25.045174    2636 log_watcher.go:253] Found this processor in watched list
    I1108 13:21:25.045315    2636 file.go:52] file.New(/run/systemd/journal/syslog, false)
    I1108 13:21:25.045454    2636 file.go:111] open failed all retries
    W1108 13:21:25.045577    2636 mtail.go:101] attempting to tail "/run/systemd/journal/syslog": open /run/systemd/journal/syslog: no such device or address
    ...
    

    mtail version and command line

    $ sudo mtail --logs /run/systemd/journal/syslog --progs . --logtostderr -v 5
    I1108 13:21:25.037480    2636 main.go:100] mtail version v3.0.0-rc33 git revision aedde73f9c304e4d558a53ece22a5472c87a7fdb go version go1.12.7 go arch amd64 go os linux
    

    Related Issues: #58

  • Memory usage grows over time

    Memory usage grows over time

    We're using mtail with a local ruleset to process logs from the Exim MTA. This has been running since Mar 30th and is now using 1.8G of RAM. The ruleset is a fairly simple set of regexp incrementing some counters - will attach examples and a /debug/pprof

    Any thoughts?

  • Added Configurable Semantic Checker Threshold Values

    Added Configurable Semantic Checker Threshold Values

    Previously, the semantic checker had static threshold values that were added for performance reasons. However, some users require higher threshold values and are willing to pay the performance penalty.

    This commit makes those values configurable but sets defaults that are performant.

    Fixes #471

  • Pattern negation

    Pattern negation

    Is there any way to have an action happen if a pattern /doesn't/ match? Go's regex library doesn't support negative-look-ahead, but i can't find any syntax in the mtail language which allows me to work around this.

    What i'm trying to do is something like this:

    /pattern1/ {
      do pattern1 stuff
    }
    /pattern2/ {
      do pattern2 stuff
    }
    otherwise {
      do other stuff
    }
    

    In my case the otherwise bit happens in the absence of the first two. Ideally, being able to chain if/else would be great, but in a pinch a simple 'else' keyword would allow this to be represented as:

    /pattern1/ {
      do pattern1 stuff
    } else {
      /pattern2/ {
        do pattern2 stuff
      } else {
        do other stuff
      }
    }
    
Monitoring-go - A simple monitoring tool to sites of MOVA

Monitoring GO A simple monitoring tool to sites of MOVA How to use Clone Repo gi

Feb 14, 2022
The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Dec 31, 2022
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

The open-source platform for monitoring and observability. Grafana allows you to query, visualize, alert on and understand your metrics no matter wher

Jan 3, 2023
Sep 24, 2021
Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

Dec 26, 2022
Leveled execution logs for Go

glog ==== Leveled execution logs for Go. This is an efficient pure Go implementation of leveled logs in the manner of the open source C++ package h

Dec 24, 2022
Library and program to parse and forward HAProxy logs

haminer Library and program to parse and forward HAProxy logs. Supported forwarder, Influxdb Requirements Go for building from source code git for dow

Aug 17, 2022
raft variant with topology order logs

Safe: A log that is safe if it has been replicated to a quorum, no matter whether or not the committed flag is set on any replica.

May 28, 2022
Very powerful server agent for collecting & sending logs & metrics with an easy-to-use web console.
Very powerful server agent for collecting & sending logs & metrics with an easy-to-use web console.

logkit-community δΈ­ζ–‡η‰ˆ Introduce Very powerful server agent for collecting & sending logs & metrics with an easy-to-use web console. logkit-community De

Dec 29, 2022
Like Prometheus, but for logs.
Like Prometheus, but for logs.

Loki: like Prometheus, but for logs. Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It

Dec 30, 2022
Pixie gives you instant visibility by giving access to metrics, events, traces and logs without changing code.
Pixie gives you instant visibility by giving access to metrics, events, traces and logs without changing code.

Pixie gives you instant visibility by giving access to metrics, events, traces and logs without changing code.

Jan 4, 2023
Search and analysis tooling for structured logs

Zed The Zed system provides an open-source, cloud-native, and searchable data lake for semi-structured and structured data. Zed lakes utilize a supers

Jan 5, 2023
gtl - Gemini Tiny Logs - A simple TUI for the tinylog format on gemini
gtl - Gemini Tiny Logs - A simple TUI for the tinylog format on gemini

GTL: Gemini Tiny Logs Goal: A TUI for the tinylogs format on the gemini space. See screenshots Installation gtl requires go β‰₯ 1.16 From Source git clo

Dec 1, 2022
A customized GORM logger that implements the appropriate interface and uses Logrus to output logs

CryptoMath GORM Logger A customized GORM logger that implements the appropriate interface and uses Logrus to output logs. Install go get github.com/ma

Nov 6, 2021
Lumberjack is a Go package for writing logs to rolling files.

Lumberjack is a Go package for writing logs to rolling files.

Feb 24, 2022
This POC is built with the goal to collect events/logs from the host systems such as Kubernetes, Docker, VMs, etc. A buffering layer is added to buffer events from the collector
This POC is built with the goal to collect events/logs from the host systems such as Kubernetes, Docker, VMs, etc. A buffering layer is added to buffer events from the collector

What is does This POC is build with the goal to collect events/logs from the host systems such as Kubernetes, docker, VMs etc. A buffering layer is ad

Nov 11, 2022
Leveled execution logs for Go.

glog Leveled execution logs for Go. This is an efficient pure Go implementation of leveled logs in the manner of the open source C++ package glog. By

Nov 29, 2021
WIP Go Thing to download HCP Vault Logs

Example Go Script to pull HCP Vault Audit Logs WARNING: This makes use of unstable preview APIs which could change at any time! USE AT YOUR OWN PERIL

Feb 6, 2022
Stream logs through websockets, written in Go

Stream logs through websockets, written in Go

Jan 8, 2022