mtail - extract internal monitoring data from application logs for collection into a timeseries database

Last update: Dec 29, 2022

Comments: 11

mtail - extract internal monitoring data from application logs for collection into a timeseries database

mtail is a tool for extracting metrics from application logs to be exported into a timeseries database or timeseries calculator for alerting and dashboarding.

It fills a monitoring niche by being the glue between applications that do not export their own internal state (other than via logs) and existing monitoring systems, such that system operators do not need to patch those applications to instrument them or writing custom extraction code for every such application.

The extraction is controlled by mtail programs which define patterns and actions:

# simple line counter
counter lines_total
/$/ {
  lines_total++
}

Metrics are exported for scraping by a collector as JSON or Prometheus format over HTTP, or can be periodically sent to a collectd, StatsD, or Graphite collector socket.

Read the programming guide if you want to learn how to write mtail programs.

Ask general questions on the users mailing list: https://groups.google.com/g/mtail-users

Installation

There are various ways of installing mtail.

Precompiled binaries

Precompiled binaries for released versions are available in the Releases page on Github. Using the latest production release binary is the recommended way of installing mtail.

Windows, OSX and Linux binaries are available.

Building from source

The simplest way to get mtail is to go get it directly.

go get github.com/google/mtail/cmd/mtail

This assumes you have a working Go environment with a recent Go version. Usually mtail is tested to work with the last two minor versions (e.g. Go 1.12 and Go 1.11).

If you want to fetch everything, you need to turn on Go Modules to succeed because of the way Go Modules have changed the way go get treats source trees with no Go code at the top level.

GO111MODULE=on go get -u github.com/google/mtail
cd $GOPATH/src/github.com/google/mtail
make install

If you develop the compiler you will need some additional tools like goyacc to be able to rebuild the parser.

See the Build instructions for more details.

A Dockerfile is included in this repository for local development as an alternative to installing Go in your environment, and takes care of all the build dependency installation, if you don't care for that.

Deployment

mtail works best when it paired with a timeseries-based calculator and alerting tool, like Prometheus.

So what you do is you take the metrics from the log files and you bring them down to the monitoring system?

It deals with the instrumentation so the engineers don't have to! It has the extraction skills! It is good at dealing with log files!!

Full documentation at http://google.github.io/mtail/

Read more about writing mtail programs:

Owner

Google

Google ❤️ Open Source

https://github.com/google/mtail

Comments

file tailing not working when a relative log path name is passed to --logs flag

Hi!

Using the count_lines sample, since version rc4 I can't get tailing to work. I get the right results if I run mtail in one_shot mode, but no metric on the Prometheus metrics endpoint.

log:

$ ./mtail_rc10 --progs="/apps/solidmon/agent/mtail/programs/" --logs="logs/a.log" -v=10 -logtostderr
I0405 06:10:19.350138   62463 main.go:108] mtail version v3.0.0-rc10 git revision 60ed333f3672ec835c09c507f095fdd0be6ec1d7 go version go1.9.4
I0405 06:10:19.350205   62463 main.go:109] Commandline: ["./mtail_rc10" "--progs=/apps/solidmon/agent/mtail/programs/" "--logs=logs/a.log" "-v=10" "-logtostderr"]
I0405 06:10:19.350429   62463 lexer.go:179] Emitting COUNTER at count_lines.mtail:2:23
I0405 06:10:19.350455   62463 lexer.go:179] Emitting ID at count_lines.mtail:2:9-18
I0405 06:10:19.350467   62463 lexer.go:179] Emitting NL at count_lines.mtail:3:20
I0405 06:10:19.350478   62463 lexer.go:179] Emitting DIV at count_lines.mtail:3:1
I0405 06:10:19.350506   62463 yaccpar:954] position marked at count_lines.mtail:3:1
I0405 06:10:19.350515   62463 driver.go:76] Entering regex
I0405 06:10:19.350521   62463 lexer.go:179] Emitting REGEX at count_lines.mtail:3:2
I0405 06:10:19.350530   62463 lexer.go:555] Exiting regex
I0405 06:10:19.350535   62463 lexer.go:556] Regex at line 2, startcol 2, col 2
I0405 06:10:19.350542   62463 lexer.go:179] Emitting DIV at count_lines.mtail:3:3
I0405 06:10:19.350554   62463 lexer.go:179] Emitting LCURLY at count_lines.mtail:3:5
I0405 06:10:19.350564   62463 lexer.go:179] Emitting NL at count_lines.mtail:4:7
I0405 06:10:19.350574   62463 lexer.go:179] Emitting ID at count_lines.mtail:4:3-12
I0405 06:10:19.350585   62463 lexer.go:179] Emitting INC at count_lines.mtail:4:13-14
I0405 06:10:19.350595   62463 lexer.go:179] Emitting NL at count_lines.mtail:5:16
I0405 06:10:19.350605   62463 lexer.go:179] Emitting RCURLY at count_lines.mtail:5:1
I0405 06:10:19.350618   62463 lexer.go:179] Emitting EOF at count_lines.mtail:5:2
I0405 06:10:19.350660   62463 checker.go:84] found sym &{line_count variable typeVar4 count_lines.mtail:2:9-18 <nil> 0 false}
I0405 06:10:19.350691   62463 types.go:285] Unifying Int and typeVar4
I0405 06:10:19.350700   62463 types.go:285] Unifying typeVar4 and Int
I0405 06:10:19.350707   62463 types.go:300] Making "typeVar4" type "Int"
I0405 06:10:19.350746   62463 loader.go:200] Loaded program count_lines.mtail
I0405 06:10:19.350755   62463 loader.go:219] Program count_lines.mtail has goroutine marker 0x636f756e
I0405 06:10:19.350785   62463 vm.go:640] Starting program count_lines.mtail
I0405 06:10:19.350800   62463 loader.go:223] Started count_lines.mtail
I0405 06:10:19.350849   62463 mtail.go:55] Tail pattern "logs/a.log"
I0405 06:10:19.350872   62463 tail.go:129] glob matches: [logs/a.log]
I0405 06:10:19.350931   62463 tail.go:222] Read: 0 EOF
I0405 06:10:19.350942   62463 tail.go:227] Suspected truncation.
I0405 06:10:19.350948   62463 tail.go:194] current seek position at 629148964
I0405 06:10:19.350955   62463 tail.go:204] File size is 629148964
I0405 06:10:19.350961   62463 tail.go:230] handletrunc with error 'no truncate appears to have occurred'
I0405 06:10:19.350975   62463 tail.go:404] EOF on first read
I0405 06:10:19.351001   62463 tail.go:423] Tailing logs/a.log
I0405 06:10:19.351137   62463 mtail.go:227] Listening on port :3903
I0405 06:10:27.916078   62463 log_watcher.go:72] watcher event "logs/a.log": WRITE
I0405 06:10:27.916197   62463 log_watcher.go:72] watcher event "logs/a.log": WRITE
I0405 06:10:27.916243   62463 loader.go:97] Skipping logs/a.log due to file extension.
I0405 06:10:27.916260   62463 loader.go:97] Skipping logs/a.log due to file extension.
I0405 06:10:27.916307   62463 tail.go:435] Event type watcher.UpdateEvent{Pathname:"logs/a.log"}
I0405 06:10:27.916334   62463 tail.go:435] Event type watcher.UpdateEvent{Pathname:"logs/a.log"}

At 06:10:27 I added a line to the tailed file using echo "Hello" >>logs/a.log. in the metrics endpoint I get:

# TYPE line_count counter
# line_count defined at count_lines.mtail:2:9-18
line_count{prog="count_lines.mtail"} 0

The same scenario works perfectly under version rc2:

$ ./mtail_rc2 --progs="/apps/solidmon/agent/mtail/programs/" --logs="logs/a.log" -v=10 -logtostderr  I0405 06:14:47.558366   63852 main.go:101] mtail version v3.0.0-rc2 git revision 5e6d38908091a8648c0f26c44ebd708e241f3814 go version go1.9.4
I0405 06:14:47.558424   63852 main.go:102] Commandline: ["./mtail_rc2" "--progs=/apps/solidmon/agent/mtail/programs/" "--logs=logs/a.log" "-v=10" "-logtostderr"]
I0405 06:14:47.558702   63852 loader.go:196] Loaded program count_lines.mtail
I0405 06:14:47.558713   63852 loader.go:215] Program count_lines.mtail has goroutine marker 0x636f756e
I0405 06:14:47.558730   63852 loader.go:218] Started count_lines.mtail
I0405 06:14:47.558772   63852 mtail.go:103] Tail pattern "logs/a.log"
I0405 06:14:47.558796   63852 tail.go:125] glob matches: [logs/a.log]
I0405 06:14:47.559063   63852 tail.go:349] Tailing /apps/solidmon/agent/mtail/logs/a.log
I0405 06:14:47.559087   63852 vm.go:715] Starting program count_lines.mtail
I0405 06:14:47.559131   63852 mtail.go:290] Listening on port :3903

with metrics output:

# TYPE line_count counter
# line_count defined at count_lines.mtail:2:9-18
line_count{prog="count_lines.mtail"} 1

In both versions running with the option one_shot works perfectly.

Am I doing something wrong or is tailing broken currently?

I've tried with the precompiled binaries and builds I made myself. I've tested in ubuntu 16.04, an old redhat and wsl on windows with Ubuntu 16.04.

Suggestion: Continuous Fuzzing
Hi, I'm Yevgeny Founder of Fuzzit - Continuous fuzzing as a service platform.

I saw that you implemented Fuzz targets but they are currently not running as part of the CI.

We have a free plan for OSS and I would be happy to contribute a PR if that's interesting. The PR will include the following

Continuous Fuzzing of master branch which will generate new corpus and look for new crashes

Regression on every PR that will run the fuzzers through all the generated corpus and fixed crashes from previous step. This will prevent new or old bugs from crippling into master.

You can see our basic example here and you can see an example of "in the wild" integration here.

Let me know if this is something worth working on.

Cheers, Yevgeny
Adding UNIX Socket support option for HTTP endpoints

Expose prometheus metrics through UNIX Socket endpoint is a crucial need for my infrastructure. Please consider this modification for the main project or add a similar option.
Split pollInterval option to reduce filepath.Glob overhead
Hi, i'm using mtail to monitor the docker logs of the machine, and every machine has 500+ matched log files . Below is the configuration:

mtail --progs /etc/mtail --logs "/home/docker/logs/*/*/*/*/*/platform-*.log" --expired_metrics_gc_interval 10m

I notice that mtail always keeps more than 60% cpu usage, even no logs produced. After profiling the process, i found that the primary reason is tailer#PollLogPatterns calls frequently (run per 250ms by default). We can't separately control the interval of poll-log-pattern or poll-log-stream. I tried to split the option and deployed the fixed mtail on production for a few weeks, and it works well. Could you help me to review the feature?
Stream log lines with a goroutine per log source.

Instead of juggling files and their offsets, tracking the filesystem for updates to files, and trying to blend both fsnotify and polling, let us instead open a goroutine that reads a single file descriptor until it's completed. This allows us to avoid polling the filesystem for updates, because the goroutine can just try to read more from the descriptor. It also means that truncation/rotation testing is simpler, reducing the amount of code required to track and possibly also the number of Stat calls per round.

Issue: #339 #338 #300 #276 #249 #289
Delay in processing the logs with mtail polling
We are processing the following things:

Number of log line per second : 75000 Polling interval: 1s

For mtail polling I saw the mismatched data in prometheus. I am not sure but I think there is delay in processing such huge amount of log.

The spike we confirmed was 75k per second but with mtail we only saw the spike of 29k. In mtail endpoint there was no error of reading the logfile.

We are running the process with following options: The logs are written into NFS here. That means we are reading from the NFS file system.

nice -n 19 /home/mtail \ -poll_interval 1s \ -port 3906 \ -progs progs/accessLogs.mtail \ -logs /prod/access.log \ &

How can we get the correct numbers ? Also how to improve the performance in terms of reading the files with 70k lines per second? Your comments will be helpful.
mtail stops sending metrics to graphite and answering prometheus metrics endpoint
Hi! I thought initially this was similar to #50, though there are no logs from mtail itself this time. We are running a simple mtail instance to tail three log files (ranging from 300 lines/s to 1.2k lines/s roughly) on a central syslog host. The three files are rotated by logrotate daily, we see this in mtail logs at rotation time:

Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.635592 28177 tail.go:144] read /srv/syslog/syslog.log: file already closed Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.670146 28177 tail.go:144] read /srv/syslog/swift.log: file already closed ... Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.704315 28177 tail.go:144] read /srv/syslog/syslog.log: file already closed Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.707416 28177 tail.go:315] Tailing /srv/syslog/syslog.log Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.708463 28177 tail.go:315] Tailing /srv/syslog/swift.log Jun 05 06:34:27 lithium mtail[28177]: I0605 06:34:27.708490 28177 tail.go:315] Tailing /srv/syslog/apache2.log

Though after about a week of mtail running (from git 9ae83e2c) no more logs are emitted and mtail is effectively hung as far as metrics reporting is concerned (datapoints not pushed to graphite, asking prometheus metrics on /metrics just hangs)

I've dumped the goroutines while mtail is hanging at https://phabricator.wikimedia.org/P5569 and the respective heap at https://phabricator.wikimedia.org/P5568

Happy to do more testing if needed, hope this helps!

unable to read from unix socket

It seems like systemd-journald support was left out due to the possiblity to read from a named pipe or unix socket. While trying to configure mtail to read from the /run/systemd/journal/syslog where systemd-journald forwards all messages to, I got the following error

...
I1108 13:21:25.043783    2636 loader.go:224] Loaded program linecount.mtail
I1108 13:21:25.043926    2636 mtail.go:99] Tail pattern "/run/systemd/journal/syslog"
I1108 13:21:25.044052    2636 tail.go:134] AddPattern: /run/systemd/journal/syslog
I1108 13:21:25.044189    2636 log_watcher.go:268] Adding a watch on resolved path "/run/systemd/journal"
I1108 13:21:25.044335    2636 log_watcher.go:248] No abspath in watched list, added new one for /run/systemd/journal
I1108 13:21:25.044479    2636 tail.go:158] glob matches: [/run/systemd/journal/syslog]
I1108 13:21:25.044620    2636 log_watcher.go:268] Adding a watch on resolved path "/run/systemd/journal/syslog"
I1108 13:21:25.044744    2636 log_watcher.go:248] No abspath in watched list, added new one for /run/systemd/journal/syslog
I1108 13:21:25.044886    2636 tail.go:265] openlogPath /run/systemd/journal/syslog false
I1108 13:21:25.045035    2636 log_watcher.go:268] Adding a watch on resolved path "/run/systemd/journal"
I1108 13:21:25.045174    2636 log_watcher.go:253] Found this processor in watched list
I1108 13:21:25.045315    2636 file.go:52] file.New(/run/systemd/journal/syslog, false)
I1108 13:21:25.045454    2636 file.go:111] open failed all retries
W1108 13:21:25.045577    2636 mtail.go:101] attempting to tail "/run/systemd/journal/syslog": open /run/systemd/journal/syslog: no such device or address
...

mtail version and command line

$ sudo mtail --logs /run/systemd/journal/syslog --progs . --logtostderr -v 5
I1108 13:21:25.037480    2636 main.go:100] mtail version v3.0.0-rc33 git revision aedde73f9c304e4d558a53ece22a5472c87a7fdb go version go1.12.7 go arch amd64 go os linux

Related Issues: #58

Memory usage grows over time

We're using mtail with a local ruleset to process logs from the Exim MTA. This has been running since Mar 30th and is now using 1.8G of RAM. The ruleset is a fairly simple set of regexp incrementing some counters - will attach examples and a /debug/pprof

Any thoughts?
Added Configurable Semantic Checker Threshold Values

Previously, the semantic checker had static threshold values that were added for performance reasons. However, some users require higher threshold values and are willing to pay the performance penalty.

This commit makes those values configurable but sets defaults that are performant.

Fixes #471
Pattern negation
Is there any way to have an action happen if a pattern /doesn't/ match? Go's regex library doesn't support negative-look-ahead, but i can't find any syntax in the mtail language which allows me to work around this.

What i'm trying to do is something like this:

/pattern1/ { do pattern1 stuff } /pattern2/ { do pattern2 stuff } otherwise { do other stuff }

In my case the otherwise bit happens in the absence of the first two. Ideally, being able to chain if/else would be great, but in a pinch a simple 'else' keyword would allow this to be represented as:

/pattern1/ { do pattern1 stuff } else { /pattern2/ { do pattern2 stuff } else { do other stuff } }

Monitoring-go - A simple monitoring tool to sites of MOVA

Monitoring GO A simple monitoring tool to sites of MOVA How to use Clone Repo gi

Feb 14, 2022

The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Dec 31, 2022

The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

The open-source platform for monitoring and observability. Grafana allows you to query, visualize, alert on and understand your metrics no matter wher

Jan 3, 2023

SigNoz helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool

Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Documentatio

Sep 24, 2021

Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

Dec 26, 2022

Leveled execution logs for Go

glog ==== Leveled execution logs for Go. This is an efficient pure Go implementation of leveled logs in the manner of the open source C++ package h

Dec 24, 2022

Library and program to parse and forward HAProxy logs

haminer Library and program to parse and forward HAProxy logs. Supported forwarder, Influxdb Requirements Go for building from source code git for dow

Aug 17, 2022

raft variant with topology order logs

Safe: A log that is safe if it has been replicated to a quorum, no matter whether or not the committed flag is set on any replica.

May 28, 2022

Very powerful server agent for collecting & sending logs & metrics with an easy-to-use web console.

logkit-community 中文版 Introduce Very powerful server agent for collecting & sending logs & metrics with an easy-to-use web console. logkit-community De

Dec 29, 2022

Like Prometheus, but for logs.

Loki: like Prometheus, but for logs. Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It

Dec 30, 2022

Pixie gives you instant visibility by giving access to metrics, events, traces and logs without changing code.

Jan 4, 2023

Search and analysis tooling for structured logs

Zed The Zed system provides an open-source, cloud-native, and searchable data lake for semi-structured and structured data. Zed lakes utilize a supers

Jan 5, 2023

gtl - Gemini Tiny Logs - A simple TUI for the tinylog format on gemini

GTL: Gemini Tiny Logs Goal: A TUI for the tinylogs format on the gemini space. See screenshots Installation gtl requires go ≥ 1.16 From Source git clo

Dec 1, 2022

A customized GORM logger that implements the appropriate interface and uses Logrus to output logs

CryptoMath GORM Logger A customized GORM logger that implements the appropriate interface and uses Logrus to output logs. Install go get github.com/ma

Nov 6, 2021

Lumberjack is a Go package for writing logs to rolling files.

Feb 24, 2022

This POC is built with the goal to collect events/logs from the host systems such as Kubernetes, Docker, VMs, etc. A buffering layer is added to buffer events from the collector

What is does This POC is build with the goal to collect events/logs from the host systems such as Kubernetes, docker, VMs etc. A buffering layer is ad

Nov 11, 2022

mtail - extract internal monitoring data from application logs for collection into a timeseries database

mtail - extract internal monitoring data from application logs for collection into a timeseries database

Installation

Precompiled binaries

Building from source

Deployment

Read More

Owner

Google

Comments

file tailing not working when a relative log path name is passed to --logs flag

Suggestion: Continuous Fuzzing

Adding UNIX Socket support option for HTTP endpoints

Split pollInterval option to reduce filepath.Glob overhead

Stream log lines with a goroutine per log source.

Delay in processing the logs with mtail polling

mtail stops sending metrics to graphite and answering prometheus metrics endpoint

unable to read from unix socket

Memory usage grows over time

Added Configurable Semantic Checker Threshold Values

Pattern negation

Related tags

Monitoring-go - A simple monitoring tool to sites of MOVA

The Prometheus monitoring system and time series database.

The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

SigNoz helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool

Open source framework for processing, monitoring, and alerting on time series data

Leveled execution logs for Go

Library and program to parse and forward HAProxy logs

raft variant with topology order logs

Very powerful server agent for collecting & sending logs & metrics with an easy-to-use web console.

Like Prometheus, but for logs.

Pixie gives you instant visibility by giving access to metrics, events, traces and logs without changing code.

Search and analysis tooling for structured logs

gtl - Gemini Tiny Logs - A simple TUI for the tinylog format on gemini

A customized GORM logger that implements the appropriate interface and uses Logrus to output logs

Lumberjack is a Go package for writing logs to rolling files.

This POC is built with the goal to collect events/logs from the host systems such as Kubernetes, Docker, VMs, etc. A buffering layer is added to buffer events from the collector

Leveled execution logs for Go.

WIP Go Thing to download HCP Vault Logs

Stream logs through websockets, written in Go