Time Series Alerting Framework

Last update: Dec 27, 2022

Comments: 15

Bosun

Bosun is a time series alerting framework developed by Stack Exchange. Scollector is a metric collection agent. Learn more at bosun.org.

Building

bosun and scollector are found under the cmd directory. Run go build in the corresponding directories to build each project. There's also a Makefile available for most tasks.

Running

For a full stack with all dependencies, run docker-compose up from the docker directory. Don't forget to rebuild images and containers if you change the code:

$ cd docker
$ docker-compose down
$ docker-compose up --build

If you only need the dependencies (Redis, OpenTSDB, HBase) and would like to run Bosun on your machine directly (e.g. to attach a debugger), you can bring up the dependencies with these three commands from the repository's root:

$ docker run -p 6379:6379 --name redis redis:6
$ docker build -f docker/opentsdb.Dockerfile -t opentsdb .
$ docker run -p 4242:4242 --name opentsdb opentsdb

The OpenTSDB container will be reachable at http://localhost:4242. Redis listens on its default port 6379. Bosun, if brought up in a Docker container, is available at http://localhost:8070.

Developing

Install:

Run make deps and make testdeps to set up all dependencies.
Run make generate when new static assets (like JS and CSS files) are added or changed.

The w.sh script will automatically build and run bosun in a loop. It will update itself when go/js/ts files change, and it runs in read-only mode, not sending any alerts.

$ cd cmd/bosun
$ ./w.sh

Go Version:

See the version number in .travis.yml in the root of this repo for the version of Go to use. Generally speaking, you should be able to use newer versions of Go if you are able to build Bosun without error.

Miniprofiler:

Bosun includes miniprofiler in the web UI which can help with debugging. The key combination ALT-P will show miniprofiler. This allows you to see timings, as well as the raw queries sent to TSDBs.

Owner

Bosun

https://github.com/bosun-monitor/bosun http://bosun.org

Comments

Support influxdb

Would help if bosun supported influxdb. I didn't find a bug tracking this, so here it is.

Since I have multiple datasources sending data to influxdb. (collectd, statsite). It would keep my dependencies low if I could use the influxdb for bosun rather than change the entire system to openTSDB.
Multiple backends of the same type?

Is it possible to have multiple instances of the same type of backends? for example multiple InfluxDB backends or multiple ElasticSearch backends? I ask because I'm trying to pull in data from two separate instances but simply creating a duplicate key results in a config error: fatal: main.go:88: conf: bosun.config:2:0: at <influxHost = xx.xx.x...>: duplicate key: influxHost
Distributed alert checks to prevent high load spikes

This is a solution for #2065

The idea behind this is simple. Every check run is slightly shifted so that the checks are distributed uniformly.

For the subset of checks that run with the period T, a shift is added to every check. The shift ranges from 0 to T-1. The shifts are incremental. For example, if we have 6 checks every 5 mins (T=5). The shifts will be 0, 1, 2, 3, 4, 0. This way, without the patch 6 checks will happen at times 0, and 5; with the patch, two checks will happen at the time 0, one at 1, one at 2, and so on. The total number of checks and check period stay the same.

Here is the test that shows the effect of the patch on system load. Note, that the majority of checks in this system have 5 mins period.
Config management

I want to deploy bosun as a dashboard & alerting system within my organization, but I feel like having config management being completely external to bosun is a major drawback. It would be super fantastic if it were possible to, entirely through the web interface, define, test, and commit a new alert, or to update an existing alert to tweak the parameters.

Is anything like this in the works? How do you manage this in your existing deployments?
Support Dependencies
Problem: Something goes down which results in lots of other things being down, because of this, we get a lot of alerts.

Common Examples:

A Network Partition: Some portion of hosts become unavailable from bosun's perspective

Host Goes Down: Everything monitored on that host becomes unavailable

Service dependencies: We expect some service to go down if another service goes down

Bosun can't query it's database (This is probably a different feature, but noting here nonetheless)

Things I want to be able to do based on our config at Stack Exchange:

Have our host-based alert macro include detect if the host in Oregon (because the host name has "or-". So this is basically a dependency based on a lookup table

Have our host-based alerts not trigger if bosun is unable to ping the host (which would be another alert most likely)

Be able to have dependencies for alerts that may have no group.

The status for any alert that is not triggering for an alert should be "unevaluated". This won't show up on the dashboard or trigger notifications.

Two general approaches come to mind. The first is that dependencies require another alert. That other alert is run first, and the alert won't trigger based on the result of another alert. The other is that dependencies are an expression. I think the expression route only really makes sense if an alert itself can be used as an expression.

Another possibility which I haven't thought much about is that alerts generate dependencies and not the other way around. So for example, an alert marks some tagset as something that should not be evaluated.

Making Stuff Up....

macro ping_location { template = ping.location $pq = max(q("sum:bosun.ping.timeout{dst_host=$loc*,host=$source}", "5m", "")) $grouped = t($pq,"") $hosts_timing_out = sum($grouped) $total_hosts = len($grouped) $percent_timeout = $hosts_timing_out / $total_hosts * 100 crit = $percent_timeout > 10 } #group is empty alert or_hosts_down { $source=ny-bosun01 $loc = or- $name = OR Peak macro = ping_location } #Group is {dst_host=*} alert host_down { template = host_down cirt = max(q("sum:bosun.ping.timeout{dst_host=*", "5m", "")) } lookup location { entry host=or-* { alert = alert("or_hosts_down") } ... } macro host_based { #This makes it so alerts based on this macro that are host based won't trigger if dependency = lookup("location", "alert") || alert("host_down") #Another idea here is that you can create tag synonyms for an alert. So instead of having to add this lookup function that translates, have a synonym feature of alerts and also global that says (consider this tag key to be the same as this tag key). This would also solve an issue with silences (i.e. silencing host=ny-web11 doesn't do anything for the haproxy alert that has hosts as svname). Another issue with that is the those alerts are not tag based, so we actually need inhibit in that case. }
Bosun sending notifications for closed and inactive alerts
We have a very simple rule file, with 3 notifications (http post to PD and slack, and email) and a bunch of alert rules which trigger them. We are facing a weird issue wherein, the following happens:

alert triggers, sends notifications

a human acks the alert

human solves problem, alert becomes inactive

human closes the alert

notification still keeps triggering (alert is no where to be seen in the bosun UI/api) - forever!

to explain it through logs, quite literally this is what we're seeing:

2016/04/01 07:56:37 info: check.go:513: check alert masked.masked.write.rate.too.low start 2016/04/01 07:26:38 info: check.go:537: check alert masked.masked.write.rate.too.low done (1.378029647s): 0 crits, 0 warns, 0 unevaluated, 0 unknown 2016/04/01 07:26:38 info: alertRunner.go:55: runHistory on masked.masked.write.rate.too.low took 54.852815ms 2016/04/01 07:26:39 info: search.go:205: Backing up last data to redis 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka} 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka} 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka} 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka} 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka} 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka} 2016/04/01 07:28:20 info: notify.go:115: relayed alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka} to [[email protected]] sucessfully. Subject: 148 bytes. Body: 3500 bytes. 2016/04/01 07:28:20 info: notify.go:115: relayed alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka} to [[email protected]] sucessfully. Subject: 147 bytes. Body: 3497 bytes. 2016/04/01 07:28:20 info: notify.go:80: post notification successful for alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka}. Response code 200. 2016/04/01 07:28:20 info: notify.go:80: post notification successful for alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka}. Response code 200. 2016/04/01 07:28:20 info: notify.go:80: post notification successful for alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka}. Response code 200. 2016/04/01 07:28:20 info: notify.go:80: post notification successful for alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka}. Response code 200.
Use templates body as payload for notifications and subject for other HTML related stuff

Hi all, as described in the docs, I'm using the templates subject as body for POSTing stuff to our hipchat bot. the problem I encounter is in Bosun main view (list of alerts) where the template subject is presented when clicking an alert for details.

Suggested is to use templates' body as payload for notification (POST notifications mainly). a flag can be also added to let the user which template will use the subject as payload and which will use the body.

Thanks, Yarden
Add Recovery Emails
When an alert instance goes from (Unknown Warning or Critical) to Normal a recovery email should be sent.

Considerations:

Should recovery templates be their own template? I think they should, and repeated logic can be done via include templates

Who to notify? The same notifications that were notified of the previous state.

notifications will need a no_recovery option. This is needed if we want to hook up alerts to pagerduty (don't want our phones being dialed to let us know that an issue is recovered, at that point we can rely on email)

My main reservation about this feature is that users are more likely not to investigate an alert that is recovered, this is dangerous because the alert could be a latent issue. However, it is better to provide a better frictionless workflow than a road block. Bosun aims to provide all the tools needed for very informative notifications so good judgements can be made at times without needing to go to a console. Furthermore, we should also add acknowledgement notifications. This will be a way to inform all recipients of an alert that someone has made a decision about this alert and hopefully committed to an action (fixing the actual problem, or tuning the alert).

Ack emails will be described in another issue.

This feature needs discussion and review prior to implementation.

Memory leak in Bosun

I updated our test servers to the latest version of bosun from https://github.com/bosun-monitor/bosun/releases/download/20150428222252/bosun-linux-amd64 After running for slightly less than a day, it stopped responding.

The command line where I started it revealed:

 ./bosun-linux-amd64 -c=/data/bosun.conf
2015/05/04 16:21:54 enabling syslog
Killed

Syslog (cat /var/log/messages |grep bosun) did not reveal any log messages in the hours before the crash.

It looks like a memory leak. The graph of bosun.collect.alloc grew gradually from 200Mb after deploying the new version to 12Gb just before the "crash": rapid memory leak

Looking back over the last week at the memory behaviour of the previous version, there was a similar memory growth pattern in the previous version too but at a much slower rate. The bottom graph shows gradual memory increasing over the course of a week followed by two rapid increases for the newer version. bosun memory leak memory only last 7 days

Just for interest sake, here is a general Bosun dashboard; the other stats look reasonable. Although there is a high number of go routines after restarting Bosun this appears unrelated to the leak. bosun memory leak dashboard

More information about our setup:

Backend: OpenTSDB
Data is being passed through Bosun to OpenTSDB (as visible from the dashboard)
We send data points every minute at a rate of about 37000 per minute
In addition scollector is submitting data from one machine monitoring openTSDB, elasticsearch, Bosun, Linux and os
The rule file is still a small prototype:

httpListen = :8070
tsdbHost = localhost:4242

smtpHost = ******
emailFrom = ******

macro grafanaConfig {
    $grafanaHost = ******
}

notification emailIzak {
    email = [email protected]
    next = emailIzak
    timeout = 24h
}


##################### Templates #######################


template generic {
    body = `{{template "genericHeader" .}}
    {{template "genericDef" .}}

    {{template "genericTags" .}}

    {{template "genericComputation" .}}

     {{if .Alert.Vars.graph}}
     <h3>{{.Alert.Vars.graphTitle}}</h3>
    <p>{{.Graph .Alert.Vars.graph}}
    {{end}}`

    subject =  {{.Last.Status}}: {{.Alert.Name}} on instance {{.Group.serviceinstance}}
}

template genericHeader {   
    body = `
    <h3> Possible actions </h3>   
    {{if .Alert.Vars.note}}
        <p>{{.Alert.Vars.note}}
    {{end}}
     <p><a href="{{.Ack}}">Acknowledge alert</a>

    {{if .Alert.Vars.grafanaDash}}
        <p><a href="{{.Alert.Vars.grafanaDash}}"> View the relevant statistics dasboard </a>
    {{end}}
    `
}

template genericDef {
    body = `
    <h3> Details </h3>
    <p><strong>Alert definition:</strong>
    <table>
        <tr>
            <td>Name:</td>
            <td>{{replace .Alert.Name "." " " -1}}</td></tr>
        <tr>
            <td>Warn:</td>
            <td>{{.Alert.Warn}}</td></tr>
        <tr>
            <td>Crit:</td>
            <td>{{.Alert.Crit}}</td></tr>
    </table>`
}

template genericTags {
    body = `<p><strong>Tags</strong>

    <table>
        {{range $k, $v := .Group}}
            {{if eq $k "host"}}
                <tr><td>{{$k}}</td><td><a href="{{$.HostView $v}}">{{$v}}</a></td></tr>
            {{else}}
                <tr><td>{{$k}}</td><td>{{$v}}</td></tr>
            {{end}}
        {{end}}
    </table>`
}

template genericComputation {
    body = `
    <p><strong>Computation</strong>

    <table>
        {{range .Computations}}
            <tr><td><a href="{{$.Expr .Text}}">{{.Text}}</a></td><td>{{.Value}}</td></tr>
        {{end}}
    </table>`
}

template unkown {
    subject = {{.Name}}: {{.Group | len}} unknown alerts. 
    body = `
    <p>Unknown alerts imply no data is being recorded for their monitored time series. Therefore we cannot know what is happening. 
    <p>Time: {{.Time}}
    <p>Name: {{.Name}}
    <p>Alerts:
    {{range .Group}}
        <br>{{.}}
    {{end}}`
}

unknownTemplate = unkown


#################### alerts #######################


alert FlowRouterBytesZero {
    template = generic
    $query = "sum:bytes.bytes.counter.value{serviceinstance=*}"

    $note = The flow router has reported zero bytes in the last 2 minutes. This note should contain extra information specifying what action the operator should take to resolve it. 
    $graph =q($query, "24h", "")
    $graphTitle = Flow router traffic in the last 24 hours
    macro = grafanaConfig
    $grafanaDash = $grafanaHost/dashboard/db/per-flow-route-bytes-drill-down

    $avgBytesPer2Min = avg(q($query, "2m", ""))
    $avgBytesPer5Min = avg(q($query, "5m", ""))

    warn =  $avgBytesPer2Min == 0
    crit =  $avgBytesPer5Min == 0
    critNotification = emailIzak
}

Add series aggregation DSL function `aggregate`
This PR adds an aggregate DSL function, which allows one to combine different series in a seriesSet using a specified aggregator (currently min, max, p50, avg).

This is particularly useful when comparing data across different weeks (using the over) function. In our case, for anomaly detection, we want to compare the current day's data with an aggregated view of the same day in previous weeks. In particular, we want to compare each point in the last day to the median of each point in the corresponding day for the last 3 weeks, so that any anomalies that occurred in a previous week are ignored. This way we compare with a hypothetical "perfect" day.

For example:

$weeks = over("avg:10m-avg-zero:os.cpu", "24h", "1w", 3) $a = aggregate($weeks, "", "p50") merge($a, $q)

Which looks like this:

Or, if we wanted to combine series but maintain the region and color groups`, that query would look like this:

$weeks = over("avg:10m-avg-zero:os.cpu{region=*,color=*}", "24h", "1w", 3) aggregate($weeks, "region,color", "p50")

which would result in one merged series for each unique region/color combination.

I am very happy to take suggestions for changes / improvements. With regards to naming the function, I would have probably chosen "merge", but since that is already taken, I went with the OpenTSDB terminology and used "aggregate".
Unable to query bosun after running for a minute

I have installed Hbase, opentsdb and bosun on a machine running Centos7. I can see the bosun website fine, but any query I try to run from the graph page is giving some error. I've put the bosun output into a log file, and there are 2 kinds of errors that pop up. Sometime it's too many open files:

2016/03/04 11:10:23 error: queue.go:102: Post http://localhost:8070/api/put: dial tcp 127.0.0.1:8070: socket: too many open files

Sometimes it's just a timeout.

2016/03/04 11:14:06 error: queue.go:102: Post http://localhost:8070/api/put: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Sometimes restarting seems to help, other times not so much. The longest I've had bosun running without these errors is a day.
build(deps): bump github.com/aws/aws-sdk-go from 1.31.12 to 1.33.0
Bumps github.com/aws/aws-sdk-go from 1.31.12 to 1.33.0.

Changelog

Sourced from github.com/aws/aws-sdk-go's changelog.

Release v1.33.0 (2020-07-01)

Service Client Updates

service/appsync: Updates service API and documentation

service/chime: Updates service API and documentation

This release supports third party emergency call routing configuration for Amazon Chime Voice Connectors.

service/codebuild: Updates service API and documentation

Support build status config in project source

service/imagebuilder: Updates service API and documentation

service/rds: Updates service API

This release adds the exceptions KMSKeyNotAccessibleFault and InvalidDBClusterStateFault to the Amazon RDS ModifyDBInstance API.

service/securityhub: Updates service API and documentation

SDK Features

service/s3/s3crypto: Introduces EncryptionClientV2 and DecryptionClientV2 encryption and decryption clients which support a new key wrapping algorithm kms+context. (#3403)

DecryptionClientV2 maintains the ability to decrypt objects encrypted using the EncryptionClient.

Please see s3crypto documentation for migration details.

Release v1.32.13 (2020-06-30)

Service Client Updates

service/codeguru-reviewer: Updates service API and documentation

service/comprehendmedical: Updates service API

service/ec2: Updates service API and documentation

Added support for tag-on-create for CreateVpc, CreateEgressOnlyInternetGateway, CreateSecurityGroup, CreateSubnet, CreateNetworkInterface, CreateNetworkAcl, CreateDhcpOptions and CreateInternetGateway. You can now specify tags when creating any of these resources. For more information about tagging, see AWS Tagging Strategies.

service/ecr: Updates service API and documentation

Add a new parameter (ImageDigest) and a new exception (ImageDigestDoesNotMatchException) to PutImage API to support pushing image by digest.

service/rds: Updates service documentation

Documentation updates for rds

Release v1.32.12 (2020-06-29)

Service Client Updates

service/autoscaling: Updates service documentation and examples

Documentation updates for Amazon EC2 Auto Scaling.

service/codeguruprofiler: Updates service API, documentation, and paginators

service/codestar-connections: Updates service API, documentation, and paginators

service/ec2: Updates service API, documentation, and paginators

Virtual Private Cloud (VPC) customers can now create and manage their own Prefix Lists to simplify VPC configurations.

Release v1.32.11 (2020-06-26)

Service Client Updates

service/cloudformation: Updates service API and documentation

ListStackInstances and DescribeStackInstance now return a new StackInstanceStatus object that contains DetailedStatus values: a disambiguation of the more generic Status value. ListStackInstances output can now be filtered on DetailedStatus using the new Filters parameter.

service/cognito-idp: Updates service API

... (truncated)

Commits

b5d0d96 Release v1.33.0 (2020-07-01)

4705b39 Fix CHANGELOG_PENDING.md entry (#3404)

dda576e service/cloudfront/sign: Fix Documentation Typo (#3402)

35fa6dd service/s3/s3crypto: V2 Client Release (#3403)

9dbc703 Release v1.32.13 (2020-06-30) (#3400)

6ec1b39 Release v1.32.12 (2020-06-29) (#3397)

15f0e61 Release v1.32.11 (2020-06-26) (#3395)

cd54c31 Release v1.32.10 (2020-06-25) (#3393)

8932aba Release v1.32.9 (2020-06-24) (#3391)

ea6bf35 Release v1.32.8 (2020-06-23) (#3388)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.
Fix false return error message for binary node validation for #2505
https://github.com/bosun-monitor/bosun/issues/2505

Description

Fixes #2505 (fill in)

Type of change

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

[ ] This change requires a documentation update

How has this been tested?

[ ] Test A

[ ] Test B

Checklist:

[x] This contribution follows the project's code of conduct

[x] This contribution follows the project's contributing guidelines

[x] My code follows the style guidelines of this project

[x] I have performed a self-review of my own code

[ ] I have commented my code, particularly in hard-to-understand areas

[ ] I have made corresponding changes to the documentation

[ ] I have added tests that prove my fix is effective or that my feature works

[x] New and existing unit tests pass locally with my changes

[ ] Any dependent changes have been merged and published in downstream modules
Added "* L4TOUT" to haproxyCheckStatus
Description

Scollector did not manage to collect data from HAProxy (HAProxy version 2.0.13-2ubuntu0.5). Got error:

Apr 28 16:26:34 ServerName scollector[1741859]: error: interval.go:65: haproxy-1-http://localhost:1936/;csv: unknown check status * L4TOUT Apr 28 16:26:49 ServerName scollector[1741859]: error: interval.go:65: haproxy-1-http://localhost:1936/;csv: unknown check status * L4TOUT

Print from HAProxy:

Simply added "* L4TOUT" so that its a valid check status for haproxyCheckStatus

Type of change

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

[ ] This change requires a documentation update

How has this been tested?

[x] HAProxy collection now works again for HAProxy version 2.0.13-2ubuntu0.5

Checklist:

[x] This contribution follows the project's code of conduct

[x] This contribution follows the project's contributing guidelines

[ ] My code follows the style guidelines of this project

[x] I have performed a self-review of my own code

[ ] I have commented my code, particularly in hard-to-understand areas

[ ] I have made corresponding changes to the documentation

[ ] I have added tests that prove my fix is effective or that my feature works

[ ] New and existing unit tests pass locally with my changes

[ ] Any dependent changes have been merged and published in downstream modules
Clarify release status

We package this for NixOS, and we like to use the latest stable release from upstream.

https://github.com/bosun-monitor/bosun/releases/tag/0.8.0-preview is listed as the latest release on GitHub. Is it a stable release or should it be marked pre-release? I ask because it has the "-preview" suffix attached to it, making me think it is an unstable release.

Time Series Alerting Framework

Bosun

Building

Running

Developing

Owner

Bosun

Comments

Support influxdb

Multiple backends of the same type?

Distributed alert checks to prevent high load spikes

Config management

Support Dependencies

Bosun sending notifications for closed and inactive alerts

Use templates body as payload for notifications and subject for other HTML related stuff

Add Recovery Emails

Memory leak in Bosun

Add series aggregation DSL function `aggregate`

Unable to query bosun after running for a minute

build(deps): bump github.com/aws/aws-sdk-go from 1.31.12 to 1.33.0

Release v1.33.0 (2020-07-01)

Service Client Updates

SDK Features

Release v1.32.13 (2020-06-30)

Service Client Updates

Release v1.32.12 (2020-06-29)

Service Client Updates

Release v1.32.11 (2020-06-26)

Service Client Updates

Fix false return error message for binary node validation for #2505

Description

Type of change

How has this been tested?

Checklist:

Added "* L4TOUT" to haproxyCheckStatus

Description

Type of change

How has this been tested?

Checklist:

Clarify release status

Related tags

An Alert notification service is an application which can receive alerts from certain alerting systems like System_X and System_Y and send these alerts to developers in the form of SMS and emails.

CPU usage percentage is the ratio of the total time the CPU was active, to the elapsed time of the clock on your wall.

terraform-plugin-mux Example (framework + framework)

Monitor your Website and APIs from your Computer. Get Notified through Slack, E-mail when your server is down or response time is more than expected.

Open Source runtime tool which help to detect malware code execution and run time mis-configuration change on a kubernetes cluster

After approve this contract, you can use the contract to adventure with multiple characters at the same time

Reconstruct Open API Specifications from real-time workload traffic seamlessly

StaticBackend is a simple backend server API handling user mgmt, database, storage and real-time component

Simple Kubernetes real-time dashboard and management.

Hardening a sketchy containerized application one step at a time

The Oracle Database Operator for Kubernetes (a.k.a. OraOperator) helps developers, DBAs, DevOps and GitOps teams reduce the time and complexity of deploying and managing Oracle Databases

A kubernetes operator sample generated by kubebuilder , which run cmd in pod on specified time

Package trn introduces a Range type with useful methods to perform complex operations over time ranges

A simple CLI and API client for One-Time Secret

A simple go application that uses Youtube Data API V3 to show the real-time stats for a youtube channel such as the subs, views, avg. earnings etc.

Huawei-push-authorizator - Huawei Push Kit authorizator in time

S3pd - CLI utility that downloads multiple s3 objects at a time, with multiple range-requests issued per object

A kubectl plugin to query multiple namespace at the same time.