A simple, standalone, and lightWeight tool that can do health/status checking, written in Go.

EaseProbe

Go Report Card

EaseProbe is a simple, standalone, and lightWeight tool that can do health/status checking, written in Go.

Table of Contents

1. Overview

EaseProbe would do three kinds of work - Probe, Notify, and Report.

1.1 Probe

Ease Probe supports the following probing methods: HTTP, TCP, Shell Command, SSH Command, Host Resource Usage, and Native Client.

Note:

Keep in mind that the prober name must be unique among probes. If multiple probes are defined with the same name, it could lead to corruption of the metrics data and the behavior of the application will be non-deterministic.

  • HTTP. Checking the HTTP status code, Support mTLS, HTTP Basic Auth, and can set the Request Header/Body. ( HTTP Probe Configuration )

    http:
      # Some of the Software support the HTTP Query
      - name: ElasticSearch
        url: http://elasticsearch.server:9200
      - name: Prometheus
        url: http://prometheus:9090/graph
  • TCP. Just simply check whether the TCP connection can be established or not. ( TCP Probe Configuration )

    tcp:
      - name: Kafka
        host: kafka.server:9093
  • Shell. Run a Shell command and check the result. ( Shell Command Probe Configuration )

    shell:
      # run redis-cli ping and check the "PONG"
      - name: Redis (Local)
        cmd: "redis-cli"
        args:
          - "-h"
          - "127.0.0.1"
          - "ping"
        env:
          # set the `REDISCLI_AUTH` environment variable for redis password
          - "REDISCLI_AUTH=abc123"
        # check the command output, if does not contain the PONG, mark the status down
        contain : "PONG"
  • SSH. Run a remote command via SSH and check the result. Support the bastion/jump server (SSH Command Probe Configuration)

    ssh:
      servers:
        - name : ServerX
          host: [email protected]:22
          password: xxxxxxx
          key: /Users/user/.ssh/id_rsa
          cmd: "ps auxwe | grep easeprobe | grep -v grep"
          contain: easeprobe
  • Host. Run an SSH command on a remote host and check the CPU, Memory, and Disk usage. ( Host Load Probe )

    host:
      servers:
        - name : server
          host: [email protected]:22
          key: /path/to/server.pem
          threshold:
            cpu: 0.80  # cpu usage  80%
            mem: 0.70  # memory usage 70%
            disk: 0.90  # disk usage 90%
  • Client. Currently, support the following native client. Support the mTLS. (refer to: Native Client Probe Configuration )

    • MySQL. Connect to the MySQL server and run the SHOW STATUS SQL.
    • Redis. Connect to the Redis server and run the PING command.
    • MongoDB. Connect to MongoDB server and just ping server.
    • Kafka. Connect to Kafka server and list all topics.
    • PostgreSQL. Connect to PostgreSQL server and run SELECT 1 SQL.
    • Zookeeper. Connect to Zookeeper server and run get / command.
    client:
      - name: Kafka Native Client (local)
        driver: "kafka"
        host: "localhost:9093"
        # mTLS
        ca: /path/to/file.ca
        cert: /path/to/file.crt
        key: /path/to/file.key

1.2 Notification

Ease Probe supports the following notifications:

  • Slack. Using Webhook for notification
  • Discord. Using Webhook for notification
  • Telegram. Using Telegram Bot for notification
  • Email. Support multiple email addresses.
  • AWS SNS. Support AWS Simple Notification Service.
  • WeChat Work. Support Enterprise WeChat Work notification.
  • DingTalk. Support the DingTalk notification.
  • Lark. Support the Lark(Feishu) notification.
  • Log File. Write the notification into a log file
  • SMS. Support SMS notification with multiple SMS service providers - Twilio, Vonage(Nexmo), YunPain

Note:

  • The notification is Edge-Triggered Mode, only notified while the status is changed.
# Notification Configuration
notify:
  slack:
    - name: "MegaEase#Alert"
      webhook: "https://hooks.slack.com/services/........../....../....../"
  discord:
    - name: "MegaEase#Alert"
      webhook: "https://discord.com/api/webhooks/...../....../"
  telegram:
    - name: "MegaEase Alert Group"
      token: 1234567890:ABCDEFGHIJKLMNOPQRSTUVWXYZ # Bot Token
      chat_id: -123456789 # Channel / Group ID
  email:
    - name: "DevOps Mailing List"
      server: smtp.email.example.com:465
      username: [email protected]
      password: ********
      to: "[email protected];[email protected]"
  aws_sns:
    - name: AWS SNS
      region: us-west-2
      arn: arn:aws:sns:us-west-2:298305261856:xxxxx
      endpoint: https://sns.us-west-2.amazonaws.com
      credential:
        id: AWSXXXXXXXID
        key: XXXXXXXX/YYYYYYY
  wecom:
    - name: "wecom alert service"
      webhook: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=589f9674-a2aa-xxxxxxxx-16bb6c43034a" # wecom robot webhook
  dingtalk:
    - name: "dingtalk alert service"
      webhook: "https://oapi.dingtalk.com/robot/send?access_token=xxxx"
  lark:
    - name: "lark alert service"
      webhook: "https://open.feishu.cn/open-apis/bot/v2/hook/d5366199-xxxx-xxxx-bd81-a57d1dd95de4"
  sms:
    - name: "sms alert service"
      provider: "yunpian"
      key: xxxxxxxxxxxx # yunpian apikey
      mobile: 123456789,987654321 # mobile phone number, multiple phone number joint by `,`
      sign: "xxxxxxxx" # need to register; usually brand name

Check the Notification Configuration to see how to configure it.

1.3 Report

  • SLA Report Notify. EaseProbe would send the daily, weekly, or monthly SLA report.

    settings:
      # SLA Report schedule
      sla:
        #  daily, weekly (Sunday), monthly (Last Day), none
        schedule: "weekly"
        # UTC time, the format is 'hour:min:sec'
        time: "23:59"
  • SLA Live Report. You can query the SLA Live Report

    The EaseProbe would listen on the 0.0.0.0:8181 port by default. And you can access the Live SLA report by the following URL:

    • HTML: http://localhost:8181/ or http://localhost:8181/?refresh=30s
    • JSON: http://localhost:8181/api/v1/sla

    Refer to the Global Setting Configuration to see how to configure the access log.

  • SLA Data Persistence. Save the SLA statistics data on the disk.

    The SLA data would be persisted in $CWD/data/data.yaml by default. If you want to configure the path, you can do it in the settings section.

    When EaseProbe starts, it looks for the location of data.yaml and if found, load the file and remove any probes that are no longer present in the configuration file. Setting a value of "-" for data: disables SLA persistence (eg data: "-").

    settings:
      sla:
        # SLA data persistence file path.
        # The default location is `$CWD/data/data.yaml`
        data: /path/to/data/file.yaml

For more information, please check the Global Setting Configuration

1.4 Administration

There are some administration configuration options:

1) PID file

The EaseProbe would create a PID file (default $CWD/easeprobe.pid) when it starts. it can be configured by:

settings:
  pid: /var/run/easeprobe.pid
  • If the file already exists, EaseProbe would overwrite it.
  • If the file cannot be written, EaseProbe would exit with an error.

If you want to disable the PID file, you can configure the pid file to "".

settings:
  pid: "" # EaseProbe won't create a PID file

2) Log file Rotation

There are two types of log file: Application Log and HTTP Access Log.

Both Application Log and HTTP Access Log would be StdOut by default. They all can be configured by:

log:
  file: /path/to/log/file
  self_rotate: true # default: true

If self_rotate is true, EaseProbe would rotate the log automatically, and the following options are available:

  size: 10 # max size of log file. default: 10M
  age: 7 # max age days of log file. default: 7 days
  backups: 5 # max backup log files. default: 5
  compress: true # compress. default: true

If self_rotate is false, EaseProbe will not rotate the log, and the log file will have to be rotated by a 3rd-party tool (such as logrotate) or manually by the administrator.

mv /path/to/easeprobe.log /path/to/easeprobe.log.0
kill -HUP `cat /path/to/easeprobe.pid`

EaseProbe accepts the HUP signal to rotate the log.

1.5 Prometheus Metrics

EaseProbe supports Prometheus metrics. The Prometheus endpoint is http://localhost:8181/metrics by default.

The following snapshot is the Grafana panel for host CPU metrics

Refer to the Global Setting Configuration to see how to configure the HTTP server.

2. Getting Started

You can get started with EaseProbe, by any of the following methods:

2.1 Build

Compiler Go 1.18+ (Generics Programming Support)

Use make to make the binary file. the target is under the build/bin directory

$ make

2.2 Configure

Read the Configuration Guide to learn how to configure EaseProbe.

Create the configuration file - $CWD/config.yaml.

2.3 Run

Running the following command for the local test

$ build/bin/easeprobe -f config.yaml
  • -f configuration file or URL. Can also be achieved by setting the environment variable PROBE_CONFIG
  • -d dry run. Can also be achieved by setting the environment variable PROBE_DRY

3. Configuration

EaseProbe can be configured by supplying a yaml file or URL to fetch configuration settings from. By default EaseProbe will look for its config.yaml on the current folder, this can be changed by supplying the -f parameter.

easeprobe -f path/to/config.yaml
easeprobe -f https://example.com/config

The following environment variables can be used to fine-tune the request to the configuration file

  • HTTP_AUTHORIZATION
  • HTTP_TIMEOUT

And the configuration file should be versioned, the version should be aligned with the EaseProbe binary version.

version: v1.5.0

The following example configurations illustrate the EaseProbe supported features.

Notes: All probes have the following options:

  • timeout - the maximum time to wait for the probe to complete. default: 30s.
  • interval - the interval time to run the probe. default: 1m.

3.1 HTTP Probe Configuration

# HTTP Probe Configuration

http:
  # A Website
  - name: MegaEase Website (Global)
    url: https://megaease.com

  # Some of the Software support the HTTP Query
  - name: ElasticSearch
    url: http://elasticsearch.server:9200
  - name: Eureka
    url: http://eureka.server:8761
  - name: Prometheus
    url: http://prometheus:9090/graph

  # Spring Boot Application with Actuator Heath API
  - name: EaseService-Governance
    url: http://easeservice-mgmt-governance:38012/actuator/health
  - name: EaseService-Control
    url: http://easeservice-mgmt-control:38013/actuator/health
  - name: EaseService-Mesh
    url: http://easeservice-mgmt-mesh:38013/actuator/health

  # A completed HTTP Probe configuration
  - name: Special Website
    url: https://megaease.cn
    # Request Method
    method: GET
    # Request Header
    headers:
      X-head-one: xxxxxx
      X-head-two: yyyyyy
      X-head-THREE: zzzzzzX-
    content_encoding: text/json
    # Request Body
    body: '{ "FirstName": "Mega", "LastName" : "Ease", "UserName" : "megaease", "Email" : "[email protected]"}'
    # HTTP Basic Auth
    username: username
    password: password
    # mTLS
    ca: /path/to/file.ca
    cert: /path/to/file.crt
    key: /path/to/file.key
    # HTTP successful response code range, default is [0, 499].
    success_code:
      - [200,206] # the code >=200 and <= 206
      - [300,308] # the code >=300 and <= 308
    # Response Checking
    contain: "success" # response body must contain this string, if not the probe is considered failed.
    not_contain: "failure" # response body must NOT contain this string, if it does the probe is considered failed.
    # configuration
    timeout: 10s # default is 30 seconds

3.2 TCP Probe Configuration

# TCP Probe Configuration
tcp:
  - name: SSH Service
    host: example.com:22
    timeout: 10s # default is 30 seconds
    interval: 2m # default is 60 seconds

  - name: Kafka
    host: kafka.server:9093

3.3 Shell Command Probe Configuration

The shell command probe is used to execute a shell command and check the output.

The following example shows how to configure the shell command probe.

# Shell Probe Configuration
shell:
  # A proxy curl shell script
  - name: Google Service
    cmd: "./resources/probe/scripts/proxy.curl.sh"
    args:
      - "socks5://127.0.0.1:1085"
      - "www.google.com"

  # run redis-cli ping and check the "PONG"
  - name: Redis (Local)
    cmd: "redis-cli"
    args:
      - "-h"
      - "127.0.0.1"
      - "ping"
    env:
      # set the `REDISCLI_AUTH` environment variable for redis password
      - "REDISCLI_AUTH=abc123"
    # check the command output, if does not contain the PONG, mark the status down
    contain : "PONG"

  # Run Zookeeper command `stat` to check the zookeeper status
  - name: Zookeeper (Local)
    cmd: "/bin/sh"
    args:
      - "-c"
      - "echo stat | nc 127.0.0.1 2181"
    contain: "Mode:"

3.4 SSH Command Probe Configuration

SSH probe is similar to Shell probe.

  • Support Password and Private key authentication.
  • Support the Bastion host tunnel.

The host supports the following configuration

The following are examples of SSH probe configuration.

# SSH Probe Configuration
ssh:
  # SSH bastion host configuration
  bastion:
    aws: # bastion host ID      ◄──────────────────────────────┐
      host: aws.basition.com:22 #
      username: ubuntu # login user                            │
      key: /path/to/aws/basion/key.pem # private key file      │
    gcp: # bastion host ID                                     │
      host: [email protected]:22 # bastion host          │
      key: /path/to/gcp/basion/key.pem # private key file      │
  # SSH Probe configuration                                    │
  servers:   #
    # run redis-cli ping and check the "PONG"                  │
    - name: Redis (AWS) # Name                                 │
      bastion: aws  # bastion host id ------------------------─┘
      host: 172.20.2.202:22
      username: ubuntu  # SSH Login username
      password: xxxxx   # SSH Login password
      key: /path/to/private.key # SSH login private file
      cmd: "redis-cli"
      args:
        - "-h"
        - "127.0.0.1"
        - "ping"
      env:
        # set the `REDISCLI_AUTH` environment variable for redis password
        - "REDISCLI_AUTH=abc123"
      # check the command output, if does not contain the PONG, mark the status down
      contain : "PONG"

    # Check the process status of `Kafka`
    - name:  Kafka (GCP)
      bastion: gcp         #  ◄------ bastion host id
      host: 172.10.1.100:22
      username: ubuntu
      key: /path/to/private.key
      cmd: "ps -ef | grep kafka"

3.5 Host Resource Usage Probe Configuration

Support the host probe, the configuration example as below.

The feature probe the CPU, Memory, and Disk usage, if one of them exceeds the threshold, then mark the host as status down.

Note:

  • The thresholds are OR conditions, if one of them exceeds the threshold, then mark the host as status down.
  • The Host needs remote server have the following command: top, df, free, awk, grep, tr, and hostname (check the source code to see how it works).
  • The disk usage only check the root disk.
host:
  bastion: # bastion server configuration
    aws: # bastion host ID      ◄──────────────────┐
      host: [email protected] # bastion host      │
      key: /path/to/bastion.pem # private key file │
  # Servers List                                   │
  servers: #
    - name : aws server   #
      bastion: aws #  <-- bastion server id ------─┘
      host: [email protected]:22
      key: /path/to/server.pem
      threshold:
        cpu: 0.80  # cpu usage  80%
        mem: 0.70  # memory usage 70%
        disk: 0.90  # disk usage 90%

    # Using the default threshold
    # cpu 80%, mem 80% and disk 95%
    - name : My VPS
      host: [email protected]:22
      key: /Users/user/.ssh/id_rsa

3.6 Native Client Probe Configuration

# Native Client Probe
client:
  - name: Redis Native Client (local)
    driver: "redis"  # driver is redis
    host: "localhost:6379"  # server and port
    password: "abc123" # password
    # mTLS
    ca: /path/to/file.ca
    cert: /path/to/file.crt
    key: /path/to/file.key

  - name: MySQL Native Client (local)
    driver: "mysql"
    host: "localhost:3306"
    username: "root"
    password: "pass"

  - name: MongoDB Native Client (local)
    driver: "mongo"
    host: "localhost:27017"
    username: "admin"
    password: "abc123"
    timeout: 5s

  - name: Kafka Native Client (local)
    driver: "kafka"
    host: "localhost:9093"
    # mTLS
    ca: /path/to/file.ca
    cert: /path/to/file.crt
    key: /path/to/file.key

  - name: PostgreSQL Native Client (local)
    driver: "postgres"
    host: "localhost:5432"
    username: "postgres"
    password: "pass"

  - name: Zookeeper Native Client (local)
    driver: "zookeeper"
    host: "localhost:2181"
    timeout: 5s
    # mTLS
    ca: /path/to/file.ca
    cert: /path/to/file.crt
    key: /path/to/file.key

3.7 Notification Configuration

# Notification Configuration
notify:
  # Notify to Slack Channel
  slack:
    - name: "Organization #Alert"
      webhook: "https://hooks.slack.com/services/........../....../....../"
      # dry: true   # dry notification, print the Slack JSON in log(STDOUT)
  telegram:
    - name: "Group Name"
      token: 1234567890:ABCDEFGHIJKLMNOPQRSTUVWXYZ # Bot Token
      chat_id: -123456789 # Group ID
    - name: "Channel Name"
      token: 1234567890:ABCDEFGHIJKLMNOPQRSTUVWXYZ # Bot Token
      chat_id: -1001234567890 # Channel ID
  # Notify to Discord Text Channel
  discord:
    - name: "Server #Alert"
      webhook: "https://discord.com/api/webhooks/...../....../"
      # the avatar and thumbnail setting for notify block
      avatar: "https://img.icons8.com/ios/72/appointment-reminders--v1.png"
      thumbnail: "https://freeiconshop.com/wp-content/uploads/edd/notification-flat.png"
      # dry: true # dry notification, print the Discord JSON in log(STDOUT)
      retry: # something the network is not good need to retry.
        times: 3
        interval: 10s
  # Notify to email addresses
  email:
    - name: "XXX Mail List"
      server: smtp.email.example.com:465
      username: [email protected]
      password: ********
      to: "[email protected];[email protected]"
      from: "[email protected]" # Optional
      # dry: true # dry notification, print the Email HTML in log(STDOUT)
  # Notify to AWS Simple Notification Service
  aws_sns:
    - name: AWS SNS
      region: us-west-2 # AWS Region
      arn: arn:aws:sns:us-west-2:298305261856:xxxxx # SNS ARN
      endpoint: https://sns.us-west-2.amazonaws.com # SNS Endpoint
      credential: # AWS Access Credential
        id: AWSXXXXXXXID  # AWS Access Key ID
        key: XXXXXXXX/YYYYYYY # AWS Access Key Secret
  # Notify to Wecom(WeChatwork) robot.
  wecom:
    - name: "wecom alert service"
      webhook: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=589f9674-a2aa-xxxxxxxx-16bb6c43034a" # wecom robot webhook
  # Notify to Dingtalk
  dingtalk:
    - name: "dingtalk alert service"
      webhook: "https://oapi.dingtalk.com/robot/send?access_token=xxxx"
  # Notify to Lark
  lark:
    - name: "lark alert service"
      webhook: "https://open.feishu.cn/open-apis/bot/v2/hook/d5366199-xxxx-xxxx-bd81-a57d1dd95de4"
  # Notify to a local log file
  log:
    - name: "Local Log"
      file: "/tmp/easeprobe.log"
      dry: true
  # Notify by sms using yunpian  https://www.yunpian.com/official/document/sms/zh_cn/domestic_single_send
  sms:
    - name: "sms alert service - yunpian"
      provider: "yunpian"
      key: xxxxxxxxxxxx # yunpian apikey
      mobile: 123456789,987654321 # mobile phone number, multi phone number joint by `,`
      sign: "xxxxx" # get this from yunpian

Notes: All of the notifications can have the following optional configuration.

  dry: true # dry notification, print the Discord JSON in log(STDOUT)
  timeout: 20s # the timeout send out notification, default: 30s
  retry: # somehow the network is not good and needs to retry.
    times: 3 # default: 3
    interval: 10s # default: 5s

3.8 Global Setting Configuration

# Global settings for all probes and notifiers.
settings:

  # The customized name and icon
  name: "Easeprobe" # the name of the probe: default: "EaseProbe"
  icon: "https://path/to/icon.png" # the icon of the probe. default: "https://megaease.com/favicon.png"
  # Daemon settings

  # pid file path,  default: $CWD/easeprobe.pid,
  # if set to "", will not create pid file.
  pid: /var/run/easeprobe.pid

  # A HTTP Server configuration
  http:
    ip: 127.0.0.1 # the IP address of the server. default:"0.0.0.0"
    port: 8181 # the port of the server. default: 8181
    refresh: 5s # the auto-refresh interval of the server. default: the minimum value of the probes' interval.
    log:
      file: /path/to/access.log # access log file. default: Stdout
      # Log Rotate Configuration (optional)
      self_rotate: true # true: self rotate log file. default: true
                        # false: managed by outside  (e.g logrotate)
                        #        the blow settings will be ignored.
      size: 10 # max of access log file size. default: 10m
      age: 7 #  max of access log file age. default: 7 days
      backups: 5 # max of access log file backups. default: 5
      compress: true # compress the access log file. default: true

  # SLA Report schedule
  sla:
    #  daily, weekly (Sunday), monthly (Last Day), none
    schedule : "daily"
    # UTC time, the format is 'hour:min:sec'
    time: "23:59"
    # debug mode
    # - true: send the SLA report every minute
    # - false: send the SLA report in schedule
    debug: false
    # SLA data persistence file path.
    # The default location is `$CWD/data/data.yaml`
    data: /path/to/data/file.yaml
    # Use the following to disable SLA data persistence
    # data: "-"
    backups: 5 # max of SLA data file backups. default: 5
               # if set to a negative value, keep all backup files

  notify:
    # dry: true # Global settings for dry run
    retry: # Global settings for retry
      times: 5
      interval: 10s

  probe:
    timeout: 30s # the time out for all probes
    interval: 1m # probe every minute for all probes

  # easeprobe program running log file.
  log:
    file: "/path/to/easeprobe.log" # default: stdout
    # Log Level Configuration
    # can be: panic, fatal, error, warn, info, debug.
    level: "debug"
    # Log Rotate Configuration (optional)
    self_rotate: true # true: self rotate log file. default: true
                        # false: managed by outside  (e.g logrotate)
                        #        the blow settings will be ignored.
    size: 10 # max of access log file size. default: 10m
    age: 7 #  max of access log file age. default: 7 days
    backups: 5 # max of access log file backups. default: 5
    compress: true # compress the access log file. default: true

  # Date format
  # Date
  #  - January 2, 2006
  #  - 01/02/06
  #  - Jan-02-06
  #
  # Time
  #   - 15:04:05
  #   - 3:04:05 PM
  #
  # Date Time
  #   - Jan _2 15:04:05                   (Timestamp)
  #   - Jan _2 15:04:05.000000            (with microseconds)
  #   - 2006-01-02T15:04:05-0700          (ISO 8601 (RFC 3339))
  #   - 2006-01-02 15:04:05
  #   - 02 Jan 06 15:04 MST               (RFC 822)
  #   - 02 Jan 06 15:04 -0700             (with numeric zone)
  #   - Mon, 02 Jan 2006 15:04:05 MST     (RFC 1123)
  #   - Mon, 02 Jan 2006 15:04:05 -0700   (with numeric zone)
  timeformat: "2006-01-02 15:04:05 UTC"

4. Community

5. License

EaseProbe is under the Apache 2.0 license. See the LICENSE file for details.

Owner
MegaEase
Open Source, Freedom, Low Cost, High Availability Cloud Native Platform.
MegaEase
Comments
  • [feature request]Hot reloading support

    [feature request]Hot reloading support

    Background & X Problem Currently (2022-05-05), easeprobe does not support hot reloading, so the problem is that if I just modify the config.yaml file (which can be a frequent action), then I need to restart the application for it to take effect.

    Proposal and Expectation We can add a reload option to the configuration file to enable it:

    reload:
      enabled: true # default is false
      period: 10s # scan interval, default is 10s
    

    Solutions & Y Problems We need to not lose the SLA information that the configuration has not changed when the reload is turned on.

  • shell content error, no dingding message send

    shell content error, no dingding message send

    Enviornment (please complete the following information):

    • OS: linux
    • EaseProbe Version
    • latest version

    Describe the bug 配置shell脚本,脚本内容出错,钉钉消息未发送通知,日志也未打印发送钉钉通知消息 Configure shell script, script error, but messages are not sent to DingDing notifications, and there is no notifcaiton logs as well.

    image

    To Reproduce

    Expected behavior 期望脚本报错发送钉钉消息,如果是钉钉配置出错,希望把日志打印出来

  • Fix the concurrent map access crashing problem

    Fix the concurrent map access crashing problem

    Background

    We have a map to collect all of the probers' probe results, all of the probers would set its result into this map concurrently in a different key(this is not a problem), however, we have another go routine that persistent this map periodically, this causes the "concurrent map iteration and map write" problem and it makes EaseProbe crash.

    the stake trace of the crash as below:

    fatal error: concurrent map iteration and map write
    
    goroutine 50 [running]:
    runtime.throw({0xfd17f1?, 0x0?})
            /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc0172745f0 sp=0xc0172745c0 pc=0x438b31
    runtime.mapiternext(0x0?)
            /usr/local/go/src/runtime/map.go:871 +0x4eb fp=0xc017274660 sp=0xc0172745f0 pc=0x41058b
    runtime.mapiterinit(0xc0172747e8?, 0xc0265008d0?, 0xc017274710?)
            /usr/local/go/src/runtime/map.go:861 +0x228 fp=0xc017274680 sp=0xc017274660 pc=0x410048
    reflect.mapiterinit(0xc0172746c8?, 0x8c24b3?, 0xc000303400?)
    ....
    ....
    ....
    ....
    gopkg.in/yaml%2ev3.(*encoder).marshalDoc(0xc000303400, {0x0, 0x0}, {0xe7b4e0?, 0xc02a155740?, 0x0?})
            /home/ubuntu/go/pkg/mod/gopkg.in/[email protected]/encode.go:105 +0x185 fp=0xc017275ab8 sp=0xc017275838 pc=0x8c2765
    gopkg.in/yaml%2ev3.Marshal({0xe7b4e0?, 0xc02a155740})
            /home/ubuntu/go/pkg/mod/gopkg.in/[email protected]/yaml.go:222 +0x370 fp=0xc017275de8 sp=0xc017275ab8 pc=0x8e2e70
    gopkg.in/yaml%2ev3.Marshal({0xe7b4e0?, 0xc02a155740})
            /home/ubuntu/go/pkg/mod/gopkg.in/[email protected]/yaml.go:222 +0x370 fp=0xc017275de8 sp=0xc017275ab8 pc=0x8e2e70
    github.com/megaease/easeprobe/probe.SaveDataToFile({0xc0015858c0, 0x18})
            /home/ubuntu/hchen/easeprobe/probe/data.go:102 +0x85 fp=0xc017275e50 sp=0xc017275de8 pc=0x8e6ba5
    main.saveData.func1()
            /home/ubuntu/hchen/easeprobe/cmd/easeprobe/report.go:34 +0x3d fp=0xc017275ee8 sp=0xc017275e50 pc=0xda975d
    main.saveData(0xc000090780)
            /home/ubuntu/hchen/easeprobe/cmd/easeprobe/report.go:59 +0x16f fp=0xc017275fc8 sp=0xc017275ee8 pc=0xda958f
    main.main.func3()
            /home/ubuntu/hchen/easeprobe/cmd/easeprobe/main.go:159 +0x26 fp=0xc017275fe0 sp=0xc017275fc8 pc=0xda7f66
    runtime.goexit()
            /usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc017275fe8 sp=0xc017275fe0 pc=0x46b6a1
    created by main.main
            /home/ubuntu/hchen/easeprobe/cmd/easeprobe/main.go:159 +0x5fd
    
    goroutine 1 [chan receive, 328 minutes]:
    main.main()
            /home/ubuntu/hchen/easeprobe/cmd/easeprobe/main.go:205 +0x82b
    

    the crash code line as blow:

    // SaveDataToFile save the results to file
    func SaveDataToFile(filename string) error {
            metaData.file = filename
            if strings.TrimSpace(filename) == "-" {
                    return nil
            }
    
            dataBuf, err := yaml.Marshal(resultData)   //<========= Panic at this line
            if err != nil {
                    return err
            }
    
            genMetaBuf()
            buf := append(metaBuf, dataBuf...)
    
            if err := ioutil.WriteFile(filename, []byte(buf), 0644); err != nil {
                    return err
            }
            return nil
    }
    

    Investigation

    The problem here is the prober's result (statistics data) would be consumed by the following modules:

    • Data Saving
    • SLA Report
    • Web Server

    All of the above modules only read the statistics data, the probers would update the statistics data.

    So, this is the read-write concurrent problem.

    Solution

    We transfer all of the probers statistics data into the Save Manager (via a channel), and all of the consumers(Saving, SLA Report, Web Server) retrieve the statistics data from Save Manager instead of the probers directly.

    1. Each Prober retrieves the persistent data from Save Manager by a clone object.
    2. Each Prober clones & reports its Result to Save Manager via a go channel.
    3. Web Server and SLA Report would query the result data from Save Manager.
    4. A Read-Write Lock added in Save Manager's Get/Set method.
                                                                  ┌────────────┐
                                                       Query      │            │
      Result┌────────┐ retrieve                 ┌────────────────►│ Web Server │
    ┌───────┤ Prober │◄────────┐                │                 │            │
    │       └────────┘         │                │                 └────────────┘
    │                          │ Result. ┌──────┴──────┐
    │ Result┌────────┐ retrieve│ Clone() │             │          ┌────────────┐
    ├───────┤ Prober │◄────────┼─────────┤ Save Manager│  Query   │            │
    │       └────────┘         │         │   data.go   ├─────────►│ SLA Report │
    │                          │    ┌────►             │          │            │
    │ Result┌────────┐ retrieve│    │    └─────▲─┬─────┘          └────────────┘
    ├───────┤ Prober │◄────────┘    │          │ │
    │       └────────┘              │          │ │  map[name]*Result
    │                               │          │ │
    │                               │     Load │ │ Save
    │    Result.                    │      ┌───┴─▼────┐
    │    Clone()  ┌──────────────┐  │      │          │
    └────────────►│    Channel   ├──┘      │   File   │
                  └──────────────┘         │          │
                                           └──────────┘
    
  • every hourly Restart stat up and down times

    every hourly Restart stat up and down times

    $CMD/data/data.yaml I want to stat up and down times,start from zero.every hourly. How it should be configured?

    ---
    name: EaseProbe
    version: v1.7.0
    ---
    "":
        name: ""
        endpoint: http://xxxxx:8000/accounts
        time: 2022-07-25T03:40:09.881474243Z
        timestamp: 1658720409881
        rtt: 354.042µs
        status: down
        prestatus: down
        message: 'Error (http): Error: Get "http://xxxx:8000/accounts": dial tcp xxxxx:8000: connect: connection refused'
        latestdowntime: 2022-07-25T02:44:48.241836769Z
        recoverytime: 0s
        stat:
            since: 2022-07-07T02:49:31.152596481Z
            total: 1454
            status:
                up: 1412
                down: 42
            uptime: 23h32m0s
            downtime: 42m0s
        timeformat: 2006-01-02 15:04:05 UTC
    
  • Connect probers and notifiers by Channel

    Connect probers and notifiers by Channel

    This PR introduces the channel to connect the probers and notifiers.

    For more information, please take a look the discussion #82

    Example:

    if we have the following configuration

    http: 
       - name: probe A
         channels : [ Chann_A, Chann_B]
    shell:
       - name: probe B
         channels: [ Chann_C ]
    notify:
       - discord: Discord
         channels: [Chann_A, Chann_C]
       - email: Gmail
         channels: [Chann_B]
    

    Then, we will have the following diagram

    ┌───────┐          ┌─────────┐
    │Probe B├─────────►│ Chann_C ├─────┐
    └───────┘          └─────────┘     │
                                       │
                                       │
                       ┌─────────┐     │    ┌─────────┐
                ┌─────►│ Chann_A ├─────▼────► Discord │
                │      └─────────┘          └─────────┘
    ┌───────┐   │
    │Probe A├───┤
    └───────┘   │
                │      ┌─────────┐          ┌─────────┐
                └─────►│ Chann_B ├──────────►  Gmail  │
                       └─────────┘          └─────────┘
    

    Test

    I've done the following test:

    • [x] Case 1: backward compatibility - NO channels defined for all probers and notifiers.
    • [x] Case 2 : 1 prober -> 1 channel -> 1 notifier.
    • [x] Case 3: 5 probers and 3 notifiers, define 3 channels.
    • [x] Case 4: the channel hasn't probers. - everything's fine, but a warning message in the log
    • [x] Case 5: the channel hasn't notifiers - everything's fine, but a warning message in the log.
    • [x] Case 6: the probers and notifiers without channels that can be connected together.
  • Support keyword found/not found for HTTP probe

    Support keyword found/not found for HTTP probe

    When using http probing, it is common to check for the existence or absence of a keyword. I would love to have such a feature in easeprobe.

    Implementation suggestion:

    http:
    - name: ...
      url: ...
      success_keyword: "Welcome to my site"
      failure_keyword: "Unauthorized"
    
  • Does easeprobe supports schedule in every minute?

    Does easeprobe supports schedule in every minute?

    Currently, the schedule only supports daily, weekly (Sunday), monthly (Last Day), none. If this schedule can set minutely, meas easeprobe can support Real-time monitoring.

  • Versionizing the binary, config, data file

    Versionizing the binary, config, data file

    fixes #98

    • [x] easeprobe -v
    $ ./easeprobe -v                                                                                                                                  ─╯
    EaseProbe v1.5.0
    
    • [x] easeprobe -h
    $ ./easeprobe -h                                                                                                                                  ─╯
    Usage of build/bin/easeprobe:
      -d    dry notification mode
      -f string
            configuration file (default "config.yaml")
      -v    prints version
    
    • [x] config.yaml
    version : 1.5.0
    
    • [x] data.yaml
    ---
    name: EaseProbe
    version: v1.5.0
    ---
    Probe:
       ...
      ....
    
  • If there are many sites, can I include conf files like nginx?

    If there are many sites, can I include conf files like nginx?

    If there are many sites For example, thousands of sites, all written in a config.yaml is too long and ugly

    Can you include conf files like nginx? That is, include files in a directory to make it easier to distinguish

  •  easeprod  start error

    easeprod start error

    /opt # easeprobe -f config.yaml INFO[0000] The data file data/data.yaml, was not found! INFO[0000] Load the configuration file successfully!
    INFO[0000] Successfully created the PID file: /opt/easeprobe.pid INFO[0000] Application Log File [Stdout] - Self-Rotate
    INFO[0000] Web Access Log File [Stdout] - Self-Rotate
    INFO[2022-07-05T03:44:22Z] [Web] HTTP server is listening on 0.0.0.0:8181 INFO[2022-07-05T03:44:22Z] Probe [http] - [ElasticSearch] base options are configured! INFO[2022-07-05T03:44:22Z] [Metric] Counter <EaseProbe_http_total> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_duration> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_status> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_sla> is created! INFO[2022-07-05T03:44:22Z] [Metric] Counter <EaseProbe_http_status_code> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_content_len> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_dns_duration> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_connect_duration> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_tls_duration> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_send_duration> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_wait_duration> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_transfer_duration> is created! INFO[2022-07-05T03:44:22Z] [Metric] Gauge <EaseProbe_http_total_duration> is created! FATA[2022-07-05T03:44:22Z] No notifies configured, exiting...

    config.yaml

    http:
      - name: test
        url: http://xxxx.com
    
  • fix undefined processExists error

    fix undefined processExists error

    just a copy of daemon_darwin to fix daemon/daemon.go:90:5: undefined: processExists

    I tried changing the headers to make the daemon_darwin be used but it kept failing.

    //go:build darwin && openbsd
    // +build darwin,openbsd
    

    Is this command i'm using to cross-compile easeprobe wrong?

    GOOS=openbsd GOARCH=amd64 go build -a -ldflags '-s -w -extldflags "-static"' -gcflags=-G=3 -o $PWD/build/bin/easeprobe.obsd $PWD/cmd/easeprobe
    
    
  • Golang monkey patching library utilized against terms of license

    Golang monkey patching library utilized against terms of license

    The Golang monkey patching library, https://github.com/bouk/monkey, is being utilized directly against its license:

    Copyright Bouke van der Bijl

    I do not give anyone permissions to use this tool for any purpose. Don't use it.

    I’m not interested in changing this license. Please don’t ask.

    It is also archived and not maintained.

Implementing SPEEDEX price computation engine in Golang as a standalone binary that exchanges can call

speedex-standalone Implementing SPEEDEX price computation engine in Golang as a standalone binary that exchanges can call. Notes from Geoff About Tato

Dec 1, 2021
A tool for checking the accessibility of your data by IPFS peers

ipfs-check Check if you can find your content on IPFS A tool for checking the accessibility of your data by IPFS peers Documentation Build go build wi

Dec 17, 2022
A tool for capturing newly issued x.509 from Certificate Transparency logs & performing periodic revocation checking.

ct-logster This repository contains the tools for collecting newly issued x509 certificates from Certificate Transparency logs, as well as performing

May 4, 2022
Sep 23, 2022
Go-aspell - GNU Aspell spell checking library bindings for golang

Aspell library bindings for Go GNU Aspell is a spell checking tool written in C/

Nov 14, 2022
Keeps track of Steam Deck Verifications. On first run, it reports all games with their respective Steam Deck Verification status. On subsequent runs, the tool will report newly tested and updated games.

Keeps track of Steam Deck Verifications. On first run, it reports all games with their respective Steam Deck Verification status. On subsequent runs, the tool will report newly tested and updated games.

Feb 2, 2022
Event driven modular status-bar for dwm; written in Go & uses Unix sockets for signaling.

dwmstat A simple event-driven modular status-bar for dwm. It is written in Go & uses Unix sockets for signaling. The status bar is conceptualized as a

Dec 25, 2021
httpstream provides HTTP handlers for simultaneous streaming uploads and downloads of objects, as well as persistence and a standalone server.

httpfstream httpfstream provides HTTP handlers for simultaneous streaming uploads and downloads of files, as well as persistence and a standalone serv

May 1, 2021
Cross check makes health checks on PostgreSQL and MySQL database servers

Cross Check Cross check makes health checks on PostgreSQL and MySQL database servers, it also performs master & slave control for clusters in H/A Acti

Jan 14, 2022
🏥 Barebones, detailed health check library for Go

go-health ?? Barebones, detailed health check library for Go go-health does away with the kitchen sink mentality of other health check libraries. You

Oct 19, 2021
Health check for instances behaind the ipvs

ipvshc Health check for instances behaind the ipvs Futures: Health check instances (curl from srcip) Change state (ipvsadm) Config in sql (sqlite) Sta

Nov 4, 2021
This repo contains a sample app exposing a gRPC health endpoint to demo Kubernetes gRPC probes.

This repo contains a sample app exposing a health endpoint by implementing grpc_health_v1. Usecase is to demo the gRPC readiness and liveness probes introduced in Kubernetes 1.23.

Feb 9, 2022
Go library for writing standalone Map/Reduce jobs or for use with Hadoop's streaming protocol

dmrgo is a Go library for writing map/reduce jobs. It can be used with Hadoop's streaming protocol, but also includes a standalone map/reduce impleme

Nov 27, 2022
Standalone client for proxies of Opera VPN

opera-proxy Standalone Opera VPN client. Younger brother of hola-proxy. Just run it and it'll start a plain HTTP proxy server forwarding traffic throu

Jan 9, 2023
Standalone client for proxies of Windscribe browser extension

windscribe-proxy Standalone Windscribe proxy client. Younger brother of opera-proxy. Just run it and it'll start a plain HTTP proxy server forwarding

Dec 29, 2022
A standalone ipfs gateway

rainbow Because ipfs should just work like unicorns and rainbows Building go build Running rainbow Configuration NAME: rainbow - a standalone ipf

Nov 9, 2022
A standalone Web Server developed with the standard http library, suport reverse proxy & flexible configuration
A standalone Web Server developed with the standard http library, suport reverse proxy & flexible configuration

paddy 简介 paddy是一款单进程的独立运行的web server,基于golang的标准库net/http实现。 paddy提供以下功能: 直接配置http响应 目录文件服务器 proxy_pass代理 http反向代理 支持请求和响应插件 部署 编译 $ go build ./main/p

Oct 18, 2022
Scout is a standalone open source software solution for DIY video security.
Scout is a standalone open source software solution for DIY video security.

scout Scout is a standalone open source software solution for DIY video security. https://www.jonoton-innovation.com Features No monthly fees! Easy In

Oct 25, 2022
Check DNS and optionally Consul and serve the status from a Web page

dns-checker Table of contents Preamble Compiling the program Keepalived and LVS Available options Setting up systemd Preamble This application checks

Nov 7, 2021