Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds

Last update: Jan 1, 2023

Comments: 17

scrutiny

WebUI for smartd S.M.A.R.T monitoring

NOTE: Scrutiny is a Work-in-Progress and still has some rough edges.

WARNING: Once the InfluxDB branch is merged, Scrutiny will use both sqlite and InfluxDB for data storage. Unfortunately, this may not be backwards compatible with the database structures in the master (sqlite only) branch.

Introduction

If you run a server with more than a couple of hard drives, you're probably already familiar with S.M.A.R.T and the smartd daemon. If not, it's an incredible open source project described as the following:

smartd is a daemon that monitors the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests.

Theses S.M.A.R.T hard drive self-tests can help you detect and replace failing hard drives before they cause permanent data loss. However, there's a couple issues with smartd:

There are more than a hundred S.M.A.R.T attributes, however smartd does not differentiate between critical and informational metrics
smartd does not record S.M.A.R.T attribute history, so it can be hard to determine if an attribute is degrading slowly over time.
S.M.A.R.T attribute thresholds are set by the manufacturer. In some cases these thresholds are unset, or are so high that they can only be used to confirm a failed drive, rather than detecting a drive about to fail.
smartd is a command line only tool. For head-less servers a web UI would be more valuable.

Scrutiny is a Hard Drive Health Dashboard & Monitoring solution, merging manufacturer provided S.M.A.R.T metrics with real-world failure rates.

Features

Scrutiny is a simple but focused application, with a couple of core features:

Web UI Dashboard - focused on Critical metrics
smartd integration (no re-inventing the wheel)
Auto-detection of all connected hard-drives
S.M.A.R.T metric tracking for historical trends
Customized thresholds using real world failure rates
Temperature tracking
Provided as an all-in-one Docker image (but can be installed manually)
Future Configurable Alerting/Notifications via Webhooks
(Future) Hard Drive performance testing & tracking

Getting Started

RAID/Virtual Drives

Scrutiny uses smartctl --scan to detect devices/drives.

All RAID controllers supported by smartctl are automatically supported by Scrutiny.
- While some RAID controllers support passing through the underlying SMART data to smartctl others do not.
- In some cases --scan does not correctly detect the device type, returning incomplete SMART data. Scrutiny will eventually support overriding detected device type via the config file.
If you use docker, you must pass though the RAID virtual disk to the container using --device (see below)
- This device may be in /dev/* or /dev/bus/*.
- If you're unsure, run smartctl --scan on your host, and pass all listed devices to the container.

Docker

If you're using Docker, getting started is as simple as running the following command:

docker run -it --rm -p 8080:8080 \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
--name scrutiny \
analogj/scrutiny

/run/udev is necessary to provide the Scrutiny collector with access to your device metadata
--cap-add SYS_RAWIO is necessary to allow smartctl permission to query your device SMART data
- NOTE: If you have NVMe drives, you must add --cap-add SYS_ADMIN as well. See issue #26
--device entries are required to ensure that your hard disk devices are accessible within the container.
analogj/scrutiny is a omnibus image, containing both the webapp server (frontend & api) as well as the S.M.A.R.T metric collector. (see below)

Hub/Spoke Deployment

In addition to the Omnibus image (available under the latest tag) there are 2 other Docker images available:

analogj/scrutiny:collector - Contains the Scrutiny data collector, smartctl binary and cron-like scheduler. You can run one collector on each server.
analogj/scrutiny:web - Contains the Web UI, API and Database. Only one container necessary

docker run -it --rm -p 8080:8080 \
--name scrutiny-web \
analogj/scrutiny:web

docker run -it --rm \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
-e SCRUTINY_API_ENDPOINT=http://SCRUTINY_WEB_IPADDRESS:8080 \
--name scrutiny-collector \
analogj/scrutiny:collector

Manual Installation (without-Docker)

While the easiest way to get started with Scrutiny is using Docker, it is possible to run it manually without much work. You can even mix and match, using Docker for one component and a manual installation for the other.

See docs/INSTALL_MANUAL.md for instructions.

Usage

Once scrutiny is running, you can open your browser to http://localhost:8080 and take a look at the dashboard.

If you're using the omnibus image, the collector should already have run, and your dashboard should be populate with every drive that Scrutiny detected. The collector is configured to run once a day, but you can trigger it manually by running the command below.

For users of the docker Hub/Spoke deployment or manual install: initially the dashboard will be empty. After the first collector run, you'll be greeted with a list of all your hard drives and their current smart status.

docker exec scrutiny /scrutiny/bin/scrutiny-collector-metrics run

Configuration

By default Scrutiny looks for its YAML configuration files in /scrutiny/config

There are two configuration files available:

Webapp/API config via scrutiny.yaml - example.scrutiny.yaml.
Collector config via collector.yaml - example.collector.yaml.

Neither file is required, however if provided, it allows you to configure how Scrutiny functions.

Notifications

Scrutiny supports sending SMART device failure notifications via the following services:

Custom Script (data provided via environmental variables)
Email
Webhooks
Discord
Gotify
Hangouts
IFTTT
Join
Mattermost
Pushbullet
Pushover
Slack
Teams
Telegram
Tulip

Check the notify.urls section of example.scrutiny.yml for more information and documentation for service specific setup.

Testing Notifications

You can test that your notifications are configured correctly by posting an empty payload to the notifications health check API.

curl -X POST http://localhost:8080/api/health/notify

Debug mode & Log Files

Scrutiny provides various methods to change the log level to debug and generate log files.

Web Server/API

You can use environmental variables to enable debug logging and/or log files for the web server:

DEBUG=true
SCRUTINY_LOG_FILE=/tmp/web.log

You can configure the log level and log file in the config file:

log:
  file: '/tmp/web.log'
  level: DEBUG

Or if you're not using docker, you can pass CLI arguments to the web server during startup:

scrutiny start --debug --log-file /tmp/web.log

Collector

You can use environmental variables to enable debug logging and/or log files for the collector:

DEBUG=true
COLLECTOR_LOG_FILE=/tmp/collector.log

Or if you're not using docker, you can pass CLI arguments to the collector during startup:

scrutiny-collector-metrics run --debug --log-file /tmp/collector.log

Contributing

Please see the CONTRIBUTING.md for instructions for how to develop and contribute to the scrutiny codebase.

Work your magic and then submit a pull request. We love pull requests!

If you find the documentation lacking, help us out and update this README.md. If you don't have the time to work on Scrutiny, but found something we should know about, please submit an issue.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

Jason Kulatunga - Initial Development - @AnalogJ

Licenses

MIT
Logo: Glasses by matias porta lezcano

Sponsors

Scrutiny is only possible with the help of my Github Sponsors.

They read a simple reddit announcement post and decided to trust & finance a developer they've never met. It's an exciting and incredibly humbling experience.

If you found Scrutiny valuable, please consider supporting my work

Owner

Jason Kulatunga

Devops/Automation guy. I build tools so you don't have to. I build things. Then I break them. Most of the time I fix them again.

https://github.com/AnalogJ/scrutiny

Comments

NVMe drives not correctly detected by Scrutiny

Output of

root@c43726ce532e:/scrutiny# smartctl -j -x /dev/nvme0

{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      0
    ],
    "svn_revision": "4883",
    "platform_info": "x86_64-linux-4.19.107-Unraid",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-j",
      "-x",
      "/dev/nvme0"
    ],
    "exit_status": 0
  },
  "device": {
    "name": "/dev/nvme0",
    "info_name": "/dev/nvme0",
    "type": "nvme",
    "protocol": "NVMe"
  },
  "model_name": "Force MP510",
  "serial_number": "yes",
  "firmware_version": "ECFM12.3",
  "nvme_pci_vendor": {
    "id": 6535,
    "subsystem_id": 6535
  },
  "nvme_ieee_oui_identifier": 6584743,
  "nvme_total_capacity": 480103981056,
  "nvme_unallocated_capacity": 0,
  "nvme_controller_id": 1,
  "nvme_number_of_namespaces": 1,
  "nvme_namespaces": [
    {
      "id": 1,
      "size": {
        "blocks": 937703088,
        "bytes": 480103981056
      },
      "capacity": {
        "blocks": 937703088,
        "bytes": 480103981056
      },
      "utilization": {
        "blocks": 937703088,
        "bytes": 480103981056
      },
      "formatted_lba_size": 512,
      "eui64": {
        "oui": 6584743,
        "ext_id": 171819811633
      }
    }
  ],
  "user_capacity": {
    "blocks": 937703088,
    "bytes": 480103981056
  },
  "logical_block_size": 512,
  "local_time": {
    "time_t": 1600380529,
    "asctime": "Thu Sep 17 22:08:49 2020 Europe"
  },
  "smart_status": {
    "passed": true,
    "nvme": {
      "value": 0
    }
  },
  "nvme_smart_health_information_log": {
    "critical_warning": 0,
    "temperature": 38,
    "available_spare": 100,
    "available_spare_threshold": 5,
    "percentage_used": 1,
    "data_units_read": 6734413,
    "data_units_written": 15749028,
    "host_reads": 29048027,
    "host_writes": 17155968,
    "controller_busy_time": 298,
    "power_cycles": 4,
    "power_on_hours": 6420,
    "unsafe_shutdowns": 4,
    "media_errors": 0,
    "num_err_log_entries": 8382,
    "warning_temp_time": 0,
    "critical_comp_time": 0
  },
  "temperature": {
    "current": 38
  },
  "power_cycle_count": 4,
  "power_on_time": {
    "hours": 6420
  }
}

[BUG] Crashes on boot

Describe the bug After updating to 0.4.9-omnibus, scrutiny can no longer boot - it always crashes as it starts up.

System

Ubuntu 22.04 arm64
Docker Compose

  scrutiny:
    container_name: scrutiny
    image: ghcr.io/analogj/scrutiny:v0.4.9-omnibus
    privileged: true
    volumes:
      - /run/udev:/run/udev:ro
      - /dev:/dev
      - scrutiny-config:/opt/scrutiny/config
      - scrutiny-db:/opt/scrutiny/influxdb
    networks:
      - scrutiny-nginx

Log Files

 ___   ___  ____  __  __  ____  ____  _  _  _  _
/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
github.com/AnalogJ/scrutiny                             dev-0.4.9

Start the scrutiny server
time="2022-06-05T18:02:42Z" level=info msg="Successfully connected to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n"
panic: failed to check influxdb setup status - Get "http://localhost:8086/api/v2/setup": dial tcp: lookup localhost: device or resource busy

goroutine 1 [running]:
github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware({0x103a6c0, 0x4000010ce8}, {0x1043620, 0x4000430070})
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xd4
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0x400042a630, {0x1043620, 0x4000430070})
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:27 +0x90
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0x400042a630)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:105 +0x530
main.main.func2(0x40003ffcc0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x288
github.com/urfave/cli/v2.(*Command).Run(0x400042e120, 0x40003ffb40)
	/go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/command.go:164 +0x648
github.com/urfave/cli/v2.(*App).RunContext(0x40002fc480, {0x1026870, 0x400003a028}, {0x4000032060, 0x2, 0x2})
	/go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/app.go:306 +0x840
github.com/urfave/cli/v2.(*App).Run(...)
	/go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/app.go:215
main.main()
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x73c

[BUG] Latest image crashes on startup

I just switched over to the docker images ghcr.io/analogj/scrutiny:master-web and ghcr.io/analogj/scrutiny:master-collector since the docker hub ones have been taken down. Now my web instance is crashing on startup with this error message:

goroutine 1 [running]:
github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware(0x129f920, 0xc00038a070, 0x12a4b00, 0xc0003faa80, 0x129f9a0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xe6
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000385610, 0x12a4b00, 0xc0003faa80, 0x1)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xd8
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000385610, 0x0, 0x0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:97 +0x234
main.main.func2(0xc000387340, 0x4, 0x6)
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x198
github.com/urfave/cli/v2.(*Command).Run(0xc0003ef200, 0xc0003871c0, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:164 +0x4e0
github.com/urfave/cli/v2.(*App).RunContext(0xc0003fe000, 0x128e820, 0xc0000c8010, 0xc0000be020, 0x2, 0x2, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:306 +0x814
github.com/urfave/cli/v2.(*App).Run(...)
	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:215
main.main()
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x65a
2022/05/13 14:38:05 Loading configuration file: /opt/scrutiny/config/scrutiny.yaml
time="2022-05-13T14:38:05Z" level=info msg="Trying to connect to scrutiny sqlite db: \n"
time="2022-05-13T14:38:05Z" level=info msg="Successfully connected to scrutiny sqlite db: \n"
panic: a username and password is required for a setup

There is no mention in the readme or the examble configs of a username/password, so what are the credentials that the application is missing and crashing over? Also, this error feels a lot like something that should be handled by the application and an informative error presented to the user.

[Feature] Add support for additional arguments when smartctl is executed - Seagate drives use 48 bit raw values and only the first 16 bits are the error data

Describe the bug Seagate Ironwolf drives show as FAILED with high seek and read error counts

Expected behavior

Some way to configure per drive some extra arguments to smartctl calls.

Seagate ironwolfs use a 48 bit value that is made up of 16 bits of error count and 32 bit of total count of read or seek events.

For smartctl I have to manually specify the correct bits to read from: smartctl /dev/sdb -a -v 1,raw48:54 -v 7,raw48:54

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   067   044    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   085   080   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       112
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   071   060   045    Pre-fail  Always       -       0

And smartctl without the specification:

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   067   044    Pre-fail  Always       -       200450784
  3 Spin_Up_Time            0x0003   085   080   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       112
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   071   060   045    Pre-fail  Always       -       12399940

The 200450784 value above is 0xBF2A2E0, which is only 28 bits of data (so only part of the count, not the error), the full hex would be: 00000BF2A2E0 where it would then be split as [0000][0BF2A2E0] and 0 is the actual value of Raw_Read_Error_Rate

Screenshots

[BUG]smartctl checksum errors

Hi,

i'm using the linuxserver.io docker image (latest tag) and currently am getting the following errors when running "scrutiny-collector-metrics run"

`root@abc9cc899866:/# scrutiny-collector-metrics run

/ ) / )( _ ( )( )( )( )( ( )( / ) _ ( ( ) / )()( )( )( ) ( \ / (/ _)()_)() () ()()_) () AnalogJ/scrutiny/metrics dev-0.1.13

INFO[0000] Verifying required tools type=metrics INFO[0000] Sending detected devices to API, for filtering & validation type=metrics INFO[0000] Main: Waiting for workers to finish type=metrics INFO[0000] Collecting smartctl results for sdd type=metrics INFO[0000] Collecting smartctl results for sda type=metrics INFO[0000] Collecting smartctl results for sdb type=metrics INFO[0000] Collecting smartctl results for sdc type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-4.14.24-qnap", "build_info": "(local build)", "argv": [ "smartctl", "-a", "-j", "/dev/sda" ], "exit_status": 4 }, "device": { "name": "/dev/sda", "info_name": "/dev/sda", "type": "scsi", "protocol": "SCSI" }, "vendor": "WDC", "product": "WD100EMAZ-00WJTA", "model_name": "WDC WD100EMAZ-00WJTA", "revision": "83.H", "scsi_version": "SPC-3", "user_capacity": { "blocks": 19532873728, "bytes": 10000831348736 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5400, "form_factor": { "scsi_value": 2, "name": "3.5 inches" }, "serial_number": "2YJDN6SD", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1601292556, "asctime": "Mon Sep 28 20:29:16 2020 KST" }, "temperature": { "current": 0, "drive_trip": 0 } } ERRO[0000] smartctl returned an error code (4) while processing sda type=metrics ERRO[0000] smartctl detected a checksum error type=metrics INFO[0000] Publishing smartctl results for unknown type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-4.14.24-qnap", "build_info": "(local build)", "argv": [ "smartctl", "-a", "-j", "/dev/sdb" ], "exit_status": 4 }, "device": { "name": "/dev/sdb", "info_name": "/dev/sdb", "type": "scsi", "protocol": "SCSI" }, "vendor": "WDC", "product": "WD100EMAZ-00WJTA", "model_name": "WDC WD100EMAZ-00WJTA", "revision": "83.H", "scsi_version": "SPC-3", "user_capacity": { "blocks": 19532873728, "bytes": 10000831348736 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5400, "form_factor": { "scsi_value": 2, "name": "3.5 inches" }, "serial_number": "2YJ8S5BD", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1601292556, "asctime": "Mon Sep 28 20:29:16 2020 KST" }, "temperature": { "current": 0, "drive_trip": 0 } } ERRO[0000] smartctl returned an error code (4) while processing sdb type=metrics ERRO[0000] smartctl detected a checksum error type=metrics INFO[0000] Publishing smartctl results for unknown type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-4.14.24-qnap", "build_info": "(local build)", "argv": [ "smartctl", "-a", "-j", "/dev/sdd" ], "exit_status": 4 }, "device": { "name": "/dev/sdd", "info_name": "/dev/sdd", "type": "scsi", "protocol": "SCSI" }, "vendor": "WDC", "product": "WD100EMAZ-00WJTA", "model_name": "WDC WD100EMAZ-00WJTA", "revision": "83.H", "scsi_version": "SPC-3", "user_capacity": { "blocks": 19532873728, "bytes": 10000831348736 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5400, "form_factor": { "scsi_value": 2, "name": "3.5 inches" }, "serial_number": "2YJDUTKD", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1601292556, "asctime": "Mon Sep 28 20:29:16 2020 KST" }, "temperature": { "current": 0, "drive_trip": 0 } } ERRO[0000] smartctl returned an error code (4) while processing sdd type=metrics ERRO[0000] smartctl detected a checksum error type=metrics INFO[0000] Publishing smartctl results for unknown type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-4.14.24-qnap", "build_info": "(local build)", "argv": [ "smartctl", "-a", "-j", "/dev/sdc" ], "exit_status": 4 }, "device": { "name": "/dev/sdc", "info_name": "/dev/sdc", "type": "scsi", "protocol": "SCSI" }, "vendor": "WDC", "product": "WD100EMAZ-00WJTA", "model_name": "WDC WD100EMAZ-00WJTA", "revision": "83.H", "scsi_version": "SPC-3", "user_capacity": { "blocks": 19532873728, "bytes": 10000831348736 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5400, "form_factor": { "scsi_value": 2, "name": "3.5 inches" }, "serial_number": "JEHN4M1N", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1601292556, "asctime": "Mon Sep 28 20:29:16 2020 KST" }, "temperature": { "current": 0, "drive_trip": 0 } } ERRO[0000] smartctl returned an error code (4) while processing sdc type=metrics ERRO[0000] smartctl detected a checksum error type=metrics INFO[0000] Publishing smartctl results for unknown type=metrics INFO[0001] Main: Completed type=metrics root@abc9cc899866:/# `

After running, I can only see /dev/sda in the web UI and it has no details (SMART reports as failed).

I'm running this on a QNAP TS453Be.

Thanks,
[BUG] Cron not working

Describe the bug The cron job doesn't not work (anymore) The collector / webapp combo still works if I trigger it manually, but for some reason the cron job doesn't work anymore (it was working before) I haven't changed my setup (apart from update the images) and I can't see anything wrong on the logs.... Happy to provide you with logs or anything that you need...

Expected behavior The webapp should update daily with new metrics

Current behavior The webapp does not get updated unless you run scrutiny-collector-metrics run in each collector

[BUG] Collector: pfsense "Command not found."

Describe the bug When running the collector from pfsense shell, as admin or root, after making it exectuable, I get the error "Command not found." To be fair, I'm not fluent with *BSD, so I'm not sure if this is even possible on pfsense.

Expected behavior Collector should run.

Screenshots NA

Log Files

[2.5.2-RELEASE][admin@pfsense]/opt/scrutiny/bin: ls -l
total 5
-rwxrwxrwx  1 root  wheel  648 Nov  8 11:08 scrutiny-collector-metrics-freebsd-amd64

[2.5.2-RELEASE][admin@pfsense]/opt/scrutiny/bin: scrutiny-collector-metrics-freebsd-amd64 
scrutiny-collector-metrics-freebsd-amd64: Command not found.

and

[2.5.2-RELEASE][admin@pfsense]/opt/scrutiny/bin: su root
# scrutiny-collector-metrics-freebsd-amd64 
su: scrutiny-collector-metrics-freebsd-amd64: not found

[BUG] No Data for LSI MegaRaid /dev/bus/0

Describe the bug I tried configuring my LSI Megaraid Controller with Scrutiny by using the following configurations. I can't add /dev/bus/0 to the devices directly because then i get: ERROR: for scrutiny Cannot start service scrutiny: error gathering device information while adding custom device "/dev/bus/0": no such file or directory

What am i doing wrong?

docker-compose.yml

version: '3.5'
services:
  scrutiny:
    container_name: scrutiny
    image: ghcr.io/analogj/scrutiny:master-omnibus
    cap_add:
      - SYS_RAWIO
    volumes:
      - /run/udev:/run/udev:ro
      - /storage/volumes/scrutiny-config:/opt/scrutiny/config
      - /storage/volumes/scrutiny-data:/opt/scrutiny/influxdb
    devices:
      - "/dev/sda"
      - "/dev/sdb"
      - "/dev/bus" # /dev/bus/0 can not be found because its a pseudo device

and

collector.yaml

version: 1
host:
  id: ""
devices:
  - device: /dev/sda
    type: "scsi"
  - device: /dev/sdb
    type: "scsi"
  - device: /dev/bus/0
    type:
      - megaraid,1
      - megaraid,2
      - megaraid,3
      - megaraid,6
      - megaraid,7
      - megaraid,8

Expected behavior Should detect 6 raid disks and one physical disk.

Screenshots

Smartctl scan

{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      3
    ],
    "svn_revision": "5338",
    "platform_info": "x86_64-linux-5.17.0-2-amd64",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "--scan",
      "-j"
    ],
    "exit_status": 0
  },
  "devices": [
    {
      "name": "/dev/sda",
      "info_name": "/dev/sda",
      "type": "scsi",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/sdb",
      "info_name": "/dev/sdb",
      "type": "scsi",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_01]",
      "type": "megaraid,1",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_02]",
      "type": "megaraid,2",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_03]",
      "type": "megaraid,3",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_06]",
      "type": "megaraid,6",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_07]",
      "type": "megaraid,7",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_08]",
      "type": "megaraid,8",
      "protocol": "SCSI"
    }
  ]
}

[FEAT] Add support to FreeBSD

Hi, I'm running several FreeNAS systems and this tool looks great. The problem is that FreeNAS is running on FreeBSD and from what I have seen it is not currently possible to run on FreeBSD. And FreeNAS seems to be among the systems best suited for such a tool. I would love to know if there is an idea in the near future to support FreeBSD systems etc. Thanks Itay By the way, is there a plan for adding Dark Mode?
Testing influx version and get password and username not found
This is run on a new docker container now settings from old version where left.

`Docker:~/docker-compose$ docker logs scrutiny [s6-init] making user provided files available at /var/run/s6/etc...exited 0. [s6-init] ensuring user provided files have correct perms...exited 0. [fix-attrs.d] applying ownership & permissions fixes... [fix-attrs.d] done. [cont-init.d] executing container initialization scripts... [cont-init.d] 01-timezone: executing... [cont-init.d] 01-timezone: exited 0. [cont-init.d] 50-config: executing... [cont-init.d] 50-config: exited 0. [cont-init.d] done. [services.d] starting services waiting for influxdb waiting for scrutiny service to start starting cron [services.d] done. starting influxdb influxdb not ready scrutiny api not ready ts=2022-05-08T15:12:02.531330Z lvl=info msg="Welcome to InfluxDB" log_id=0aL3UZtl000 version=v2.2.0 commit=a2f8538837 build_date=2022-04-06T17:36:40Z ts=2022-05-08T15:12:02.535848Z lvl=info msg="Resources opened" log_id=0aL3UZtl000 service=bolt path=/scrutiny/influxdb/influxd.bolt ts=2022-05-08T15:12:02.535900Z lvl=info msg="Resources opened" log_id=0aL3UZtl000 service=sqlite path=/scrutiny/influxdb/influxd.sqlite ts=2022-05-08T15:12:02.536757Z lvl=info msg="Bringing up metadata migrations" log_id=0aL3UZtl000 service="KV migrations" migration_count=19 ts=2022-05-08T15:12:02.615243Z lvl=info msg="Bringing up metadata migrations" log_id=0aL3UZtl000 service="SQL migrations" migration_count=5 ts=2022-05-08T15:12:02.629517Z lvl=info msg="Using data dir" log_id=0aL3UZtl000 service=storage-engine service=store path=/scrutiny/influxdb/engine/data ts=2022-05-08T15:12:02.629583Z lvl=info msg="Compaction settings" log_id=0aL3UZtl000 service=storage-engine service=store max_concurrent_compactions=8 throughput_bytes_per_second=50331648 throughput_bytes_per_second_burst=50331648 ts=2022-05-08T15:12:02.629595Z lvl=info msg="Open store (start)" log_id=0aL3UZtl000 service=storage-engine service=store op_name=tsdb_open op_event=start ts=2022-05-08T15:12:02.629632Z lvl=info msg="Open store (end)" log_id=0aL3UZtl000 service=storage-engine service=store op_name=tsdb_open op_event=end op_elapsed=0.039ms ts=2022-05-08T15:12:02.629654Z lvl=info msg="Starting retention policy enforcement service" log_id=0aL3UZtl000 service=retention check_interval=30m ts=2022-05-08T15:12:02.629659Z lvl=info msg="Starting precreation service" log_id=0aL3UZtl000 service=shard-precreation check_interval=10m advance_period=30m ts=2022-05-08T15:12:02.630081Z lvl=info msg="Starting query controller" log_id=0aL3UZtl000 service=storage-reads concurrency_quota=1024 initial_memory_bytes_quota_per_query=9223372036854775807 memory_bytes_quota_per_query=9223372036854775807 max_memory_bytes=0 queue_size=1024 ts=2022-05-08T15:12:02.631198Z lvl=info msg="Configuring InfluxQL statement executor (zeros indicate unlimited)." log_id=0aL3UZtl000 max_select_point=0 max_select_series=0 max_select_buckets=0 ts=2022-05-08T15:12:02.636081Z lvl=info msg=Listening log_id=0aL3UZtl000 service=tcp-listener transport=http addr=:8086 port=8086 scrutiny api not ready starting scrutiny 2022/05/08 09:12:07 No configuration file found at /scrutiny/config/scrutiny.yaml. Using Defaults. time="2022-05-08T09:12:07-06:00" level=info msg="Trying to connect to scrutiny sqlite db: \n"

/ ) / )( _ ( )( )( )( )( ( )( / ) _ ( ( ) / )()( )( )( ) ( \ / (/ _)()_)() () ()()_) () github.com/AnalogJ/scrutiny dev-0.3.12

Start the scrutiny server [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

using env: export GIN_MODE=release

using code: gin.SetMode(gin.ReleaseMode)

time="2022-05-08T09:12:07-06:00" level=info msg="Successfully connected to scrutiny sqlite db: \n" panic: a username and password is required for a setup

goroutine 1 [running]: github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware(0x129e540, 0xc000114078, 0x12a3720, 0xc000482230, 0x129e5c0) /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xe6 github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000113290, 0x12a3720, 0xc000482230, 0x1) /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xcf github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000113290, 0x0, 0x0) /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:91 +0x234 main.main.func2(0xc00011b380, 0x4, 0x6) /go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x198 github.com/urfave/cli/v2.(*Command).Run(0xc000484480, 0xc00011b200, 0x0, 0x0) /go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:164 +0x4e0 github.com/urfave/cli/v2.(*App).RunContext(0xc000102600, 0x128d440, 0xc0001a8010, 0xc0001a0020, 0x2, 0x2, 0x0, 0x0) /go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:306 +0x814 github.com/urfave/cli/v2.(*App).Run(...) /go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:215 main.main() /go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x65a waiting for influxdb starting scrutiny 2022/05/08 09:12:07 No configuration file found at /scrutiny/config/scrutiny.yaml. Using Defaults. `

[FEAT] Add Instructions for Bring-your-own-InfluxDB with restricted access token

I want to use my existing influxdb in the same server, but it is not working and I need some help.

This is my compose file:

 scrutiny:
   image: ghcr.io/analogj/scrutiny:master-omnibus
   container_name: scrutiny
   cap_add:
     - SYS_RAWIO
   volumes:
     - /docker/scrutiny:/opt/scrutiny/config
     - /run/udev:/run/udev:ro
   ports:
     - 8017:8080
   devices:
     - /dev/sda:/dev/sda
     - /dev/sdb:/dev/sdb
     - /dev/sdc:/dev/sdc
     - /dev/sdd:/dev/sdd
     - /dev/sde:/dev/sde
   restart: unless-stopped

This is my scrutiny.yaml file

log:
  file: ""
  level: INFO
notify:
  urls: []
web:
  database:
    location: /opt/scrutiny/config/scrutiny.db
  influxdb:
    bucket: scrutiny
    host: MY-SERVER-LOCAL-IP
    org: MYORG
    port: "8086"
    retention_policy: true
    token: MY-TOKEN
  listen:
    basepath: ""
    host: 0.0.0.0
    port: "8080"
  src:
    frontend:
      path: /opt/scrutiny/web

I have configured a new bucket in my influxdb instance called scrutiny and a new API token with read and write permissions in that bucket.

In logs, there is this error:

panic: organization 'MYORG' not found

goroutine 1 [running]:
github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware(0x129f920, 0xc000408088, 0x12a4b00, 0xc000471650, 0x129f9a0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xe6
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000405ad0, 0x12a4b00, 0xc000471650, 0x14)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xd8
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000405ad0, 0x0, 0x0)
	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:97 +0x234
main.main.func2(0xc00040f400, 0x4, 0x6)
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x198
github.com/urfave/cli/v2.(*Command).Run(0xc0004737a0, 0xc00040f280, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:164 +0x4e0
github.com/urfave/cli/v2.(*App).RunContext(0xc000484000, 0x128e820, 0xc000130010, 0xc000126020, 0x2, 0x2, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:306 +0x814
github.com/urfave/cli/v2.(*App).Run(...)
	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:215
main.main()
	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x65a

It seems it cannot found my organization? The name is correct, I have other services working fine with that influxedb instance.

I am missing something?

Tutorial: SMART Monitoring with Scrutiny across machines
S.M.A.R.T. Monitoring with Scrutiny across machines

🤔 The problem:

Scrutiny offers a nice Docker package called "Omnibus" that can monitor HDDs attached to a Docker host with relative ease. Scrutiny can also be installed in a Hub-Spoke layout where Web interface, Database and Collector come in 3 separate packages. The official documentation assumes that the spokes in the "Hub-Spokes layout" run Docker, which is not always the case. The third approach is to install Scrutiny manually, entirely outside of Docker.

💡 The solution:

This tutorial provides a hybrid configuration where the Hub lives in a Docker instance while the spokes have only Scrutiny Collector installed manually. The Collector periodically send data to the Hub. It's not mind-boggling hard to understand but someone might struggle with the setup. This is for them.

🖥️ My setup:

I have a Proxmox cluster where one VM runs Docker and all monitoring services - Grafana, Prometheus, various exporters, InfluxDB and so forth. Another VM runs the NAS - OpenMediaVault v6, where all hard drives reside. The Scrutiny Collector is triggered every 30min to collect data on the drives. The data is sent to the Docker VM, running InfluxDB.

Setting up the Hub

The Hub consists of Scrutiny Web - a web interface for viewing the SMART data. And InfluxDB, where the smartmon data is stored.

🔗This is the official Hub-Spoke layout in docker-compose. We are going to reuse parts of it. The ENV variables provide the necessary configuration for the initial setup, both for InfluxDB and Scrutiny.

If you are working with and existing InfluxDB instance, you can forgo all the INIT variables as they already exist.

The official Scrutiny documentation has a sample scrutiny.yamlfile that normally contains the connection and notification details but I always find it easier to configure as much as possible in the docker-compose.

version: "3.4" networks: monitoring: # A common network for all monitoring services to communicate into external: true notifications: # To Gotify or another Notification service external: true services: influxdb: container_name: influxdb image: influxdb:2.1-alpine ports: - 8086:8086 volumes: - ${DIR_CONFIG}/influxdb2/db:/var/lib/influxdb2 - ${DIR_CONFIG}/influxdb2/config:/etc/influxdb2 environment: - DOCKER_INFLUXDB_INIT_MODE=setup - DOCKER_INFLUXDB_INIT_USERNAME=Admin - DOCKER_INFLUXDB_INIT_PASSWORD=${PASSWORD} - DOCKER_INFLUXDB_INIT_ORG=homelab - DOCKER_INFLUXDB_INIT_BUCKET=scrutiny - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=your-very-secret-token restart: unless-stopped networks: - monitoring scrutiny: container_name: scrutiny image: ghcr.io/analogj/scrutiny:master-web ports: - 8080:8080 volumes: - ${DIR_CONFIG}/scrutiny/config:/opt/scrutiny/config environment: - SCRUTINY_WEB_INFLUXDB_HOST=influxdb - SCRUTINY_WEB_INFLUXDB_PORT=8086 - SCRUTINY_WEB_INFLUXDB_TOKEN=your-very-secret-token - SCRUTINY_WEB_INFLUXDB_ORG=homelab - SCRUTINY_WEB_INFLUXDB_BUCKET=scrutiny # Optional but highly recommended to notify you in case of a problem - SCRUTINY_WEB_NOTIFY_URLS=["http://gotify:80/message?token=a-gotify-token"] depends_on: - influxdb restart: unless-stopped networks: - notifications - monitoring

A freshly initialized Scrutiny instance can be accessed on port 8080, eg. 192.168.0.100:8080. The interface will be empty because no metrics have been collected yet.

Setting up a Spoke without Docker

A spoke consists of the Scrutiny Collector binary that is run on a set interval via crontab and sends the data to the Hub. The official documentation describes the manual setup of the Collector - dependencies and step by step commands. I have a shortened version that does the same thing but in one line of code.

# Installing dependencies apt install smartmontools -y # 1. Create directory for the binary # 2. Download the binary into that directory # 3. Make it exacutable # 4. List the contents of the library for confirmation mkdir -p /opt/scrutiny/bin && \ curl -L https://github.com/AnalogJ/scrutiny/releases/download/v0.5.0/scrutiny-collector-metrics-linux-amd64 > /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 && \ chmod +x /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 && \ ls -lha /opt/scrutiny/bin

When downloading Github Release Assests, make sure that you have the correct version. The provided example is with Release v0.5.0. [The release list can be found here.](https://github.com/analogj/scrutiny/releases)

Once the Collector is installed, you can run it with the following command. Make sure to add the correct address and port of your Hub as --api-endpoint.

/opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 run --api-endpoint "http://192.168.0.100:8080"

This will run the Collector once and populate the Web interface of your Scrutiny instance. In order to collect metrics for a time series, you need to run the command repeatedly. Here is an example for crontab, running the Collector every 15min.

# open crontab crontab -e # add a line for Scrutiny */15 * * * * /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 run --api-endpoint "http://192.168.0.100:8080"

The Collector has its own independent config file that lives in /opt/scrutiny/config/collector.yaml but I did not find a need to modify it. A default collector.yaml can be found in the official documentation.

Setting up a Spoke with Docker

Setting up a remote Spoke in Docker requires you to split the official Hub-Spoke layout docker-compose.yml. In the following docker-compose you need to provide the ${API_ENDPOINT}, in my case http://192.168.0.100:8080. Also all drives that you wish to monitor need to be presented to the container under devices.

The image handles the periodic scanning of the drives.

version: "3.4" services: collector: image: 'ghcr.io/analogj/scrutiny:master-collector' cap_add: - SYS_RAWIO volumes: - '/run/udev:/run/udev:ro' environment: COLLECTOR_API_ENDPOINT: ${API_ENDPOINT} devices: - "/dev/sda" - "/dev/sdb"
[BUG] No temps logged between hours of 0100 and 1700

Describe the bug Scrutiny doesn't log temperatures between the hours of 0100 and 1700. It DOES log temperatures hourly from 1700-0100.

Expected behavior Temperatures are logged hourly, 24 hours a day.

Log Files I'm running hub-and-spoke, with one spoke on the same machine as the hub and one spoke on a remote machine connected via VPN. Hub is an RPI 4, spoke is an RPI 3, both running aarch64 Arch Linux. Log files are attached for each, and the output of /api/summary. rpi3-collector.log rpi4-collector.log api-summary.txt

docker info RPI 4 (hub and spoke) Client: Context: default Debug Mode: false Plugins: compose: Docker Compose (Docker Inc., 2.13.0)

Server: Containers: 3 Running: 3 Paused: 0 Stopped: 0 Images: 3 Server Version: 20.10.21 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2 Default Runtime: runc Init Binary: docker-init containerd version: 770bd0108c32f3fb5c73ae1264f7e503fe7b2661.m runc version: init version: de40ad0 Security Options: seccomp Profile: default cgroupns Kernel Version: 5.19.7-1-aarch64-ARCH Operating System: Arch Linux ARM OSType: linux Architecture: aarch64 CPUs: 4 Total Memory: 7.614GiB Name: rpi4 ID: TGJZ:IXFZ:ZOAI:3AKS:D3JZ:FOIE:UT6S:QRTI:Y6DW:EZR7:77FE:KXKH Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

RPI 3 (spoke) Client: Context: default Debug Mode: false Plugins: compose: Docker Compose (Docker Inc., 2.13.0)

Server: Containers: 1 Running: 1 Paused: 0 Stopped: 0 Images: 1 Server Version: 20.10.21 Storage Driver: devicemapper Pool Name: docker-179:2-1443081-pool Pool Blocksize: 65.54kB Base Device Size: 10.74GB Backing Filesystem: ext4 Udev Sync Supported: true Data file: /dev/loop0 Metadata file: /dev/loop1 Data loop file: /var/lib/docker/devicemapper/devicemapper/data Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata Data Space Used: 465.5MB Data Space Total: 107.4GB Data Space Available: 28.3GB Metadata Space Used: 17.79MB Metadata Space Total: 2.147GB Metadata Space Available: 2.13GB Thin Pool Minimum Free Space: 10.74GB Deferred Removal Enabled: true Deferred Deletion Enabled: true Deferred Deleted Device Count: 0 Library Version: 1.02.187 (2022-11-10) Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2 Default Runtime: runc Init Binary: docker-init containerd version: 770bd0108c32f3fb5c73ae1264f7e503fe7b2661.m runc version: init version: de40ad0 Security Options: seccomp Profile: default cgroupns Kernel Version: 5.19.8-1-aarch64-ARCH Operating System: Arch Linux ARM OSType: linux Architecture: aarch64 CPUs: 4 Total Memory: 894.5MiB Name: rpi3 ID: 4ZJX:WUJX:XF3A:L4NJ:3QTR:U6RP:QZBQ:PYCD:TWPX:QDCE:IDYJ:FWII Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

[FEAT] mdadm RAID status

Is your feature request related to a problem? Please describe.

It would be awesome if Scrutiny could show the basic status of any standard mdadm RAID devices.

Describe the solution you'd like

If mdadm RAID array is detected it could be shown similar to an individual hard drive with the basic raid health information, e.g.

Device name (e.g. /dev/md0)
RAID Typology (1,6,1+0 etc...)
Disks in the array (could even link to the physical disks already shown)
- Disks active in the array
- Disks marked as failed in the array
- Disks spare in the array
RAID rebuild rate
Bitmap enabled (t/f)
Which disk has which sync set (e.g. set-A /dev/sdb1)

Example mdstat output:

cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sde1[4] sdf1[5] sdc1[2] sdb1[0] sdd1[1] sda1[3]
      29298911232 blocks super 1.2 512K chunks 2 near-copies [6/6] [UUUUUU]
      bitmap: 0/110 pages [0KB], 131072KB chunk

Example mdadm --detail output:

mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Fri Jul  5 18:06:18 2019
        Raid Level : raid10
        Array Size : 29298911232 (27.29 TiB 30.00 TB)
     Used Dev Size : 9766303744 (9.10 TiB 10.00 TB)
      Raid Devices : 6
     Total Devices : 6
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Dec 15 11:03:24 2022
             State : clean
    Active Devices : 6
   Working Devices : 6
    Failed Devices : 0
     Spare Devices : 0

            Layout : near=2
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : nas:0  (local to host nas)
              UUID : 18a081b2:0264df3b:88465833:1bf34071
            Events : 315607

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync set-A   /dev/sdb1
       1       8       49        1      active sync set-B   /dev/sdd1
       2       8       33        2      active sync set-A   /dev/sdc1
       3       8        1        3      active sync set-B   /dev/sda1
       5       8       81        4      active sync set-A   /dev/sdf1
       4       8       65        5      active sync set-B   /dev/sde1

Bump express from 4.17.1 to 4.18.2 in /webapp/frontend
Bumps express from 4.17.1 to 4.18.2.

Release notes

Sourced from express's releases.

4.18.2

Fix regression routing a large stack in a single route

deps: [email protected]

deps: [email protected]

perf: remove unnecessary object clone

deps: [email protected]

4.18.1

Fix hanging on large stack of sync routes

4.18.0

Add "root" option to res.download

Allow options without filename in res.download

Deprecate string and non-integer arguments to res.status

Fix behavior of null/undefined as maxAge in res.cookie

Fix handling very large stacks of sync middleware

Ignore Object.prototype values in settings through app.set/app.get

Invoke default with same arguments as types in res.format

Support proper 205 responses using res.send

Use http-errors for res.format error

deps: [email protected]

Fix error message for json parse whitespace in strict

Fix internal error when inflated body exceeds limit

Prevent loss of async hooks context

Prevent hanging when request already read

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

Add priority option

Fix expires option to reject invalid dates

deps: [email protected]

Replace internal eval usage with Function constructor

Use instance methods on process to check for listeners

deps: [email protected]

Remove set content headers that break response

deps: [email protected]

deps: [email protected]

deps: [email protected]

Prevent loss of async hooks context

deps: [email protected]

deps: [email protected]

Fix emitted 416 error missing headers property

Limit the headers removed for 304 response

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

... (truncated)

Changelog

Sourced from express's changelog.

4.18.2 / 2022-10-08

Fix regression routing a large stack in a single route

deps: [email protected]

deps: [email protected]

perf: remove unnecessary object clone

deps: [email protected]

4.18.1 / 2022-04-29

Fix hanging on large stack of sync routes

4.18.0 / 2022-04-25

Add "root" option to res.download

Allow options without filename in res.download

Deprecate string and non-integer arguments to res.status

Fix behavior of null/undefined as maxAge in res.cookie

Fix handling very large stacks of sync middleware

Ignore Object.prototype values in settings through app.set/app.get

Invoke default with same arguments as types in res.format

Support proper 205 responses using res.send

Use http-errors for res.format error

deps: [email protected]

Fix error message for json parse whitespace in strict

Fix internal error when inflated body exceeds limit

Prevent loss of async hooks context

Prevent hanging when request already read

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

Add priority option

Fix expires option to reject invalid dates

deps: [email protected]

Replace internal eval usage with Function constructor

Use instance methods on process to check for listeners

deps: [email protected]

Remove set content headers that break response

deps: [email protected]

deps: [email protected]

deps: [email protected]

Prevent loss of async hooks context

deps: [email protected]

deps: [email protected]

... (truncated)

Commits

8368dc1 4.18.2

61f4049 docs: replace Freenode with Libera Chat

bb7907b build: [email protected]

f56ce73 build: [email protected]

24b3dc5 deps: [email protected]

689d175 deps: [email protected]

340be0f build: [email protected]

33e8dc3 docs: use Node.js name style

644f646 build: [email protected]

ecd7572 build: [email protected]

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.
[FEAT] Add the possibility to run and submit metrics immediately/with delay after collector startup

Is your feature request related to a problem? Please describe. When you deploy the collector, you have no feedback that it really works besides the lonely message starting cron. You can play with the cron schedule to test it faster then change it back which is quite some useless work. You can also access the container and run it manually scrutiny-collector-metrics run which is also useless extra work.

Describe the solution you'd like It would be better if the collector runs once immediately at deployment to provide an immediate feedback

Well, maybe run it after a certain delay 20-30s since in swarm mode, the depends_on is not supported so if it starts immediately it might be unable to reach the end point which is still starting up).

ENV_VAR can be used as well to enable the first run as well as fix the delay duration (in seconds).

Additional context -/-
Bump qs from 6.5.2 to 6.5.3 in /webapp/frontend
Bumps qs from 6.5.2 to 6.5.3.

Changelog

Sourced from qs's changelog.

6.5.3

[Fix] parse: ignore __proto__ keys (#428)

[Fix] utils.merge: avoid a crash with a null target and a truthy non-array source

[Fix] correctly parse nested arrays

[Fix] stringify: fix a crash with strictNullHandling and a custom filter/serializeDate (#279)

[Fix] utils: merge: fix crash when source is a truthy primitive & no options are provided

[Fix] when parseArrays is false, properly handle keys ending in []

[Fix] fix for an impossible situation: when the formatter is called with a non-string value

[Fix] utils.merge: avoid a crash with a null target and an array source

[Refactor] utils: reduce observable [[Get]]s

[Refactor] use cached Array.isArray

[Refactor] stringify: Avoid arr = arr.concat(...), push to the existing instance (#269)

[Refactor] parse: only need to reassign the var once

[Robustness] stringify: avoid relying on a global undefined (#427)

[readme] remove travis badge; add github actions/codecov badges; update URLs

[Docs] Clean up license text so it’s properly detected as BSD-3-Clause

[Docs] Clarify the need for "arrayLimit" option

[meta] fix README.md (#399)

[meta] add FUNDING.yml

[actions] backport actions from main

[Tests] always use String(x) over x.toString()

[Tests] remove nonexistent tape option

[Dev Deps] backport from main

Commits

298bfa5 v6.5.3

ed0f5dc [Fix] parse: ignore __proto__ keys (#428)

691e739 [Robustness] stringify: avoid relying on a global undefined (#427)

1072d57 [readme] remove travis badge; add github actions/codecov badges; update URLs

12ac1c4 [meta] fix README.md (#399)

0338716 [actions] backport actions from main

5639c20 Clean up license text so it’s properly detected as BSD-3-Clause

51b8a0b add FUNDING.yml

45f6759 [Fix] fix for an impossible situation: when the formatter is called with a no...

f814a7f [Dev Deps] backport from main

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds

scrutiny

Introduction

Features

Getting Started

RAID/Virtual Drives

Docker

Hub/Spoke Deployment

Manual Installation (without-Docker)

Usage

Configuration

Notifications

Testing Notifications

Debug mode & Log Files

Web Server/API

Collector

Contributing

Versioning

Authors

Licenses

Sponsors

Owner

Jason Kulatunga

Comments

NVMe drives not correctly detected by Scrutiny

[BUG] Crashes on boot

[BUG] Latest image crashes on startup

[Feature] Add support for additional arguments when smartctl is executed - Seagate drives use 48 bit raw values and only the first 16 bits are the error data

[BUG]smartctl checksum errors

[BUG] Cron not working

[BUG] Collector: pfsense "Command not found."

[BUG] No Data for LSI MegaRaid /dev/bus/0

[FEAT] Add support to FreeBSD

Testing influx version and get password and username not found

[FEAT] Add Instructions for Bring-your-own-InfluxDB with restricted access token

Tutorial: SMART Monitoring with Scrutiny across machines

S.M.A.R.T. Monitoring with Scrutiny across machines

🤔 The problem:

💡 The solution:

🖥️ My setup:

Setting up the Hub

Setting up a Spoke without Docker

Setting up a Spoke with Docker

[BUG] No temps logged between hours of 0100 and 1700

[FEAT] mdadm RAID status

Bump express from 4.17.1 to 4.18.2 in /webapp/frontend

4.18.2

4.18.1

4.18.0

4.18.2 / 2022-10-08

4.18.1 / 2022-04-29

4.18.0 / 2022-04-25

[FEAT] Add the possibility to run and submit metrics immediately/with delay after collector startup

Bump qs from 6.5.2 to 6.5.3 in /webapp/frontend

6.5.3

Related tags

An Open Source video surveillance management system for people making this world a safer place.

A hello world project with NES.css and Netlify Functions

A flexible process data collection, metrics, monitoring, instrumentation, and tracing client library for Go

The Prometheus monitoring system and time series database.

A GNU/Linux monitoring and profiling tool focused on single processes.

Open source framework for processing, monitoring, and alerting on time series data

rtop is an interactive, remote system monitoring tool based on SSH

distributed monitoring system

Ping monitoring engine used in https://ping.gg

Simple and extensible monitoring agent / library for Kubernetes: https://gravitational.com/blog/monitoring_kubernetes_satellite/

A system and resource monitoring tool written in Golang!

An open-source and enterprise-level monitoring system.

Distributed simple and robust release management and monitoring system.

Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

checkah is an agentless SSH system monitoring and alerting tool.

mtail - extract internal monitoring data from application logs for collection into a timeseries database

SigNoz helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool

Detecctor is a ⚡ fast, fully customizable 💗 monitoring platform. It uses Telegram as a notification 📥 service

Cloudprober is a monitoring software that makes it super-easy to monitor availability and performance of various components of your system.