A docker container that can be deployed as a sidecar on any kubernetes pod to monitor PSI metrics


Logo

CgroupV2 PSI Sidecar

CgroupV2 PSI Sidecar can be deployed on any kubernetes pod with access to cgroupv2 PSI metrics.

About

This is a docker container that can be deployed as a sidecar on any kubernetes pod to monitor PSI metrics.

Built With

Getting Started

To deploy a sidecar follow these simple steps.

Prerequisites

The host machine for all the nodes on the cluster must be using cgroupv2.

Minimum versions:

  • Docker 20.10
  • Linux 5.2
  • Kubernetes 1.17

Check Availability

Ensure that your machine has cgroupv2 available:

$ grep cgroup /proc/filesystems
nodev	cgroup
nodev	cgroup2

Just because you have cgroupv2 it doesn't mean you are using it. Check that the unified cgroup is enabled by checking the hierarchy.

$ ll /sys/fs/cgroup/
total 0
dr-xr-xr-x   5 root root 0 Oct 31 14:52 ./
drwxr-xr-x  10 root root 0 Oct 31 14:52 ../
-r--r--r--   1 root root 0 Nov  1 08:45 cgroup.controllers
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.max.depth
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.max.descendants
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.procs
-r--r--r--   1 root root 0 Nov  1 08:45 cgroup.stat
-rw-r--r--   1 root root 0 Oct 31 14:52 cgroup.subtree_control
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.threads
-rw-r--r--   1 root root 0 Nov  1 08:45 cpu.pressure
-r--r--r--   1 root root 0 Nov  1 08:45 cpuset.cpus.effective
-r--r--r--   1 root root 0 Nov  1 08:45 cpuset.mems.effective
drwxr-xr-x   2 root root 0 Nov  1 08:45 init.scope/
-rw-r--r--   1 root root 0 Nov  1 08:45 io.cost.model
-rw-r--r--   1 root root 0 Nov  1 08:45 io.cost.qos
-rw-r--r--   1 root root 0 Nov  1 08:45 io.pressure
-rw-r--r--   1 root root 0 Nov  1 08:45 memory.pressure
drwxr-xr-x 106 root root 0 Nov  1 08:45 system.slice/
drwxr-xr-x   3 root root 0 Oct 31 14:52 user.slice/

Note the slice dirs.

If you have cgroupv2 but it isn't enabled the above structure will be available in /sys/fs/cgroup/unified.

Enable cgroupv2

Edit /etc/default/grub and add systemd.unified_cgroup_hierarchy=1 to GRUB_CMDLINE_LINUX Run sudo update-grub and reboot the system.

If cgroupv2 is not available on the system you will have to update the kernel version to meet the prerequisites above.

Build Image

There are two docker files one for regular deployment and the other for debugging. If you want to run the server locally without a container/kubernetes deployment edit sidecar_pid_lookup.go to resolve the systems cgroup dir.

Regular image

  1. docker build -f ./Dockerfile . -t evankrul/cgroup-sc:v.1.2
  2. docker push evankrul/cgroup-sc:v.1.2

Debug image

  1. docker build -f ./Dockerfile.debug . -t evankrul/cgroup-sc:v.1.2-debug
  2. docker push evankrul/cgroup-sc:v.1.2-dubug

Usage

Assuming all the prerequisites have been met and image built and pushed to your favorite repository follow these steps to deploy the sidecar.

In this section I will refer to the monitoring container as the sidecar and the container being monitored as the host container. The sidecar makes use of the shareProcessNamespace option to access the host cgroup metrics. The sidecar has access to process dirs in /proc. The sidecar finds the pid dir of the host by searching the dirs in /proc.

For each dir the sidecar looks at the contents of /proc/{id}/root/etc/pid_flag and checks that it exists and matches the contents of /etc/pid_flag_sc. If a match is found then this is the host container. The pid_flag and pid_flag_sc are mounted in the deployment configuration as a ConfigMap using a VolumeMount.

The service is used to expose the sidecar webserver where the metrics are hosted. If you are not using some kind of service mesh make sure your Prometheus deployment is on the same namespace as your sidecar deployment. Then just point Prometheus to the /metrics endpoint of your pod on the metrics port.

- job_name: 'cgroup_monitor_sc'
        scrape_interval: 1s
        static_configs:
          - targets: ['cgroup-monitor-sc:2333']

Example kubernetes yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stress-ng
  namespace: default
spec:
  selector:
    matchLabels:
      app: stress-ng
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: stress-ng
    spec:
      terminationGracePeriodSeconds: 5
      shareProcessNamespace: true
      containers:
        - name: CONTAINER_TO_BE_MONITORED
          ...
          volumeMounts:
            - name: pid-flag-volume
              mountPath: /etc/pid_flag
        - name: cgroup-monitor-sc
          image: evankrul/cgroup-sc:prom.v.1.2
          imagePullPolicy: Always
          ports:
            - containerPort: 2333
              name: metrics
          securityContext:
            capabilities:
              add:
                - SYS_PTRACE
          env:
            - name: PORT
              value: "2333"
          resources:
            requests:
              cpu: 1
              memory: "500Mi"
            limits:
              cpu: 1
              memory: "500Mi"
          volumeMounts:
            - name: pid-flag-volume
              mountPath: /etc/pid_flag_sc
      volumes:
        - name: pid-flag-volume
          configMap:
            name: pid-flag-config-map
---
#Cgroup config map
kind: ConfigMap
apiVersion: v1
metadata:
  name: pid-flag-config-map
data:
  pid_flag: stess-ng-1
---
#Cgroup Monitor SC Service
apiVersion: v1
kind: Service
metadata:
  name: cgroup-monitor-sc #this will be the Domain name
  namespace: default
spec:
  selector:
    app: stress-ng
  ports:
    - name: stress
      port: 2335
      targetPort: 2335
    - name: metrics
      port: 2333
      targetPort: 2333
  type: LoadBalancer

Data Available

The following PSI metrics are reported to Prometheus and are available for querying.

# HELP cgroup_monitor_sc_monitored_cpu_psi CPU PSI of monitored container
# TYPE cgroup_monitor_sc_monitored_cpu_psi gauge
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="10s"} 0
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="300s"} 0
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="60s"} 0
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="total"} 385

# HELP cgroup_monitor_sc_monitored_io_psi IO PSI of monitored container
# TYPE cgroup_monitor_sc_monitored_io_psi gauge
cgroup_monitor_sc_monitored_io_psi{type="full",window="10s"} 0
cgroup_monitor_sc_monitored_io_psi{type="full",window="300s"} 0
cgroup_monitor_sc_monitored_io_psi{type="full",window="60s"} 0
cgroup_monitor_sc_monitored_io_psi{type="full",window="total"} 330809
cgroup_monitor_sc_monitored_io_psi{type="some",window="10s"} 0
cgroup_monitor_sc_monitored_io_psi{type="some",window="300s"} 0
cgroup_monitor_sc_monitored_io_psi{type="some",window="60s"} 0
cgroup_monitor_sc_monitored_io_psi{type="some",window="total"} 330815

# HELP cgroup_monitor_sc_monitored_mem_psi Mem PSI of monitored container
# TYPE cgroup_monitor_sc_monitored_mem_psi gauge
cgroup_monitor_sc_monitored_mem_psi{type="full",window="10s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="full",window="300s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="full",window="60s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="full",window="total"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="10s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="300s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="60s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="total"} 0

FAQ

Why isn't there any FAQs?

Because I haven't written this section yet.

Will there be FAQs?

Yes, there will be.

When will there be FAQs?

Soon.

Contact

Evan Krul - Website

Similar Resources

Andrews-monitor - A Go program to monitor when times were available to order for Brown's Andrews dining hall. Used during the portion of the pandemic when the dining hall was only available for online order.

Andrews Dining Hall Monitor A Go program to monitor when times were available to order for Brown's Andrews dining hall. Used during the portion of the

Jan 1, 2022

Display (Namespace, Pod, Container, Primary PID) from a host PID, fails if the target process is running on host

Display (Namespace, Pod, Container, Primary PID) from a host PID, fails if the target process is running on host

Oct 17, 2022

:recycle: Now you can easily rollback to previous deployed images whatever you want on k8s environment

EasyRollback EasyRollback is aim to easy rollback to previous images that deployed on k8s environment Installation You should have go installation fir

Dec 24, 2022

General Pod Autoscaler(GPA) is a extension for K8s HPA, which can be used not only for serving, also for game.

General Pod Autoscaler(GPA) is a extension for K8s HPA, which can be used not only for serving, also for game.

Introduction General Pod Autoscaler(GPA) is a extension for K8s HPA, which can be used not only for serving, also for game. Features Compatible with a

Aug 19, 2022

A pod scaler golang app that can scale replicas either inside of cluster or out of the cluster

pod-scaler A simple pod scaler golang application that can scale replicas via manipulating the deployment Technologies The project has been created us

Oct 24, 2021

Using this you can access node external ip address value from your pod.

Using this you can access node external ip address value from your pod.

Jan 30, 2022

A toolbox for debugging docker container and kubernetes with web UI.

A toolbox for debugging docker container and kubernetes with web UI.

A toolbox for debugging Docker container and Kubernetes with visual web UI. You can start the debugging journey on any docker container host! You can

Oct 20, 2022

The CLI tool glueing Git, Docker, Helm and Kubernetes with any CI system to implement CI/CD and Giterminism

The CLI tool glueing Git, Docker, Helm and Kubernetes with any CI system to implement CI/CD and Giterminism

___ werf is an Open Source CLI tool written in Go, designed to simplify and speed up the delivery of applications. To use it, you need to describe the

Jan 4, 2023

Dotnet-appsettings-env - Convert .NET appsettings.json file to Kubernetes, Docker and Docker-Compose environment variables

dotnet-appsettings-env Convert .NET appsettings.json file to Kubernetes, Docker

Dec 30, 2022
Helper sidecar for exposing Prometheus metrics as service

metrics-server-go Helper sidecar service for exposing prometheus metrics. Application expose endpoints to update defined metrics. Whats inside? The se

Feb 3, 2022
An example of Kubernetes' Horizontal Pod Autoscaler using costume metrics.
An example of Kubernetes' Horizontal Pod Autoscaler using costume metrics.

Kubernetes Autoscaling Example In this project, I try to implement Horizontal Pod AutoscalerHPA provided by Kubernetes. The Horizontal Pod Autoscaler

Dec 1, 2022
cluster-api-state-metrics (CASM) is a service that listens to the Kubernetes API server and generates metrics about the state of custom resource objects related of Kubernetes Cluster API.

Overview cluster-api-state-metrics (CASM) is a service that listens to the Kubernetes API server and generates metrics about the state of custom resou

Oct 27, 2022
Sensu-go-postgres-metrics - The sensu-go-postgres-metrics is a sensu check that collects PostgreSQL metrics

sensu-go-postgres-metrics Table of Contents Overview Known issues Usage examples

Jan 12, 2022
⎈ Multi pod and container log tailing for Kubernetes

stern Stern allows you to tail multiple pods on Kubernetes and multiple containers within the pod. Each result is color coded for quicker debugging. T

Nov 7, 2022
KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
 KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes

Kubernetes-based Event Driven Autoscaling KEDA allows for fine-grained autoscaling (including to/from zero) for event driven Kubernetes workloads. KED

Jan 7, 2023
gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods.

gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods. If you have a GPU machine, and some pods are using the GPU device, you can run the container by docker or kubernetes when your GPU device belongs to nvidia. The gpu-memory-monitor will collect the GPU memory usage of pods, you can get those metrics by API of gpu-memory-monitor

Jul 27, 2022
This library provides a metrics package which can be used to instrument code, expose application metrics, and profile runtime performance in a flexible manner.

This library provides a metrics package which can be used to instrument code, expose application metrics, and profile runtime performance in a flexible manner.

Jan 18, 2022
Vilicus is an open source tool that orchestrates security scans of container images(docker/oci) and centralizes all results into a database for further analysis and metrics.
Vilicus is an open source tool that orchestrates security scans of container images(docker/oci) and centralizes all results into a database for further analysis and metrics.

Vilicus Table of Contents Overview How does it work? Architecture Development Run deployment manually Usage Example of analysis Overview Vilicus is an

Dec 6, 2022
Cmsnr - cmsnr (pronounced "commissioner") is a lightweight framework for running OPA in a sidecar alongside your applications in Kubernetes.

cmsnr Description cmsnr (pronounced "commissioner") is a lightweight framework for running OPA in a sidecar alongside your applications in Kubernetes.

Jan 13, 2022