A docker container that can be deployed as a sidecar on any kubernetes pod to monitor PSI metrics

CgroupV2 PSI Sidecar

CgroupV2 PSI Sidecar can be deployed on any kubernetes pod with access to cgroupv2 PSI metrics.

About

This is a docker container that can be deployed as a sidecar on any kubernetes pod to monitor PSI metrics.

Built With

Getting Started

To deploy a sidecar follow these simple steps.

Prerequisites

The host machine for all the nodes on the cluster must be using cgroupv2.

Minimum versions:

Docker 20.10
Linux 5.2
Kubernetes 1.17

Check Availability

Ensure that your machine has cgroupv2 available:

$ grep cgroup /proc/filesystems
nodev	cgroup
nodev	cgroup2

Just because you have cgroupv2 it doesn't mean you are using it. Check that the unified cgroup is enabled by checking the hierarchy.

$ ll /sys/fs/cgroup/
total 0
dr-xr-xr-x   5 root root 0 Oct 31 14:52 ./
drwxr-xr-x  10 root root 0 Oct 31 14:52 ../
-r--r--r--   1 root root 0 Nov  1 08:45 cgroup.controllers
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.max.depth
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.max.descendants
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.procs
-r--r--r--   1 root root 0 Nov  1 08:45 cgroup.stat
-rw-r--r--   1 root root 0 Oct 31 14:52 cgroup.subtree_control
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.threads
-rw-r--r--   1 root root 0 Nov  1 08:45 cpu.pressure
-r--r--r--   1 root root 0 Nov  1 08:45 cpuset.cpus.effective
-r--r--r--   1 root root 0 Nov  1 08:45 cpuset.mems.effective
drwxr-xr-x   2 root root 0 Nov  1 08:45 init.scope/
-rw-r--r--   1 root root 0 Nov  1 08:45 io.cost.model
-rw-r--r--   1 root root 0 Nov  1 08:45 io.cost.qos
-rw-r--r--   1 root root 0 Nov  1 08:45 io.pressure
-rw-r--r--   1 root root 0 Nov  1 08:45 memory.pressure
drwxr-xr-x 106 root root 0 Nov  1 08:45 system.slice/
drwxr-xr-x   3 root root 0 Oct 31 14:52 user.slice/

Note the slice dirs.

If you have cgroupv2 but it isn't enabled the above structure will be available in /sys/fs/cgroup/unified.

Enable cgroupv2

Edit /etc/default/grub and add systemd.unified_cgroup_hierarchy=1 to GRUB_CMDLINE_LINUX Run sudo update-grub and reboot the system.

If cgroupv2 is not available on the system you will have to update the kernel version to meet the prerequisites above.

Build Image

There are two docker files one for regular deployment and the other for debugging. If you want to run the server locally without a container/kubernetes deployment edit sidecar_pid_lookup.go to resolve the systems cgroup dir.

Regular image

docker build -f ./Dockerfile . -t evankrul/cgroup-sc:v.1.2
docker push evankrul/cgroup-sc:v.1.2

Debug image

docker build -f ./Dockerfile.debug . -t evankrul/cgroup-sc:v.1.2-debug
docker push evankrul/cgroup-sc:v.1.2-dubug

Usage

Assuming all the prerequisites have been met and image built and pushed to your favorite repository follow these steps to deploy the sidecar.

In this section I will refer to the monitoring container as the sidecar and the container being monitored as the host container. The sidecar makes use of the shareProcessNamespace option to access the host cgroup metrics. The sidecar has access to process dirs in /proc. The sidecar finds the pid dir of the host by searching the dirs in /proc.

For each dir the sidecar looks at the contents of /proc/{id}/root/etc/pid_flag and checks that it exists and matches the contents of /etc/pid_flag_sc. If a match is found then this is the host container. The pid_flag and pid_flag_sc are mounted in the deployment configuration as a ConfigMap using a VolumeMount.

The service is used to expose the sidecar webserver where the metrics are hosted. If you are not using some kind of service mesh make sure your Prometheus deployment is on the same namespace as your sidecar deployment. Then just point Prometheus to the /metrics endpoint of your pod on the metrics port.

- job_name: 'cgroup_monitor_sc'
        scrape_interval: 1s
        static_configs:
          - targets: ['cgroup-monitor-sc:2333']

Example kubernetes yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stress-ng
  namespace: default
spec:
  selector:
    matchLabels:
      app: stress-ng
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: stress-ng
    spec:
      terminationGracePeriodSeconds: 5
      shareProcessNamespace: true
      containers:
        - name: CONTAINER_TO_BE_MONITORED
          ...
          volumeMounts:
            - name: pid-flag-volume
              mountPath: /etc/pid_flag
        - name: cgroup-monitor-sc
          image: evankrul/cgroup-sc:prom.v.1.2
          imagePullPolicy: Always
          ports:
            - containerPort: 2333
              name: metrics
          securityContext:
            capabilities:
              add:
                - SYS_PTRACE
          env:
            - name: PORT
              value: "2333"
          resources:
            requests:
              cpu: 1
              memory: "500Mi"
            limits:
              cpu: 1
              memory: "500Mi"
          volumeMounts:
            - name: pid-flag-volume
              mountPath: /etc/pid_flag_sc
      volumes:
        - name: pid-flag-volume
          configMap:
            name: pid-flag-config-map
---
#Cgroup config map
kind: ConfigMap
apiVersion: v1
metadata:
  name: pid-flag-config-map
data:
  pid_flag: stess-ng-1
---
#Cgroup Monitor SC Service
apiVersion: v1
kind: Service
metadata:
  name: cgroup-monitor-sc #this will be the Domain name
  namespace: default
spec:
  selector:
    app: stress-ng
  ports:
    - name: stress
      port: 2335
      targetPort: 2335
    - name: metrics
      port: 2333
      targetPort: 2333
  type: LoadBalancer

Data Available

The following PSI metrics are reported to Prometheus and are available for querying.

# HELP cgroup_monitor_sc_monitored_cpu_psi CPU PSI of monitored container
# TYPE cgroup_monitor_sc_monitored_cpu_psi gauge
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="10s"} 0
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="300s"} 0
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="60s"} 0
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="total"} 385

# HELP cgroup_monitor_sc_monitored_io_psi IO PSI of monitored container
# TYPE cgroup_monitor_sc_monitored_io_psi gauge
cgroup_monitor_sc_monitored_io_psi{type="full",window="10s"} 0
cgroup_monitor_sc_monitored_io_psi{type="full",window="300s"} 0
cgroup_monitor_sc_monitored_io_psi{type="full",window="60s"} 0
cgroup_monitor_sc_monitored_io_psi{type="full",window="total"} 330809
cgroup_monitor_sc_monitored_io_psi{type="some",window="10s"} 0
cgroup_monitor_sc_monitored_io_psi{type="some",window="300s"} 0
cgroup_monitor_sc_monitored_io_psi{type="some",window="60s"} 0
cgroup_monitor_sc_monitored_io_psi{type="some",window="total"} 330815

# HELP cgroup_monitor_sc_monitored_mem_psi Mem PSI of monitored container
# TYPE cgroup_monitor_sc_monitored_mem_psi gauge
cgroup_monitor_sc_monitored_mem_psi{type="full",window="10s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="full",window="300s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="full",window="60s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="full",window="total"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="10s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="300s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="60s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="total"} 0

FAQ

Why isn't there any FAQs?

Because I haven't written this section yet.

Will there be FAQs?

Yes, there will be.

When will there be FAQs?

Soon.

Contact

Evan Krul - Website

Andrews-monitor - A Go program to monitor when times were available to order for Brown's Andrews dining hall. Used during the portion of the pandemic when the dining hall was only available for online order.

Andrews Dining Hall Monitor A Go program to monitor when times were available to order for Brown's Andrews dining hall. Used during the portion of the

Jan 1, 2022

Dotnet-appsettings-env - Convert .NET appsettings.json file to Kubernetes, Docker and Docker-Compose environment variables

dotnet-appsettings-env Convert .NET appsettings.json file to Kubernetes, Docker

Dec 30, 2022

A docker container that can be deployed as a sidecar on any kubernetes pod to monitor PSI metrics

CgroupV2 PSI Sidecar

About

Built With

Getting Started

Prerequisites

Minimum versions:

Check Availability

Enable cgroupv2

Build Image

Regular image

Debug image

Usage

Example kubernetes yaml

Data Available

FAQ

Why isn't there any FAQs?

Will there be FAQs?

When will there be FAQs?

Contact

Owner

Similar Resources

Related tags

Helper sidecar for exposing Prometheus metrics as service

An example of Kubernetes' Horizontal Pod Autoscaler using costume metrics.

cluster-api-state-metrics (CASM) is a service that listens to the Kubernetes API server and generates metrics about the state of custom resource objects related of Kubernetes Cluster API.

Sensu-go-postgres-metrics - The sensu-go-postgres-metrics is a sensu check that collects PostgreSQL metrics

⎈ Multi pod and container log tailing for Kubernetes

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes

gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods.

This library provides a metrics package which can be used to instrument code, expose application metrics, and profile runtime performance in a flexible manner.

Vilicus is an open source tool that orchestrates security scans of container images(docker/oci) and centralizes all results into a database for further analysis and metrics.

Cmsnr - cmsnr (pronounced "commissioner") is a lightweight framework for running OPA in a sidecar alongside your applications in Kubernetes.