A tool based on eBPF, prometheus and grafana to monitor network connectivity.

Connectivity Monitor

reuse compliant

Tracks the connectivity of a kubernetes cluster to its api server and exposes meaningful connectivity metrics.

Uses ebpf to observe all the TCP connection establishments from the shoot cluster to the kubernetes api server. Derives meaningful connectivity metrics (upper bound for meaningful availability) for the kubernetes api server that is running in the seed cluster.

Can be deployed in two different modes:

  • Deployed in a shoot cluster (or a normal kubernetes cluster) to track the connectivity to the api server.
  • Deployed in a seed cluster to track the connectivity of all shoot clusters hosted on the seed.

The network path

The network path from the shoot cluster to the api server.

The shoot cluster's api server is hosted in the seed cluster and the network path involves several hops:

  • the NAT gateway in the shoot cluster,
  • the load balancer in the seed cluster,
  • a k8s service hop and
  • the envoy reverse proxy.

The reverse proxy terminates the TCP connection, starts the TLS negotiation and chooses the api server of the shoot cluster based on the server name extension in the TLS ClientHello message (SNI). The TLS negotiation is relayed to the chosen api server so that the client actually establishes a TLS session directly with the api server. (See SNI GEP for details.)

Possible failure types

We can distinguish multiple failure types:

  • There is no network connectivity to the api server.

    The focus of this connectivity-monitor component.

    New TCP connections to the kubernetes api server are observed to confirm that all the components along the network path to the kubernetes api server, and the kubernetes api server itself, are working as expected. Many things can break along the network path: the DNS resolution of the domain name of the load balancer, packets can be dropped due to misconfiguration of connection tracking tables, or the reverse proxy might be overloaded to accept any new connections. The mundane failure case that there are no running api server processes is also covered by the connectivity monitor.

  • The api server reports an internal server error.

    Detecting this failure type is not feasible for the connectivity-monitor component; it can be achieved by processing the access logs of the api server.

    The failure cases when the connection is successfully established, but the api server detects and returns a internal server failure (4xx - user error, 5xx - internal error) are considered as successful connection attempts, hence the connectivity monitor yields an upper bound for meaningful availability. This situations can be detected on the server side, by parsing the access logs, knowing that due to the successful connections we can expect to find matching access logs.

  • The api server doesn't comply with the specification.

    Detecting this failure type requires test cases with a known expected outcome.

    The most tricky failure case is when the api server can not itself detect the error and returns an incorrect answer as a success (2xx - ok). This failure case can only be detected by running test cases against the api server, where the result is known ahead of time and it can be asserted that the expected and actual results are equivalent.

Observe all the connections from the shoot cluster to the api server

To capture all connection attempts by:

  • system components managed by Gardener: kubelet, kube-proxy, calico, ... and
  • any user workload that is talking to the api server

the connectivity-exporter must be deployed as a daemonset in the host network of the node, in the shoot cluster.

Deploying the connectivity-exporter directly in the shoot cluster is motivated by:

  • the connectivity-exporter is closer to the clients that initiate the connection and hence it can even capture failed attempts that don't reach the seed cluster at all (e.g. due to DNS misconfiguration),

  • by deploying the connectivity-exporter in the shoot cluster, the load is considerably smaller: it is tracking all the connections from a single shoot cluster (1-1k/s), and not all the connections from all the shoot clusters of a single seed cluster (300x).

Later, we plan to deploy the connectivity exporter in the seed cluster as well to monitor all the connections from all the shoot clusters centrally, that could at least reach the reverse proxy (envoy).

Annotate time based on state of connections

The connectivity-exporter assesses each connection attempt based on the packet sequence it observes in a certain time window:

  • unacknowledged connection: SYN (packet sent to the api server), no acknowledgment received

  • rejected connection: SYN packet sent, SYN+ACK packet received, but e.g. during the TLS negotiation the server responds with an RST+ACK packet to abort the connection

  • successful connection: SYN (packet sent to the api server), SYN+ACK (packet received from the api server)

The connectivity exporter annotates 1s long time buckets after a certain offset, to tolerate late arrivals and avoid issues at second boundaries:

  • active (/inactive) second: active if there were some new connection attempts, inactive if there were no new connection attempts,

  • failed (/successful) second: failed if there was at least one failed connection attempt (unacknowledged or rejected), or if there were no connection attempts and the preceding bucket was assessed as failed; successful otherwise.

If packets arrive too late (beyond a certain time window) or simply out of sequence (e.g. a SYN+ACK packet without a preceding SYN packet on the same connection), they are counted as an orphan packet.

Prometheus metrics

The state of the connectivity exporter is exposed with prometheus counter metrics, which can be comfortably scraped without losing the 1s granularity.

# HELP connectivity_exporter_connections_total Total number of new connections.
# TYPE connectivity_exporter_connections_total counter
connectivity_exporter_connections_total{kind="rejected"} 0
connectivity_exporter_connections_total{kind="successful"} 544
connectivity_exporter_connections_total{kind="unacknowledged"} 0

# HELP connectivity_exporter_packets_total Total number of new packets.
# TYPE connectivity_exporter_packets_total counter
connectivity_exporter_packets_total{kind="orphan"} 0

# HELP connectivity_exporter_seconds_total Total number of seconds.
# TYPE connectivity_exporter_seconds_total counter
connectivity_exporter_seconds_total{kind="active"} 337
connectivity_exporter_seconds_total{kind="active_failed"} 0
connectivity_exporter_seconds_total{kind="clock"} 2354
connectivity_exporter_seconds_total{kind="failed"} 0

When the connectivity exporter is deployed in the seed, an SNI label is added to the metrics above to differentiate the connections to the different api servers.

Inspiration

This work is motivated by the meaningful availability paper and the SRE books by Google.

The failed seconds counter metric is meaningful according to the definition of the paper: it captures what users experience. In every counted failed second, there was at least one failed connection attempt by a user or there weren't any successful connection attempts since the last failure. During the uptime of the monitoring stack itself, any failed connection attempt by a user (running in the shoot cluster) will be reported as a failed second.

Overview

The following sketch shows where are the TCP connections captured and how is time annotated based on the assessed connection states.

overview

The big picture of meaningful availability also includes application level access logs on the server side. Connectivity monitoring is a first step on the path to meaningful availability that yields an upper bound: availability requires connectivity.

Note that this is a low level and hence very generic approach with potential for widespread adoption. As long as the service is delivered via TCP/IP (i.e. all the services of our concern), service instances can be differentiated by the SNI TLS extension, we can measure the connectivity with 1s resolution with this approach. The connectivity exporter can be deployed anywhere along the path between the clients and the servers. This choice is a tradeoff: if deployed close to the clients, it can cover more failure cases and needs to handle less load; if it is deployed closer to the server, it might cover all the clients but miss certain failure cases.

In the Gardener architecture, we have the unique situation that all the relevant clients of the api server are running in the shoot cluster and we can deploy the connectivity exporter next to some other Gardener managed system components in the shoot cluster as well.

Owner
Gardener
Universal Kubernetes at Scale
Gardener
Comments
  • build: container images and helm charts

    build: container images and helm charts

    This adds Makefile targets:

    • docker/build
    • docker/push
    • helm/generate
    • helm/install
    • helm/uninstall

    The following environment variables can be redefined before running 'make':

    • REGISTRY
    • IMAGE_NAME
    • IMAGE_TAG

    For example, I run

    export REGISTRY=xxxxx.azurecr.io
    export IMAGE_NAME=connectivity-monitor
    export IMAGE_TAG=albantest
    

    With this, I can test:

    $ time make docker/build docker/push helm/install
    

    And then, the pod is deployed:

    $ kubectl logs -n connectivity-monitor connectivity-exporter-65rz4
    2021/11/08 14:45:01 maxprocs: Updating GOMAXPROCS=2: determined from CPU quota
    I1108 14:45:02.076150       9 metrics.go:24] Starting connectivity-exporter
    I1108 14:45:24.076205       9 packet.go:245] sni: dc.services.visualstudio.com, connections: 1
    ...
    

    There are some errors but that could be debugged later:

    • packet.go:159] Empty SNI
    • sni: , connections: 2

    TODO:

    • [ ] Add missing helm charts
    • [ ]
  • CI: initial GitHub Action

    CI: initial GitHub Action

    What this PR does / why we need it:

    This builds, runs the unit tests, creates a docker image and pushes it to the GitHub Container Registry.

    Which issue(s) this PR fixes: Fixes #

    Special notes for your reviewer:

    Release note:

    
    
  • Fix Prometheus counters for each SNI

    Fix Prometheus counters for each SNI

    What this PR does / why we need it:

    • Rename {succeeded,failed}_seconds to {succeeded,failed}_connections in the BPF map sni_stats
    • Separate stats for each SNI
    • On inactive seconds, carry over failed second state

    Which issue(s) this PR fixes: Fixes #

    Special notes for your reviewer:

    Release note:

    
    
  • ebpf: fix integer overflow with offsets

    ebpf: fix integer overflow with offsets

    Offsets should not be stored in __u8 because they might be bigger than 256. Typically, when the client wget supports a large amount of cypher suites, the offset for the SNI becomes bigger than 256.

  • connectivity-exporter: add CLI flag -metrics-addr

    connectivity-exporter: add CLI flag -metrics-addr

    connectivity-exporter was previously listening on port 19100 on all network interfaces and this was not configurable.

    This patch adds a CLI flag -metrics-addr to make this configurable. The default is still ":19100" to keep the previous behaviour unchanged.

    This is useful when the Kubernetes cluster already has something listening on port 19100.

  • connectivity-exporter: monitoring all interfaces

    connectivity-exporter: monitoring all interfaces

    The network interface name could be specified with the "-i" CLI flag but it was not possible to monitor all network interfaces.

    With this patch, connectivity-exporter will monitor all network interfaces when the "-i" flag is empty or missing.

    It works by setting sll_ifindex to zero: see "man 7 packet":

    sll_ifindex is the interface index of the interface (see netdevice(7)); 0 matches any interface (only permitted for binding).

  • Simplify metric expiration

    Simplify metric expiration

    What this PR does / why we need it:

    SNIs should be "expired" after 15 minutes of inactivity. Previously each SNI received its own goroutine that would start a timer. This timer would be reset whenever there was activity. However, this added unnecessary complexity and made it more difficult to test the code. With this PR the SNIs are now expired in a single goroutine where we check the last time that it was updated. This should simplify the code and make it more readable/ testable.

  • Reset the weekly metrics on Sundays at midnight

    Reset the weekly metrics on Sundays at midnight

    What this PR does / why we need it:

    Previously, they were reset on Thursdays at midnight, because the start of the unix epoch time, January 1, 1970 was a Thursday.

    Special notes for your reviewer:

    May 9, 2022 is a Monday.

    image
  • Add some panels to show the cluster downtimes in seconds

    Add some panels to show the cluster downtimes in seconds

    What this PR does / why we need it:

    Adds panels to show downtime in seconds for specific SNIs. This can be useful if you want to see how many seconds a downtime was versus a percentage.

  • Rename instances of connectivity-monitor to connectivity-exporter

    Rename instances of connectivity-monitor to connectivity-exporter

    Cleanup any renaming instances of connectivity-monitor and replace them with connectivity-exporter. Done after renaming the repository to gardener/connectivity-exporter.

  • Fix BPF verifier issue

    Fix BPF verifier issue

    What this PR does / why we need it:

    On Kernel 5.15, the current version fails after 354 iterations of the unrolled for loop. TLS_MAX_SERVER_NAME_LEN is less than that (128) and if the for loop is rewritten in this (equivalent) way, the BPF verifier accepts the program (both on 5.13 and on 5.15).

Internet connectivity for your VPC-attached Lambda functions without a NAT Gateway
Internet connectivity for your VPC-attached Lambda functions without a NAT Gateway

lambdaeip Internet connectivity for your VPC-attached Lambda functions without a NAT Gateway Background I occasionally have serverless applications th

Nov 9, 2022
eBPF based TCP observability.
eBPF based TCP observability.

TCPDog is a total solution from exporting TCP statistics from Linux kernel by eBPF very efficiently to store them at your Elasticsearch or InfluxDB da

Jan 3, 2023
eBPF library for Go based on Linux libbpf
eBPF library for Go based on Linux libbpf

libbpfgo libbpfgo is a Go library for working with Linux's eBPF. It was created for Tracee, our open source Runtime Security and eBPF tracing tools wr

Jan 5, 2023
eBPF-based EDR for Linux

ebpf-edr A proof-of-concept eBPF-based EDR for Linux Seems to be working fine with the 20 basic rules implemented. Logs the alerts to stdout at the mo

Nov 9, 2022
An ebpf's tool to watch traffic
An ebpf's tool to watch traffic

watch-dog watch-dog利用ebpf的能力,监听指定网卡的流量来达到旁路检测流量的目的,并使用图数据库neo4j保存节点之间的流量关系。 Get go get github.com/TomatoMr/watch-dog Install make build Usage sudo ./w

Feb 5, 2022
Trace Go program execution with uprobes and eBPF
Trace Go program execution with uprobes and eBPF

Weaver PLEASE READ! - I am currently refactoring Weaver to use libbpf instead of bcc which would include various other major improvements. If you're c

Dec 28, 2022
SailFirewall - Linux firewall powered by eBPF and XDP

SailFirewall Linux firewall powered by eBPF and XDP Requirements Go 1.16+ Linux

May 4, 2022
Library to work with eBPF programs from Go

Go eBPF A nice and convenient way to work with eBPF programs / perf events from Go. Requirements Go 1.10+ Linux Kernel 4.15+ Supported eBPF features e

Dec 29, 2022
eBPF Library for Go

eBPF eBPF is a pure Go library that provides utilities for loading, compiling, and debugging eBPF programs. It has minimal external dependencies and i

Dec 29, 2022
A distributed Layer 2 Direct Server Return (L2DSR) load balancer for Linux using XDP/eBPF

VC5 A distributed Layer 2 Direct Server Return (L2DSR) load balancer for Linux using XDP/eBPF This is very much a proof of concept at this stage - mos

Dec 22, 2022
Edb - An eBPF program debugger

EDB (eBPF debugger) edb is a debugger(like gdb and dlv) for eBPF programs. Norma

Dec 31, 2022
Prometheus exporter for counting connected devices to a network using nmap

nmapprom Prometheus exporter for counting the hosts connected to a network using nmap · Report Bug · Request Feature Table of Contents About The Proje

Oct 17, 2021
Package socket provides a low-level network connection type which integrates with Go's runtime network poller to provide asynchronous I/O and deadline support. MIT Licensed.

socket Package socket provides a low-level network connection type which integrates with Go's runtime network poller to provide asynchronous I/O and d

Dec 14, 2022
Magma is an open-source software platform that gives network operators an open, flexible and extendable mobile core network solution.
Magma is an open-source software platform that gives network operators an open, flexible and extendable mobile core network solution.

Connecting the Next Billion People Magma is an open-source software platform that gives network operators an open, flexible and extendable mobile core

Dec 31, 2022
Zero Trust Network Communication Sentinel provides peer-to-peer, multi-protocol, automatic networking, cross-CDN and other features for network communication.
Zero Trust Network Communication Sentinel provides peer-to-peer, multi-protocol, automatic networking, cross-CDN and other features for network communication.

Thank you for your interest in ZASentinel ZASentinel helps organizations improve information security by providing a better and simpler way to protect

Nov 1, 2022
May 8, 2022
Optimize Windows's network/NIC driver settings for NewTek's NDI(Network-Device-Interface).

windows-ndi-optimizer[WIP] Optimize Windows's network/NIC driver settings for NewTek's NDI(Network-Device-Interface). How it works This is batchfile d

Apr 15, 2022
A simple network analyzer that capture http network traffic
A simple network analyzer that capture http network traffic

httpcap A simple network analyzer that captures http network traffic. support Windows/MacOS/Linux/OpenWrt(x64) https only capture clienthello colorful

Oct 25, 2022
A client can monitor OceanBase

OBAgent OBAgent is a monitor collection framework. OBAgent supplies pull and push mode data collection to meet different applications. By default, OBA

Dec 24, 2022