Reworking kube-proxy's architecture

Kubernetes Proxy NG

The Kubernetes Proxy NG a new design of kube-proxy aimed at

  • allowing Kubernetes business logic to evolve with minimal to no impact on backend implementations,
  • improving scalability,
  • improving the ability of integrate 3rd party environments,
  • being library-oriented to allow packaging logic at distributor's will,
  • provide gRPC endpoints for lean integration, extensibility and observability.

The project will provide multiple components, with the core being the API watcher that will serve the global and node-specific sets of objects.

More context can be found in the project's KEP.

Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the community page.

You can reach the maintainers of this project at:

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

Owner
Kubernetes SIGs
Org for Kubernetes SIG-related work
Kubernetes SIGs
Comments
  • Change endpoint key and use anonymous set depending on presence of PodName

    Change endpoint key and use anonymous set depending on presence of PodName

    Replacing a service could (and can) result in the following sequence of actions:

    1. ipvssink.Backend.SetEndpoint(IP = 10.0.0.1, Key = A)
    2. ipvssink.Backend.SetEndpoint(IP = 10.0.0.1, Key = B)
    3. ipvssink.Backend.DeleteEndpoint(IP = 10.0.0.1, Key = A)

    Ideally, once these operations are complete, one would expect that the IP 10.0.0.1 should exist after the completion of these actions, but since duplicates are ignored while adding Real Servers to an IPVS Virtual Server, this results in the IP being completely removed.

    Solution

    Refer the discussion below on how we arrived at this solution.

    Discarded Solution

    The initial proposal was to keep a count of the usages that an IP has in a service and only remove it from the IPVS Virtual Server when its usage drops to zero.

    Final solution

    We have now changed the key for an endpoint such that it changes less often (when the configuration of the endpoint changes). The effect this has is that if the key remains the same, then instead of deleting the old endpoint and creating a new one, we simply update the existing endpoint - meaning no delete event is triggered.

    The key calculation relies on the name of the pod corresponding to the endpoint. There is a case when the pod name might be absent and to deal with this scenario, we introduce another diffing set. Changes to this new set result in sending the delete events prior to the update events.

    This also closes https://github.com/kubernetes-sigs/kpng/issues/290 by comparing ports properly to propagate new IPPort additions

  • implementation of local e2e-tests

    implementation of local e2e-tests

    Based on Antonio's PR, this PR is introducing local e2e tests. This is also supposed to provide a base to build CI from, so that local tests and CI tests are equivalent.

    To tests this, run

    ./hack/test_e2e.sh
    
  • Make hack/test_e2e assume less about environment around it

    Make hack/test_e2e assume less about environment around it

    I updated the test_e2e to make less assumptions, it is now possible to: specify where the e2e files will be stored specify which Dockerfile to use run the command from anywhere, just not from the kpng directory No assumptions regarding that /usr/local/bin exist, or is part of the $PATH

  • Tilt setup for kpng

    Tilt setup for kpng

    This PR add Tilt setup for kpng.

    How it works? eg:

    root@realistic-planarian:/home/ubuntu/kpng# make tilt-setup i=ipv4 b=nft
    
    root@realistic-planarian:/home/ubuntu/kpng# make tilt-up
    
    

    make tilt-setup will create a cluster and install tools such as kubectl, bpf2go, and kind. It'll also create a file called tilt.env with the environment variables (including ip family, backend, and etc).

    make tilt-up will run tilt up command and the environment variables set previously will be loaded into tilt session. We will generate kpng Daemonset yaml based on the environment variables.

    To change the backend, We can modify the file tilt.env. It'll trigger an update to tilt. We can also trigger an update from the Web UI as well.

    Currently, Tilt is not configured build and re-deploy on change of source code but it can be done. I avoided that it may not be effective to run tilt update every code changes.

    Appreciate your feedback.

    Fixes #383

  • Kpng ebpf backend POC

    Kpng ebpf backend POC

    This PR contains the code implementing an POC of an ebpf based backend for KPNG. Currently it supports only clusterIP TCP and UDP services, however additional functionality will be added in the future. See the backend's README.md for more information.

    Additionally this makes some changes/ fixes to other parts of the KPNG codebase including...

    • Implement setup() for the fullstate client
    • Using j2 to have smarter yaml templating since we need to spin up additional debugging containers for the ebpf backend
    • Fixes a bug in the lightdiffstore tests
    • Adds support for building and running the ebpf backend locally with KIND
  • test_e2e can now run parallel e2e test on  multiple clusters with the same -i and -b configuration

    test_e2e can now run parallel e2e test on multiple clusters with the same -i and -b configuration

    Moved all binaries to bin directory. Changed installation to check if {kind, kubectl, ginko] is in the directory instead of checking if they are in the existing PATH.

    Added -s (suffix) argument, the suffix will be appended to the e2e directory name and also to the kind cluster names. Added -n (test_run_count) argument, this will control how many clusters that will be created in parallel and how many tests that will execute in parallel for the same -i and -b combination

  • make: add support for build multi-arch bin

    make: add support for build multi-arch bin

    This patch adds:

         make windows - build windows binary
         make linux - build linux binary
         make darwin - build darwin binary
         make release - build all supported platforms
    
    • All binaries will be generated in kpng-bin/
    • Addionatly, it sets VERSION (Now kpng version works).

    Signed-off-by: Douglas Schilling Landgraf [email protected]

  • e2e: improvements

    e2e: improvements

    • e2e: use kubectl --all-namespaces
    • e2e: specify image registry for kindest/node
    • e2e: replace --timeout to --request-timeout
    • e2e: show error if kind cannot load image
    • e2e: fail if kind create cluster fails
  • e2e: add initial support to multiple CNI

    e2e: add initial support to multiple CNI

    PLEASE NOTE: this branch is based in the PR: #169 (require merge first)

    Testing KPNG with multiple CNI will increase the number of people in the adoption road.

    Signed-off-by: Douglas Schilling Landgraf [email protected]

  • documentation on our vendoring strategy

    documentation on our vendoring strategy

    There are docs, about the general vendoring problem in KPNG

    • https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/vendor.md

    We also have some really crude, hacky explanations of how we vendor:

    • https://github.com/kubernetes-sigs/kpng/blob/master/go-mod-tidy-all

    We need to:

    • understand what we import and why
    • have a section in CONTRIBUtING.md about how to update go modules
    • possibly improve the vendor.md from sig-arch if we can

    XRef, https://github.com/kubernetes/community/issues/6225

  • Backend ipvs only works on the local node

    Backend ipvs only works on the local node

    When using the ipvs backend there is no masquerading when the call is routed to another node. So only calls to the local node works from the main netns on the node. There must be some rule that uses the KUBE-MARK-MASQ chain;

    # iptables -t nat -S
    -P PREROUTING ACCEPT
    -P INPUT ACCEPT
    -P OUTPUT ACCEPT
    -P POSTROUTING ACCEPT
    -N KUBE-KUBELET-CANARY
    -N KUBE-MARK-DROP
    -N KUBE-MARK-MASQ
    -N KUBE-POSTROUTING
    -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
    -A POSTROUTING -s 11.0.0.0/16 ! -d 11.0.0.0/16 -j MASQUERADE
    -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
    -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
    -A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
    -A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
    -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
    
  • Make btree item, transaction and constructor for NewEventHandler and …

    Make btree item, transaction and constructor for NewEventHandler and …

    Was making Sequence diagrams, and realized this would be much cleaner to document . Added comments along the way.

    fixes https://github.com/kubernetes-sigs/kpng/pull/418

  • IPVS Fullstate Implementation

    IPVS Fullstate Implementation

    IPVS Fullstate Implementation

    This implementation roughly follows bridge design pattern. IPVSController acts as an abstraction and Handler acts as implementer.
    This implementation can be broken down into three major steps.

    1. Registration

    • reister.go

      init() registers backend with the brain

      Sink() implements the full state sink

      Setup() initializes required components - proxier, ipvs and dummy interfaces

    • setup.go

      performs sanity checks and prepares kernel for IPVS implementation

    • ipvs.go

      type IpvsController struct {
          mu sync.Mutex
      
          ipFamily v1.IPFamily
      
          // service store for storing ServicePortInfo object to diffstore
          svcStore *lightdiffstore.DiffStore
      
          // endpoint store for storing EndpointInfo object to diffstore
          epStore *lightdiffstore.DiffStore
      
          iptables util.IPTableInterface
          ipset    util.Interface
          exec     exec.Interface
          proxier  *proxier
      
          // Handlers hold the actual networking logic and interactions with kernel modules 
      
          //we need handler for all types of services; see Handler interface for reference
          handlers map[ServiceType]Handler
      }
      

      IpvsController takes care of high order of things - what needs to be done after receiving full state callbacks.

    • proxier.go

      proxier directly interacts with iptables, ipvs and ipsets.

      proxier has no business logic and acts as an adapter for IpvsController interaction with the networking layer.

    2. Callback, Prepare diffs [ What to do? ]

    • types.go

      ServicePortInfo Contains base information of a service and a port in a structure that can be directly consumed by the proxier

      EndpointInfo Contains base information of an endpoint in a structure that can be directly consumed by the proxier

    • patch.go

      Here we leverage diffstore to get the deltas for service and endpoints and store them in form of patches. A Patch is basically a combination of resources and operations.

      type Operation int32
      
      const (
           NoOp = iota
           Create
           Update
           Delete
      )
      

      The following structures are used to organize a client.ServiceEndpoints diffs into patches

      • ServicePatch
      type ServicePatch struct {
            servicePortInfo *ServicePortInfo    
            op          Operation
      }
      
      func (p *ServicePatch) apply(handler map[Operation]func(servicePortInfo *ServicePortInfo)) {
            handler[p.op](p.servicePortInfo)
      }
      
      • EndpointPatch
      type EndpointPatch struct {
            endpointInfo *EndpointInfo
            servicePortInfo  *ServicePortInfo
            op           Operation
      }
      
      func (p *EndpointPatch) apply(handler map[Operation]func(endpointInfo *EndpointInfo, servicePortInfo *ServicePortInfo)) {
            handler[p.op](p.endpointInfo, p.servicePortInfo)
      }
      
      • EndpointPatches
      type EndpointPatches []EndpointPatch
      
      func (e EndpointPatches) apply(handler map[Operation]func(*EndpointInfo, *ServicePortInfo)) {
      	    for _, patch := range e {
      		     patch.apply(handler)
            }
      }
      
      • ServiceEndpointsPatch
      type ServiceEndpointsPatch struct {
            svc ServicePatch
            eps EndpointPatches
      }
      

    ServiceEndpointsPatch couples all mutually dependent operations together. Thus, all the ServiceEndpointsPatch(s) are mutually exclusive and can be applied in parallel in the future. It is a representation of the delta in a fullstate.ServiceEndpoints state transition ServiceEndpointsPatch = fullstate.ServiceEndpoints(after callback) - fullstate.ServiceEndpoints(before callback)

    • ipvs.go

      generatePatches() returns the list of ServiceEndpointsPatch required to complete the transition client.ServiceEndpoints from state A to state B.

    3. Implementation, Execute diffs [How to do?]

    • (clusterip | nodeport | loadbalacer)_handler.go

      handlers directly interact with the proxier to implement the network patches

    type Handler interface {
    	getServiceHandlers() map[Operation]func(*ServicePortInfo)
    	getEndpointHandlers() map[Operation]func(*EndpointInfo, *ServicePortInfo)
    }
    

    getServiceHandlers() and getEndpointHandlers() of the Handler interface returns sets of functions that actually implement the low-level networking logic and interact with kernel.

    • patch.go

      The apply() methods of the patches takes getServiceHandlers() and getEndpointHandlers() dependency as an argument to apply the change in the networking stack.

  • kpng local up.sh isnt watching the APIServer

    kpng local up.sh isnt watching the APIServer

    KPNG local up dies on startup bc it isnt doing the right thing to invoke the brain/backend separate modes problem. Ive partially fixed this, but something is still off w/ apiserver watches

    I have a quick fix to add

      containers:
      3       - args:
      2         - kube
      1         - to-api
    33          env:
      1         - name: GOLANG_P
    

    into the kpng local up script, which fixes the problem of it dying on startup. .

  • tilt up issue

    tilt up issue

    Looks like tilt up isnt parsing args right...

    tilt-dev
    user must specify the supported backend
    
    Usage: hack/tilt/setup.sh [i=ip_family] [b=backend] [m=deployment_model]
            -m set the KPNG deployment model, can either be:
                * split-process-per-node [legacy/debug] -> (To run KPNG server + client in separate containers/processes per node)
                * single-process-per-node [default] -> (To run KPNG server + client in a single container/process per node)
    Example:
             hack/tilt/setup.sh i=ipv4 b=iptables``` 
  • NFT Proxy should send the node name

    NFT Proxy should send the node name

    @daman1807 showed me this recently when making ipvs to fullstate... the NFT proxy doesnt send the node name when registering.

    • we should error in the brain when the backend doesnt send this on register
    • we should fix the issue in the NFT backend also
    • part of why i filed this issue was so that i could say the word "brain"
Command kube-tmux prints Kubernetes context and namespace to tmux status line.

kube-tmux Command kube-tmux prints Kubernetes context and namespace to tmux status line.

Sep 10, 2021
A general purpose cloud provider for Kube-Vip

kube-vip-cloud-provider The Kube-Vip cloud provider is a general purpose cloud-provider for on-prem bare-metal or virtualised environments. It's desig

Jan 8, 2023
A controller managing namespaces deployments, statefulsets and cronjobs objects. Inspired by kube-downscaler.

kube-ns-suspender Kubernetes controller managing namespaces life cycle. kube-ns-suspender Goal Usage Internals The watcher The suspender Flags Resourc

Dec 27, 2022
scenario system to check the behavior of kube-scheduler

kube-scheduler-simulator-cli: Kubernetes Scheduler simulator on CLI and scenario system. Hello world. This repository is scenario system for kube-sche

Jan 25, 2022
Kube - A simple Kubernetes client, based on client-go

kube A simple Kubernetes client, based on client-go.

Aug 9, 2022
Container image sweeper kube

container-image-sweeper-kube container-image-sweeper-kube は、不要になった Docker イメージを自

Jan 24, 2022
Kube-step-podautoscaler - Controller to scale workloads based on steps
Kube-step-podautoscaler - Controller to scale workloads based on steps

Refer controller/*controller.go for implementation details and explanation for a better understanding.

Sep 5, 2022
A fake kube-apiserver that serves static data from files

Static KAS A fake kube-apiserver that serves static data from an Openshift must-gather. Dynamically discovers resources and supports logs. Requires go

Nov 19, 2022
Rest API for todoapp written in Golang, using clean architecture, CI/CD
Rest API for todoapp written in Golang, using clean architecture, CI/CD

todoapp-backend Rest API for todoapp written in Golang, using Clean Architecture and CI/CD (includes unit tests and integration tests). Using: Web fra

Oct 23, 2022
Service Discovery and Governance Center for Distributed and Microservice Architecture
Service Discovery and Governance Center for Distributed and Microservice Architecture

Polaris: Service Discovery and Governance English | 简体中文 README: Introduction Components Getting started Chat group Contribution Visit website to lear

Dec 31, 2022
Lightweight, single-binary Backup Repository client. Part of E2E Backup Architecture designed by RiotKit

Backup Maker Tiny backup client packed in a single binary. Interacts with a Backup Repository server to store files, uses GPG to secure your backups e

Apr 4, 2022
dropspy is a (POC-quality) reworking of the C-language dropwatch tool in Go, with some extra features.

dropspy is a (POC-quality) reworking of the C-language dropwatch tool in Go, with some extra features.

Dec 12, 2022
Go (Golang) Clean Architecture based on Reading Uncle Bob's Clean Architecture
Go (Golang) Clean Architecture based on Reading Uncle Bob's Clean Architecture

go-clean-arch Changelog v1: checkout to the v1 branch Proposed on 2017, archived to v1 branch on 2018 Desc: Initial proposal by me. The story can be r

Jan 1, 2023
Golang Clean Architecture based on Uncle Bob's Clean Architecture and Summer internship in 2021

clean-architecture-api Description This is an example of implemention of Clean Architecture in Golang projects. This project has 4 layer : Infrastruct

Feb 20, 2022
Hexagonal architecture paradigms, such as dividing adapters into primary (driver) and secondary (driven)Hexagonal architecture paradigms, such as dividing adapters into primary (driver) and secondary (driven)

authorizer Architecture In this project, I tried to apply hexagonal architecture paradigms, such as dividing adapters into primary (driver) and second

Dec 7, 2021
This is a POC (Proof of Concept) using Hexagonal Architecture, SOLID, DDD, Clean Code, Clean Architecture
This is a POC (Proof of Concept) using Hexagonal Architecture, SOLID, DDD, Clean Code, Clean Architecture

Projeto Planetas Star Wars: Esse projeto de trata de uma POC utilizando os conceitos de Clean Arch, Hexagonal Arch, Clean Code, DDD, e SOLID. O princi

Feb 10, 2022
kcp is a prototype of a Kubernetes API server that is not a Kubernetes cluster - a place to create, update, and maintain Kube-like APis with controllers above or without clusters.
kcp is a prototype of a Kubernetes API server that is not a Kubernetes cluster - a place to create, update, and maintain Kube-like APis with controllers above or without clusters.

kcp is a minimal Kubernetes API server How minimal exactly? kcp doesn't know about Pods or Nodes, let alone Deployments, Services, LoadBalancers, etc.

Jan 6, 2023
Command kube-tmux prints Kubernetes context and namespace to tmux status line.

kube-tmux Command kube-tmux prints Kubernetes context and namespace to tmux status line.

Sep 10, 2021
A general purpose cloud provider for Kube-Vip

kube-vip-cloud-provider The Kube-Vip cloud provider is a general purpose cloud-provider for on-prem bare-metal or virtualised environments. It's desig

Jan 8, 2023
A controller managing namespaces deployments, statefulsets and cronjobs objects. Inspired by kube-downscaler.

kube-ns-suspender Kubernetes controller managing namespaces life cycle. kube-ns-suspender Goal Usage Internals The watcher The suspender Flags Resourc

Dec 27, 2022