Cluster API k3s

Cluster API k3s

Cluster API bootstrap provider k3s (CABP3) is a component of Cluster API that is responsible for generating a cloud-init script to turn a Machine into a Kubernetes Node; this implementation brings up k3s clusters instead of full kubernetes clusters.

CABP3 is the bootstrap component of Cluster API for k3s and brings in the following CRDS and controllers:

  • k3s bootstrap provider (KThrees, KThreesTemplate)

Cluster API ControlPlane provider k3s (CACP3) is a component of Cluster API that is responsible for managing the lifecycle of control plane machines for k3s; this implementation brings up k3s clusters instead of full kubernetes clusters.

CACP3 is the controlplane component of Cluster API for k3s and brings in the following CRDS and controllers:

  • k3s controlplane provider (KThreesControlPlane)

Together these two components make up Cluster API k3s...

Testing it out.

Warning: Project and documentation are in an early stage, there is an assumption that an user of this provider is already familiar with ClusterAPI.

Prerequisites

Check out the ClusterAPI Quickstart page to see the prerequisites for ClusterAPI.

Three main pieces are

  1. Bootstrap cluster. In the samples/azure/azure-setup.sh script, I use k3d, but feel free to use kind as well.

  2. clusterctl. Please check out ClusterAPI Quickstart for instructions.

  3. Infrastructure Specific Prerequisites:

Cluster API k3s has been tested on AWS, Azure, and AzureStackHCI environments.

To try out the Azure flow, fork the repo and look at samples/azure/azure-setup.sh.

To try out the AWS flow, fork the repo and look at samples/aws/aws-setup.sh.

Known Issues

Roadmap

  • Support for External Databases
  • Fix Token Logic
  • Setup CAPV samples
  • Clean up Control Plane Provider Code
  • Post an issue!
Comments
  • Control plane load balancer SSL health check fails

    Control plane load balancer SSL health check fails

    after applying the sample config

    $ kubectl apply -f samples/aws/k3s-cluster.yaml
    

    cluster-api-k3s successfully creates the vpc, control plane instance and load balancer.

    However the load balancer doesn't like how the apiserver on the control plane machine is talking https:

    image image

    When I change the health check type to TCP it works just fine. The rest of the CAPI machinery successfully connects to the apiserver and proceeds with the bootstrap of the worker node just fine. The CA and the certificates appear to me to be correct.

  • Mark DataSecretAvailable condition to true when setting dataSecretName

    Mark DataSecretAvailable condition to true when setting dataSecretName

    Closes #15

    I manually tried to edit the status subresource with the kubectl edit-status plugin and I confirm that once the DataSecretAvailable flips, then the Ready condition flips too and the cluster is fully initialized:

    $ clusterctl describe cluster k3-test-2
    NAME                                                          READY  SEVERITY  REASON  SINCE  MESSAGE
    Cluster/k3-test-2                                             True                     143m
    ├─ClusterInfrastructure - AWSCluster/k3-test-2                True                     5h45m
    ├─ControlPlane - KThreesControlPlane/k3-test-2-control-plane  True                     143m
    │ └─Machine/k3-test-2-control-plane-4s42f                     True                     5h45m
    └─Workers
      └─MachineDeployment/k3-test-2-md-0                          True                     141m
        └─Machine/k3-test-2-md-0-6454966fcd-mvbnt                 True                     143m
    

    (I still need to actually test the code by building my branch into an image and trying it out)

  • Fix networking in aws k3s-template

    Fix networking in aws k3s-template

    I noticed that networking wasn't working for pods scheduled on the worker node.

    k3s uses flannel by default, which implements an overlay network using VXLAN, which uses an UDP port (the port number is documented in https://docs.k3s.io/installation/requirements#networking).

    I also noticed that using the aws ccm is incompatible with servicelb (see #20) starting with k3s versions >=1.23.

    IIUC how networking is supposed to work here, servicelb is meant as a "low tech" load balancer if you expose the node directly to the public internet. But since in the same we're setting up aws ccm, I think we should also disable servicelb.


    I tested this with this sample app:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
    spec:
      selector:
        matchLabels:
          app: nginx
      replicas: 2 # tells deployment to run 2 pods matching the template
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.14.2
            ports:
            - containerPort: 80
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - nginx
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: shell
    spec:
      selector:
        matchLabels:
          app: shell
      replicas: 2 # tells deployment to run 2 pods matching the template
      strategy:
        type: Recreate
      template:
        metadata:
          labels:
            app: shell
        spec:
          containers:
          - name: shell
            image: debian
            command: [ "/bin/bash", "-c", "--" ]
            args: [ "apt-get update; apt install curl; while true; do sleep 30; done;" ]
          terminationGracePeriodSeconds: 1
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - shell
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx
    spec:
      selector:
        app: nginx
      ports:
      - port: 80
    ---
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: demo
    spec:
      ingressClassName: traefik
      rules:
      - http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: nginx
                port:
                  number: 80
    

    it deploys two nginx and two shells on both nodes (control plane and worker)

    • I manually issued requests to the nginx service from both shell pods and also directly to the pod ips for the two replicas.
    • I verified that pods on both nodes can access the internet
    • I verified that I could access my nginx demo pod from the external load balancer
    $ ke --kubeconfig <(clusterctl get kubeconfig k3-test-24) get ingress -A
    NAMESPACE   NAME   CLASS     HOSTS   ADDRESS                                                                   PORTS   AGE
    default     demo   traefik   *       ad248e6fc6f2f4ceca6db3a90bbbc269-1341700786.us-east-1.elb.amazonaws.com   80      9s
    
  • `Ready` and `DataSecretAvailable` conditions are false despite there being a `dataSecretName`

    `Ready` and `DataSecretAvailable` conditions are false despite there being a `dataSecretName`

    apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
      kind: KThreesConfig
      metadata:
        creationTimestamp: "2022-12-09T11:00:38Z"
        generation: 2
        labels:
          cluster.x-k8s.io/cluster-name: k3-test-2
          cluster.x-k8s.io/control-plane: ""
        name: k3-test-2-control-plane-pl7ln
        namespace: default
        ownerReferences:
        - apiVersion: cluster.x-k8s.io/v1beta1
          blockOwnerDeletion: true
          controller: true
          kind: Machine
          name: k3-test-2-control-plane-4s42f
          uid: ad4312fa-bf76-4b9e-999a-3cc6e0a12182
        resourceVersion: "1142181"
        uid: 5390ddd9-f4b3-4ec7-b94a-3156b365dfce
      spec:
        agentConfig:
          nodeName: '{{ ds.meta_data.local_hostname }}'
        serverConfig: {}
        version: v1.21.5+k3s2
      status:
        conditions:
        - lastTransitionTime: "2022-12-09T11:00:38Z"
          reason: WaitingForControlPlaneAvailable
          severity: Info
          status: "False"
          type: Ready
        - lastTransitionTime: "2022-12-09T11:00:38Z"
          status: "True"
          type: CertificatesAvailable
        - lastTransitionTime: "2022-12-09T11:00:38Z"
          reason: WaitingForControlPlaneAvailable
          severity: Info
          status: "False"
          type: DataSecretAvailable
        dataSecretName: k3-test-2-control-plane-pl7ln   # <----------
        observedGeneration: 2
        ready: true
    

    quickly skimming through the sources I noticed that the code that should turn this condition true has been commented out since the initial commit and never uncommented:

    	case configOwner.DataSecretName() != nil && (!config.Status.Ready || config.Status.DataSecretName == nil):
    		config.Status.Ready = true
    		config.Status.DataSecretName = configOwner.DataSecretName()
    		//conditions.MarkTrue(config, bootstrapv1.DataSecretAvailableCondition)
    		return ctrl.Result{}, nil
    
  • Sample for Nutanix CAPI provider

    Sample for Nutanix CAPI provider

    Add a first working sample for Nutanix CAPI Provider (CAPX)

    kubectl get nodes
    NAME                        STATUS   ROLES                       AGE     VERSION
    capx-control-plane-vxvwr   Ready    control-plane,etcd,master   9m41s   v1.24.8+k3s1
    capx-mt-0-chfx4            Ready    <none>                      8m15s   v1.24.8+k3s1
    capx-mt-0-jsx7z            Ready    <none>                      8m25s   v1.24.8+k3s1
    
  • custom cni (disable flannel)

    custom cni (disable flannel)

    According to https://docs.k3s.io/installation/network-options#custom-cni if we want to install a custom CNI, we need to pass --flannel-backend=none.

    In order to do that, we need to surface the config in K3sServerConfig and in the relevant CRDs

    type K3sServerConfig struct {
    	DisableCloudController    bool     `json:"disable-cloud-controller,omitempty"`
    	KubeAPIServerArgs         []string `json:"kube-apiserver-arg,omitempty"`
    	KubeControllerManagerArgs []string `json:"kube-controller-manager-arg,omitempty"`
    	TLSSan                    []string `json:"tls-san,omitempty"`
    ....
    }
    
    
  • cloud controller port clash on k3s >=v1.23.x

    cloud controller port clash on k3s >=v1.23.x

    When using k3s with version >= v1.23.x I get this error when spinning up the cloud controller (which blocks any other component due to the cloud controller readiness taint):

    cloud-controller-manager
    I1124 09:28:48.381554 1 serving.go:313] Generated self-signed cert in-memory
    cloud-controller-manager
    failed to create listener: failed to listen on 0.0.0.0:10258: listen tcp 0.0.0.0:10258: bind: address already in use
    

    Turns out this is caused by a change in k3s https://github.com/k3s-io/k3s/issues/6554

    I tested the workaround mentioned in that ticket by manually editing /etc/rancher/k3s/config.yaml

     cluster-init: true
     disable-cloud-controller: true
     kube-apiserver-arg:
     - anonymous-auth=true
     - tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_GCM_SHA384
     kube-controller-manager-arg:
     - cloud-provider=external
     kubelet-arg:
     - cloud-provider=external
    +kube-cloud-controller-manager-arg:
    +- secure-port=0
     node-name: 'ip-10-0-193-85.ec2.internal'
     tls-san:
     - k3-test-16-apiserver-1867539897.us-east-1.elb.amazonaws.com
    

    A quick look at the server config schema doesn't reveal any trick I can use to set that arg:

    type K3sServerConfig struct {
    	DisableCloudController    bool     `json:"disable-cloud-controller,omitempty"`
    	KubeAPIServerArgs         []string `json:"kube-apiserver-arg,omitempty"`
    	KubeControllerManagerArgs []string `json:"kube-controller-manager-arg,omitempty"`
    	TLSSan                    []string `json:"tls-san,omitempty"`
    	BindAddress               string   `json:"bind-address,omitempty"`
    	HttpsListenPort           string   `json:"https-listen-port,omitempty"`
    	AdvertiseAddress          string   `json:"advertise-address,omitempty"`
    	AdvertisePort             string   `json:"advertise-port,omitempty"`
    	ClusterCidr               string   `json:"cluster-cidr,omitempty"`
    	ServiceCidr               string   `json:"service-cidr,omitempty"`
    	ClusterDNS                string   `json:"cluster-dns,omitempty"`
    	ClusterDomain             string   `json:"cluster-domain,omitempty"`
    	DisableComponents         []string `json:"disable,omitempty"`
    	ClusterInit               bool     `json:"cluster-init,omitempty"`
    	K3sAgentConfig            `json:",inline"`
    }
    

    should I add KubeCloudControllerManagerArgs ?

  • k3s vs k8s versioning

    k3s vs k8s versioning

    with CAPA, you need to pass a k8s version string like 1.21.5, while with cluster-api-k3s you need to pass the full qualified version including the k3s revision, like v1.21.5+k3s2.

    This breaks the automatic AMI image lookup logic and requires you fiddle with the imageLookupFormat or to add an explicit ami ID (which depends on a region).

    Example:

    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KThreesControlPlane
    metadata:
      name: k3-test-13-control-plane
    spec:
      version: v1.21.5+k3s2
    ...
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: AWSCluster
    metadata:
      name: k3-test-13
    spec:
      controlPlaneLoadBalancer:
        healthCheckProtocol: TCP
      imageLookupBaseOS: ubuntu-20.04
      imageLookupFormat: capa-ami-{{.BaseOS}}-1.21.5-*
    ...
    

    I wonder if there is a more ergonomic way to use cluster-api-k3s.

    • Does it even make sense to use capa-ami-..... images to run k3s? These images have been tested and tuned for kubeadm based k8s, but doesn't necessarily mean k3s would benefit from that.
    • should cluster-api-k3s autodiscover the latest k3s revision (and offer the possibility to pin one if the user wants?)
    • should we just document how to use imageLookupOrg and imageLookupFilter to find a generic ubuntu image?
    • does k3s come with its own blessed AMIs?
  • Releasing via git versioning/tagging and build automation

    Releasing via git versioning/tagging and build automation

    I saw in the Makefile that the container images are built, tagged and pushed to ghcr.io here and here and I was trying to understand what git commits those container image tags were referring to.

    I think it would be good to have some sort of releasing by tagging at a git level (or something similar) and then have some automation (e.g. GitHub Actions) to trigger the container images build and push, to keep things referenced between the source and the image artifacts. What do you think?

    PS. Thanks for this project.

  • more a question of running on vsphere

    more a question of running on vsphere

    Great job on the initial tryout of creating a framework for k3s using cluster-api. Is this something that you are planning to eventually contribute upstream? is it ok if i ask for help when i try to integrate it with CAPV so that i can run k3s on vsphere environment?

RancherOS v2 is an immutable Linux distribution built to run Rancher and it's corresponding Kubernetes distributions RKE2 and k3s

RancherOS v2 is an immutable Linux distribution built to run Rancher and it's corresponding Kubernetes distributions RKE2 and k3s. It is built using the cOS-toolkit and based on openSUSE

Dec 27, 2022
kubetnl tunnels TCP connections from within a Kubernetes cluster to a cluster-external endpoint, e.g. to your local machine. (the perfect complement to kubectl port-forward)

kubetnl kubetnl (kube tunnel) is a command line utility to tunnel TCP connections from within a Kubernetes to a cluster-external endpoint, e.g. to you

Dec 16, 2022
A pod scaler golang app that can scale replicas either inside of cluster or out of the cluster

pod-scaler A simple pod scaler golang application that can scale replicas via manipulating the deployment Technologies The project has been created us

Oct 24, 2021
Go-gke-pulumi - A simple example that deploys a GKE cluster and an application to the cluster using pulumi

This example deploys a Google Cloud Platform (GCP) Google Kubernetes Engine (GKE) cluster and an application to it

Jan 25, 2022
Influxdb-cluster - InfluxDB Cluster for replacing InfluxDB Enterprise

InfluxDB ATTENTION: Around January 11th, 2019, master on this repository will be

Dec 26, 2022
A Terraform module to manage cluster authentication (aws-auth) for an Elastic Kubernetes (EKS) cluster on AWS.

Archive Notice The terraform-aws-modules/eks/aws v.18.20.0 release has brought back support aws-auth configmap! For this reason, I highly encourage us

Dec 4, 2022
Kubedock is a minimal implementation of the docker api that will orchestrate containers on a Kubernetes cluster, rather than running containers locally.

Kubedock Kubedock is an minimal implementation of the docker api that will orchestrate containers on a kubernetes cluster, rather than running contain

Nov 11, 2022
K8s controller implementing Multi-Cluster Services API based on AWS Cloud Map.

AWS Cloud Map MCS Controller for K8s Introduction AWS Cloud Map multi-cluster service discovery for Kubernetes (K8s) is a controller that implements e

Dec 17, 2022
Kubernetes Cluster API Provider AWS
Kubernetes Cluster API Provider AWS

Kubernetes Cluster API Provider AWS Kubernetes-native declarative infrastructure for AWS. What is the Cluster API Provider AWS The Cluster API brings

Nov 2, 2022
Cluster API Provider for KubeVirt

Kubernetes Template Project The Kubernetes Template Project is a template for starting new projects in the GitHub organizations owned by Kubernetes. A

Jan 4, 2023
Kusk makes your OpenAPI definition the source of truth for API resources in your cluster
Kusk makes your OpenAPI definition the source of truth for API resources in your cluster

Kusk - use OpenAPI to configure Kubernetes What is Kusk? Developers deploying their REST APIs in Kubernetes shouldn't have to worry about managing res

Dec 16, 2022
capc (cap ka) is a cluster api provider for the civo platform created for the hackathon for fun

capc (cap ka) is a cluster api provider for the civo platform created for the hackathon for fun! Interested in helping drive it forward? you are more then welcome to join in!

Nov 20, 2022
Simple-go-api - This porject deploys a simple go app inside a EKS Cluster

SimpleGoApp This porject deploys a simple go app inside a EKS Cluster Prerequisi

Jan 19, 2022
Multi cluster kubernetes dashboard with batteries included. Build by developers, for developers.

kubetower Multi cluster kubernetes dashboard with batteries included. Built by developers, for developers. Features Restart deployments with one click

Nov 28, 2022
A serverless cluster computing system for the Go programming language

Bigslice Bigslice is a serverless cluster data processing system for Go. Bigslice exposes composable API that lets the user express data processing ta

Dec 14, 2022
Enterprise-grade container platform tailored for multicloud and multi-cluster management
Enterprise-grade container platform tailored for multicloud and multi-cluster management

KubeSphere Container Platform What is KubeSphere English | 中文 KubeSphere is a distributed operating system providing cloud native stack with Kubernete

Jan 2, 2023
Open Source runtime tool which help to detect malware code execution and run time mis-configuration change on a kubernetes cluster
Open Source runtime tool which help to detect malware code execution and run time mis-configuration change on a kubernetes cluster

Kube-Knark Project Trace your kubernetes runtime !! Kube-Knark is an open source tracer uses pcap & ebpf technology to perform runtime tracing on a de

Sep 19, 2022
Kubesecret is a command-line tool that prints secrets and configmaps data of a kubernetes cluster.

Kubesecret Kubesecret is a command-line tool that prints secrets and configmaps data of a kubernetes cluster. kubesecret -h for help pages. Install go

May 3, 2022
Kubegres is a Kubernetes operator allowing to create a cluster of PostgreSql instances and manage databases replication, failover and backup.

Kubegres is a Kubernetes operator allowing to deploy a cluster of PostgreSql pods with data replication enabled out-of-the box. It brings simplicity w

Dec 30, 2022