cluster registration and lifecycle

Cluster Registration

Contains controllers that support the registration of managed clusters to a hub to place them under management.

Community, discussion, contribution, and support

Check the CONTRIBUTING Doc for how to contribute to the repo.


Getting Started

Prerequisites

These instructions assume:

  • You have a running kubernetes cluster
  • You have KUBECONFIG environment variable set to a kubeconfig file giving you cluster-admin role on that cluster

Notice: The time of hub and managed clusters should be synchronized.

Deploy Hub

  1. Run make deploy-hub
  2. Run make deploy-webhook

Deploy Spoke

  1. Run make bootstrap-secret
  2. Run make deploy-spoke
Comments
  • Add V1beta1CSRAPICompatibility feature gate for registration-controller

    Add V1beta1CSRAPICompatibility feature gate for registration-controller

    In some cases, our hub cluster's kubernetes version is less than 1.19, and its certificates.k8s.io is v1beta1. So we need to enable the feature gate V1beta1CSRAPICompatibility. Otherwise, the registration-controller will get the following error message.

    W0630 07:07:55.361239       1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.CertificateSigningRequest: the server could not find the requested resource
    E0630 07:07:55.361260       1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.CertificateSigningRequest: failed to list *v1.CertificateSigningRequest: the server could not find the requested resource
    

    Signed-off-by: Promacanthus [email protected]

  • Refactor and make client cert controller reusable

    Refactor and make client cert controller reusable

    As @qiujian16 suggested, it is split into two PRs to support addon registration. This is the first one.

    The related issue: https://github.com/open-cluster-management/backlog/issues/10714

    Signed-off-by: Yang Le [email protected]

  • add controller to set default clusterset label

    add controller to set default clusterset label

    add a controller to set default cluster set label for managed clusters which not set cluster set label. the label will be cluster.open-cluster-management.io/clusterset:default

    Signed-off-by: ycyaoxdu [email protected]

  • Report all res information to managedcluster status like cpu and memory

    Report all res information to managedcluster status like cpu and memory

    Signed-off-by: suigh [email protected]

    This change is used to report the GPU information to hub cluster, the status in the managedcluster will be updated. We will use this GPU information to get placement decision in the future.

    nvidia.com/gpu is used here for GPU information because: 1: it is used by nvidia and k8s 2: if we support other GPUs (like AMD gpu) later, we needn't change the current name that only for nvidia gpu only, no compatibility issue.

    The output of kc describe managedclusters is updated as following, nvidia.com/gpu is displayed:

    Name:         cluster2
     ......
    Spec:
      Hub Accepts Client:      true
      Lease Duration Seconds:  60
      ......
    Status:
      Allocatable:
        Cpu:             64
        Memory:          256168Mi
        nvidia.com/gpu:  1
      Capacity:
        Cpu:             64
        Memory:          256468Mi
        nvidia.com/gpu:  1
      Conditions:
    

    BTW: Thanks very much for Jian (@zhujian7), without his help, I cannot build the test env at all.

  • add taints controller

    add taints controller

    Signed-off-by: JiaHaoWei [email protected] Implement the Taints mentioned in #https://github.com/open-cluster-management-io/community/issues/48, and add ut and integration tests

  • Add details (metric) about Spoke cluster

    Add details (metric) about Spoke cluster

    This is a WIP PR. But since this could generate lots of discussions, wanted raise this and capture all the discussion through this PR.

    Goal: We aim to capture a metric about the Spoke cluster that joins the Hub in Prometheus exposition format as shown

    //cluster_id = OCP ID of the Cluster (need to resolve for eks, etc)
    //type = K8s Distribution, e.g. OCP, EKS, etc
    //version = Distribution version
    //cluster_infrastructure_provider = value "Type" from cluster_infrastructure_provider
    //hub_id = cluster_id of hub server
    //cluster_name =User Display Name of Cluster (defaults to Id if not provided)
    var managedClusterMetric = prometheus.NewGaugeVec(prometheus.GaugeOpts{
    	Name: "a_managed_cluster",
    	Help: "Managed Cluster being managed by ACM Hub.",
    }, []string{"cluster_name", "cluster_id", "type", "version", "cluster_infrastructure_provider", "hub_id"})
    

    Highlights:

    1. Metrics endpoint is created such that it can be scraped by Prometheus. For this reason we have exposed a new API endpoint in https://github.com/bjoydeep/registration/tree/master/pkg/hub/custommetrics. We also have been told by @deads2k and @qiujian16 that we should NOT expose this endpoint - but use the default 8443 port which is available for this and already exposes some other key metrics about the performance of the controller. Moving to 8443 will be WIP.
    2. For the code to work the way it is, we need to add an environment variable (ie we have fenced this off): https://github.com/bjoydeep/registration/blob/master/pkg/hub/manager.go#L88-#L102 . So unless the env variable is specifically added to the deployment yaml, this code will not get called.
    3. We are working on adding e2e tests.
    4. As of now, we can only gather the cluster-name of the spoke cluster (which may or may not be the OpenShift ClusterID for OpenShift clusters). We will propose to make other changes so that we can get the ClusterID for OpenShift and *KS clusters in a consistent manner. But we can do that elsewhere. We have also captured the ClusterID for the Spoke cluster using the OpenShift ClusterVersionOperator
    5. We are trying to see what metrics are gathered at the default 8443 port of the registration controller POD and running into security issues - which we will try to work through.
  • remove role/rolebinding finalizer when manifestworks are cleaned

    remove role/rolebinding finalizer when manifestworks are cleaned

    remove finalizer on rbac when all manifestworks are cleaned (samiliar from pr https://github.com/open-cluster-management/work/pull/23) but i think it is better to move to registration repo.

  • check e2e mutatingWebhookDeployment replicas

    check e2e mutatingWebhookDeployment replicas

    Fix the below error when running e2e cases before mutatingWebhook Deployment is ready

    Taints update check Check the taint to update according to the condition status 
      Should update taints automatically
      /root/go/src/open-cluster-management.io/registration/test/e2e/taint_update_test.go:43
    
    • Failure in Spec Setup (BeforeEach) [0.007 seconds]
    Taints update check
    /root/go/src/open-cluster-management.io/registration/test/e2e/taint_update_test.go:16
      Check the taint to update according to the condition status
      /root/go/src/open-cluster-management.io/registration/test/e2e/taint_update_test.go:17
        Should update taints automatically [BeforeEach]
        /root/go/src/open-cluster-management.io/registration/test/e2e/taint_update_test.go:43
    
        Unexpected error:
            <*errors.StatusError | 0xc0003941e0>: {
                ErrStatus: {
                    TypeMeta: {Kind: "", APIVersion: ""},
                    ListMeta: {
                        SelfLink: "",
                        ResourceVersion: "",
                        Continue: "",
                        RemainingItemCount: nil,
                    },
                    Status: "Failure",
                    Message: "Internal error occurred: failed calling webhook \"managedclustermutators.admission.cluster.open-cluster-management.io\": failed to call webhook: the server is currently unable to handle the request",
                    Reason: "InternalError",
                    Details: {
                        Name: "",
                        Group: "",
                        Kind: "",
                        UID: "",
                        Causes: [
                            {
                                Type: "",
                                Message: "failed calling webhook \"managedclustermutators.admission.cluster.open-cluster-management.io\": failed to call webhook: the server is currently unable to handle the request",
                                Field: "",
                            },
                        ],
                        RetryAfterSeconds: 0,
                    },
                    Code: 500,
                },
            }
            Internal error occurred: failed calling webhook "managedclustermutators.admission.cluster.open-cluster-management.io": failed to call webhook: the server is currently unable to handle the request
        occurred
    
        /root/go/src/open-cluster-management.io/registration/test/e2e/taint_update_test.go:35
    

    Signed-off-by: haoqing0110 [email protected]

  • Support install addons in customized namespaces

    Support install addons in customized namespaces

    Signed-off-by: xuezhaojun [email protected]

    Epic: https://github.com/stolostron/backlog/issues/23093

    User-story: https://github.com/stolostron/backlog/issues/23929

  • fix cluster labels update conflict.

    fix cluster labels update conflict.

    fix cluster labels update conflict:

    E1122 13:49:38.188870       1 base_controller.go:270] "AddOnFeatureDiscoveryController" controller failed to sync "ocm-controlplane-1-mc-4/work-manager", err: Operation cannot be fulfilled on managedclusters.cluster.open-cluster-management.io "ocm-controlplane-1-mc-4": the object has been modified; please apply your changes to the latest version and try again
    E1122 13:49:45.423603       1 base_controller.go:270] "AddOnFeatureDiscoveryController" controller failed to sync "ocm-controlplane-1-mc-3/governance-policy-framework", err: Operation cannot be fulfilled on managedclusters.cluster.open-cluster-management.io "ocm-controlplane-1-mc-3": the object has been modified; please apply your changes to the latest version and try again
    E1122 13:49:45.438568       1 base_controller.go:270] "AddOnFeatureDiscoveryController" controller failed to sync "ocm-controlplane-1-mc-1/config-policy-controller", err: Operation cannot be fulfilled on managedclusters.cluster.open-cluster-management.io "ocm-controlplane-1-mc-1": the object has been modified; please apply your changes to the latest version and try again
    E1122 13:49:45.451226       1 base_controller.go:270] "AddOnFeatureDiscoveryController" controller failed to sync "ocm-controlplane-1-mc-3/config-policy-controller", err: Operation cannot be fulfilled on managedclusters.cluster.open-cluster-management.io "ocm-controlplane-1-mc-3": the object has been modified; please apply your changes to the latest version and try again
    E1122 13:49:45.460923       1 base_controller.go:270] "AddOnFeatureDiscoveryController" controller failed to sync "ocm-controlplane-1-mc-1/config-policy-controller", err: Operation cannot be fulfilled on managedclusters.cluster.open-cluster-management.io "ocm-controlplane-1-mc-1": the object has been modified; please apply your changes to the latest version and try again
    E1122 13:49:45.477055       1 base_controller.go:270] "AddOnFeatureDiscoveryController" controller failed to sync "ocm-controlplane-1-mc-1/governance-policy-framework", err: Operation cannot be fulfilled on managedclusters.cluster.open-cluster-management.io "ocm-controlplane-1-mc-1": the object has been modified; please apply your changes to the latest version and try again
    E1122 13:49:45.488734       1 base_controller.go:270] "AddOnFeatureDiscoveryController" controller failed to sync "ocm-controlplane-1-mc-3", err: Operation cannot be fulfilled on managedclusters.cluster.open-cluster-management.io "ocm-controlplane-1-mc-3": the object has been modified; please apply your changes to the latest version and try again
    E1122 13:49:45.503787       1 base_controller.go:270] "AddOnFeatureDiscoveryController" controller failed to sync "ocm-controlplane-1-mc-1/governance-policy-framework", err: Operation cannot be fulfilled on managedclusters.cluster.open-cluster-management.io "ocm-controlplane-1-mc-1": the object has been modified; please apply your changes to the latest version and try again
    E1122 13:49:45.511774       1 base_controller.go:270] "AddOnFeatureDiscoveryController" controller failed to sync "ocm-controlplane-1-mc-3", err: Operation cannot be fulfilled on managedclusters.cluster.open-cluster-management.io "ocm-controlplane-1-mc-3": the object has been modified; please apply your changes to the latest version and try again
    

    Signed-off-by: morvencao [email protected]

  • [RFE] during import cluster auto detect microshift vendor/version

    [RFE] during import cluster auto detect microshift vendor/version

    MicroShift is a project that is exploring how OpenShift and Kubernetes can be optimized for small form factor and edge computing.

    • https://microshift.io/docs/home/
    • https://microshift.io/docs/getting-started/

    microshift is currently recognized as OpenShiftV3 when registering a cluster. This affects Observability to install monitoring stack for microshift.

  • [WIP] Refactor cluster deletion process

    [WIP] Refactor cluster deletion process

    1. monitor addon/work when deletion
    2. delete ns when a special label is not set
    3. allow adding other resource to monitor before starting deletion process

    Signed-off-by: Jian Qiu [email protected]

  • Add label to all kube resources that are created/managed by registration

    Add label to all kube resources that are created/managed by registration

    We need to add label to all kube resources that are created/managed by registration, thus we can use this label to filter the resources when we inform the resources to reduce the resource usage

  • Add the ability to auth to managed clusters without using CSR

    Add the ability to auth to managed clusters without using CSR

    For some reason we don't have ability for enabling CSR in our clusters. Can you add another way to authenticate managed clusters without using CSR, like sa-token/secret with kubeconfig?

Control external Fan to cool down your raspi cluster
Control external Fan to cool down your raspi cluster

Fan control for Raspberry Pi This is a small project that I build in order to cool down my raspi home cluster The case I use have some external fans t

Dec 11, 2021
A restart tracker that gives context to what is restarting in your cluster

A restart tracker that gives context to what is restarting in your cluster

Dec 20, 2022
Multi-cluster api gateway based on apiserver-aggregation.

Cluster Gateway "Cluster-Gateway" is a gateway apiserver for routing kubernetes api traffic to multiple kubernetes clusters. Additionally, the gateway

Jan 6, 2023
[TOOL, CLI] - Filter and examine Go type structures, interfaces and their transitive dependencies and relationships. Export structural types as TypeScript value object or bare type representations.

typex Examine Go types and their transitive dependencies. Export results as TypeScript value objects (or types) declaration. Installation go get -u gi

Dec 6, 2022
:chart_with_upwards_trend: Monitors Go MemStats + System stats such as Memory, Swap and CPU and sends via UDP anywhere you want for logging etc...

Package stats Package stats allows for gathering of statistics regarding your Go application and system it is running on and sent them via UDP to a se

Nov 10, 2022
James is your butler and helps you to create, build, debug, test and run your Go projects
James is your butler and helps you to create, build, debug, test and run your Go projects

go-james James is your butler and helps you to create, build, debug, test and run your Go projects. When you often create new apps using Go, it quickl

Oct 8, 2022
GoThanks automatically stars Go's official repository and your go.mod github dependencies, providing a simple way to say thanks to the maintainers of the modules you use and the contributors of Go itself.
GoThanks automatically stars Go's official repository and your go.mod github dependencies, providing a simple way  to say thanks to the maintainers of the modules you use and the contributors of Go itself.

Give thanks (in the form of a GitHub ★) to your fellow Go modules maintainers. About GoThanks performs the following operations Sends a star to Go's r

Dec 24, 2022
A simple Cron library for go that can execute closures or functions at varying intervals, from once a second to once a year on a specific date and time. Primarily for web applications and long running daemons.

Cron.go This is a simple library to handle scheduled tasks. Tasks can be run in a minimum delay of once a second--for which Cron isn't actually design

Dec 17, 2022
Library to work with MimeHeaders and another mime types. Library support wildcards and parameters.

Mime header Motivation This library created to help people to parse media type data, like headers, and store and match it. The main features of the li

Nov 9, 2022
The new home of the CUE language! Validate and define text-based and dynamic configuration

The CUE Data Constraint Language Configure, Unify, Execute CUE is an open source data constraint language which aims to simplify tasks involving defin

Dec 31, 2022
Hack this repo and add your name to the list above. Creativity and style encouraged in both endeavors.

Hack this repo and add your name to the list above. Creativity and style encouraged in both endeavors.

Oct 1, 2021
A comprehensive training, nutrition, and social platform for martial artists and combat sport athletes

COMHRAC Comhrac (Gaelic for "Combat") is a comprehensive training, nutrition, and social platform for martial artists and combat sport athletes. Devel

Oct 17, 2021
Purpose: dump slack messages, users and files using browser token and cookie.

Slack Dumper Purpose: dump slack messages, users and files using browser token and cookie. Typical usecase scenarios: You want to archive your private

Jan 2, 2023
Package buildinfo provides basic building blocks and instructions to easily add build and release information to your app.
Package buildinfo provides basic building blocks and instructions to easily add build and release information to your app.

Package buildinfo provides basic building blocks and instructions to easily add build and release information to your app. This is done by replacing variables in main during build with ldflags.

Nov 14, 2021
Phalanx is a cloud-native full-text search and indexing server written in Go built on top of Bluge that provides endpoints through gRPC and traditional RESTful API.

Phalanx Phalanx is a cloud-native full-text search and indexing server written in Go built on top of Bluge that provides endpoints through gRPC and tr

Dec 25, 2022
Highly extensible, customizable application launcher and window switcher written in less than 300 lines of Golang and fyne
Highly extensible, customizable application launcher and window switcher written in less than 300 lines of Golang and fyne

golauncher A go application launcher A simple, highly extensible, customizable application launcher and window switcher written in less than 300 lines

Aug 21, 2022
Go package and associated command line utility to generate random yet human-readable names and identifiers
Go package and associated command line utility to generate random yet human-readable names and identifiers

namegen | What's this? Go package and associated command line utility to generate random yet human-readable names and identifiers. Somewhat inspired b

Oct 19, 2022
An application dedicated to the trivial and boring task of meal planning 📅 and generating shoppings list 🛒.
An application dedicated to the trivial and boring task of meal planning 📅 and generating shoppings list 🛒.

An application dedicated to the trivial and boring task of meal planning ?? and generating shoppings list ??.

Mar 1, 2022
Proc-peepin - Capture process cpu and memory and send it off to influx

proc-peepin Capture process cpu and memory and send it off to influx Running loc

Feb 13, 2022