Commands used to depoly a hub and managed cluster using minikube:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/cluster-registry/master/cluster-registry-crd.yaml
git clone https://github.com/open-cluster-management/registration-operator.git
cd ./registration-operator
git checkout master
make deploy-hub
kubectl wait deployments --all --for condition=Available -n olm --timeout 1m
kubectl wait deployments --all --for condition=Available -n open-cluster-management
kubectl wait deployments --all --for condition=Available -n open-cluster-management-hub --timeout 2m
minikube start --driver=kvm2 --profile=cluster1
kubectl config view --flatten --context=hub --minify >/tmp/hub-config
kubectl config view --flatten --context=cluster1 --minify >/tmp/cluster1-config
kubectl config use-context cluster1
export KLUSTERLET_KIND_KUBECONFIG=/tmp/cluster1-config
export KIND_CLUSTER=cluster1
export HUB_KIND_KUBECONFIG=/tmp/hub-config
make deploy-spoke-kind
kubectl wait deployments --all --for condition=Available -n open-cluster-management
kubectl wait deployments --all --for condition=Available -n open-cluster-management-agent
Command to gather state:
NAME READY STATUS RESTARTS AGE
klusterlet-registration-agent-85f68b68c4-hr7ln 1/1 Running 0 32m
klusterlet-registration-agent-85f68b68c4-pkg98 1/1 Running 0 32m
klusterlet-registration-agent-85f68b68c4-t756z 1/1 Running 0 32m
klusterlet-work-agent-786b47fb99-7cp4x 0/1 CrashLoopBackOff 7 32m
klusterlet-work-agent-786b47fb99-8wcdb 0/1 CrashLoopBackOff 7 32m
klusterlet-work-agent-786b47fb99-vx2qk 0/1 CrashLoopBackOff 7 32m
$ minikube profile list
|----------|-----------|---------|----------------|------|---------|---------|-------|
| Profile | VM Driver | Runtime | IP | Port | Version | Status | Nodes |
|----------|-----------|---------|----------------|------|---------|---------|-------|
| cluster1 | kvm2 | docker | 192.168.39.92 | 8443 | v1.20.2 | Running | 1 |
| hub | kvm2 | docker | 192.168.39.156 | 8443 | v1.20.2 | Running | 1 |
|----------|-----------|---------|----------------|------|---------|---------|-------|
$ kubectl describe pod klusterlet-work-agent-786b47fb99-7cp4x -n open-cluster-management-agent
Name: klusterlet-work-agent-786b47fb99-7cp4x
Namespace: open-cluster-management-agent
Priority: 0
Node: cluster1/192.168.39.92
Start Time: Tue, 13 Apr 2021 08:16:03 -0700
Labels: app=klusterlet-manifestwork-agent
pod-template-hash=786b47fb99
Annotations: <none>
Status: Running
IP: 172.17.0.16
IPs:
IP: 172.17.0.16
Controlled By: ReplicaSet/klusterlet-work-agent-786b47fb99
Containers:
klusterlet-manifestwork-agent:
Container ID: docker://0247a6d965fc6040fab94d2cacf7bfbea915ac92b3b7cb54ad1e2c54da05ceb9
Image: quay.io/open-cluster-management/work
Image ID: docker-pullable://quay.io/open-cluster-management/work@sha256:e60ddb4fdcb24932816118f564748126ff032f49354f4d430d2d46a9df22a3f5
Port: <none>
Host Port: <none>
Args:
/work
agent
--spoke-cluster-name=cluster1
--hub-kubeconfig=/spoke/hub-kubeconfig/kubeconfig
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 13 Apr 2021 08:46:26 -0700
Finished: Tue, 13 Apr 2021 08:47:36 -0700
Ready: False
Restart Count: 7
Requests:
cpu: 100m
memory: 128Mi
Liveness: http-get https://:8443/healthz delay=2s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:8443/healthz delay=2s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/spoke/hub-kubeconfig from hub-kubeconfig-secret (ro)
/var/run/secrets/kubernetes.io/serviceaccount from klusterlet-work-sa-token-ls59n (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
hub-kubeconfig-secret:
Type: Secret (a volume populated by a Secret)
SecretName: hub-kubeconfig-secret
Optional: false
klusterlet-work-sa-token-ls59n:
Type: Secret (a volume populated by a Secret)
SecretName: klusterlet-work-sa-token-ls59n
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 40m default-scheduler Successfully assigned open-cluster-management-agent/klusterlet-work-agent-786b47fb99-7cp4x to cluster1
Normal Pulled 39m kubelet Successfully pulled image "quay.io/open-cluster-management/work" in 29.427044917s
Normal Pulled 38m kubelet Successfully pulled image "quay.io/open-cluster-management/work" in 1.508110447s
Normal Pulled 37m kubelet Successfully pulled image "quay.io/open-cluster-management/work" in 1.442129606s
Normal Pulled 33m kubelet Successfully pulled image "quay.io/open-cluster-management/work" in 1.516616046s
Normal Started 33m (x4 over 39m) kubelet Started container klusterlet-manifestwork-agent
Warning Unhealthy 25m kubelet Readiness probe failed: Get "https://172.17.0.16:8443/healthz": dial tcp 172.17.0.16:8443: connect: connection refused
Normal Pulled 25m kubelet Successfully pulled image "quay.io/open-cluster-management/work" in 1.41603949s
Normal Created 25m (x5 over 39m) kubelet Created container klusterlet-manifestwork-agent
Normal Pulling 10m (x8 over 40m) kubelet Pulling image "quay.io/open-cluster-management/work"
Warning BackOff 5m17s (x72 over 37m) kubelet Back-off restarting failed container
$ kubectl logs -p klusterlet-work-agent-786b47fb99-7cp4x -n open-cluster-management-agent
W0413 15:46:26.585511 1 cmd.go:204] Using insecure, self-signed certificates
I0413 15:46:27.236528 1 observer_polling.go:159] Starting file observer
I0413 15:46:27.309689 1 builder.go:240] work-agent version v2.3.0-2021-03-29-10-25-04-2-g4832b07-4832b07
W0413 15:46:27.803860 1 secure_serving.go:69] Use of insecure cipher 'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256' detected.
W0413 15:46:27.803995 1 secure_serving.go:69] Use of insecure cipher 'TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256' detected.
I0413 15:46:27.807585 1 leaderelection.go:243] attempting to acquire leader lease open-cluster-management-agent/work-agent-lock...
I0413 15:46:27.815226 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
0413 15:46:27.816318 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0413 15:46:27.816743 1 secure_serving.go:197] Serving securely on [::]:8443
I0413 15:46:27.817189 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0413 15:46:27.815911 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 15:46:27.817987 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 15:46:27.815926 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 15:46:27.819358 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 15:46:27.816294 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/serving-cert-328065387/tls.crt::/tmp/serving-cert-328065387/tls.key
I0413 15:46:27.917412 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0413 15:46:27.920012 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 15:46:27.920228 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 15:47:36.163633 1 leaderelection.go:253] successfully acquired lease open-cluster-management-agent/work-agent-lock
I0413 15:47:36.169819 1 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"open-cluster-management-agent", Name:"work-agent-lock", UID:"972836a0-5311-4022-8f44-09e7170d5a61", APIVersion:"v1", ResourceVersion:"3585", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' klusterlet-work-agent-786b47fb99-7cp4x_ce3f539c-5804-4d88-997f-7ba502de845d became leader
W0413 15:47:36.188407 1 builder.go:99] graceful termination failed, controllers failed with error: invalid configuration: [unable to read client-cert /spoke/hub-kubeconfig/tls.crt for default-auth due to open /spoke/hub-kubeconfig/tls.crt: no such file or directory, unable to read client-key /spoke/hub-kubeconfig/tls.key for default-auth due to open /spoke/hub-kubeconfig/tls.key: no such file or directory]
$ go version
go version go1.16.2 linux/amd64
$ kustomize version
{Version:kustomize/v4.0.5 GitCommit:9e8e7a7fe99ec9fbf801463e8607928322fc5245 BuildDate:2021-03-08T20:53:03Z GoOs:linux GoArch:amd64}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:20:00Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
$ minikube version
minikube version: v1.18.1
commit: 09ee84d530de4a92f00f1c5dbc34cead092b95bc
I wonder if the last line of the logs explains why a terminaton failed, or why a termination occurred.
I'm willing to run more commands to debug. This issue may be reproduced easily.
master branch was used. head commit identifier: daf5b5cf51a7da18243c6f99cee94f8197f53b90