Kubernetes Native Edge Computing Framework (project under CNCF)

KubeEdge

Build Status Go Report Card LICENSE Releases Documentation Status

KubeEdge is built upon Kubernetes and extends native containerized application orchestration and device management to hosts at the Edge. It consists of cloud part and edge part, provides core infrastructure support for networking, application deployment and metadata synchronization between cloud and edge. It also supports MQTT which enables edge devices to access through edge nodes.

With KubeEdge it is easy to get and deploy existing complicated machine learning, image recognition, event processing and other high level applications to the Edge. With business logic running at the Edge, much larger volumes of data can be secured & processed locally where the data is produced. With data processed at the Edge, the responsiveness is increased dramatically and data privacy is protected.

KubeEdge is an incubation-level hosted project by the Cloud Native Computing Foundation (CNCF). KubeEdge incubation announcement by CNCF.

Note:

The versions before 1.3 have not been supported, please try upgrade.

Advantages

  • Kubernetes-native support: Managing edge applications and edge devices in the cloud with fully compatible Kubernetes APIs.
  • Cloud-Edge Reliable Collaboration: Ensure reliable messages delivery without loss over unstable cloud-edge network.
  • Edge Autonomy: Ensure edge nodes run autonomously and the applications in edge run normally, when the cloud-edge network is unstable or edge is offline and restarted.
  • Edge Devices Management: Managing edge devices through Kubernetes native APIs implemented by CRD.
  • Extremely Lightweight Edge Agent: Extremely lightweight Edge Agent(EdgeCore) to run on resource constrained edge.

How It Works

KubeEdge consists of cloud part and edge part.

Architecture

In the Cloud

  • CloudHub: a web socket server responsible for watching changes at the cloud side, caching and sending messages to EdgeHub.
  • EdgeController: an extended kubernetes controller which manages edge nodes and pods metadata so that the data can be targeted to a specific edge node.
  • DeviceController: an extended kubernetes controller which manages devices so that the device metadata/status data can be synced between edge and cloud.

On the Edge

  • EdgeHub: a web socket client responsible for interacting with Cloud Service for the edge computing (like Edge Controller as in the KubeEdge Architecture). This includes syncing cloud-side resource updates to the edge, and reporting edge-side host and device status changes to the cloud.
  • Edged: an agent that runs on edge nodes and manages containerized applications.
  • EventBus: a MQTT client to interact with MQTT servers (mosquitto), offering publish and subscribe capabilities to other components.
  • ServiceBus: a HTTP client to interact with HTTP servers (REST), offering HTTP client capabilities to components of cloud to reach HTTP servers running at edge.
  • DeviceTwin: responsible for storing device status and syncing device status to the cloud. It also provides query interfaces for applications.
  • MetaManager: the message processor between edged and edgehub. It is also responsible for storing/retrieving metadata to/from a lightweight database (SQLite).

Kubernetes compatibility

Kubernetes 1.13 Kubernetes 1.14 Kubernetes 1.15 Kubernetes 1.16 Kubernetes 1.17 Kubernetes 1.18 Kubernetes 1.19
KubeEdge 1.3
KubeEdge 1.4
KubeEdge 1.5
KubeEdge HEAD (master)

Key:

  • KubeEdge and the Kubernetes version are exactly compatible.
  • + KubeEdge has features or API objects that may not be present in the Kubernetes version.
  • - The Kubernetes version has features or API objects that KubeEdge can't use.

Guides

Get start with this doc.

See our documentation on kubeedge.io for more details.

To learn deeply about KubeEdge, try some examples on examples.

Roadmap

Meeting

Regular Community Meeting:

Resources:

Contact

If you need support, start with the troubleshooting guide, and work your way through the process that we've outlined.

If you have questions, feel free to reach out to us in the following ways:

Contributing

If you're interested in being a contributor and want to get involved in developing the KubeEdge code, please see CONTRIBUTING for details on submitting patches and the contribution workflow.

License

KubeEdge is under the Apache 2.0 license. See the LICENSE file for details.

Comments
  • Metrics-Sever on KubeEdge (configuration process and vague document)

    Metrics-Sever on KubeEdge (configuration process and vague document)

    What happened:

    1. I cannot find the certgen.sh in the folder /etc/kubeedge/, and I found it in the original git clone folder in $GOPATH/src/github.com/kubeedge/kubeedge/build/tools/certgen.sh. Are they the same file or not? (Doc section 4.4.3 "third" step certification part for cloud core.)

    2. I have no idea how to activate cloudStream and edgeStream. In the document, it is said that I could modify cloudcore.yaml or edgecore.yaml. However, in the last sentence, it mentioned that we need to set both cloudStream and edgeStream to true!! (Doc section 4.4.3 "fifth" step cloudStream and edgeStream setting)

    3. I only found the edgecore service with the help of @GsssC (Doc section 4.4.3 "sixth" step for restarting cloudcore and edgecore.) However, I still cannot find a way to restart cloudcore. I cannot even find any services no matter which is activated or not related to KubeEdge service (cloudcore) by using this command sudo systemctl list-units --all. By the way, I used this command to restart edgecore: systemctl restart edgecore, and I found out that the kube-proxy containers may cause the problem. Also, I tried to use this command to restart cloudcore: cloudcore restart, though I am not sure if this is the right command or not.

    4. If the KubeEdge supports the CNI plugin or not? (@daixiang0 said that CNI plugin is not supported right now, but @GsssC said that it is supported.) I am using the weave-net plugin as my CNI plugin since I heard that it has good support for ARM CPU architecture. (Edge node is raspberry pi 4 or NVIDIA Jetson TX2(future)) If the answer is yes, could you please help me with the configuration issues. In k8s_weave-npc container, it said that unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined In k8s_weave container, it said that [kube-peers] Could not get cluster config: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined Failed to get peers

    5. kubectl top nodes cannot get any KubeEdge edge node metrics. image

    What you expected to happen:

    1. Get the certgen.sh for certificates generation.

    2. Specified YAML file I do need to modify. (cloudcore.yaml or edgecore.yaml or both?)

    3. There is a command to restart cloudcore.

    4. No terminated error status on weave-net pod anymore.

    5. I can see metrics like other Kubernetes nodes.

    6. I can see metrics on Kubernetes Dashboard, Grafana, and Prometheus.

    How to reproduce it (as minimally and precisely as possible):

    1. construct a KubeEdge cluster (1 master node, 1 worker node(k8s), 1 edge node(KubeEdge))
    2. deploy Kubernetes Dashboard by Dan Wahlin GitHub scripts https://github.com/DanWahlin/DockerAndKubernetesCourseCode/tree/master/samples/dashboard-security
    3. deploy Grafana, Prometheus, kube-state-metrics, metrics-server by Dan Wahlin GitHub scripts https://github.com/DanWahlin/DockerAndKubernetesCourseCode/tree/master/samples/prometheus

    Anything else we need to know?: Document: https://docs.kubeedge.io/_/downloads/en/latest/pdf/ (section 4.4.3 - kubectl logs) (section 4.4.4 - metrics-server) Weave-Net Pod --> contaienrs logs in RPI4 image image Grafana, Prometheus, metrics-server, kube-state-metrics scripts: https://github.com/DanWahlin/DockerAndKubernetesCourseCode/tree/master/samples/prometheus

    Environment:

    • KubeEdge version(e.g. cloudcore/edgecore --version):

    cloudcore: image image

    edgecore: edgecore --version: command not found image

    CloudSide Environment:

    • Hardware configuration (e.g. lscpu): [email protected]:/home/charlie# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Thread(s) per core: 1 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz Stepping: 10 CPU MHz: 800.071 CPU max MHz: 4100.0000 CPU min MHz: 800.0000 BogoMIPS: 6000.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 9216K NUMA node0 CPU(s): 0-5 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d

    • OS (e.g. cat /etc/os-release): image

    • Kernel (e.g. uname -a): image

    • Go version (e.g. go version): image

    • Others:

    EdgeSide Environment:

    • edgecore version (e.g. edgecore --version): command not found image

    • Hardware configuration (e.g. lscpu): image

    • OS (e.g. cat /etc/os-release): image

    • Kernel (e.g. uname -a): image

    • Go version (e.g. go version): image

    • Others:

    @GsssC Hi, I finally submitted an issue! Could you please give me a hand? Thank you very much in advance!

  • How to deploy the edge part into a k8s cluster

    How to deploy the edge part into a k8s cluster

    Which jobs are failing: I tried to deploy edgecore according to the documentation, but it didn't work.

    Which test(s) are failing: I tried to deploy edgecore according to the document. When I finished, the edge node is still in the state of NotReady. If I see the pods, I can see that the pod containing edgecore is in the pending state.

    Just think about the following (my edgenode's name is "172.31.23.166")

    $ kubectl get nodes
    NAME               STATUS     ROLES    AGE   VERSION
    172.31.23.166      NotReady   <none>   15s
    ip-172-31-27-157   Ready      master   17h   v1.14.1
    
    $ kubectl get pods -n kubeedge
    NAME                              READY   STATUS    RESTARTS   AGE
    172.31.23.166-7464f44944-nlbj2    0/2     Pending   0          9s
    edgecontroller-5464c96d6c-tmqfs   1/1     Running   0          42s
    
    $ kubectl describe pods 172.31.23.166-7464f44944-nlbj2 -n kubeedge
    Name:               172.31.23.166-7464f44944-nlbj2
    Namespace:          kubeedge
    Priority:           0
    PriorityClassName:  <none>
    Node:               <none>
    Labels:             k8s-app=kubeedge
                        kubeedge=edgenode
                        pod-template-hash=7464f44944
    Annotations:        <none>
    Status:             Pending
    IP:
    Controlled By:      ReplicaSet/172.31.23.166-7464f44944
    Containers:
      edgenode:
        Image:      kubeedge/edgecore:latest
        Port:       <none>
        Host Port:  <none>
        Limits:
          cpu:     200m
          memory:  1Gi
        Requests:
          cpu:     100m
          memory:  512Mi
        Environment:
          DOCKER_HOST:  tcp://localhost:2375
        Mounts:
          /etc/kubeedge/certs from certs (rw)
          /etc/kubeedge/edge/conf from conf (rw)
          /var/run/secrets/kubernetes.io/serviceaccount from default-token-lpftk (ro)
      dind-daemon:
        Image:      docker:dind
        Port:       <none>
        Host Port:  <none>
        Requests:
          cpu:        20m
          memory:     512Mi
        Environment:  <none>
        Mounts:
          /var/lib/docker from docker-graph-storage (rw)
          /var/run/secrets/kubernetes.io/serviceaccount from default-token-lpftk (ro)
    Conditions:
      Type           Status
      PodScheduled   False
    Volumes:
      certs:
        Type:          HostPath (bare host directory volume)
        Path:          /etc/kubeedge/certs
        HostPathType:
      conf:
        Type:      ConfigMap (a volume populated by a ConfigMap)
        Name:      edgenodeconf
        Optional:  false
      docker-graph-storage:
        Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
        Medium:
        SizeLimit:  <unset>
      default-token-lpftk:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  default-token-lpftk
        Optional:    false
    QoS Class:       Burstable
    Node-Selectors:  <none>
    Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
    Events:
      Type     Reason            Age                From               Message
      ----     ------            ----               ----               -------
      Warning  FailedScheduling  46s (x2 over 46s)  default-scheduler  0/2 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 Insufficient pods, 1 node(s) had taints that the pod didn't tolerate.
    

    Since when has it been failing:

    Reason for failure:

    Anything else we need to know:

    1. I set a taint on the primary node: node-role.kubernetes.io/master=:NoSchedule, which makes the master node unable to deploy any pods, I am not sure if this is correct
    2. The edge node does not do anything other than pulling the edgecore image and copying the edge certs file into the /etc/kubeedge/certs folder. Is there anything I missed?
    3. The edge deploy file has no information about the node select. When there are multiple edgenodes, how is it deployed to the correct node?
    4. Have you completed the function of connecting edge from cloud? Otherwise, how does cloud deploy pods to edge?
  • Bump ginkgo from v1 to v2

    Bump ginkgo from v1 to v2

    What type of PR is this?

    Add one of the following kinds: /kind feature /kind test

    What this PR does / why we need it:

    Which issue(s) this PR fixes:

    Fixes #3829

    Special notes for your reviewer: This pr includes:

    1. upgrades vendor, go.mod and go.sum, also replace "github.com/onsi/ginkgo" with "github.com/onsi/ginkgo/v2".

    2. deprecated CurrentGinkgoTestDescription with CurrentSpecReport as suggested in the Migration Guide.

    3. refactors the ginkgo.Measure spec, and implement it with gmeasure.Experiment as suggested in the Migration Guide.

    4. revises the ginkgo version in scripts and github actions.

    Does this PR introduce a user-facing change?:

    
    
  • Pods are not leaving pending state

    Pods are not leaving pending state

    What happened: The pods are not leaving pending state on the cloud.

    What you expected to happen: The pod running on the edge node

    How to reproduce it (as minimally and precisely as possible): I have virtual mashines one is dealing the other one is dealing as edge. On the cloud: follow the instructions in the readme On the edge: follow the instructions in the readme Then I executed on the cloud side: kubectl apply -f $GOPATH/src/github.com/kubeedge/kubeedge/build/deployment.yaml

    Anything else we need to know?: I have tested the edge setup with make edge_integration_test (all tests passed) Edge node state is Ready. kubectl describe nginx-deployment output: output-kubectl-describe.txt

    Environment:

    • KubeEdge version: 1233e7643b25a81b670fe1bb85a8a93d58d3a163
    • Hardware configuration: Mem: 7.6G; 2 CPU
    • OS (e.g. from /etc/os-release): os-release.txt
    • Kernel (e.g. uname -a): Linux node 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
    • Others: vm running with libvirt quemu

    Do you need any more information?

  • Csidriver fixs

    Csidriver fixs

    What type of PR is this? /kind bug

    What this PR does / why we need it: This tries to fix some of the problems of the csi implementation.

    Which issue(s) this PR fixes:

    Fixes #2088

    Special notes for your reviewer: There are some of the fixes of the problems that I've found, but not all. I will continue working on this to drive csi driver work better.

    Does this PR introduce a user-facing change?:

    NONE
    
  • Edge node get not ready

    Edge node get not ready

    What happened: My edge node shows status not ready image

    What you expected to happen: My edge node hould be ready How to reproduce it (as minimally and precisely as possible):

    Anything else we need to know?: I post my logs below. cloudcore.log: image edgecode.log: image and my edge config file: image

    Environment:

    • KubeEdge version: v1.1.0
    • Hardware configuration: amd64
    • OS (e.g. from /etc/os-release): ubuntu 16
    • Kernel (e.g. uname -a):
    • Others:
  • Fix: clean containers after `keadm reset`

    Fix: clean containers after `keadm reset`

    What type of PR is this?

    /kind bug /kind design

    What this PR does / why we need it:

    We need to recycle staled containers to avoid resource occupation, which maybe cause memory leak or something.

    You can see the details in the following issue.

    Which issue(s) this PR fixes:

    Fixes #1973

    Special notes for your reviewer:

    Does this PR introduce a user-facing change?:

    NONE
    
  • Add maxPods/CPU/MEM SystemReservedResource configuration item

    Add maxPods/CPU/MEM SystemReservedResource configuration item

    What type of PR is this? /kind feature

    What this PR does / why we need it: Add maxPods/CPU/MEM SystemReservedResource configuration item Which issue(s) this PR fixes: Fixes #1832

    Special notes for your reviewer: Change some API items Does this PR introduce a user-facing change?: NONE

    
    
  • Failed to mount configmap/secret volume because of

    Failed to mount configmap/secret volume because of "no such file or directory"

    What happened: Failed to mount configmap/secret volume because of "no such file or directory". Although we can be sure that related resources are included in the sqlite.

    I0527 10:35:29.789719     660 edged_volumes.go:54] Using volume plugin "kubernetes.io/empty-dir" to mount wrapped_kube-proxy
    I0527 10:35:29.800195     660 process.go:685] get a message {Header:{ID:8b57a409-25c9-454e-a9ae-b23f0b1861a9 ParentID: Timestamp:1590546929789 ResourceVersion: Sync:true} Router:{Source:edged Group:meta Operation:query Resource:kube-system/configmap/kube-proxy} Content:<nil>}
    I0527 10:35:29.800543     660 metaclient.go:121] send sync message kube-system/configmap/kube-proxy successed and response: {{ab5f3aab-11ff-48cf-8c3b-c5ded97678db 8b57a409-25c9-454e-a9ae-b23f0b1861a9 1590546929800  false} {metaManager meta response kube-system/configmap/kube-proxy} [{"data":{"config.conf":"apiVersion: kubeproxy.config.k8s.io/v1alpha1\nbindAddress: 0.0.0.0\nclientConnection:\n  acceptContentTypes: \"\"\n  burst: 0\n  contentType: \"\"\n  kubeconfig: /var/lib/kube-proxy/kubeconfig.conf\n  qps: 0\nclusterCIDR: 192.168.0.0/16\nconfigSyncPeriod: 0s\nconntrack:\n  maxPerCore: null\n  min: null\n  tcpCloseWaitTimeout: null\n  tcpEstablishedTimeout: null\nenableProfiling: false\nhealthzBindAddress: \"\"\nhostnameOverride: \"\"\niptables:\n  masqueradeAll: false\n  masqueradeBit: null\n  minSyncPeriod: 0s\n  syncPeriod: 0s\nipvs:\n  excludeCIDRs: null\n  minSyncPeriod: 0s\n  scheduler: \"\"\n  strictARP: false\n  syncPeriod: 0s\nkind: KubeProxyConfiguration\nmetricsBindAddress: \"\"\nmode: \"\"\nnodePortAddresses: null\noomScoreAdj: null\nportRange: \"\"\nudpIdleTimeout: 0s\nwinkernel:\n  enableDSR: false\n  networkName: \"\"\n  sourceVip: \"\"","kubeconfig.conf":"apiVersion: v1\nkind: Config\nclusters:\n- cluster:\n    certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt\n    server: https://10.10.102.78:6443\n  name: default\ncontexts:\n- context:\n    cluster: default\n    namespace: default\n    user: default\n  name: default\ncurrent-context: default\nusers:\n- name: default\n  user:\n    tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token"},"metadata":{"creationTimestamp":"2020-04-21T14:50:46Z","labels":{"app":"kube-proxy"},"name":"kube-proxy","namespace":"kube-system","resourceVersion":"193","selfLink":"/api/v1/namespaces/kube-system/configmaps/kube-proxy","uid":"5651c863-c755-4da4-8039-b251efc82470"}}]}
    E0527 10:35:29.800949     660 configmap.go:249] Error creating atomic writer: stat /var/lib/edged/pods/25e6f0ea-6364-4bcc-9937-9760b6ec956a/volumes/kubernetes.io~configmap/kube-proxy: no such file or directory
    W0527 10:35:29.801070     660 empty_dir.go:392] Warning: Unmount skipped because path does not exist: /var/lib/edged/pods/25e6f0ea-6364-4bcc-9937-9760b6ec956a/volumes/kubernetes.io~configmap/kube-proxy
    I0527 10:35:29.801109     660 record.go:24] Warning FailedMount MountVolume.SetUp failed for volume "kube-proxy" : stat /var/lib/edged/pods/25e6f0ea-6364-4bcc-9937-9760b6ec956a/volumes/kubernetes.io~configmap/kube-proxy: no such file or directory
    E0527 10:35:29.801199     660 nestedpendingoperations.go:270] Operation for "\"kubernetes.io/configmap/25e6f0ea-6364-4bcc-9937-9760b6ec956a-kube-proxy\" (\"25e6f0ea-6364-4bcc-9937-9760b6ec956a\")" failed. No retries permitted until 2020-05-27 10:37:31.80112802 +0800 CST m=+2599.727653327 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"kube-proxy\" (UniqueName: \"kubernetes.io/configmap/25e6f0ea-6364-4bcc-9937-9760b6ec956a-kube-proxy\") pod \"kube-proxy-gbdgw\" (UID: \"25e6f0ea-6364-4bcc-9937-9760b6ec956a\") : stat /var/lib/edged/pods/25e6f0ea-6364-4bcc-9937-9760b6ec956a/volumes/kubernetes.io~configmap/kube-proxy: no such file or directory"
    

    What you expected to happen: mount successffuly How to reproduce it (as minimally and precisely as possible): Sorry for I can not provide the way of reproduction. Anything else we need to know?:

    Environment:

    • KubeEdge version(e.g. cloudcore/edgecore --version): v1.3.0
  • Move Docs to website repository and proposals to enhancements repository.

    Move Docs to website repository and proposals to enhancements repository.

    What would you like to be added: Move docs to website repository and proposals to enhancement repository.

    Why is this needed: The hyperlinks in the docs have a .html link which results does not redirect to correct page while moving browsing the docs in github repository. Also moving proposals to enhancements and docs to website repository will help in seperation and easy management/review of source code, documentation and enhancement proposals.

    Thoughts ?? @kevin-wangzefeng @rohitsardesai83 @m1093782566 @CindyXing @qizha

  • add an unusual case on kind to resource conversion

    add an unusual case on kind to resource conversion

    What type of PR is this? /kind bug

    What this PR does / why we need it: It added an unusual Kind to Resource conversion. In #2769 , we found that for resources like Gateway, the corresponding resource name should be gateways instead of gatewaies. Another example is a resource composed of multiple words, such as ServiceEntry. These are special cases that needs to be handled specially, so I created a crdmap to record the resource-kind relationship of crd.

    Which issue(s) this PR fixes: Fixes #2769

    Special notes for your reviewer: none

    Does this PR introduce a user-facing change?: none

  • optimize document of dtcontext

    optimize document of dtcontext

    Signed-off-by: crui [email protected]

    What type of PR is this? /kind documentation

    What this PR does / why we need it: optimize the document of dtcontext

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

    Does this PR introduce a user-facing change?:

    
    
  • fix unhandled transaction error

    fix unhandled transaction error

    Signed-off-by: crui [email protected]

    What type of PR is this? /kind cleanup

    What this PR does / why we need it: Fix some unhandled transaction error. If the error caused by rollback is not captured, it will be commit successfully.

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

    Does this PR introduce a user-facing change?:

    
    
  • Pod with the same name stucks on Pending status after recreation

    Pod with the same name stucks on Pending status after recreation

    What happened:

    Sometimes I recreated pod with the same name, but the pod will stuck on Pending phase. After executing docker ps on edge nodes, I found that there was even no containers related to the pod.

    What you expected to happen:

    Pod data should be synchronized to edge nodes after recreation.

    How to reproduce it (as minimally and precisely as possible):

    Create a pod, delete it and then recreate it again. I use a little program to do that. This program will create a pod and quickly delete it and then recreate it with the same spec and the same name.

    package main
    
    import (
    	"context"
    	corev1 "k8s.io/api/core/v1"
    	"k8s.io/apimachinery/pkg/api/errors"
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"log"
    	"sigs.k8s.io/controller-runtime/pkg/client"
    	"sigs.k8s.io/controller-runtime/pkg/client/config"
    	"time"
    )
    
    var namespace = "default"
    
    func main() {
    	cfg, err := config.GetConfig()
    	if err != nil {
    		panic(err)
    	}
    
    	cli, err := client.New(cfg, client.Options{})
    	if err != nil {
    		panic(err)
    	}
    
    	pod := nginxPod2()
    
    	log.Println("creating pod...")
    	if err = cli.Create(context.Background(), &pod); err != nil {
    		panic(err)
    	}
    
    	for {
    		key := client.ObjectKey{Name: pod.Name, Namespace: pod.Namespace}
    		if err = cli.Get(context.Background(), key, &pod); err != nil {
    			panic(err)
    		}
    		log.Printf("nginx pod phase: %s", pod.Status.Phase)
    
    		if pod.Status.Phase == corev1.PodRunning {
    			break
    		}
    
    		time.Sleep(time.Second)
    	}
    	log.Println("nginx is running")
    
    	log.Println("deleting nginx...")
    	if err = cli.Delete(context.Background(), &pod); err != nil {
    		panic(err)
    	}
    
    	for {
    		key := client.ObjectKey{Name: pod.Name, Namespace: pod.Namespace}
    		err = cli.Get(context.Background(), key, &pod)
    		if err == nil {
    			continue
    		}
    
    		if errors.IsNotFound(err) {
    			break
    		}
    
    		panic(err)
    	}
    
    	log.Println("recreating pod...")
            // increase delay may avoid this problem
    	time.Sleep(20 * time.Millisecond)
    	pod = nginxPod2()
    	if err = cli.Create(context.Background(), &pod); err != nil {
    		panic(err)
    	}
    
    	for {
    		key := client.ObjectKey{Name: pod.Name, Namespace: pod.Namespace}
    		if err = cli.Get(context.Background(), key, &pod); err != nil {
    			panic(err)
    		}
    		log.Printf("nginx pod phase: %s", pod.Status.Phase)
    
    		if pod.Status.Phase == corev1.PodRunning {
    			break
    		}
    		time.Sleep(time.Second)
    	}
    }
    
    func nginxPod2() corev1.Pod {
    	automountServiceAccountToken := false
    	privileged := true
    
    	return corev1.Pod{
    		ObjectMeta: metav1.ObjectMeta{
    			Name:      "nginx",
    			Namespace: namespace,
    		},
    		Spec: corev1.PodSpec{
    			NodeName:                     "edge1",
    			AutomountServiceAccountToken: &automountServiceAccountToken,
    			Containers: []corev1.Container{
    				{
    					Name:  "net-tool",
    					Image: "bluven/net-tool:latest",
    					SecurityContext: &corev1.SecurityContext{
    						Privileged: &privileged,
    					},
    					Resources: corev1.ResourceRequirements{},
    				},
    			},
    			Tolerations: []corev1.Toleration{
    				{
    					Key:      "",
    					Operator: corev1.TolerationOpExists,
    				},
    			},
    		},
    	}
    }
    
    

    This bug won't happen all the time and I have to repeat this process serveral times to trigger it. After some trial, I found If I delay the recreation, this problem won't happen.

    Anything else we need to know?:

    Sometimes the phase of pod may not stuck on Pending but some other Phase:

    [[email protected] ~]# kubectl get po -o wide
    NAME             READY   STATUS      RESTARTS         AGE     IP              NODE      NOMINATED NODE   READINESS GATES
    net-tool-edge1   0/1     Pending     0                46m     <none>          edge1     <none>           <none>
    net-tool-edge2   0/1     Pending     0                46m     <none>          edge2     <none>           <none>
    nginx            0/1     Completed   0                4m18s   <none>          edge1     <none>           <none>
    

    Environment:

    • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:32:32Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
    
    • KubeEdge version(e.g. cloudcore --version and edgecore --version): cloudcore: 1.9.0 edgecore: 1.9.4

    • Cloud nodes Environment:
      • Hardware configuration (e.g. lscpu):
    [[email protected] ~]# lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                2
    On-line CPU(s) list:   0,1
    Thread(s) per core:    1
    Core(s) per socket:    1
    Socket(s):             2
    NUMA node(s):          1
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 94
    Model name:            Intel Core Processor (Skylake)
    Stepping:              3
    CPU MHz:               2099.998
    BogoMIPS:              4199.99
    Hypervisor vendor:     KVM
    Virtualization type:   full
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              4096K
    L3 cache:              16384K
    NUMA node0 CPU(s):     0,1
    Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat
    
    
    • OS (e.g. cat /etc/os-release):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"
    
    CENTOS_MANTISBT_PROJECT="CentOS-7"
    CENTOS_MANTISBT_PROJECT_VERSION="7"
    REDHAT_SUPPORT_PRODUCT="centos"
    REDHAT_SUPPORT_PRODUCT_VERSION="7"
    
    
    • Kernel (e.g. uname -a): Linux master1 5.18.12-1.el7.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 15 07:03:42 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
    • Go version (e.g. go version):
    • Others:
    -
    Edge nodes Environment:
    • edgecore version (e.g. edgecore --version): 1.9.4
    • Hardware configuration (e.g. lscpu):
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                2
    On-line CPU(s) list:   0,1
    Thread(s) per core:    1
    Core(s) per socket:    1
    Socket(s):             2
    NUMA node(s):          1
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 94
    Model name:            Intel Core Processor (Skylake)
    Stepping:              3
    CPU MHz:               2099.998
    BogoMIPS:              4199.99
    Hypervisor vendor:     KVM
    Virtualization type:   full
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              4096K
    L3 cache:              16384K
    NUMA node0 CPU(s):     0,1
    Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat
    
    • OS (e.g. cat /etc/os-release):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"
    
    CENTOS_MANTISBT_PROJECT="CentOS-7"
    CENTOS_MANTISBT_PROJECT_VERSION="7"
    REDHAT_SUPPORT_PRODUCT="centos"
    REDHAT_SUPPORT_PRODUCT_VERSION="7"
    
    • Kernel (e.g. uname -a): Linux edge1 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
    • Go version (e.g. go version):
    • Others:
  • failed to delete deviceProfile config map

    failed to delete deviceProfile config map

    What happened: when I delete all of my device & deviceModel and recreate new device2 & deviceModel2, I found that the content of deviceProfile configmap still contains device1 & deviceModel1.

    I add some output of err: image

    Then I check the clusterRole of cloudcore, there are no delete verb of configMap image

    What you expected to happen: when I delete all of my device & deviceModel, the deviceProfile configMap should be delete. image

    How to reproduce it (as minimally and precisely as possible): firstly, create a new deviceModel & device to produce a deviceProfile cm; secondly, delete all of deviceModel & device, do kubectl get cm & watch the log of cloudcore you will find the cm still exist, and, there is a log contains "failed to delete config map ……"

    Anything else we need to know?:

    Environment:

    • Kubernetes version (use kubectl version): v1.11

    • KubeEdge version(e.g. cloudcore --version and edgecore --version): v1.11

    • Cloud nodes Environment:
      • Hardware configuration (e.g. lscpu):
      • OS (e.g. cat /etc/os-release):
      • Kernel (e.g. uname -a):
      • Go version (e.g. go version):
      • Others:
    • Edge nodes Environment:
      • edgecore version (e.g. edgecore --version):
      • Hardware configuration (e.g. lscpu):
      • OS (e.g. cat /etc/os-release):
      • Kernel (e.g. uname -a):
      • Go version (e.g. go version):
      • Others:
  • WIP CloudHub refactoring proposal

    WIP CloudHub refactoring proposal

    What type of PR is this?

    /kind design What this PR does / why we need it:

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

    Does this PR introduce a user-facing change?:

    
    
  • kube-proxy couldn't get endpointslice data from metaServer

    kube-proxy couldn't get endpointslice data from metaServer

    What happened and what you expected to happen:

    I was trying to run kube-proxy on edge nodes and configured kube-proxy to access metaServer, but it reported error like this:

    E0803 03:08:03.248436       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: no kind "EndpointslicList" is registered for version "discovery.k8s.io/v1" in scheme "k8s.io/client-go/kubernetes/scheme/register.go:72"
    

    How to reproduce it (as minimally and precisely as possible): I created another kube-proxy daemonset to run edge nodes and changed kube-proxy-config for it as following:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      labels:
        app: edge-kube-proxy
      name: edge-kube-proxy
      namespace: kube-system
    data:
      config.conf: |-
        apiVersion: kubeproxy.config.k8s.io/v1alpha1
        bindAddress: 0.0.0.0
        bindAddressHardFail: false
        clientConnection:
          acceptContentTypes: ""
          burst: 10
          contentType: application/vnd.kubernetes.protobuf
          kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
          qps: 5
        clusterCIDR: 10.234.64.0/18,fd96:ee88:d8a6:8607::1:0000/112
        configSyncPeriod: 15m0s
        conntrack:
          maxPerCore: 32768
          min: 131072
          tcpCloseWaitTimeout: 1h0m0s
          tcpEstablishedTimeout: 24h0m0s
        detectLocalMode: ""
        enableProfiling: false
        featureGates:
          IPv6DualStack: true
        healthzBindAddress: 0.0.0.0:10256
        hostnameOverride: master1
        iptables:
          masqueradeAll: false
          masqueradeBit: 14
          minSyncPeriod: 0s
          syncPeriod: 30s
        ipvs:
          excludeCIDRs: []
          minSyncPeriod: 0s
          scheduler: rr
          strictARP: false
          syncPeriod: 30s
          tcpFinTimeout: 0s
          tcpTimeout: 0s
          udpTimeout: 0s
        kind: KubeProxyConfiguration
        metricsBindAddress: 127.0.0.1:10249
        mode: ipvs
        nodePortAddresses: []
        oomScoreAdj: -999
        portRange: ""
        showHiddenMetricsForVersion: ""
        udpIdleTimeout: 250ms
        winkernel:
          enableDSR: false
          networkName: ""
          sourceVip: ""
      kubeconfig.conf: |-
        apiVersion: v1
        kind: Config
        clusters:
        - cluster:
            certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            server: http://127.0.0.1:10550
          name: default
        contexts:
        - context:
            cluster: default
            namespace: default
            user: default
          name: default
        current-context: default
        users:
        - name: default
          user:
            tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: edge-kube-proxy
      namespace: kube-system
      annotations:
        deprecated.daemonset.template.generation: "3"
      labels:
        k8s-app: edge-kube-proxy
    spec:
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          k8s-app: edge-kube-proxy
      template:
        metadata:
          creationTimestamp: null
          labels:
            k8s-app: edge-kube-proxy
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: node-role.kubernetes.io/edge
                    operator: Exists
          containers:
          - command:
            - /usr/local/bin/kube-proxy
            - --config=/var/lib/kube-proxy/config.conf
            - --hostname-override=$(NODE_NAME)
            env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
            image: k8s.gcr.io/kube-proxy:v1.22.5
            imagePullPolicy: IfNotPresent
            name: kube-proxy
            resources: {}
            securityContext:
              privileged: true
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /var/lib/kube-proxy
              name: kube-proxy
            - mountPath: /run/xtables.lock
              name: xtables-lock
            - mountPath: /lib/modules
              name: lib-modules
              readOnly: true
          dnsPolicy: ClusterFirst
          hostNetwork: true
          nodeSelector:
            kubernetes.io/os: linux
          priorityClassName: system-node-critical
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext: {}
          serviceAccount: kube-proxy
          serviceAccountName: kube-proxy
          terminationGracePeriodSeconds: 30
          tolerations:
          - operator: Exists
          volumes:
          - configMap:
              defaultMode: 420
              name: edge-kube-proxy
            name: kube-proxy
          - hostPath:
              path: /run/xtables.lock
              type: FileOrCreate
            name: xtables-lock
          - hostPath:
              path: /lib/modules
              type: ""
            name: lib-modules
      updateStrategy:
        rollingUpdate:
          maxSurge: 0
          maxUnavailable: 1
        type: RollingUpdate
    

    Anything else we need to know?:

    Environment:

    • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:32:32Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
    
    • KubeEdge version(e.g. cloudcore --version and edgecore --version): cloudcore: 1.9.0 edgecore: 1.9.4
Project Flogo is an open source ecosystem of opinionated event-driven capabilities to simplify building efficient & modern serverless functions, microservices & edge apps.
Project Flogo is an open source ecosystem of opinionated  event-driven capabilities to simplify building efficient & modern serverless functions, microservices & edge apps.

Project Flogo is an Open Source ecosystem for event-driven apps Ecosystem | Core | Flows | Streams | Flogo Rules | Go Developers | When to use Flogo |

Aug 4, 2022
A project outputs Bluetooth Low Energy (BLE) sensors data in InfluxDB line protocol formatA project outputs Bluetooth Low Energy (BLE) sensors data in InfluxDB line protocol format

Intro This project outputs Bluetooth Low Energy (BLE) sensors data in InfluxDB line protocol format. It integrates nicely with the Telegraf execd inpu

Apr 15, 2022
Raspberry pi project that controls jack-o-lantern via servo motor and PIR motion sensors
Raspberry pi project that controls jack-o-lantern via servo motor and PIR motion sensors

pumpkin-pi ?? Raspberry pi project that controls jack-o-lantern via servo motor and PIR motion sensors to simulate it "watching" you. Inspired by Ryde

Nov 29, 2021
Golang framework for robotics, drones, and the Internet of Things (IoT)
Golang framework for robotics, drones, and the Internet of Things (IoT)

Gobot (https://gobot.io/) is a framework using the Go programming language (https://golang.org/) for robotics, physical computing, and the Internet of

Aug 9, 2022
Gobot - Golang framework for robotics, drones, and the Internet of Things (IoT)
Gobot - Golang framework for robotics, drones, and the Internet of Things (IoT)

Gobot (https://gobot.io/) is a framework using the Go programming language (https://golang.org/) for robotics, physical computing, and the Internet of Things.

Aug 7, 2022
OpenYurt - Extending your native Kubernetes to edge(project under CNCF)
OpenYurt - Extending your native Kubernetes to edge(project under CNCF)

openyurtio/openyurt English | 简体中文 What is NEW! Latest Release: September 26th, 2021. OpenYurt v0.5.0. Please check the CHANGELOG for details. First R

Aug 11, 2022
An edge-native container management system for edge computing
An edge-native container management system for edge computing

SuperEdge is an open source container management system for edge computing to manage compute resources and container applications in multiple edge regions. These resources and applications, in the current approach, are managed as one single Kubernetes cluster. A native Kubernetes cluster can be easily converted to a SuperEdge cluster.

Aug 8, 2022
A Kubernetes Native Batch System (Project under CNCF)
A Kubernetes Native Batch System (Project under CNCF)

Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workloa

Aug 9, 2022
Microshift is a research project that is exploring how OpenShift1 Kubernetes can be optimized for small form factor and edge computing.

Microshift is a research project that is exploring how OpenShift1 Kubernetes can be optimized for small form factor and edge computing.

Nov 1, 2021
a small form factor OpenShift/Kubernetes optimized for edge computing

Microshift Microshift is OpenShift1 Kubernetes in a small form factor and optimized for edge computing. Edge devices deployed out in the field pose ve

Aug 5, 2022
Edge Orchestration project is to implement distributed computing between Docker Container enabled devices.
Edge Orchestration project is to implement distributed computing between Docker Container enabled devices.

Edge Orchestration Introduction The main purpose of Edge Orchestration project is to implement distributed computing between Docker Container enabled

Dec 17, 2021
🦖 Streaming-Serverless Framework for Low-latency Edge Computing applications, running atop QUIC protocol, engaging 5G technology.
🦖 Streaming-Serverless Framework for Low-latency Edge Computing applications, running atop QUIC protocol, engaging 5G technology.

YoMo YoMo is an open-source Streaming Serverless Framework for building Low-latency Edge Computing applications. Built atop QUIC Transport Protocol an

Aug 6, 2022
Zero - If Google Drive says that 1 is under copyright, 0 must be under copyleft

zero Zero under copyleft license Google Drive's copyright detector says that fil

May 16, 2022
Provide cloud-edge message synergy solutions for companies and individuals.the cloud-edge message system based on NATS.

Swarm This project is a cloud-edge synergy solution based on NATS. quikly deploy cloud deploy on k8s #pull the project. git clone https://github.com/g

Jan 11, 2022
dockin ops is a project used to handle the exec request for kubernetes under supervision
dockin ops is a project used to handle the exec request for kubernetes under supervision

Dockin Ops - Dockin Operation service English | 中文 Dockin operation and maintenance management system is a safe operation and maintenance management s

Feb 19, 2022
🐻 The Universal Service Mesh. CNCF Sandbox Project.
🐻 The Universal Service Mesh. CNCF Sandbox Project.

Kuma is a modern Envoy-based service mesh that can run on every cloud, in a single or multi-zone capacity, across both Kubernetes and VMs. Thanks to i

Aug 10, 2021
🐻 The Universal Service Mesh. CNCF Sandbox Project.
🐻 The Universal Service Mesh. CNCF Sandbox Project.

Kuma is a modern Envoy-based service mesh that can run on every cloud, in a single or multi-zone capacity, across both Kubernetes and VMs. Thanks to i

Aug 10, 2022
MOSN is a cloud native proxy for edge or service mesh. https://mosn.io
MOSN is a cloud native proxy for edge or service mesh. https://mosn.io

中文 MOSN is a network proxy written in Golang. It can be used as a cloud-native network data plane, providing services with the following proxy functio

Aug 4, 2022
MatrixOne is a planet scale, cloud-edge native big data engine crafted for heterogeneous workloads.
MatrixOne is a planet scale, cloud-edge native big data engine crafted for heterogeneous workloads.

What is MatrixOne? MatrixOne is a planet scale, cloud-edge native big data engine crafted for heterogeneous workloads. It provides an end-to-end data

Aug 10, 2022
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers on a single machine right now.

MapReduce This is an easy-to-use Map Reduce Go framework inspired by 2021 6.824 lab1. Feature Multiple workers on single machine right now. Easy to pa

Jun 9, 2022