Read SuperEdge edge container architecture and principle

preface

Superedge is a Kubernetes-native edge computing management framework launched by Tencent. Compared with OpenYURT and KubeEdge, SuperEdge not only has Kubernetes zero intrusion and edge autonomy features, but also supports unique distributed health check and edge service access control and other advanced features, greatly reducing the impact of cloud network instability on services. At the same time, it also facilitates the publishing and governance of edge cluster services to a large extent

features

Kubernetes-native: SuperEdge expands on the basis of native Kubernetes, adding a dry component of edge computing, which is completely non-intrusive to Kubernetes. In addition, simple deployment of superEdge core components can enable the original Kubernetes cluster edge computing function; In addition, zero intrusion makes it possible to deploy any Kubernetes native workload (Deployment, Statefulset, Daemonset, and etc) on edge clusters
Edge autonomy: SuperEdge provides L3 level edge autonomy. When the edge node is unstable or disconnected from the cloud network, edge nodes can still run normally without affecting deployed edge services
Distributed health check: SuperEdge provides the edge distributed health check capability. Edge-health is deployed on each edge node. Edge nodes in the same edge cluster perform health check on each other and vote on node status. In this way, even if there is a problem in the cloud side network, as long as the connection between the edge nodes is normal, the node will not be expelled. In addition, distributed health check supports grouping. Cluster nodes are divided into multiple groups (nodes in the same equipment room are assigned to the same group). Nodes in each group check each other. It also ADAPTS to the situation that edge nodes are grouped naturally in network topology. The whole design avoids the large number of POD migration and reconstruction caused by the instability of cloud side network and ensures the stability of service
Service access control: SuperEdge developed ServiceGroup to implement service access control based on edge computing. With this feature, you can easily deploy a set of services in different rooms or regions of the same cluster by building the DeploymentGrid and ServiceGrid Custom Resources, and the requests between services can be completed within the same room or region (closed loop). Service access across regions is avoided. This feature can greatly facilitate the publishing and governance of edge cluster services
Cloud-side tunnel: SuperEdge supports self-built tunnels (currently TCP, HTTP and HTTPS) to overcome cloud-side connection problems on different networks. Unified operation and maintenance of edge nodes without public IP addresses is implemented

The overall architecture

Component functions are summarized as follows:

The cloud components

In addition to the original Kubernetes Master component (Cloud-Kube-Apiserver, Cloud-Kube-Controller and Cloud-Kube-Scheduler) deployed by edge cluster, the main control components of cloud terminal also include:

Tunnel-cloud: maintains the network tunnel with tunnel-edge, which supports TCP, HTTP, and HTTPS
Application – grid controller: Service access control Kubernetes Controller corresponding to ServiceGroup, which manages DeploymentGrids and ServiceGrids CRDs, In addition, the corresponding Kubernetes Deployment and Service are generated by these two Kinds of CR. Meanwhile, the service topology awareness is realized by the self-development to make the service closed-loop access
Edge-admission: Determines whether a node is healthy by the status report of the edge node distributed health check, and assists cloud-Kube-Controller to perform relevant processing actions (taint)

Edge of the component

In addition to the kubelet and Kube-proxy deployed by the native Kubernetes worker node, the following edge computing components are added:

Lite – apiserver: The core component of edge autonomy is the proxy service of Cloud-Kube-Apiserver, which caches some requests of edge node components to Apiserver. When these requests are encountered and there are problems with Cloud-Kube-Apiserver network, they will be directly returned to the client
Edge-health: an edge-side distributed health check service that monitors and probes nodes and votes to determine whether a node is healthy
Tunnel-edge: responsible for establishing network tunnel with cloud edge cluster Tunnel-cloud, receiving API request and forwarding to edge node component (Kubelet)
Application-grid Wrapper: close loop service access within the ServiceGrid in conjunction with application-Grid Controller

Functions overview

Application deployment & Service access Control

Superedge supports application deployment for all workloads of native Kubernetes, including:

deployment
statefulset
daemonset
job
cronjob

For edge computing applications, it has the following unique points:

In edge computing scenarios, multiple edge sites are often managed in the same cluster, with one or more computing nodes in each edge site
At the same time, it is expected that each site runs a set of services with business logic connection. The services in each site are a complete set of functions, which can provide services for users
Due to network restrictions, services that have business connections do not want or cannot be accessed across sites

To solve the above problems, SuperEdge innovatively constructed the concept of ServiceGroup, which enables users to easily deploy a group of services in different machine rooms or regions belonging to the same cluster. In addition, requests between services can be completed within the machine room or region (closed loop), avoiding cross-region access of services

There are several key concepts involved in ServiceGroup:

NodeUnit

NodeUnit is usually one or more computing resource instances located in the same edge site. Ensure that nodes in the same NodeUnit can communicate with each other on the Intranet
The services in the ServiceGroup group run within a NodeUnit
ServiceGroup allows the user to set the number of Pods (belonging to Deployment) that the service runs in a NodeUnit
ServiceGroup can restrict calls between services to this NodeUnit

NodeGroup

NodeGroup contains one or more NodeUnits
Ensure that the services in ServiceGroup are deployed on each NodeUnit in the collection
When a NodeUnit is added to a cluster, services in ServiceGroup are automatically deployed to the new NodeUnit

ServiceGroup

ServiceGroup contains one or more service services
Applicable scenarios:
- Services need to be packaged and deployed.
- You need to run in each NodeUnit and keep the number of pods
- Calls between services need to be controlled in the same NodeUnit and traffic cannot be forwarded to other NodeUnits
Note: A ServiceGroup is an abstract resource concept, and multiple servicegroups can be created in a cluster

Here is an example of ServiceGroup:

# step1: labels edge nodes
$ kubectl  get nodes
NAME    STATUS   ROLES    AGE   VERSION
node0   Ready    <none>   16d   v1.16.7
node1   Ready    <none>   16d   v1.16.7
node2   Ready    <none>   16d   v1.16.7
# nodeunit1(nodegroup and servicegroup zone1)
$ kubectl --kubeconfig config label nodes node0 zone1=nodeunit1  
# nodeunit2(nodegroup and servicegroup zone1)
$ kubectl --kubeconfig config label nodes node1 zone1=nodeunit2
$ kubectl --kubeconfig config label nodes node2 zone1=nodeunit2

# step2: deploy echo DeploymentGrid
$ cat <<EOF | kubectl --kubeconfig config apply -f -
apiVersion: superedge.io/v1
kind: DeploymentGrid
metadata:
  name: deploymentgrid-demo
  namespace: default
spec:
  gridUniqKey: zone1
  template:
    replicas: 2
    selector:
      matchLabels:
        appGrid: echo
    strategy: {}
    template:
      metadata:
        creationTimestamp: null
        labels:
          appGrid: echo
      spec:
        containers:
        - image: GCR. IO/kubernetes - e2e - test - images/echoserver: 2.2
          name: echo
          ports:
          - containerPort: 8080
            protocol: TCP
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
          resources: {}
EOF
deploymentgrid.superedge.io/deploymentgrid-demo created
# note that there are two deployments generated and deployed into both nodeunit1 and nodeunit2
$ kubectl  get deploy
NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deploymentgrid-demo-nodeunit1   2/ 2,     2            2           5m50s
deploymentgrid-demo-nodeunit2   2/ 2,     2            2           5m50s
$ kubectl  get pods -o wide
NAME                                             READY   STATUS    RESTARTS   AGE     IP            NODE    NOMINATED NODE   READINESS GATES
deploymentgrid-demo-nodeunit1-65bbb7c6bb-6lcmt   1/ 1     Running   0          5m34s   172.16. 016.   node0   <none>           <none>
deploymentgrid-demo-nodeunit1-65bbb7c6bb-hvmlg   1/ 1     Running   0          6m10s   172.16. 015.   node0   <none>           <none>
deploymentgrid-demo-nodeunit2-56dd647d7-fh2bm    1/ 1     Running   0          5m34s   172.161.12.   node1   <none>           <none>
deploymentgrid-demo-nodeunit2-56dd647d7-gb2j8    1/ 1     Running   0          6m10s   172.162.9.    node2   <none>           <none>

# step3: deploy echo ServiceGrid
$ cat <<EOF | kubectl --kubeconfig config apply -f -
apiVersion: superedge.io/v1
kind: ServiceGrid
metadata:
  name: servicegrid-demo
  namespace: default
spec:
  gridUniqKey: zone1
  template:
    selector:
      appGrid: echo
    ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
EOF
servicegrid.superedge.io/servicegrid-demo created
# note that there is only one relevant service generated
$ kubectl  get svc
NAME                   TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)   AGE
kubernetes             ClusterIP   192.168. 01.       <none>        443/TCP   16d
servicegrid-demo-svc   ClusterIP   192.1686.139.     <none>        80/TCP    10m

# step4: access servicegrid-demo-svc(service topology and closed-looped)
# execute on onde0
$ curl 192.1686.139.|grep "node name"
        node name:      node0
# execute on node1 and node2
$ curl 192.1686.139.|grep "node name"
        node name:      node2
$ curl 192.1686.139.|grep "node name"
        node name:      node1
Copy the code

ServiceGroup can be summarized as follows from the above example:

NodeUnit, NodeGroup, and ServiceGroup are all concepts. Their mappings are as follows:
- NodeUnit is a group of edge nodes with the same label key and value
- A NodeGroup is a group of NodeUnits (different values) with the same label key
- ServiceGroup consists of two TYPES of CRDS: DepolymentGrid and ServiceGrid, which have the same gridUniqKey
- GridUniqKey corresponds to the Label key of the NodeGroup, that is, ServiceGroup corresponds to the NodeGroup one by one, and NodeGroup corresponds to multiple NodeUnits. At the same time, each NodeUnit in the NodeGroup will deploy ServiceGroup corresponding to Deployment. These deployments (named deploymentGridName-NodeUnit) are fixed to a NodeUnit by nodeSelector affinity and restricted to access within that NodeUnit by service topological awareness

Distributed Health Check

In the scenario of edge computing, the network environment between edge nodes and cloud is very complex and the connection is not reliable. In the original Kubernetes cluster, the connection between Apiserver and nodes is interrupted, and the node status is abnormal, which eventually leads to the expulsion of POD and the loss of endpoint, resulting in service interruption and fluctuation. Specifically, native Kubernetes processing is as follows:

The lost node is placed ConditionUnknown and added to the taints of NoSchedule and NoExecute
The POD on the missing node is expelled and rebuilt on the other nodes
The POD on the lost node is removed from the Service Endpoint list

Therefore, in the edge computing scenario, it is not enough to judge whether a node is abnormal by relying only on the connection between the edge end and apiserver, which may cause misjudgment due to the unreliability of the network and affect normal services. Compared with the connection between cloud and edge, it is obvious that the connection between edge nodes is more stable and has certain reference value. Therefore, SuperEdge proposes the edge distributed health check mechanism. In this mechanism, not only the factors of Apiserver are considered, but also the evaluation factors of nodes are introduced, so as to make a more comprehensive judgment of node state. Through this function, it can avoid the large number of POD migration and reconstruction caused by the unreliable cloud side network, and ensure the stability of the service

Specifically, the accuracy of node state judgment can be enhanced through the following three aspects:

Each node periodically detects the health status of other nodes
All nodes in the cluster periodically vote to determine the status of each node
The cloud and the edge nodes jointly determine the node state

The final judgment processing of distributed health check is as follows:

Node final state	Cloud judged normal	Cloud decision exception
The node internal check is normal	normal	No more scheduling new pods to this node (NoSchedule taint)
The node is abnormal. Procedure	normal	Evict stock POD; Remove pod from the Endpoint list; New pods are no longer scheduled to this node

Edge of the autonomous

For edge computing users, they want to enjoy the convenient management operation and maintenance brought by Kubernetes, but also want to have the disaster recovery capability in weak network environment, specifically, as follows:

Even if the node is disconnected from the master, services on the node continue to run
Ensure that kubelet can continue to pull up if the business container exits unexpectedly or hangs
After the node is restarted, services can be pulled up again
Users deploy microservices in factories. Ensure that microservices in the same factory can be accessed after a node is restarted

However, for standard Kubernentes, if the node is disconnected from the network and abnormally restarts, the phenomenon is as follows:

The status of the lost node is set to ConditionUnknown
The container can be pulled up if the service process on the lost node exits unexpectedly
The Pod IP on the lost node was removed from the Endpoint list
After the node is restarted, the container disappears and will not be pulled up

The self-developed edge autonomy of Superedge aims to solve the above problems. Specifically, edge autonomy can achieve the following effects:

The node will be placed ConditionUnknown, but the service will still be available (pod will not be expelled or removed from the endpoint list)
When multiple nodes are disconnected from the network, Pod services and microservices are properly provided
When multiple nodes are disconnected and restarted, the Pod is pulled up again and runs normally
After multiple nodes are disconnected and restarted, all microservices can be accessed normally

The first two points can be implemented by the distributed health check mechanism described above, while the second two points can be implemented by lite-Apiserver, network snapshot, and DNS solution as follows:

Lite – apiserver mechanism

Superedge adds a layer of mirrored lite-Apiserver components to the edge so that all edge nodes’ requests to cloud Kube-Apiserver are directed to lite-Apiserver components:

Lite-apiserver is a proxy that caches kube-Apiserver requests and returns them to the client when they are not connected to apiserver:

In summary: For edge node components, lite-Apiserver provides kube-Apiserver functions, but on the one hand, lite-Apiserver is only effective for the local node, and on the other hand, it consumes very few resources. Lite-apiserver components are transparent to node components when the network is free; When a network exception occurs, lite-Apiserver returns the data required by the local node to the components on the node to ensure that the components are not affected by the network exception

Network snapshot

With Lite-Apiserver, the pod can be pulled up normally after the edge node is disconnected. However, according to the original Kubernetes principle, the POD IP will change after the pull up, which is not allowed in some cases. For this reason, SuperEdge designed a network snapshot mechanism to ensure that edge nodes are restarted, and IP remains unchanged after POD is pulled up. To be specific, snapshots are periodically taken of network information of components on nodes and restored after the nodes are restarted

Local DNS solution

Lite-apiserver and the network snapshot mechanism can ensure that the Pod will be pulled up and run properly after the edge node is restarted when the network is disconnected. In addition, the microservices also run properly. Services accessing each other involves a domain name resolution problem: Generally speaking, coreDNS is used for domain name resolution within the cluster, and is generally deployed in the form of Deployment. However, in the case of edge computing, the nodes may not be on the same LAN, and may be across the availability area, so coreDNS service may not be accessible. In order to ensure that DNS access is always normal, Superedge has designed a special local DNS solution, as follows:

Coredns is deployed in DaemonSet mode for local DNS to ensure that coreDNS is available on each node. Meanwhile, cluster-DNS, the startup parameter of Kubelet on each node, is modified to point to the local private IP(the same on each node). This ensures that domain name resolution can be performed even when the network is down.

In general, SuperEdge is based on lite-Apiserver mechanism, combined with distributed health check mechanism, network snapshot and local CoreDNS, to ensure the network reliability of edge container cluster in weak network environment. In addition, as the level of edge autonomy increases, more and more components are required

Cloud side tunnel

Finally, the cloud side tunnel of Superedge is introduced. The cloud side tunnel is mainly used to: broker the request of cloud to access edge node components and solve the problem that cloud cannot directly access edge nodes (edge nodes are not exposed to the public network).

The architecture diagram is as follows:

The implementation principle is as follows:

Tunnel-edge on the edge node actively connects to the tunnel-cloud service. The tunnel-Cloud service forwards the request to the pod of the tunnel-cloud based on the load balancing policy
After a GRPC connection is established between tunnel-edge and tunnel-cloud, the tunnel cloud writes the mapping of its podIp and the nodeName of the node where tunnel-edge resides into the DNS(Tunnel DNS). After the GRPC connection is disconnected, the tunnel-cloud deletes the mapping between the podIp and the node name

The proxy forwarding process of the whole request is as follows:

When apiserver or another cloud application accesses kubelet or other applications on edge nodes, tunnel-DNS forwards the request to the pod of tunnel-Cloud through DNS hijacking (resolving the node name in host to the podIp of tunnel-cloud)
Tunnel-cloud forwards the request information to the GRPC connection established with tunnel-edge based on the node name
Tunnel-edge requests applications on edge nodes based on the received request information

conclusion

This paper introduces the open source edge computing framework SuperEdge features, overall architecture and main functions and principles. Distributed health check and edge cluster service access control ServiceGroup are unique features of SuperEdge. Distributed health check largely avoids a large number of POD migration and reconstruction caused by the unreliable cloud side network and ensures the stability of services. ServiceGroup enables users to deploy a group of services in different equipment rooms or regions in the same cluster. In this way, service requests can be completed within the equipment room or region (closed loop), preventing cross-region service access. Other features include edge autonomy and cloud-side tunnels.

On the whole, SuperEdge adopts non-invasive way to build edge cluster. On the basis of retaining the original Kubernetes components unchanged, some new components are added to complete the function of edge computing, which not only retains Kubernetes powerful arrangement system, but also has perfect edge computing capability.