The author | | Tang Bingchang source alibaba cloud native public number

With the rapid development of edge computing, more and more data need to be stored, processed and analyzed at the edge of the network. How to efficiently manage edge resources and applications is a major problem facing the industry. Currently, the cloud edge and end integration architecture, which uses the cloud native method to sink the cloud computing capability to the edge and do unified scheduling and control in the cloud, has been widely recognized in the industry.

In May 2020, Alibaba opened source the first Kubernetes non-invasive edge computing cloud native project OpenYurt, and entered CNCF SandBox in September the same year. OpenYurt enhances the native Kubernetes non-intrudingly in order to solve the problems of unstable network and difficult cloud side operation and maintenance in edge scenarios. It focuses on the ability of edge node autonomy, cloud side operation and maintenance channel and edge unification.

As shown in Figure 1, this paper sets up the cloud tube side scenario by deploying the control surface of Kubernetes cluster in the cloud and connecting raspberry PI to the cluster. Based on this environment to demonstrate the core capabilities of OpenYurt, take you quickly to get started with OpenYurt.

Figure 1. Native Kubernetes cluster

Environment to prepare

1. Basic environment

On the cloud, purchase the ENS node (which has a public IP address for exposing services externally through the public network) to deploy the management and control components of the native K8s cluster. The system uses Ubuntu18.04, Hostname as master-node and Docker version as 19.03.5.

In the edge, as shown in Figure 2, raspberry PI 4 is connected to the local router to form the edge private network environment. The router accesses the Internet through the 4G network card. Raspberry PI 4 is ubuntu18.04, Hostname is Edge-Node, and Docker version is 19.03.5.

Figure 2. Entity diagram of edge environment

2. Build a native K8s cluster

In this paper, the demonstration environment is based on K8s cluster of community 1.16.6 version, and uses kubeadm tool provided by the community to build the cluster, the specific operations are as follows:

  • Install Kubernetes components by executing the following commands on the cloud node and raspberry PI respectively.
curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt install -y kubelet=1.16.6-00 kubeadm=1.16.6-00 kubectl=1.16.6-00
Copy the code
  • Use kubeadm to initialize the cloud node (run the following command on the cloud node), and use ali Cloud’s mirror warehouse during the deployment process. In order to support the access of raspberry PI, the mirror of the warehouse has made a manifest list, which can support amD64 / ARM64 different CPU architectures.
# master-node kubeadm init --image-repository=registry.cn-hangzhou.aliyuncs.com/edge-kubernetes - kubernetes - version = v1.16.6 - pod - network - cidr = 10.244.0.0/16Copy the code

Copy the config file to $HOME/.kube as prompted after initialization:

mkdir -p $HOME/.kube
 sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
Copy the code
  • Access the raspberry PI to the cloud cluster Run the access command on the Raspberry PI based on the node access information output after the initialization in Step 2.
Kubeadm join 183.195.233.42:6443 --token XXXX \ --discovery-token-ca-cert-hash XXXXCopy the code
  • Add CNI configuration (required for cloud control nodes and raspberry PI), and the cluster set up in this paper uses host network. Create the cnI configuration file /etc/cni/net.d/0-loopback.conf and copy the following content to the file.
{" cniVersion ":" 0.3.0 ", "name" : "lo", "type" : "loopback}"Copy the code
  • Check the deployment effect on the master node.
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME edge-node Ready <none> 74s v1.16.6 192.168.0.100 < None > Ubuntu 18.04.4Lts 4.19.105-V8-28 Docker ://19.3.5 master-node Ready Master 2m5s v1.16.6 183.195.233.42 < None > Ubuntu 18.04.2 LTS 4.15.0-52- Generic Docker ://19.3.5Copy the code
  • Delete CoreDNS (which is not required in this Demo) and remove the taints of the master node (convenient for deploying the OpenYurt component).
kubectl delete deployment coredns -n kube-system
kubectl taint node master-node node-role.kubernetes.io/master-
Copy the code

Native K8s cluster problems in edge scenarios

Based on the above environment, we test the support for cloud side operation and maintenance of native K8s in cloud tubeside architecture and the response to the disconnection of cloud side network. First, we deploy a test application nginx from the cloud and execute kubectl apply-f nginx.yaml on the master node.

Note: nodeSelector selects the edge-Node node, the host network is set to true, and the POD tolerance time is set to 5s (5min by default, which is convenient for demonstrating pod expulsion).

apiVersion: v1
kind: Pod
metadata:
 name: nginx
spec:
 tolerations:
 - key: "node.kubernetes.io/unreachable"
 operator: "Exists"
 effect: "NoExecute"
 tolerationSeconds: 5
 - key: "node.kubernetes.io/not-ready"
 operator: "Exists"
 effect: "NoExecute"
 tolerationSeconds: 5
 nodeSelector:
 kubernetes.io/hostname: edge-node
 containers:
 - name: nginx
 image: nginx
 hostNetwork: true
Copy the code

View the deployment result:

root@master-node:~# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 11s 192.168.0.100 edge-node <none>           <none>
Copy the code

1. Test common cluster operation and maintenance commands, including logs, exec, and port-forward.

Run and maintain edge node applications on the master node, execute logs/exec/port-forward and other commands to check the results.

root@master-node:~# kubectl logs nginx Error from server: Get https://192.168.0.100:10250/containerLogs/default/nginx/nginx: dial TCP 192.168.0.100:10250: connect: connection refused root@master-node:~# kubectl exec -it nginx sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. Error from server: error dialing backend: Dial the TCP 192.168.0.100:10250: connect: connection refused root@master-node:~# kubectl port-forward pod/nginx 8888:80 error: error upgrading connection: Error Dialing backend: Dial TCP 192.168.0.100:10250: connect: connection refusedCopy the code

According to the execution results, the native K8S cannot provide the ability to operate and maintain edge applications from the cloud in the cloud tubeside scenario. This is because edge nodes are deployed on users’ private networks and cannot be directly accessed from the cloud through their IP addresses.

2. Test the impact of edge disconnection on services

The edge node is connected to the cloud management system through the public network. Therefore, the network is unstable and the cloud system is disconnected. Here we will do two disconnection related tests:

  • Disconnect the network for 1 minute -> Restore the network

  • Disconnect the network for 1 minute -> Restart edge nodes -> Restore the network

Observe the state changes of nodes and pods during the two tests. In this Demo, the router is disconnected from the public network.

1) Disconnect the network for 1 minute -> Restore the network

After the network is disconnected, the node becomes NotReady about 40 seconds later. A node reports a heartbeat every 10 seconds. If the node does not report a heartbeat for four times, the management module considers the node abnormal.

root@master-node:~# kubectl get nodes NAME STATUS ROLES AGE VERSION edge-node NotReady <none> 5M13s v1.16.6 master-node Ready master 6 m4s v1.16.6Copy the code

After 5s, 5M will start to expel POD after the normal node becomes NotReady. In order to test the effect, the pod tolerance time is set to 5s. The application pod is expelled and the status changes to Terminating.

root@master-node:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 1/1 Terminating 0 3m45s
Copy the code

Restore the network and observe the changes of nodes and PODS.

root@master-node:~# kubectl get pods
No resources found in default namespace.
Copy the code

After the network is restored, the node state becomes ready, and the business pod is cleared. This is because Kubelet of the edge node has obtained the Terminating state of the business POD, deleted the business POD, and returned that the deletion was successful. The cloud has also done the corresponding cleaning. At this point, the business Pod was banished due to the instability of the cloud side network, while the edge nodes actually worked during the outage.

Recreate the application nginx for the following tests.

root@master-node:~# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 4s 192.168.0.100 edge-node <none>           <none>
Copy the code

2) Disconnect the network for 1 minute -> Restart edge nodes -> Restore the network

Next, we test the impact of the restart of the edge node on the service in the case of a network outage. After 1 minute, the Node and Pod states are the same as the above test results. Node changes to NotReady and Pod changes to Terminating. Switch to the private network, log in to the Raspberry PI, and restart the raspberry PI. After the restart, wait about 1 minute, and check the container list on the node before and after the restart.

List of edge node containers before restart (at this time, the edge of the cloud is open, although the POD obtained in the cloud is in Terminating state, the edge does not Watch to Terminating operation, so the application of the edge is still running normally).

root@edge-node:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9671cbf28ca6 e86f991e5d10 "/ docker - entrypoint...." About a minute ago Up About a minute k8s_nginx_nginx_default_efdf11c6-a41c-4b95-8ac8-45e02c9e1f4d_0 6272a46f93ef Registry.cn-hangzhou.aliyuncs.com/edge-kubernetes/pause:3.1 "/ pause" 2 minutes line Up About a minute K8s_POD_nginx_default_efdf11c6 - b95 a41c - 4-8 ac8-45 e02c9e1f4d_0 698 bb024c3db f9ea384ddb34 "/ usr/local/bin/kube..." 8 minutes ago Up 8 minutes k8s_kube-proxy_kube-proxy-rjws7_kube-system_51576be4-2b6d-434d-b50b-b88e2d436fef_0 31952700 c95b registry.cn-hangzhou.aliyuncs.com/edge-kubernetes/pause:3.1 8 minutes "/ pause" line Up to 8 minutes k8s_POD_kube-proxy-rjws7_kube-system_51576be4-2b6d-434d-b50b-b88e2d436fef_0Copy the code

List of node containers after restart. Kubelet cannot obtain Pod information from the cloud and will not rebuild Pod after being disconnected and restarted.

root@edge-node:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@edge-node:~#
Copy the code

The comparison before and after the restart shows that all the PODS on the edge nodes cannot be recovered after the edge nodes are disconnected and restarted. This will cause the application to fail once the node is restarted in the event of a cloud outage.

Restore the network and observe the changes of nodes and PODS. The same as the test results above. After the network is restored, the nodes become Ready and business Pods are cleared.

root@master-node:~# kubectl get nodes NAME STATUS ROLES AGE VERSION edge-node Ready < None > 11m v1.16.6 master-node Ready Master 12M v1.16.6 root@master-node:~# kubectl get Pods No resources found in default namespace.Copy the code

Next, business Nginx is deployed again to test the support of OpenYurt cluster for cloud side operation and maintenance and the response to cloud side network outage.

root@master-node:~# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 12s 192.168.0.100 edge-node <none>           <none>
Copy the code

One-click conversion of native K8s clusters to OpenYurt clusters

After exploring the shortcomings of native Kubernetes in cloud-side integration architecture, let’s see if The OpenYurt cluster can meet this scenario. Now, we use the cluster conversion tool yurTCtl provided by the OpenYurt community to convert the native K8s cluster into an OpenYurt cluster. Run the following command on the master node to specify the component image and cloud node, and to install the yurt-tunnel.

yurtctl convert - yurt-controller-manager-image=registry.cn-hangzhou.aliyuncs.com/openyurt/yurt-controller-manager:v0.2.1 - yurt-tunnel-agent-image=registry.cn-hangzhou.aliyuncs.com/openyurt/yurt-tunnel-agent:v0.2.1 - yurt-tunnel-server-image=registry.cn-hangzhou.aliyuncs.com/openyurt/yurt-tunnel-server:v0.2.1 - yurtctl-servant-image=registry.cn-hangzhou.aliyuncs.com/openyurt/yurtctl-servant:v0.2.1 - yurthub-image=registry.cn-hangzhou.aliyuncs.com/openyurt/yurthub:v0.2.1 - cloud - nodes = master node - deploy -- yurttunnelCopy the code

The conversion takes about 2 minutes. After the conversion is complete, observe the pod status. It can be seen that the conversion process has no impact on the POD status (you can also use kubectl get pod-w on the new terminal during the conversion process to observe the POD status).

root@master-node:~# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 2m4s 192.168.0.100 edge-node <none>           <none>
Copy the code

The component distribution after execution is shown in Figure 3, where the orange part is OpenYurt related component and the blue part is native K8s component. Accordingly, we look at pods for cloud nodes and edge nodes.

Figure 3 OpenYurt cluster component distribution

Urt related pods of the cloud node are yurt-controller-Manager and Yurt-Tunnel-Server.

root@master-node:~# kubectl get pods --all-namespaces -owide | grep master | grep yurt kube-system Yurt-controller-manager-7d9db5bf85-6542h 1/1 Running 0 103s 183.195.233.42 master-node <none> <none> kube-system Yurt-tunnel-server-65784df-pl5bn 1/1 Running 0 103s 183.195.233.42 master-node <none> <none>Copy the code

Added yURT related pods: Yurt-Hub (static Pod) and Yurt-Tunnel-Agent on edge nodes.

root@master-node:~# kubectl get pods --all-namespaces -owide | grep edge | grep yurt
kube-system yurt-hub-edge-node 1/1 Running 0 117s 192.168.0.100 edge-node <none>           <none>
kube-system   yurt-tunnel-agent-7l8nv                    1/1     Running   0          2m      192.168.0.100    edge-node     <none>           <none>
Copy the code

Test the OpenYurt cluster capability in edge scenarios

1. Test the logs/exec/port-forward command and view the result

root@master-node:~# kubectl logs nginx
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Configuration complete; ready for start up


root@master-node:~# kubectl exec -it nginx sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
# ls
bin dev docker-entrypoint.sh home media opt root sbin sys usr
boot docker-entrypoint.d etc lib mnt proc run srv tmp var
# exit


root@master-node:~# kubectl port-forward pod/nginx 8888:80
Forwarding from 127.0.0.1:8888 -> 80
Handling connection for 8888
Copy the code

To test port-forward, run curl 127.0.0.1:8888 on the master node to access the nginx service.

According to the demo results, OpenYurt can support common cloud side operation and maintenance instructions very well.

2. Test the impact of edge disconnection on services

We also repeat the two tests of the native K8s interrupt network. Before testing, we enable autonomy for the edge-node. In an OpenYurt cluster, the autonomy of edge nodes is identified by an annotation.

root@master-node:~# kubectl annotate node edge-node node.beta.alibabacloud.com/autonomy=true
node/edge-node annotated
Copy the code

1) Disconnection for 1 minute -> Network recovery

Similarly, disconnect the router from the public network and observe the status of Node and Pod. After 40 seconds, the node status changes to “NotReady”. After 1 minute, the Pod status remains Running and is not expelled.

root@master-node:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
edge-node NotReady <none>   24m   v1.16.6
master-node   Ready      master   25m   v1.16.6
root@master-node:~# kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          5m7s
Copy the code

Restore the network and observe the status of Node and Pod. Node becomes Ready and Pod stays Running. When the cloud – side network is unstable, service PODS on edge nodes are not affected.

root@master-node:~# kubectl get nodes NAME STATUS ROLES AGE VERSION edge-node Ready < None > 25m v1.16.6 master-node Ready Master 26M v1.16.6 root@master-node:~# kubectl get Pods NAME READY STATUS RESTARTS AGE nginx 1/1 Running 0 6m30sCopy the code

2) Disconnect the network for 1 minute -> Restart edge nodes -> Restore the network

Next, we test the impact of the restart of the edge node on the service in the case of a network outage. After 1 minute, the Node and Pod status are the same as the previous test results. Node changes to NotReady and Pod remains Running. Similarly, we log in to the Raspberry PI, restart the raspberry PI, and observe the list of containers on the node before and after the restart.

Edge node container list before restart:

root@edge-node:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 38727ec9270c 70bf6668c7eb "yurthub - v = 2 - ser..." 7 minutes ago Up 7 minutes k8s_yurt-hub_yurt-hub-edge-node_kube-system_d75d122e752b90d436a71af44c0a53be_0 c403ace1d4ff 7 minutes registry.cn-hangzhou.aliyuncs.com/edge-kubernetes/pause:3.1 "/ pause" line Up to 7 minutes k8s_POD_yurt-hub-edge-node_kube-system_d75d122e752b90d436a71af44c0a53be_0 de0d693e9e74 473ae979be68 "yurt-tunnel-agent -..." 7 minutes ago Up 7 minutes k8s_yurt-tunnel-agent_yurt-tunnel-agent-7l8nv_kube-system_75d28494-f577-43fa-9cac-6681a1215498_0 a0763f143f74 7 minutes registry.cn-hangzhou.aliyuncs.com/edge-kubernetes/pause:3.1 "/ pause" line Up to 7 minutes k8s_POD_yurt-tunnel-agent-7l8nv_kube-system_75d28494-f577-43fa-9cac-6681a1215498_0 80c247714402 e86f991e5d10 "/ docker - entrypoint...." 7 minutes ago Up 7 minutes k8s_nginx_nginx_default_b45baaac-eebc-466b-9199-2ca5c1ede9fd_0 01f7770cb0f7 7 minutes registry.cn-hangzhou.aliyuncs.com/edge-kubernetes/pause:3.1 "/ pause" line Up to 7 minutes K8s_POD_nginx_default_b45baaac eebc - 466 - b - 9199-2 ca5c1ede9fd_0 7 e65f83090f6 f9ea384ddb34 "/ usr/local/bin/kube..." 17 minutes ago Up 17 minutes k8s_kube-proxy_kube-proxy-rjws7_kube-system_51576be4-2b6d-434d-b50b-b88e2d436fef_1 17 minutes c1ed142fc75b registry.cn-hangzhou.aliyuncs.com/edge-kubernetes/pause:3.1 "/ pause" line Up 17 minutes k8s_POD_kube-proxy-rjws7_kube-system_51576be4-2b6d-434d-b50b-b88e2d436fef_1Copy the code

List of edge node containers after restart:

root@edge-node:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0c66b87066a0 473ae979be68 "Yurt - tunnel - agent -..." 12 seconds ago Up 11 seconds k8s_yurt-tunnel-agent_yurt-tunnel-agent-7l8nv_kube-system_75d28494-f577-43fa-9cac-6681a1215498_2 a4fb3e4e8c8f E86f991e5d10 / docker - entrypoint.... "" 58 seconds ago Up 56 seconds k8s_nginx_nginx_default_b45baaac-eebc-466b-9199-2ca5c1ede9fd_1 fce730d64b32 f9ea384ddb34 "/ usr/local/bin/kube..." 58 seconds ago Up 57 seconds k8s_kube-proxy_kube-proxy-rjws7_kube-system_51576be4-2b6d-434d-b50b-b88e2d436fef_2 C78166ea563f registry.cn-hangzhou.aliyuncs.com/edge-kubernetes/pause:3.1 "/ pause" 59 seconds line Up 57 seconds k8s_POD_yurt-tunnel-agent-7l8nv_kube-system_75d28494-f577-43fa-9cac-6681a1215498_1 799ad14bcd3b Registry.cn-hangzhou.aliyuncs.com/edge-kubernetes/pause:3.1 "/ pause" 59 seconds line Up 57 seconds k8s_POD_nginx_default_b45baaac-eebc-466b-9199-2ca5c1ede9fd_1 627673da6a85 Registry.cn-hangzhou.aliyuncs.com/edge-kubernetes/pause:3.1 "/ pause" 59 seconds line Up 58 seconds k8s_POD_kube-proxy-rjws7_kube-system_51576be4-2b6d-434d-b50b-b88e2d436fef_2 04da705e4120 70bf6668c7eb "yurthub --v=2 - ser..." About a minute ago Up About a minute k8s_yurt-hub_yurt-hub-edge-node_kube-system_d75d122e752b90d436a71af44c0a53be_1 260057 d935ee registry.cn-hangzhou.aliyuncs.com/edge-kubernetes/pause:3.1 "/ pause" About a minute a line Up About a minute k8s_POD_yurt-hub-edge-node_kube-system_d75d122e752b90d436a71af44c0a53be_1Copy the code

According to the comparison before and after the restart, the POD on the edge node can be pulled up normally after the network is disconnected and restarted. The node autonomy capability of OpenYurt can ensure the stable operation of services under the network disconnection.

The network is restored. The node is Ready. Observe the POD status.

root@master-node:~# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 1 11m 192.168.0.100 edge-node <none>           <none>
Copy the code

Finally, we take advantage of yurTCTL’s ability to convert OpenYurt clusters into native K8s clusters. Again, you can observe that the transformation process has no impact on the existing business.

Yurtctl revert - yurtctl-servant-image=registry.cn-hangzhou.aliyuncs.com/openyurt/yurtctl-servant:v0.2.1Copy the code

As ali’s first edge cloud native open source project, OpenYurt, based on the commercial product ACK@Edge, has experienced a long time of polishing within the group. It has been applied in CDN, IoT, Hema, ENS, Cainiao logistics and many other scenarios. For edge scenarios, the project insists on maintaining the characteristics of the original K8s and provides edge node autonomy and cloud edge and end integrated operation and maintenance channel in the form of Addon. Recently, with the concerted efforts of students in the community, we have opened source the edge ununit management ability. Meanwhile, we will continue to open source more edge management ability in the future. We welcome you to actively participate in and contribute. Nail nail search group number: 31993519, you can enter the group communication.