Introduction:

In the Edge scenario, the network is often unreliable, and it is easy to trigger Kubernetes expulsion mechanism by mistake, causing Pod expulsion actions that are not in line with expectations. TKE Edge initiated the distributed node state determination mechanism, which can better identify the expulsion timing, ensure the normal operation of the system under weak network, and avoid service interruption and fluctuation.

In the context of edge computing, the network environment between edge node and cloud is very complex, the network quality cannot be guaranteed, and the connection between APIServer and node is likely to be interrupted. If the original Kubernetes is directly used without modification, the node status will often appear abnormal, and then cause Kubernetes expulsion mechanism to take effect, resulting in Pod expulsion and the loss of Endpoint, and eventually cause service interruption and fluctuation. In order to solve this problem, TKE edge container team proposed the edge node distributed node state determination mechanism under the edge cluster weak network environment, which can better identify the expulsion timing.

background

Different from the central cloud, in the edge scenario, the first thing to do is to face the weak network environment of the cloud side. The edge devices are often located in the edge cloud machine room and mobile edge site, and the network environment connected to the cloud is very complex, which is not as reliable as the central cloud. This includes not only the unreliable network environment of the cloud (control terminal) and the edge terminal, but also the unreliable network environment between the edge nodes. Even between different computer rooms in the same area, the network quality between nodes cannot be assumed to be good.

Take the smart factory as an example, the edge nodes are located in the factory, warehouse and workshop, and the Master node at the control end is located in the central machine room of Tencent cloud.

The network between the edge equipment in the warehouse and workshop and the cloud cluster is complex, with the possibility of Internet, 5G, WIFI and other forms, and the network quality is uneven and not guaranteed. However, compared with the cloud network environment, the network quality must be better than the connection between the cloud cluster and the edge devices in the warehouse and workshop because of the local network. Therefore, it is relatively more reliable.

Challenges posed

Native Kubernetes processing

The problem caused by weak cloud side network is that the communication between Kubelet running on the edge node and APIServer on the cloud is affected. The cloud APIServer cannot receive kubelet’s heartbeat or renew the lease, and cannot accurately obtain the operation of the node and pod on the node. If the duration exceeds the set threshold, APIServer considers the node unavailable and does the following:

  • The status of the lost node is set to NotReady or Unknown, and the taints of NoSchedule and NoExecute are added
  • The Pod on the missing node is expelled and rebuilt on the other nodes
  • The Pod on the lost node is removed from the Service Endpoint list

Demand scenarios

Another audio and video pull flow scenario, audio and video service is an important application scenario of edge computing, as shown in the figure:

For the sake of user experience and company cost, audio and video pull streams often need to improve edge cache hit ratio and reduce back source. It is common practice to schedule the same file requested by users to the same service instance and service instance cache file.

However, in the case of native Kubernetes, if Pod is rebuilt frequently due to network fluctuations, the caching effect of service instances will be affected on the one hand, and the scheduling system will schedule user requests to other service instances on the other hand. Undoubtedly, these two points will have a great impact on the CDN effect, or even unacceptable.

In fact, the edge nodes work perfectly, and Pod eviction or reconstruction is completely unnecessary. To overcome this problem and keep the service continuously available, the TKE Edge Container team proposed a distributed node state determination mechanism.

The solution

Design principles

Obviously, in the edge computing scenario, it is not reasonable to rely only on the connection between the edge end and APIServer to judge whether the node is normal. In order to make the system more robust, additional judgment mechanism needs to be introduced.

Compared to the cloud and the edge, the network between the edge nodes is more stable. How can a more stable infrastructure be used to improve accuracy? We pioneered the edge health distributed node state judgment mechanism. In addition to considering the connection between nodes and APIServer, we also introduced edge nodes as evaluation factors to make a more comprehensive state judgment of nodes. Through testing and a lot of practice, it is proved that this mechanism can greatly improve the accuracy of node state judgment of the system in the case of weak cloud side network, and escort the stable operation of services.

The main principles of this mechanism are as follows:

  • Each node periodically detects the health status of other nodes
  • All nodes in the cluster periodically vote to determine the status of each node
  • Cloud and edge nodes jointly determine node state

Firstly, detection and voting are carried out among nodes to jointly determine whether a specific node has abnormal state, and the specific state of the node can only be determined by the unanimous judgment of most nodes. In addition, although the network state between nodes is generally better than the cloud side network, it should be noted that the network between edge nodes is also very complicated, and the network between them is not 100% reliable.

Therefore, the network between nodes cannot be completely trusted, and the state of nodes cannot be determined by the nodes themselves, but jointly determined by the cloud side is more reliable. Based on this consideration, we made the following design:

Program features

It should be noted that when the cloud determines that the node is abnormal, but other nodes consider the node normal, although the existing Pod will not be expelled, new Pod will not be scheduled to the node to ensure the stability of incremental service. The normal operation of the stock also benefits from the edge autonomy ability of edge cluster.

In addition, due to the edge of the network and the particularity of topology, a single point of failure often exists between node group network problems, such as the workshop of the case, although belong to the warehouse and workshop workshop within the region, but may be the network connection between them, relying on a key link, once the link is interrupted, will cause the schism between node group, Our scheme can ensure that when the two split node groups lose contact and determine each other, the majority of one node will not be judged as abnormal, so as to avoid the situation that Pod can only be dispatched to a few nodes and the node load is too high.

In addition, edge devices are likely to be located in different regions and disconnected from each other, so it is not appropriate to have disconnected nodes checking each other. To deal with this situation, our scheme also supports grouping of nodes, and nodes within each group detect each other’s state. Allowing for the possibility of nodes being regrouped, the mechanism also enables nodes to be regrouped in real time without the need to redeploy detection components or reinitialize.

The detection mechanism is disabled by default. If you need to perform operations, you can enter Basic Information – Enable Edge Health (disabled by default). If you need to group nodes, you can continue to open “Enable Multi-Region” and group nodes by editing and adding labels corresponding to nodes. If nodes are not grouped after the multi-region check is enabled, each node is a group by default and other nodes are not checked.

During the development of this feature, we also found a Node Taint-related Kubernetes community bug and proposed a fix.

future

In the future, we will support more checking methods to enhance stability in various scenarios; In addition, some of the current open source decentralized cluster state detection management projects cannot fully meet the edge scenarios in some scenarios, such as cluster splitting. In the later stage, we will try to integrate and learn to meet our needs.

Open source project SuperEdge

This component is currently open sourced as part of the Edge container project SuperEdge (github.com/superedge/s… Star, below is wechat group, wechat enterprise wechat can join

Public Cloud ProductsTKE Edge

At present, the product is fully open, welcome to the edge container service console to experience ~

Edge series to the wonderful recommendation period

  • Edge computing and the origin of edge containers
  • Edge application management from 0 to 1 learn Edge Container Series -2
  • Learning edge Containers from 0 to 1 Series 3 applying edge Autonomy to Disaster Recovery
  • After the explosion. With the edge container, the workload of seven or eight team members in a week can be achieved in seconds
  • Tencent Cloud united with a number of ecological partners, the open source SuperEdge edge container project
  • Cloud video service is based on the practice of edge container technology
  • Read SuperEdge edge container architecture and principle