1. Definition of SuperEdge

To quote from the SuperEdge open source website:

SuperEdge is an open source container management system for edge computing to manage compute resources and container applications in multiple edge regions. These resources and applications, in the current approach, are managed as one single Kubernetes cluster.

Translated, it means:

SuperEdge is an open source edge container solution that manages resources and container applications across regions in a single Kubernetes cluster.

Let’s explain this sentence simply by using key words:

  • Open source

Although this solution is an edge container solution open source by Tencent Cloud container team, it is a completely third-party neutral open source project. On the official announcement day, Intel, VMware, Meituan, Cambrian, Capital Online, Huya and Tencent jointly announced the open source, but not Tencent alone. Students and enterprises with ideas in the edge are also welcome to participate, build the edge, and jointly promote the implementation and development of edge computing in the actual scene.

  • Edge container solution

This sentence does not need too much explanation, mainly do container choreography and scheduling management. But Kubernetes is the most popular way to do container scheduling. Why not use Kubernetes to do edge container scheduling and management? More on that later.

  • Single Kubernetes cluster

Still use Kubernetes to do container scheduling, after deeper understanding, found that SuperEdge team did not cut any line of Kubernetes code, that is, completely Kubernetes native, with all the features of the corresponding version of Kubernetes.

  • Manage resource and container applications across regions in a single Kubernetes cluster

Key words fall single and across regions. Why manage resource and container applications across regions in a single Kubernetes cluster? Scene, scene, scene.

As a simple example, there is a supermarket chain with 10 outlets, each of which has a special AD promotion program for the day. The network between each business network and between the business network and the central control surface is completely disconnected and completely independent, forming regional isolation. The AD program for each site is exactly the same, and the content is exactly the same, and the goal is to be able to manage 10 outlets in a single Kubernetes cluster as easily as one.

Ii. Possible problems

After reading the features of SuperEdge website, I didn’t fully understand it either. What are Network Tunneling? Why built-in edge orchestration capability? What problem does the introduced feature solve, I wonder?

Take a supermarket chain with 10 outlets, the network is disconnected between the center and outlets, and between outlets. However, we need to deploy and maintain the same application to design a solution, see the problems that need to be solved, and maybe we can have a better grasp of the features and design intention of SuperEdge.

Let’s take a look at this example of the problem we would face in a real world scenario:

1.10 business outlets same procedure

Although there are only 10 outlets in the example, there could be hundreds of them in real life. We also have to think about the scalability of the future, hundreds of thousands of nodes to synchronize the same program all do one thing, it will be difficult. It would be much easier to manage hundreds of branches as easily as one.

2. The network between the center and nodes or between nodes is unavailable

Such a scenario exists in real life, after all, the dedicated line is time-consuming and costly. The real scenario is complex. Each node may be a small computer room or a small box without a public IP address. Some sites have hundreds of boxes, only one box can access the external network, and some even all the boxes can not access the external network, the deployment time is quite difficult.

3. Weak network, geographically far apart

The scene on the edge is not the same as that in the center. Many boxes on the edge, such as cameras, may have access to the public network or WIFI connection in each area, and the network may be disconnected from time to time, for a few minutes, or a few hours, or even a few days, which is a normal phenomenon. What’s more, power off and restart… The challenge in this scenario is to ensure that the edge business can provide services normally and maximize the availability of services.

4. How do I know the health of edge nodes?

If the edge node is not healthy, the service from the abnormal node should be scheduled to the available node to ensure the health of the service as much as possible. How to know the health of nodes when the central and edge networks are disconnected and the network of each node is disconnected?

5. Limited resources

Embedded edge nodes often have limited resources, and 1G1C is very common. How to ensure that edge services are running under the condition of limited edge resources?

6. Mixed resources

Kubernetes wants to manage both central cloud applications and edge applications. How?

.

Do not expand to say, there are still more details to consider in this, can not be met in the actual scene to solve, must be in the design of the program, as far as possible to solve the problems will be faced. This allows us to invest our limited time in the specific business, rather than fighting the underlying architecture.

Think of your own solutions

If we encounter the above problem, how to solve it?

1.10 business outlets same procedure

What’s the best way to get a program running around in a constant environment, a single program? The container, a well-tested image, runs fine as long as the underlying system is consistent.

2. What is used for container scheduling?

Kubernetes, of course, but the question is can the open source community use Kubernetes directly? The answer is: No! Why is that? Because the Kubernetes website network model clearly states:

Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):
- pods on a node can communicate with all pods on all nodes without NAT
- agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
Copy the code

In other words, Kubernetes requires a node to communicate with the master node where the Kube-Apiserver resides. In other words, components of the node can access components of the master node, and components of the master node can access components of the node. Our second problem is that there is no network connection between the center and the network, so deploying native Kubernetes is obviously not feasible, so how do we solve the management of edge services?

3. Weak network

In the case of weak network, even if we establish a connection between the center and the edge, the network will be disconnected from time to time. How to ensure the normal service of the edge container when the network is disconnected? There is also the problem of restarting the edge node so that the edge container can serve normally.

4. How to deploy edge nodes because the area is far apart?

Geographical distance has to consider the deployment of the problem, we can not be deployed every outlet to go to the user. What is the problem, but also the user to open a back door, remote connection to the user’s node to solve.

.

Take the real scene to think down, faced with a lot of problems, I hope these problems can cause everyone to think, there is no uniform standard edge solution, can only be smooth to solve customer problems, is the perfect solution.

Let’s take a look at SuperEdge’s solution.

Four, SuperEdge function

Before introducing the features, let’s post the SuperEdge architecture diagram and explore the SuperEdge implementation while looking at the architecture diagram:

1.Net Work Tunneling

After looking at the introduction of this feature and SuperEdge’s architecture diagram, it turns out that Network Tunneling is actually a tunnel. Deploy a tunnel-edge on the edge node and a tunnel-cloud in the center. The tunnel-edge initiates a request to the tunnel-cloud to establish a long connection. In this way, the connection between the edge node and the cloud can be established even if the edge node does not have a public IP address, so that the central and edge nodes can communicate with each other. The essence of the tunnel technology is tunnel technology.

The government has an Edge autonomy system.

This is mainly to solve the weak network and restart. Even with Network Tunneling, we won’t change the fact that the edge nodes and the cloud are unstable, and there will still be outages. The edge autonomy function can meet two edge scenarios. First, the network between the center and edge is disconnected, and the services of edge nodes are not affected. Second, the edge node restarts. After the restart, the services on the edge node can still be restored.

How does this function work?

Just look at the introduction to Lite-Apiserver. This component caches all the management data in the request center of the edge node and uses the data to maintain the edge service. Even if the server is restarted, the service can be provided normally. But the question is, the application of edge autonomy also generates data. What happens to that data? Will it create dirty data that will affect edge services? I’ll leave you to think about that.

3.Distributed node health monitoring

But I am more confused, why the center and edge nodes are disconnected, the edge node service is not expelled? According to Kubernetes’ logic, services on NotReady should be deported to other Ready nodes. Does SuperEdge turn off edge nodes for expulsion. I can confirm that SuperEdge has not turned off eject, and the open source documentation on the official website emphasizes that Pod eject only occurs when an edge node exception is confirmed.

So how does SuperEdge identify an edge node exception? I found the answer later in the component introduction to Edge-Health.

Edge-health runs on each edge node and detects the health of edge nodes in an area. The principle is like this: in a certain area, edge nodes can access each other, and then the edge-health running on each edge node periodically accesses each other to verify each other’s health, including its own. According to the detection results, the health of each edge node is calculated. If XX% of the node considers that the node is abnormal (XX% is configured by healthCheckScoreline parameter, 100% by default), the result is reported to the central component edge-Health Admission.

Edge-health Admission is deployed in the center to determine whether to expel edge services to other edge nodes based on edge node status and edge-Health voting results. By using EdgeHealth, set up HealthCheckScoreline so that services will not be evished-out as long as the edge node is not actually down. One is to improve the availability of edge services, and the other is to extend the use of Kubernetes expulsion in edge.

However, I still have a question: if an edge node does go down, and the service is expelled to other edge nodes, but the health of this edge node is restored, what happens to the services on the edge node?

4.Built-in edge orchestration capability

This function, personally, is the most practical. This is the ServerGroup capability provided jointly by the application-Grid Controller and application-Grid Wrapper components of the architecture diagram.

What’s ServiceGroup for? Take the example of a supermarket chain with 10 outlets, which provides roughly two functions:

  • Multiple outlets can simultaneously deploy a special AD promotion solution;

What’s so unusual about this? Note that at the same time, that is, you have one Deployment, and once delivered to the central cluster, you can deploy one deployment solution for each of the 10 outlets. And you can do it, and it’s as easy to manage a 10-point AD bargain program as it is to manage a one-point AD bargain program. And new outlets will automatically deploy special AD promotion solutions, regardless of version, naming issues.

  • The services deployed at multiple sites are different, namely, grayscale capabilities

This extension to ServiceGroup allows you to define templates for different regions or sites using the DeploymentGrid’s templatePool. Using the function of Kubernestes Patch, Patch produces different running versions based on the basic version. This allows for simple management using the DeploymentGrid’s templatePool to gray out a region before a service goes live, or define a set of services that run differently from site to site.

  • The same solution is not cross-node access;

What do you mean? Even if the networks of various outlets are interconnected and each outlet has deployed special AD promotion programs, the programs of Outlet A will not visit the special AD promotion services of outlet B, but only access the services of this outlet, realizing the closed flow loop.

In my opinion, there are two benefits:

  • The closed-loop network access flow is realized. The service of the network can only access the network.
  • Edge autonomy is promoted. If a node is not connected to the center, the node can be autonomous with its own services and will not access other nodes.

This feature is really useful in edge scenarios where application-Grid Controller and Application-Grid Wrapper can be independently deployed in Kubernetes clusters to solve the problem of managing multiple solutions in multiple nodes. You can also use the ServerGroup documentation for your own experience.

Today said here, basically the basic function of the SuperEdge analysis of the almost, there is an opportunity to in-depth analysis of the SuperEdge internal principle!

5. Collaboration and open source

TKE Edge Edge container management service Edge computing power core components have been open source to the SuperEdge project, welcome to build Edge computing, participate in the construction of SuperEdge open source project, let you develop Edge power benefit more people. The following is the SuperEdge open source project wechat group, the environment to participate in the exchange of discussion.

SuperEdge version:

  • SuperEdge – V0.4.0
  • SuperEdge – V0.3.0
  • SuperEdge – V0.2.0
  • Dorcas co-operates with open source SuperEdge