With the emergence of more and more multimedia services, each business has different characteristics of difference. The biggest challenge for video cloud vendors is how to build multimedia distribution networks that deliver the best online experience for multiple services at the lowest cost. Yang Changpeng, an expert of Huawei cloud algorithm, was invited to introduce the exploration and practice of cloud-edge collaboration in the video scene.

By Yang Changpeng

Organizing/LiveVideoStack

Hello, everyone. Today, I will share with you how to do global scheduling from the perspective of cloud vendors. The topic of this sharing is Global Scheduling: Exploring and Practices of Cloud Edge ColABORation in Video Scenes.

First of all, I would like to introduce myself. I am a beginner in the video field, and my previous research direction is scheduling optimization related to cloud resources, including VM scheduling and container scheduling. Recently, I have been doing video scheduling, including bandwidth resource planning and scheduling related algorithm research.

This sharing mainly includes the above four aspects, among which the core modules are the last three. Next, I will introduce them to you one by one.

Introduction to global scheduling

The first part of the sharing is the introduction of global scheduling, which may be relatively unfamiliar to everyone. In fact, this is the concept of distributed cloud released on Huawei Techwave last week. Distributed cloud is actually a network, which handles all the requests of different types of customers. For customers, we have a unified architecture and management interface, but for us, it requires a lot of scheduling technology, and global scheduling is a very core module.

When it comes to dispatching, one of my business friends once said that the essence of dispatching is to build roads, but I don’t quite agree with that. When it comes to scheduling, there must be two objects, one is the resource, and the other is the business. If these two things are separated from each other, it is not appropriate. If the essence of scheduling is to build roads, it should depend on you having a budget. We can first look at scheduling from the perspective of cloud vendors. Businesses are diversified, typically including VOD, live broadcasting and RTC. The latency requirements, quality, and scale of each service are not consistent, as is the infrastructure used by each service. For example, in the form of on-demand, the basic approach is to store the above Cache in exchange for bandwidth. Both live broadcast and RTC are long connections, and it is necessary to pay attention to end-to-end delay when scheduling, so they are very different.

Let’s look at resources. The resource layer is also divided into two parts, one is computing resources, the other is bandwidth resources. On the video cloud, it involves the entire link resources, including endside, edge side, central node and public cloud. These resources are obviously from low to high in terms of computing power, and there will be some bandwidth resources, which are inconsistent in terms of latency and cost. So we have a lot of business types, a lot of resources.

A very simple idea is how we can achieve synergistic matching of multiple services and multiple resources, which is actually a case in reality. For example, in our express network, a company may have a fast ticket, or a slow ticket. In fact, they are also the coordination of resources. For example, on some routes, the fast and slow parts can be put on one vehicle, while on some routes, the fast and slow parts can be put on different exclusive vehicles. These are all considered from the perspective of resource utilization. In fact, we will face greater adjustment in the cloud, because our own decision granularity and time will have higher requirements, which may be millisecond level, and at the same time, there are many kinds of services in the cloud, making our whole scheduling problem very complicated.

This complexity can be seen from two perspectives. One is from the perspective of our cloud users. In fact, there are two kinds of demands for “change” and “constant”. The first “change” is that the demand for bandwidth varies greatly. For example, for a scene like live broadcast, the demand for bandwidth will suddenly increase at night. But our users also expect resources to be available in a timely manner in the event of drastic changes in bandwidth. During the Spring Festival, there will be a demand for hundreds of tons of bandwidth resources, so how to deal with this sudden resource request?

In addition, the types of businesses are different. Different businesses have corresponding SLA requirements. Even for the same product, such as live broadcast and Linemap, new businesses are also emerging gradually. They have different latency requirements in different scenarios, so how do we deal with the variety of business types?

At the same time, the geographical distribution of our business itself is also changing. For example, the distribution of anchors and users is very inconsistent. Generally speaking, I may be in South China and East China, and my users will be distributed more. Then how to deal with the different requests of these customers?

Customers have the same demand for such extreme experience, low cost and high reliability. They always expect cloud manufacturers to provide services with extremely low cost and extreme experience.

From the perspective of cloud vendors, there are also two requirements. The first is high quality. We should first promise to provide customers with the best quality service, and meanwhile, on the premise of high quality service, we should try to improve the reuse efficiency of resources. Only in this way, after the improvement of resource efficiency, can we reduce the price with our customers and achieve the competitiveness in our market at a lower cost.

The above figure is extracted from the concept of distributed cloud. There are many kinds of resources involved in distributed cloud, including Huawei cloud core area, central Region resources, intelligent edge IEC resources of hot area, IES resources arranged in customer room, etc. The essential problem of scheduling becomes how to achieve the optimal balance between the supply side and consumption side as far as possible. This needs to do a good user portrait, at the same time to the customer’s entire computing resources, bandwidth resources for the planning and prediction in advance.

This is actually a very complicated problem, and I’ll give you an example from the power industry just to make it easier. How does the power industry do it? Now the electric power industry is very intelligent, we all use electricity, generation cycle iteration is very fast, in fact, it has done a series of optimization in each link can make the whole system is very efficient and robust. The challenge for the cloud is the same.

In order to do this design, we made a global scheduling system, the current global scheduling has three modules. One is the resource scheduling module, which is mainly responsible for the scheduling of computing resources, including CPU memory related scheduling. Our open source version is called Arktos. In the open source version, we provide a series of basic scheduling algorithms, such as delay based, geographical distribution, etc. Of course, the version used internally will be more efficient and can support multi-objective optimization solution and cross-end edge cloud collaborative resource scheduling.

We also made a traffic scheduling module, which mainly includes bandwidth resource planning and bandwidth resource scheduling. Bandwidth resource scheduling supports scheduling of geographic location, delay, QoS and cost.

At the same time, we designed the reasoning and training framework Sedna of cloud edge collaboration for AI tasks in video edge scenes, which will be introduced in detail later.

02 Traffic Scheduling

Next, we enter the part of traffic scheduling, which is highly related to the characteristics of the business itself and is a relatively difficult module.

In terms of traffic scheduling, our ultimate goal is to design a scheduling engine that meets the requirements of multi-service, multi-objective SLA and heterogeneous data. Needless to say, there is a variety of businesses on the cloud. Multi-objective refers to the user experience, but the cloud vendor looks at the cost, which is actually a level 1 indicator, which is mathematically difficult to write down, so there will be a series of level 2 indicators. For example, user experience can be divided into first frame length, times of lag, etc., and cost can be divided into return rate, bandwidth trend, etc. These secondary indicators can be optimized mathematically.

In addition, the whole system has heterogeneous characteristics, such as user space characteristics, which are distributed differently in different places, and traffic characteristics, which have peaks and troughs in different time periods. Meanwhile, the distribution of each site, anchor, user, computing resources, bandwidth resources, and network characteristics are also different. So in such a complex system, how to do scheduling, this is actually a very difficult problem.

Huawei cloud traffic scheduling currently contains several core modules, the first is the bandwidth planning module, the second is the bandwidth scheduling module. These two modules are relatively abstract to understand, and I will use a relatively simple example to explain. In a room, there are two old people and three children, bought a cake, how to divide it? Well, usually old people don’t have a sweet tooth, kids like it, so I’m going to cut it in quarters, two old people have a cake, and three quarters of a cake for the kids, so that’s bandwidth planning. How much cake each old person and child gets, that’s bandwidth scheduling.

First of all, according to the characteristics of different services, we should plan for large granularity, such as live broadcast, where the bandwidth will be large at night, while the RTC will be large in the daytime. Because RTC will support a lot of online education, there will be a certain opportunity to reduce the peak and fill the gap in bandwidth reuse. It’s like dividing the cake into chunks, dividing the old from the young, but making a real-time decision about how much each child gets.

In this part, we actually did a mathematical abstraction. We carefully studied the characteristics of the problems under different services, whether live broadcast, RTC or VOD. From the abstract point of view, the mathematical problems are similar. Maybe the business students feel that the problem is different, but the mathematical description is the same, but the constraints are different. For example, what’s the difference between RTC and live streaming? RTC’s approximation is 200 milliseconds, but live streaming is allowed to be 400 milliseconds. The math is essentially the same. Therefore, we have seen that the peers have done three networks without coordination, which is actually a huge waste of resources. What Huawei Cloud is doing is to achieve coordination on the whole resource, no matter on demand, live broadcast and RTC, we have achieved coordination on the resource link on the same site and the same bandwidth. Only in this way can we guarantee the ultimate experience and lower cost for our customers.

Bandwidth scheduling is a real-time link, and the real-time problem is actually very BUG, because it requires quick decisions, but the result of quick decisions is violent rules. The disadvantage of rules is that they are not optimal for the efficiency of the whole system. So even in the real time scheduling module, we designed it as a two-tier architecture.

System architecture design will have Global module and Local module. The Global module can collect as much information as possible, and the solution speed can be slower. However, since it is a Global view, the solution obtained in the end is better from a Global perspective. Then, the most robust global information is put into the Local module for detailed scheduling. For example, the link of Global may choose five sites, such as five CDNs in Shanghai, which are the best in terms of cost and experience. Real-time weight adjustment is made in the Local link, which can make up for the inaccuracy of global link prediction.

The following sections introduce the flow scheduling, the first is the planning of bandwidth.

Bandwidth planning is to solve the problem of how to divide the bandwidth for each service. As we all know, there will be some problems in the case of VOD live broadcast RTC with a large amount of VOD, and the peak time period will suddenly rise. If it is a common site, RTC will be pushed down, because RTC is the best service and needs to be guaranteed first. For example, on the highway, such as RTC with high priority, it is necessary to set up a dedicated channel, whenever it is guaranteed to be the highest priority. If this problem is not solved, the experience will be degraded. Therefore, we have done bandwidth planning to guarantee the RTC live broadcasting experience. But it’s hard, and a recent paper from Microsoft NSDI shows that their piece of technology is already in place, and that the scale of the solution is quite large and mathematically difficult to solve, especially when 95 charges make the whole problem nonlinear. We tried it in a single area. After solving the whole block of Difficulty 1 (figure above), our experience and cost were greatly improved.

The difficulty of this problem is also reflected in the fact that the bandwidth demand itself is dynamically changing, so demand planning in advance requires advance prediction, and the prediction may be inaccurate in terms of time, such as when to reach the peak and how much to reach the peak. At present, the accuracy of the prediction is only about 90%.

The second is the access link.

The access link is actually a variant of the assignment problem. The simplest way in engineering is to do nearby access. The disadvantage is that the delay in one area may be guaranteed, but the delay in another area cannot be guaranteed, which is a combinatorial optimization problem. Together with the Department of Mathematics of the University of Hong Kong, Huawei has done a systematic study on this part, and achieved good results. Finally, the solution can be quickly solved in the whole network environment. The difficulty of this problem is affected by several points, such as Node 1 in Areas wanting to access Nos. 1, 2, and 3…… in Nodes Node N (figure above), the historical data may show access to Node 1, Node 2, but no access to Node N. At this time, the approach should be to first make accurate QoS prediction under the sparse data. In general, Huawei has a better idea on this issue and has basically solved it, but the scale is still relatively small. We think this is the root technology of video, or the core problem.

The third link is the back source problem, which is essentially a path optimization problem. How to do it systematically requires us to break down the problem, not only to find a path, but also to a forwarding node. Here to popularise a little knowledge, the back source means that after a user requests video access, he finds that there is no stream here, so he has to pull the stream back from other places. This problem also exists in the logistics and aviation industries, but they all have their own characteristics.

The above three problems are the basic version problems, but video scheduling is especially for the scheduling that is highly relevant to the service. Bandwidth planning is multi-service, but real-time traffic scheduling is for each service, which is actually very similar to the chimney structure (resource sharing at the bottom and perception of each service feature at the top).

The cost of the whole video comes from two parts: access Costs and BTS Costs. The core difficulty of loop cost reduction is to implement the scheduling of streams with different characteristics under the condition of QoS guarantee. Huawei and Tsinghua University have come up with the AGGCAST architecture to solve this problem. The image on the right shows the effect after going online. The bandwidth loopback cost is reduced by 30%, and the experience is better.

03 Resource Scheduling

Resource scheduling mainly refers to the scheduling of computing resources. In the scene of video cloud, there will be many similar examples. For example, offline transcoding business hopes to carry out large-scale offline transcoding with cheaper resources, which can be carried out by cross-AZ scheduling and competitive sharing machines. During video AI model training and reasoning, there will be cross-cloud edge resource collaboration, which requires scheduling to be able to schedule across cloud edges.

We made a global resource scheduler. The current open source version is Arktos, open source link is https://github.com/CentaurusI… .

There are several core modules. The bottom layer is the Flow Monitor, which can Monitor the status of the entire resource pool on each DC in real time. If the resource utilization rate is high, the VM can be dynamically expanded and migrated. The resource collector collects the status of each resource pool, including cost and geographic location, and all topology information is fed back to the Global Scheduler.

The Business Disassembly and Deployment module is about making quick decisions about where to place when there are multiple requests.

The Global Scheduler consists of two modules (not available in open source). First, resource clustering decision. Computing resources in the distributed cloud are clustered according to their geographical location. For example, customers want VM to serve users in Shanghai, so we need to make decisions according to their geographical location. Second, the distributed cloud selection decision, according to the clustering results, for each resource group to score, and finally according to the scoring situation to quickly decide the location and number of VMs.

After making a good placement, real-time management and scheduling should be carried out, which is divided into three processes.

First, the resource and performance detector, which can monitor the resource occupancy of the entire resource in real time; Second, the single-region resource manager can make dynamic resource scaling decisions on AZ or edge nodes when it finds that the business data is not satisfied according to QoS monitoring. Third, the multi-regional resource elastic telescopic device can be deployed across the cloud edge when local resources cannot be satisfied. In the off-line transcoding service, the delay is reduced by 17% and the cost is reduced by 33%.

Cloud-edge collaborative training/reasoning

There are more and more AI tasks on the video. After the video is encoded on the end side, it will be transferred to the edge side immediately. In the edge side, many characteristic tasks will be done, including a series of AI tasks such as video review, intelligent cover and intelligent vision. However, the edge computing resources are very limited. With the increasing number of AI tasks, the problem of how to complete more AI tasks under the condition of limited computing resources will be faced. There is a very simple idea: the computing resources on the edge side are limited, so can we do reasoning on the edge side and do training on the cloud side? This can significantly reduce resource costs, meet customer requirements for bandwidth delays, and protect data privacy. The techniques involved are collaborative reasoning, incremental learning, and federated learning.

Huawei’s cloud-edge collaborative reasoning framework, Sedna, aims to solve the pain points of cloud-edge collaborative reasoning and training development and deployment. Sedna overall framework in SIG KubeEdge AI community open source, a link to https://github.com/kubeedge/s… .

Sedna has several core parts: First, the Lib library, for AI developers and application developers, exposes the side cloud to collaborate AI functionality for applications; Second, Workers perform training or reasoning tasks on the cloud side and the edge side. Third, GlobalMGMT is responsible for management and collaboration across cloud edges, while KubeEdge is responsible for message collaboration. Fourth, the LocalController, which is responsible for local general management, including model and data collection.

Sedna is based on Kubeedge. KubeEdge is an official open source project of Cloud Native Computing Foundation (CNCF) (which has been promoted to the incubation stage) and is being used by more than 20 companies. It is a very active open source Computing edge platform.

KubeEdge is based on K8S, which means it is highly compatible with the cloud native. At the same time, it has very strong expansibility. It adopts declarative API, CRD(Custom Resource Definition) and Custom Controller. Compared with the native K8S has an incremental advantage. We have implemented edge cloud collaboration, extending the capabilities of cloud to the edge, including AI collaboration, data collaboration, application collaboration and management collaboration. KubeEdge is easy to maintain, lightweight, plug-in edge framework, offline autonomy, self-dynamic disaster recovery, support heterogeneous hardware, decoupling from the hardware.

The Sedna framework is positioned as a “back-end capability” that is “integrated” and can be integrated into different products.

Client-facing falls into two categories:

1. AI developers, if they want to use edge cloud collaboration services and functions, can use the Sedna framework;

2. Application developers can directly use edge cloud to collaborate AI capabilities.

Sedna localization does not include:

1. Replace existing AI frameworks such as TensorFlow, PyTorch, Mindspore, etc. We are compatible because Sedna is a framework;

2. Replace the existing edge platform, such as KubeEdge;

3. Research algorithms in specific fields, such as face recognition, text recognition, etc.

Sedna currently consists of three modules.

First, side cloud collaborative reasoning, how to improve the overall reasoning performance when side and side resources are limited. Developing an AI model may have multiple versions, low precision or high precision. Sedna can be used to place the low precision version on the edge and the high precision version on the center cloud. When there is an image recognition task, the edge-side lightweight model is used for reasoning. If the confidence interval is low, the task is pushed to the central cloud model, so as to achieve better overall reasoning performance.

Second, edge cloud synergy incremental learning. Incremental learning is similar to transfer learning. Incremental learning aims at models with small samples and non-homogeneous distribution, making the whole model smarter with more use. For example, there is an AI task that makes inferences through the edge side model and finds that the effect is poor. After the task is pushed to the center side, the recognition task is completed by other methods. After the task is completed, the AI task is continued to be trained on the center side and then pushed to the edge side. Sedna is well supported in this part and is very easy to use.

Third, edge cloud cooperates with federated learning. For example, when doing banking business, it wants data to remain within the edge, but AI training is needed at the same time, so federated learning framework can be adopted. Each edge side is a model, and the gradient is synchronized to the center side after training, and then pushed to the edge side after training.

05 summary

This sharing mainly introduces three plates.

1. Traffic scheduling: through the traffic scheduling system, it can support the scheduling of multi-service, multi-objective and heterogeneous data characteristics;

2. Resource scheduling: Resource scheduling can support cross-AZ resource scheduling and cloud edge collaborative scheduling;

3. KubeEdge Sedna Framework: The framework is oriented to AI and business R&D personnel, providing edge cloud collaborative reasoning, edge cloud collaborative incremental learning and edge cloud collaborative federated learning capabilities, laying a foundation for solving edge cloud collaborative AI technical challenges.

The above is what I share, thank you!