With the continuous development of the edge computing industry, its business is becoming more and more extensive and mature. The development of edge computing is not smooth sailing, and its application is also very different from the traditional cloud computing. So what are the challenges and the future of the edge computing industry? Let’s welcome Li Hao from Netheart Technology to share with us his contribution to the video industry in recent years.

By Li Hao/LiveVideoStack

Hello, everyone. It’s a great honor to be invited to this conference. This is a good opportunity for me to communicate with you and share with you the paths I have traveled in the past few years.

01 Simple product decisions

First of all, I would like to make a brief self-introduction. From 2011 to 2015, I was engaged in general cloud computing business, and I still have some distance from the video industry. The core of standard cloud computing is some virtualization, some complex network scheduling, more centralized mode, while video is a natural distributed architecture. In 2015, the trend of the whole cloud computing industry is that people gradually accept the cloud mode. All manufacturers say that computing will be available like water network and power grid in the future, and it is not cost-effective for customers to build power plants by themselves. But I was thinking at that time, the water network, the power grid is certainly not a few large room can be composed, it must need a network closer to the terminal, including computing, storage, network three big. Since I came into contact with the concept of edge computing at the end of 2014, I have believed that this field will have the opportunity to complete cloud computing network in the next decade.

1.1 Delivery in the virtual world

Among the three major components, computing, storage and network, the network is the easiest place to enter. In 2015, the most appropriate scene is CDN. Because it already has a market of scale, and at the same time the comparison found that the technology and model can do a lot of innovation. Analogue and the express delivery in the virtual world, the cost structure and core competition of express delivery in the real world are the same as those in the virtual world, but the pursuit of more, faster, better and less. There are various business forms of express delivery companies in the real world, including direct operation, scattered franchise, direct operation with franchise and so on. However, there is only one mode in CDN, so we think there are innovation points in both technology and mode here. It is interesting to note that logistics companies are also learning the organization method of virtual network. IPIC Association is studying how to map physical logistics network into virtual network and transform experience-based express delivery into algorithm and more scientific data operation. There are several differences with existing digital networks. One is that it assumes that the speed of travel is not the speed of light. The second is that the goods it transmits cannot be copied. Thirdly, its retransmission cost is very high and the probability of packet loss tolerance is very low. IPIC has done a lot of research and innovation based on this assumption.

On the left is the real-time large screen of the front end of the logistics company, which is almost identical to that of the CDN company. On the right is the networking mode and business mode of the logistics company, which is more abundant than that of the CDN company in 2015. Point surface, block surface, tree and network mapping were all more complex than the domestic CDN architecture at that time.

1.2 Different path choices

Looking back at the whole CDN market, I think CDN is divided into several choices on the path. One is the pursuit of balance. Make a calculation based on your own coverage density and node capacity, and finally achieve a balance between node coverage and ROI. This is a most simplified CDN schematic diagram, the actual is more complex. This is the internal component of a tree-like structure, a planned network. Planning is based on its own economic benefits and needs. The density is generally the same with the provincial network and the major first-tier cities with the network. CDNs are unlikely to cover urban areas, prefecture-level cities and further down communities in places like Xizang and Xinjiang, because they are economically unbalanced.

The second path choice is the core pursuit of capacity and quality. One innovative edge computing company, Fastly, is much smaller than Akamai. But what’s special about them is that they’re doing a model that’s similar to the Asia 1 warehouse in logistics. Each node is made extremely large, and the configuration of each node is also very high, which is equivalent to the model of Asia No. 1 Warehouse with full air transport. For some high-value traffic distribution, such as e-commerce pages, Times and other high-demand traffic, the value of differentiation. Their model is that a single node is very large, one node may be 1 to 2TB, global coverage, 100 POP points; The routing algorithm in the node is done by itself, and global access is done through Anycast IP. This model is not only very expensive, but also not very viable at home. Because so many POP points are not available in China, but apart from these problems, he has some innovations that are very worthy of reference. Fastly makes some abstractions for the traffic distribution scenario, dividing the request into several processing lifecycles-receiving the request, packet processing, logging phases, and so on. Users can write code directly on the Fastly platform, do some logical processing of these requests, and have some very lightweight container solutions. Shopify has a shopping cart logic that works by running code on the Fastly edge node, using the lightweight virtualization of WASM.

The core of the third path is to emphasize the number of nodes and coverage density. Overlay is achieved by deploying ultra-dense edge nodes at all levels of the network. The difference of network node is very big, it may belong to the access network, may belong to the MAN, may belong to the backbone network, the single node capacity is small, but the total capacity is large. So the pursuit of economic efficiency is not as strict as the first and second. And because its overall redundancy is so large, it can be made into a model that accepts more peaks at a lower cost. On the other hand, its complexity is very high. The fluctuation of nodes is very large, and the difference of computing capacity and storage capacity is very large. Therefore, many considerations should be taken into consideration in transmission, scheduling and deployment. The network center adopts this kind of mode. To estimate the entire network, computing resources are equivalent to over ten million online servers, hundreds of thousands of petabytes of storage, and more than 500 terabytes of bandwidth resources. The core task is how to use the resources in these edge networks. With the idea of the previous model, and some observations we made about the real world network, we launched the astral cloud product. Ideas are decided quickly, but implementation is very difficult.

Refer to the industry’s layering of edges – Home Edge, Network Edge, Cloud Edge. CDN nodes belong to the edge of the cloud. It can be seen that the more resources sink to the edge, the lower the delay will be, the more dispersed the resources will be, and the more difficult the use will be, but the larger the capacity will be.

We thought we had found a vast expanse of blue ocean. Not only was the market good, but we found a difference in the model that matched the distributed technology we had, and in the context of 15 years of mass innovation, so we were fully involved in the creation of the network center.

The truth is that while ideas are easy to derive, they are very hard to put into practice. The initial plan was to get the network layer marginalized within two years, then start doing storage, and then computing. It actually took six years to get the network layer right. Storage and computing have only been running in a few scenarios since last year.

02 Product Realization

2.1 Real Path

In 2014, we began to transform the Thunder, the download traffic for marginal transformation is the most easy. In 2015, I entered the live broadcasting industry, and the requirements of live broadcasting for delay time and delay time were very high. In 2016, I involved in long videos, and then made medium and short videos. The challenges are getting bigger and the external environment is changing. Great changes have taken place in the price of CDN industry. Within 3 years, the price has become 1/3 of the original price, and the challenges from cost, efficiency and other aspects have become greater. For example, the ROI balance can be calculated at 20% utilization at the beginning, but after a few years it needs to reach 60%, requiring high level of refinement in scheduling and transmission, and increasing quality requirements. The advantage is that the domestic bandwidth continues to speed up, increasing the overall edge resource pool.

2.2 Outline Architecture

This is a general architecture diagram. We embed the code into the user’s APP, so we can do double-ended optimization. This is the difference between edge transmission scheme and traditional CDN. Since you can do two-ended optimization, you can do more processing in the transport protocol, coding level. Another difference is that the hub places a lot of emphasis on deployment. The traditional CDN pattern is to trigger back the source according to the request to achieve the optimal deployment structure. Because the traditional CDN has a larger number of nodes and a smaller number of nodes, the request scheduling has a fixed logic. According to the user’s request, the traditional CDN can naturally precipitate the data into the network to form the optimal deployment. However, this pattern does not work in the architecture of large-scale edge nodes. First, the scheduling path is unpreplanned and more variable, and second, the nodes vary greatly. It can be understood as joining the outlets of express companies, some of which are small outlets of 10 square meters open at the door, while some are large warehouses of thousands of square meters. If you want to schedule items, you need to do a good deployment algorithm in advance to ensure the optimal packing efficiency. On the contrary, if they are thousands of square meters of large warehouse, the request to direct delivery can be.

2.3 Supply problem

This work needs to address several core issues. The first is supply. This is the most difficult and central problem to solve. This mode is essentially self – built and part of the franchise sharing. This is a platform economy, in which balancing demand and supply is always a seesaw game. Can be understood as Didi Taxi, is it to subsidize drivers or passengers in the early stage? Especially in B-side business, where passengers are major companies, continuing to subsidize passengers is hard to bring stickiness. And if the network does not reach the scale and quality required, passengers will not get on the bus. Therefore, to set up the network at the early stage requires not only a large amount of money, but also time Windows such as network infrastructure upgrade and new business outbreak. For the supply of network core, one is to balance the capacity and achieve sufficient economic benefits in the utilization rate at the same time. The second is to do incentives and distribute the income to the participants of the network effectively through the settlement strategy. We have a settlement network that evaluates which participants contribute the most computing power, bandwidth capacity, storage capacity to our network and have the greatest potential value in the future. Our network is an open network, any node can join, so anti-cheating is also a core point. NetHeart builds a pool of resources, including routers, boxes, moneymakers, player clouds, PCs, servers, and so on. There are also nodes and computer rooms built by Netcenter, and we ourselves are also participants in the platform. Based on the supplied resource pool, hierarchical services are constructed. At the bottom is a virtualized container platform, on which are PaaS services for storage, transport, networking, and computing. Among them, the network is the first line, because the network is the easiest to be used on a large scale in practical application scenarios. At the same time, a complete settlement system is needed to measure the value generated by these services, and then reward participants based on their actual contributions and pre-determined value judgments.

2.4 Transmission Problem

The second problem that needs to be solved is a technical one, we have done some condition analysis: 1. The node capacity of a single point is not large enough, so the many-to-one transmission mode must be used in order to effectively utilize all the resources of edge measurement. Two: do redundant error correction. With too many interactions, the traditional ARQ approach is inefficient and can cause some critical data blocking. Three: can implement complex coding. This is important because the edge node has more computing power than the server, because the amount of bandwidth the server is processing has less CPU power than the CPU capacity, and the gap between the unit bandwidth and the edge side is dozens of times. So the edge above can support more complex channel and source coding. Four: no rate constraint. Because the node network fluctuates greatly, the channel deletion probability cannot be inferred, so the overall coding algorithm should pursue statelessness. If I slice the data, it’s going to be before and after, and if I don’t have the first slice, it’s not going to help if I have all the second slice. This is typical of simple stateful coding, in which the cost of coordination and communication will increase a lot. Therefore, we need to pursue rate – free, stateless, variable length coding. Five: adaptive channel capacity. Because this is essentially a heterogeneous network, with very large and small nodes, the channel capacity requires a certain ability to detect and adjust itself. The coding algorithm is essentially A ⊕X ⊕X = A, which is equivalent to making A transpose matrix. The core point is that the reduction of computational complexity is achieved through the loss of the success rate. Some dedicated hardware circuits will be used in the communication to increase the success rate of unmarshalling by 4 9. If the CPU is used, the algorithm needs to be optimized, such as reducing the complexity of the algorithm by an order of magnitude with a success rate of three nines.

2.5 Cost problem

The third issue that needs to be addressed is cost. The issue of cost has become increasingly important in recent years. A few years ago, when the industry was overpriced, cost wasn’t an issue, you didn’t have to chase utilization, you didn’t have to chase extreme packing algorithms. However, in the current situation, if the resources are not fully utilized, the economic value of the network is not exerted to the maximum, it will lose its competitive advantage. The core of network center is heat estimation, active deployment, node cost division and packing optimization. The storage, bandwidth, computing and volatility of nodes are diversified combinations, and the demand scenarios of customers are also different, so the current optimal matching is required. Network disk customers need large storage and small bandwidth, while live broadcast customers need small storage and large bandwidth. How to effectively pack them is the core point to enhance economic value. Here we need to do a minimum of continuous adjustment, this network can not be scheduled once, need to move and adjust.

2.6 User Problems

The other thing that the normal transport doesn’t have is that the player is very tightly coupled to the client’s playback logic, and that can change all the time. We need to infer user behavior through very fragmented information, logs, and transmission feedback. What does the preload logic do? How does the error handling work? And so on need to be inferred through comprehensive information, of course, also need to do more communication with partners and customers. For example, when you open up an APP, there might be four or five previews of a little video, and we’ve come across a lot of these. Is the user a 0- request, or a 0-500KB? Do you load them all or in sequence? The difference is significant, especially considering how difficult it is to upgrade. The indicator requirement of the business side also changes from the clear QoS indicator to the comprehensive QoE indicator, which also brings a great challenge to the causal analysis.

These are some of the issues that Web Heart has been addressing over the years. At present, the scenarios covered in the network layer have been relatively comprehensive, but the edge computing has just started, and the computing, storage, and the scene moving down depending on the basic service provider has just started.

03 Continuous Upgrade

We believe that audio and video must be the most core way of information transmission and expression, and the content will be video-oriented. And it’s not just people who consume video. As video comprehension improves, there will be audio and video consumption between objects in the future. Building audio and video transmission networks between objects and large, fully connected networks is what we are already doing. Secondly, the sinking of computing power. Considering from the cost level, the bandwidth and computing cost of computing in a central computer room are relatively high. For example, in the field of 3D design, the current calculation force of the edge can be completely satisfied, if the sinking cost can be greatly reduced. In addition, there are some VR and AR scenes that are sensitive to time delay and have huge demand for bandwidth. It is not economical or realistic to put computing on the cloud, and it will bring huge bandwidth consumption. You can take the arithmetic apart and render the background traffic through the edges. There are also some device-based virtual solutions, but the cost of the generic container management solution is still too high, and a lightweight solution like WASM for OnDevice is definitely needed.

04 Future Prospects

The future is a fully digital society. The full Internet or Digital Twin means that the real and virtual networks will become increasingly inseparable. Under this premise, marginalized computing is just a concept. The more core logic is that all devices in the future will be perceptible and computable. Video content, audio and video will be the most core way of information transmission, information density is high, including the future transmission of objects, need to solve the understanding of video content. At present, the industry of edge computing is still in its infancy, which needs patience and has a long way to go. In the future, central computing and edge computing will form a three-dimensional network, truly forming water networks and power grids, and computing resources will be everywhere at your fingertip.

Thank you!