With the development of 5G and AI, content expression video has become the mainstream of today, and many industries have a very strong demand for video distribution. We are honored to have Huang Ting, Senior Video Architect of Huawei Cloud, present to you the challenges of Internet-based real-time audio and video services, and share with you the practice of Huawei Cloud native media network to fully guarantee real-time audio and video service experience.

The text/Huang Ting

Organizing/LiveVideoStack

Hello, everyone. I’m Huang Ting from Huawei Cloud. I’m currently in charge of the relevant work of Huawei Cloud Video Architecture Design. Today I will share with you the practice of how Huawei Cloud native media network guarantees real-time audio and video service experience.

I will share it from the above parts. First, I will explain why we need a media network. Secondly, we will introduce the overall architecture design of Huawei Cloud native media network, and finally, we will share our practice on how to improve the real-time audio and video experience.

Why do you need a media network

1.1 The content expression is video-oriented, and every industry has the demand of video distribution

Why do we need a media network? I have summarized three main reasons. The first reason is that we see a clear trend towards video content presentation. There is a strong demand for video distribution in many industries. Let me give you a small example from my personal experience. During the Spring Festival this year, my family wanted to take off the ring they had been wearing for many years. Because they had been wearing it for a long time, their fingers got thicker and they couldn’t take it off. At first, our first reaction was to go to the mall and ask the salesperson to take it off. Later, with a mind of trying, I searched the word “take the ring” on Douyin. A very simple method was found in the search results. The video didn’t last long, and the ring was quickly removed by following the video, and there was no damage to the ring and no pain to the finger. If you are interested, you can search for it. In fact, this is a manifestation of the video-oriented expression of knowledge content, which has emerged in many fields. In addition to short videos, the video-oriented development trend of content expression has also emerged in the current e-commerce live broadcast, online education, cloud games and other industries.

1.2 With the emergence of new media expression forms, audio and video technologies are increasingly required

The second reason is that we see many new forms of media expression in the future. For example, VR and the recently popular free perspective, the emergence of these new forms of expression, will bring users a more immersive experience. However, it requires all aspects of audio and video technology, including bandwidth, delay, rendering complexity and so on. As you can see in the picture on the left, taking VR as an example, if you wear a VR headset to watch a video, the bit rate required to achieve the ultimate retinal experience is very high, and it needs to reach about 2Gbps bit rate through simple calculation. There are many more factors affecting the VR experience than flat video: refresh rate, FOV, resolution, low MTP latency, attitude tracking, eye movement tracking, etc.

1.3 The Internet does not promise service quality to users

We generally analyze a product from two dimensions: demand side and supply side. The first two are kind of demand-side analysis, and then let’s look at the supply side analysis. A very important supply side of real-time audio and video services is the infrastructure of the Internet. We all know that the Internet has basically no promise of quality of service to users. How do you understand that? First of all, the cost of building the Internet is very expensive. For example, it is necessary to pull optical cables under the sea. The laying cost is very expensive, including manpower and material resources. So the construction of the Internet must be to consider sharing, sharing needs to use reuse and exchange technology. How do you understand exchange? Take a look at this simple diagram below. Suppose we want to build four network nodes A, B, C, D; If there is no swapping, six wires are needed to connect two to one. But if swapping is used, only four lines are needed. So in terms of cost, the technology that needs to be exchanged; As we know, there are two kinds of switching techniques, one is Circuit switching and the other is Packet switching. Circuit switching is characterized by capacity reservation, but resource is wasted because once reserved, bandwidth resources will be occupied even if there is no data transmission. But Packet switching technology is link resource sharing, so it can achieve a lower cost of switching. At that time, considering the cost of Internet design, Packet switching is chosen as the evolution technology. Packet switching, coupled with the best effort forwarding mode, brings a series of Packet loss, repeated message, delay, out of order and so on. Therefore, we conclude that packet loss, repetition, delay and disorder are the inherent properties of this generation of Internet.

So one of the things you can think about here is why the Internet, when it was originally designed, didn’t solve this problem at the network level. Or the bigger question, if we were to redesign the Internet today, what would we do? Will we try to let the Internet solve these problems? The second question is how to solve the problems of packet loss, repetition, delay and out-of-order in our daily application development process.

1.4 Inspiration for us

The above analysis brings us some inspirations. First of all, we believe that we need to build a media network to bridge the gap between the supply side and the demand side. The supply side is the infrastructure of the Internet, and the demand side is the rapidly developing audio and video business. Second point: through this network to meet the strong demand for audio and video distribution in different industries. And thirdly, using this network to meet the challenges of the new technologies that will emerge in the future.

02 Introduction to Huawei Cloud Native Media Network Architecture

This explains why we need a media network. Next, I will introduce the native media network architecture of Huawei Cloud.

2.1 Huawei Cloud Native Media Network

Everybody can think that huawei cloud native media network cloud native video service is a technology base, based on the zhang yun, a native of media network will build up a series of broadcasting from production to processing to distribute to cloud native video services, such as a CDN, live, RTC, and so on, through these cloud native video services to support the above one thousand lines of all sectors of the customer. Our cloud native media network mainly includes seven features: flattening, Mesh, intelligence, low latency, flexibility, diversity and end-to-side cloud collaboration.

2.2 Wide coverage: support a variety of access modes to achieve global interconnection

Next, I will introduce the three important architectural design goals of Huawei Cloud native media network. Because we serve people all over the world, the first thing we need is a globally deployed network. This network mainly solves three problems: the first is the need to support a variety of access methods, the second is the node interconnection; The third is to consider a highly available design redundancy coverage.

First of all, as we are a PaaS service, we have a lot of customers from different industries. Take cloud conference as an example, many customers have very high requirements on the security and quality of cloud conference, so he hopes to access this network through a special line from his enterprise park. However, some customers hope that their users can access this network anytime and anywhere to distribute business, such as some Internet customers, at this time, they need to support the Internet access mode. In addition, because the traffic of a large number of our services ends at the edge, we mainly access through the single line of China Telecom, China Unicom and China Mobile to save the cost of service bandwidth. In China, the problem of cross-operator network resource exchange is solved through third-line computer rooms or BGP resources. Overseas, we will give priority to IXP nodes with rich network resources. Transnational interconnection can be realized through Huawei cloud infrastructure network facilities or high-quality Internet resources. In addition, we need to consider high availability design when we plan our deployments. A common method of high availability design is to increase redundancy. We take site redundancy and bandwidth redundancy into consideration when we plan. We will ensure that users in the covered area have at least 3 sites that can provide services corresponding to the quality requirements. In addition, when we make resource planning, we will plan according to more than twice of the bandwidth required by the business to deal with some emergencies.

2.3 Whole industry: meet different business requirements such as entertainment, communication and industry video

Because we are a PaaS service, we can not meet the needs of one type of customer, affect the characteristics of other customers, and try to meet the needs of different customers as quickly as possible. The technology puts forward the requirements of three aspects: first, because need to meet the business needs of different industries, so the agility of business application development is very important, we need to make new functionality to any global edge nodes can quickly launch, in order to reduce the risk of new features to online at the same time, we need to support new features in the different edge gray online. We call this development approach Living on the Edge.

The second technical requirement, and a very important design principle for us, is that Edge Services are self-governing. Edge Services are a series of micro-services we deploy around the network nodes of media network, collectively known as Edge Services. Each Edge service must be independent and autonomous, because we are a distributed media network, and we certainly do not want the failure of one node (such as network failure) to affect our entire network business. So each Edge service must be independent. What is autonomy? When there are temporary outages in the Edge and control center network, I have to make sure that the Edge Services are internally autonomous, meaning that its local Services are still available. We can see that there are simply four micro-services listed on the left, among which local scheduling is to reduce the dependence on global scheduling. When there are some temporary failures in the edge and control center network, the edge can still provide services. In addition, our architecture within Edge Services is mainly divided by micro-services. Its core purpose is to help us to quickly and flexibly launch some features. For example, we have protocol adaptation microservice inside Edge Service, so that when we need to support new terminals and adapt some protocols, we can quickly launch a new protocol adaptation microservice, so that we can quickly launch. And it won’t affect the support of terminals that are already online.

The third technical requirement is that the Overlay network needs to be able to define its routes flexibly. For example, Huawei Cloud Conference needs to support a large number of high-level governmental meetings, which have very high requirements for security and quality. We need to let all the messages of this conference entering our media network go through our backbone network of Huawei Cloud, avoiding the use of Internet resource transmission. Some customers are more sensitive to the price. For such customers, we will try our best to use cost-effective network resources to forward their messages. This requires a programmable overlay network to achieve flexible network routing and forwarding.

2.4 Whole process: provide the whole process services of media production, processing, distribution and broadcasting

The third and more important design goal is that our architecture needs to be able to provide end-to-end services, from production to processing to distribution to playback. We mainly divide our customers into two categories. One is cloud native. Many Internet customers are in the cloud at the beginning of their birth, so they can easily use our cloud services. However, some customers need to transition from traditional offline to online. In order to serve such customers, our production and processing system is based on Huawei’s unified Huawei Cloud Stack unified technology Stack, which supports flexible and rapid deployment online and offline. Meanwhile, we also provide convenient SDK. It can be cross-terminal, low power consumption to help customers cover more terminals. The final technical requirement is that the entire real-time media processing pipeline can be flexibly choreographed and dynamically managed. For example, our joint innovation project with Douyu last year helped Douyu move the endside special effects algorithm to Edge Services. This directly brings three benefits to Douyu. The first advantage is that the amount of development work is reduced. The original special effect algorithm needs to be adapted to different terminals and different chips. The second advantage is that the iteration speed of the special effects algorithm is faster. Customers can experience the special effects algorithm only when it is updated and deployed in Edge Services. The third advantage is that there are more terminal models covered, because the traditional end-to-side effects can not be experienced by many low-end machines. If we put it on our Edge services, we can quickly meet the requirements of many low-end models.

2.5 Architecture Layered Design: Adapt to the Characteristics of the Internet

Finally, I want to share one of our most important architectural layering design ideas. We borrowed the design ideas of computer network systems. Imagine what our application development experience would be like if we didn’t have this computer network layered system. Maybe I need to list the nodes of the entire network topology and find the optimal path to send my messages from A to destination B. In this process, I have to deal with various network exceptions, such as packet loss, retransmission, out-of-order, etc., which is obviously very unfriendly to application development.

Computer network system design is to solve these problems. The first is the idea of layering. There is a link layer at the bottom to shield the differences of different link transmission technologies. For example, after we support 5G, the application of the upper layer does not need to be modified. On the top is the network layer, which mainly has two functions, forwarding and routing, so there is no need for each application to define its own forwarding path. On top of that is End to End layer. This is the upper transport layer, the expression layer. An umbrella term for the application layer. The purpose of layering is to modularize, reduce the coupling degree, and focus on each layer to solve the problems of each layer.

Our cloud native media network architecture layering also draws on this idea. We carry out enhanced design in the network layer to improve the delay and arrival rate of message forwarding. We’ve made it easier to develop real-time audio and video applications on the top layer by developing our own real-time transport protocol at End to End Layer. This allows our application development to focus more on business logic. At the same time, we abstract the media processing module, so that audio and video related codec technology, pre and post processing technology, can be independent evolution, rapid innovation.

2.6 Architecture Layer Design -Network Layer

Before we introduce some of our key designs at the network layer and the End to End Layer, let’s take a look at what went wrong at the network layer. The Internet has a very important quality attribute at the beginning of the design, which is the high availability of interconnection. We know that the Internet is composed of tens of thousands of ISPs. If any ISP fails, the network can still communicate normally. Among them, BGP protocol is a very important design, it mainly considers the connectivity, but does not do some quality of service perception. We can see on the left side of the figure, the user A want to send A message to the user B, cross operator, it is likely to pass the Internet through A number of different ISP, it will bring A lot of problems, such as can aggravate the retransmission packet loss, and the key problems are many technical factors, such as many carriers for A certain network network strategy is not necessarily the optimal quality, It may be the most cost-effective, such as a routing strategy that has some cold potatoes or some hot potatoes.

The second reason is that the operator is going to upgrade the equipment tonight, and the operation and maintenance personnel are required to make some configuration changes. In the process of configuration changes, there may be human error resulting in link failure. In addition, there may be a hot event in these areas, which may cause congestion.

In order to solve this problem, we decided to enhance the network layer, here we mainly have two technical means; One is underlay and one is overlay.

1) First of all, Underlay. We use Huawei Cloud global network infrastructure to improve the network access and interconnection quality. Once entering our Underlay network, we can avoid competing with other Internet traffic for bandwidth, which not only improves the quality but also guarantees the security.

2) Next is the overlay part. In addition to building the backbone network by ourselves, we will also deploy some overlay nodes to optimize the transmission path and efficient forwarding of messages based on different QoS objectives, instead of allowing messages to be forwarded arbitrarily. Our design principle in the network layer is also a very classic design idea of separating the control plane and the data plane. To put it simply, the control plane is responsible for routing and controlling the operation of the whole network, while the data plane is responsible for forwarding.

In order to make the data forwarding more simple, we also adopted a very classic design idea in the network: the idea of source routing algorithm. The core purpose is to reduce the complexity of the forwarding device. Specifically, when a message into our first forwarding nodes of network, the system will forward the message to go through all the node information, including the destination node are encapsulated in the message header, so each forwarding node after receiving the message, only need to parse message header, knew the next-hop where to send, so can greatly reduce the complexity of the forwarding device.

Another very important design principle is that we do not make reliability commitment requirements for the network layer. Although we do not guarantee reliability, we will still improve the delay and arrival rate of packet forwarding by using redundant error correction, multipath transmission and other technologies. That’s why we call this the network layer, it’s still focused on routing and forwarding. It’s just some enhancements.

2.7 Architecture Layer Design -End to End Layer

The enhancement of the network layer can help us achieve lower delay forwarding and higher arrival rate. Then there is our End to End layer. Here you can first think about a problem. As mentioned above, there are so many inherent properties of the Internet, such as packet loss, out-of-order, and retransmission, which seem very unfriendly to developers. But the development of the Internet is very prosperous, there are generations of Internet applications Email, Web, IM, audio, video all kinds of business, this is what reason?

A very important point is protocol. Many important protocols have emerged in the End to End Layer, which have greatly reduced the technology threshold of our application developers. For example, from TCP to HTTP to QUIC, there is a protocol behind each generation of Internet application development. The core design goal of End to End Layer is to define a good protocol and development framework to make application development easier.

How do you do that? You can see the figure on the left. The middle part is the general function diagram of our self-developed real-time transmission protocol. We will provide a unified interface in the north of it. Through this set of northbound interface, we can not only develop real-time audio and video business, but also develop reliable message business. At the same time, let’s take a look at its southbound interface, through the protocol stack shielding the underlying UDP or ADNP protocol differences, so that the application development will also become more simple.

The purpose of protocol stack design is to make application development easy. Therefore, we also abstract two modules, NQE and QoS, through which the two modules provide callback methods to quickly feedback the network information to the upper application, such as coding module. The coding module can quickly adapt to the conditions of the network to adjust its coding parameters.

Another very important design principle is efficiency. As we know, as mentioned above, there will be a lot of IoT terminals in the future. IoT terminal has a big feature, which requires very high power consumption. We hope to consider this issue at the beginning of the protocol stack design. So we don’t want to add unnecessary copies to this layer. It follows the design principle of ALF, which is also very classic. The RTP was designed with this design principle in mind.

In addition, the design of our protocol stack also refers to the design idea of QUIC. Support multiplexing, network multipath, Huawei LinkTurbo, priority management and other functions. Here we share a small experience, is in the development of free view and VR services, the demand for bandwidth is very high, at this time we will open the multi-path function, can get a relatively large improvement in the experience.

2.8 Target architecture of Huawei cloud native media network

Finally, I will make a brief summary of the target architecture of the whole media network.

1) To put it simply, it is to simplify complex problems, divide and conquer, and make each layer decouple and evolve rapidly through layered design;

2) Each Edge Service is self-governing to improve the availability of the whole service;

3) By dividing Edge Services into micro-services, we can be more flexible to adapt to the needs of customers and realize the rapid launch according to the micro-service level.

Real time audio and video service quality assurance practice

In the third part, I will share some of our practice on real-time audio and video service quality assurance. Here are some thoughts on algorithm design, and the previous part is mainly about architecture.

3.1 Video, audio and network are the key system factors affecting the experience

As shown in the figure above, we performed an analysis of the relevant dimensions that affect the experience. A simple mapping of the relationship from objective indicators to subjective indicators to QoE is made. Through the analysis, we found that the three core systematic factors affecting the quality of real-time audio and video service experience are video, audio and network. Next, I will introduce the algorithm practice of these three parts respectively.

3.2 Video coding technology

First let’s look at video coding. We have made a simple classification of video coding technology according to design objectives. The first category is how to scientifically reduce the redundancy of video coding and reduce the impact of coding distortion on the subjective perception of human eyes. Because our real-time audio and video business is mainly people-oriented, there are some very classic optimization ideas, such as: starting from people, analyzing the visual characteristics of human eyes, and optimizing the coding algorithm based on these characteristics. The figure simply lists several categories of human eye visual characteristics that are highly correlated with coding.

Another optimization idea is to start from the source, that is, from the content, we will analyze the characteristics of different scene content to optimize the coding algorithm, for example, the characteristics of computer generated images are low noise, large flat area and so on.

The second design goal is how to scientifically increase the redundancy to resist the impact of weak network transmission on subjective perception of personnel. Here is a brief list of the types of encoding that add redundancy, such as extreme full I frame encoding, intra-frame refresh mode, and long reference frame and SVC encoding. In some spatial video services, in order to improve the time delay of spatial positioning, we will use some full I frame encoding combined with some common encoding to reduce the time delay of spatial positioning. In order to reduce the burst of large I frames in cloud games, we use intra-frame refresh encoding. Long – term reference frame and SVC are common encoding methods in real-time audio and video service.

3.3 PVC perceptual coding

Here are some of our specific coding techniques. Our cloud video team, together with Huawei 2012 Central Media Technology Institute, improved the PVC perceptual coding algorithm from the perspective of analyzing the human visual system. Our algorithm went through several iterations. The latest perceptual coding 2.0 algorithm achieves a 1Mbps bit rate and provides a 1080P 30 frame HD experience; The main ideas for improvement of the algorithm are as follows: firstly, the scene and region are distinguished through pre-analysis and coding feedback information. The main highly sensitive areas of the real-time call scene include the face region and the static region. Different coding parameters and rate allocation strategies are adopted for different scenes and regions, such as low rate allocation for non-sensitive regions. On the basis of 1.0, we added AI technology in code control. Compared with the previous fixed combination of code rate and resolution, the new method is based on AI perceptual code control to obtain the optimal combination of code rate and resolution in different scenes, so as to achieve better subjective effects under low bandwidth.

3.4 SCC code

The second coding technology is SCC coding, which is mainly applied to the coding of computer generated images, such as the screen sharing scene in education or conference. Compared with X265 Ultrafast gear, the compression performance of our algorithm is improved by 65%, and the coding speed is improved by 50% under the same computing resources. We also addressed some of the issues specific to the screen sharing scenario. When sharing, I often share some pictures and texts, such as Word or PPT. This type is relatively static. At this time, the encoding parameters will generally adopt the encoding method of low frame rate to ensure its picture quality as much as possible, but in many cases, after sharing text and text, they will switch to sharing video. If we don’t have a good perception of this, our experience of watching a video is a discontinuous picture, similar to GIF.

In order to solve this problem, we adopt the complexity analysis based on the time and space domain of video to adapt the frame rate of video coding. This gives you a high quality image for still graphics, and also ensures fluency when you switch to video sharing.

The second problem we solved was the color distortion problem brought by the YUV444 subsampling to the YUV420 scene, because we know that many times the screen shares static text and text, and the requirement for the color is relatively high. But it from under the YUV444 sampling to YUV420, UV domain signal will appear a lot of attenuation, is left without prior effect of using the new algorithm, on the right is the effect after using the new algorithm, significantly more clearly as you can see the picture on the right fonts, colour degree of distortion will be small, the core is the use of the color correction algorithm is low complexity.

3.5 Adaptive long term reference frame coding

The first two coding technology is to reduce the redundancy, and adaptive long-term reference frame coding technology is scientific to enhance the redundancy. To get a better idea, let’s simplify the complexity and understand what a fixed long term reference frame is. As we see on the top left, the red frame is the I frame, the green frame is the long term reference frame, and the blue frame is the normal P frame. By such a reference frame, the normal forward reference dependency of IPPPP is broken, so that when P2 or P3 is lost, the decoding of subsequent P5 will not be affected, and the decoding can continue, which will improve the fluency of IPPPP. However, there are still disadvantages, such as the green long reference frame P5 is missing, because the subsequent P frames rely on it, so it can not be decoded. The second problem is fixed, because of the long reference frame, it will bring a certain amount of redundancy, which will lead to the quality of the same bandwidth will be reduced, so we hope that when the network is good, we can try to reduce the redundancy, to improve the picture quality, so we put forward the method of adaptive long-term reference frame.

The core idea of adaptive long-term reference frame is two points. The first one is to add a feedback mechanism at the decoder end, which tells the encoding end that I have received this long-term reference frame. After the encoding end knows that this frame has been received, it will then encode with reference to this frame. The second is the mechanism of adding a dynamic mark long-term reference frame, that is, I will dynamically optimize the step size of long-term reference frame encoding according to the QoS situation of the network, and adjust the step size a little shorter when the network is good, and a little longer when the network is bad.

However, after adding the feedback mechanism, there will be a problem. When the RTT is relatively long in some network models, my feedback cycle will be relatively long. And feedback message may be lost, also need feedback again, this will lead to long-term reference frame of the step length is very long, once the step length, its coding quality will drop, even down to the point of business can not accept, at the time of our optimization algorithm is also given that, when the reference frame of the step length is too long for a long time, Instead of relying entirely on feedback, we will force the P-frame to refer to its nearest long-term reference frame. This will bring about two better optimization effects. One is that the picture fluency becomes better in the case of sudden packet loss. At the same time, it has a better ability of network self-adaptation, which can give consideration to both fluency and picture quality.

3.6 Network transmission technology: seek the optimal solution of interactivity and quality

First is the video coding technology to share some, next let’s take a look at our practice in network transmission. The core goal of our definition of network transmission is to require the optimal solution of interactivity and quality. We know the network transmission technology, mainly resist packet loss, resist delay, resist jitter. Common techniques such as ARQ and FEC, unequal protection, expansion and dithering estimation, cache, etc., in addition to do shake resisting packet loss, also need to have a congestion control and congestion control core purpose is to make “sending rate as much as possible to close to” available “rate”, as far as possible at the same time maintain a low latency, if send rate and network bandwidth available do not match, This can result in packet loss, jitter, or low bandwidth utilization. There is also a very important interaction between source and channel. The dynamic long-term reference frame we have seen before is a way to realize the dynamic adjustment of coding parameters through the information of the channel. Only based on this interaction can we better improve our experience.

3.7 Enhance the accuracy of bandwidth prediction and improve the quality of QoE experience based on reinforcement learning

The algorithm of bandwidth prediction is very important in both congestion control and source-channel linkage. The traditional approach is to use artificial experience and make some decision tree algorithms to predict the bandwidth under different network models. However, the effect of this approach is not particularly ideal in complex scenes, so we hope to improve this point by means of reinforcement learning.

The main idea is network QoS based on receiver feedback, which mainly feedback four information: receiving rate, sending rate, packet loss rate and delay jitter. Based on these information, the prediction accuracy of bandwidth can be improved by reinforcement learning method. After the optimization of the algorithm, our high-definition ratio has been improved by 30%, and the lag rate has decreased by 20%.

3.8 Audio 3A technology: Improve audio clarity

Finally, I’ll share some technical practices in audio. A good 3A algorithm is critical to the speech clarity experience. We applied AI technology to the 3A algorithm to improve the speech experience.

First, we applied AI to echo cancellation, which is a very important step in the entire 3A. Traditional algorithms in steady state conditions of echo cancellation, has been relatively mature, generally processing is better, but when there is some environmental changes, such as I took my mobile phone hands-free calling, in the home, from the room to the balcony, the environment has changed, echo cancellation will face many challenges, can better deal with these problems by means of AI. Especially for the two-lecture scenario, our new algorithm solves the problem of echo leakage and word loss very well.

Followed by noise, the noise of the traditional, such as the noise of the steady state, such as fan, air conditioning is relatively good, and the noise reduction algorithm based on AI not only can we better smooth processing noise, in response to such as keyboard, mouse percussive sound or water, cough this sudden noise scenario, we can also quickly for noise suppression.

Another important link in 3A is the automatic gain. In the call scenario, the automatic gain is mainly based on the recognition of human voice. At this time, the detection of human voice VAD is very important. In this area, we also use AI technology to improve the accuracy of human voice detection and improve the effect of automatic gain.

3.9 Audio packet loss recovery technology: reduce the impact of packet loss on audio experience

Another difference between video technology and audio technology is the technology of packet loss recovery. The figure on the left is also a classic technical map of packet loss recovery, which is mainly divided into two categories: one is based on active packet loss recovery, and the other is based on passive packet loss recovery.

Active packet loss recovery technology mainly includes common FEC, ARQ and so on. There are three main methods of passive recovery, interpolation, insertion and regeneration. The algorithm optimization idea is the same as that of video, which starts from the study of people. Video is to study the characteristics of human eyes and vision, while audio is to study the sound mechanism of people. The information of fundamental frequency reflects the vibration frequency of vocal cords to some extent. The information of the envelope, to some extent, reflects the situation of the mouth, and the vocoder technology based on the combination of these two information and AI can achieve the recovery level of the lost audio message in about 100 milliseconds. As we know, the sound of a Chinese character usually takes 150 milliseconds to 200 milliseconds. The traditional PLC signal-based recovery method can generally achieve the recovery of 50ms audio signal, but now our AI-based way can achieve the recovery of 100ms audio signal.

3.10 Case 1: Huawei Changlian, the world’s first full-scene audio-video calling product

Finally, share two examples. Our products not only serve external customers, but also support many other products and services of Huawei internally. I always joke that it is harder to support internal customers, and even harder to support internal customers is to support Huawei’s internal customers. Their requirements are very high. Now we support Huawei’s Changlian service, Changlian is the world’s first full scenario (in addition to supporting mobile phones, We also support real-time audio and video calling products such as Huawei’s big screen, Huawei’s tablet, Huawei’s notebook, watch and bracelet communication. We help ChangLian achieve high-quality 1080P30 frame call effect under the condition of 1Mbps bit rate.

3.11 Case 2: Webinar: integrated experience of conference and live broadcast, making it easier to hold conferences

It is harder to support two Huawei internal customers than one. The second internal customer we support is Huawei Cloud Conference. The webinar scene of Huawei Cloud Conference is also developed based on our real-time audio and video service. Now we can support 3,000 audience at the same time for a single webinar, and 100 of them are interactive. In the second half of this year, our cloud conference products will be able to support a single webinar with 10,000 audience and 500 interaction.

04 summary

Finally, I’d like to conclude with what I’ve shared today. First of all, we can clearly see that the video business is driving the development of the entire Internet technology, including audio and video coding/transmission technology, edge computing and edge networking technologies. So we need a service or system to bridge the gap between the Internet infrastructure (the supply side) and the rapidly growing video business (the demand side).

Second, today’s sharing is just the beginning. With the increase of real-time audio and video technology application scenarios, data-driven, our cloud native media network architecture and various algorithms will continue to optimize.

Finally, I hope that Huawei Cloud native video service can join hands with you to enter the “new era” of video.

Thank you very much.