Brief introduction: In the 2021 Cloud Conference “Industry video innovation and Best Practice” video cloud theme forum, Ali Cloud intelligent senior technical experts in the “AliRTC opens the video interaction” zero processing era “in the keynote speech, released ali cloud video cloud next generation real-time interaction solution – RTC” zero processing “, At the same time, it shared the exploration and practice of Aliyun video cloud in RTC products. The following is the content of the speech.

I. Interactive evolution and challenges

How have video interaction products changed in the past few years?

We think RTC products have changed the industry in two very important ways.

The first change is the upgrade of interactive from text and text to audio and video since 2014.

In 2014, Internet entrepreneurs and RTC product suppliers explored the commercialization of video interaction together, and education and entertainment became the main breakthrough direction. Based on the worldwide interactive teaching, show video, and multi-person friendship interaction, most of them completed the successful combination of business and technology at this time point.

2017 is a landmark time when RTC products have helped head Internet customers achieve disruptive development, marking the maturity of interactive video technology and online interactive business model.

In the next few years, there will be more scale replication of different volumes and scenes. Therefore, we can see that in 2018 and the following years, the market will not innovate with new scenes and new interactions, but business replication based on different content and different customer groups, and video interaction will move from the head to more market segments.

The second major change occurred in 2020, which was brought forward by at least five years due to the impact of the pandemic and the full penetration of cloud video conferencing.

The change of the market, we cannot call it a technology revolution, in fact no new demands for RTC products, also did not produce new interactive scene and technology, but the penetration of the mass, redefined the supplier market pattern, cloud vendors for the first time become the extremely important part of the market, let market from a single meeting, Split into cloud platform + conference terminal supplier, so that our customers have more choices.

From 2018 to now we have no fundamental breakthrough in scene, is it because we have hit a technical bottleneck?

With such questions, Ali Cloud has conducted in-depth technical evaluation of RTC scene technology. We try to find out what the technical level of everyone in the whole industry is. Unlike single video technology, THE evaluation of RTC is more complex.

For example, video coding can be analyzed by PSNR, SSIM, VMAF, etc. For visual algorithms such as video classification, we can analyze by ROC curve. However, video RTC involves a lot of subjective feelings, which is quite complicated. There is no unified evaluation standard in the industry at present.

We selected six dimensions to represent the performance quality of RTC from these indicators affecting user experience.

If you are interested in the evaluation, you can follow our official account “Video Cloud Technology” **, which explains in detail how we automate the evaluation. During the evaluation process, we will create different network environments to test the performance of RTC in various aspects.

We did some testing of RTCS in the industry and found two characteristics.

First, RTC has obvious technical threshold. For example, the green box represents a typical TYPE of RTC capability, which is developed by a small team with a small investment, so there will be an obvious gap.

The second is a few larger suppliers, including in ali cloud, outside the circle, the red line, blue line, as well as the yellow line, they are in a relatively consistent level, but none have outstanding place, so technical homogeneity is extremely serious, everyone basically at the same level.

Our current real-time video interaction mainly focuses on online and offline scenes, and there may be broader application scenes in the future, such as some interactive scenes, VR control and virtual reality.

At this point, we have to think about the question, are we at a point where our technology has reached a plateau, where we can’t meet the broader needs of the future, what’s behind that? Could it be that our technology has reached a bottleneck? Because technology usually develops in a step-wise way, it can’t break through without getting stuck in one level.

Second, “zero processing” to accelerate interactive upgrade

We want to analyze, what is the user experience now? What’s the problem with our current technology?

By comparing various RTC suppliers, we found an interesting point, is that they have a two-thousandth of the rate is difficult to eliminate. 50%, 60% packet loss can be done well, but if the network bandwidth is limited, two-thousandths of a packet loss is difficult to eliminate.

We have some tools to solve these problems, such as narrowband HD technology, we can solve these problems with complex computing, we can solve these problems with non-standard screen coding, but it’s very difficult to make these technologies widely available.

The most fundamental reason is that we will find the end-side capability is limited, everyone’s mobile phone is different, some people’s mobile phone is very good, can do complex algorithm, some people’s mobile phone is bad, can not do complex algorithm, at the same time, the fragmentation of the end is serious, it is difficult to adapt to all the end.

On the application side, we want to be able to provide more interesting interactions, such as generating cartoon characters in real time, which works on the server, but only on a few very powerful devices.

A natural thought is, can we go beyond the current application architecture?

We take a completely rely on the ability of architecture, a gradual transition to rely on the cloud and the transmission of video processing architecture together, based on this idea we put forward the cloud + side rendering technology, the purpose is to hope to provide a powerful processing capacity from the cloud, on the side is responsible for rendering, only need to provide very little processing capability to finish good treatment effect, So that everyone can get the same experience on different phones.

This is video cloud “zero” solution * * * *, the basic architecture diagram on the side only need to compare simple video collection and video transmission, and then through our build global coverage GRTN network to reach the cloud, the cloud using GRTP cloud real-time processing engine for video processing, to deal with a good video to the end, You only need to do simple rendering on the end. In this way, the problems of insufficient computing power and fragmentation mentioned above can be solved.

However, there is no such thing as a free lunch. Using the above framework, it is easy to find several problems.

The first is whether our cloud can handle this scale of processing.

Second, the cloud can afford such a large scale of costs.

Third, can so many types of processing services be sustained on the cloud?

Our own confidence comes from several sources.

First, through years of accumulation, We have accumulated the largest cloud video processing cluster in the industry, so we are technically capable of undertaking super-large-scale processing.

Second, about cost.

Below is a drawing of the business that we deal with the figure, the abscissa is time, the vertical is the resource usage, the black line is a kind of business, the red line is another kind of business, you can see, each business there are a large number of idle period, business free period can let we have a lot of resources for reuse, we when we mix a variety of business running, can make the resource utilization, Significantly reduce costs.

In addition to time mixing, we can also reduce the overall cost through space mixing and heterogeneous mixing.

Thirdly, since we are backed by Alibaba Group, including ourselves, we have accumulated a lot of video algorithm processing, so we have the opportunity to continuously provide rich algorithms and processing capabilities.

Third, “zero treatment” practice sharing

Next is ali Cloud video cloud in zero processing practice.

The first scenario is to use the MCU to liberate the end-to-end computing power.

Under normal circumstances, when we do RTC live broadcast, the live broadcast screen is completed by THE RTMP protocol. In this case, the audience cannot participate in the live broadcast interaction due to the delay. In order to enhance the interaction of the audience, it is necessary for everyone to join the RTC network, and each end subscribes to multiple streams. The computing power and network traffic of the other end are very heavy burdens.

We combined the streams through the MCU of the cloud and re-entered the RTC meeting. In this way, the audience could see the live stream through THE RTC, which was very convenient for interaction and did not consume too much on-end resources. This mode is called interactive low latency mode, which is already a mature product capability for us.

The second scenario, cloud push.

This is an example of how we can get through the internal service capability of Ali. Through cooperation with the Security Department of Ali Group, we can get through the circulation of RTC through the internal network and the products of the security Department, thus reducing intermediate links and realizing the content review with low cost and low delay.

The third scene, the cloud effect.

I believe you have already seen this scene. We have realized the virtual conference room with the help of the cloud processing. We can matting and map all people through the MCU of the cloud to improve the experience of attending the video conference.

The real-time virtual image shown above relies on GRTN real-time transmission network to transmit the video stream to the cloud, where the cloud carries out complex AI processing such as matting, voice changing and cartoonization of the video. The terminal is only responsible for display, thus realizing end-to-end zero processing.

“Zero processing” as the next generation of real-time interactive solution, pioneered in the cloud vendors, solved the end side new interactive era in the work force constrained problem to realize interaction of virtual scene, make full use of the cloud of hyperfine force, to cloud effects to build real-time virtual scene, is a comprehensive open immersed interaction an important evolution in the new world.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.