Codec reevolution: Ali266 and the next generation of video technology

The past year has seen a once-in-a-century event, as well as the proliferation of video applications. The resulting explosion of video data volumes has intensified the urgent need for efficient encoding and decoding of underlying core technologies.

Soon after the new video codec standard VVC was finalized, alibaba’s video team began to devote itself to the development of VVC software codec.

At LiveVideoStackCon 2021 Beijing Summit, Mr. Ye Yan, alibaba researcher and head of Intelligent cloud video standards and implementation of Ali Cloud, shared the current situation of the video industry, the technical evolution history and business outlook of Ali266’s self-developed VVC codec, as well as the future opportunities and challenges facing the video industry.

Article | Ye Yan finishing | LiveVideoStack

Hello everyone, I am Ye Yan, the head of intelligent video standards and implementation team of Ali Cloud. This topic is codec Reevolution: Ali266 and the next Generation of video technology.

This sharing is divided into four parts: first, the current situation of the video industry, then the technical evolution history and business outlook of Ali266’s self-developed VVC codec, and finally, the future opportunities and challenges from the perspective of the video industry.

1. Current situation of the video industry

It is no exaggeration to say that the COVID-19 pandemic we have been experiencing in the past year and now is a once-in-a-century event for mankind. The epidemic disrupted the normal pace of life and the customary face-to-face communication between people, changed the rules of many games, and triggered the accumulation of advanced video technology products.

The epidemic situation varies from country to country. China is a country with good control of the epidemic, so People’s Daily life is basically normal. However, people’s life and work in countries and regions with serious epidemic situation have been completely changed due to the epidemic.

There are several aspects to these changes. First of all, no matter the interaction at work is transferred from offline to online, a large number of cloud conferences and video conferences are used. Today, the cumulative daily user time has exceeded 100 million minutes. In addition, more than half of the employees in hard-hit countries and regions work from home, which is a big change from the face-to-face communication they were used to.

In the United States, for example, movie theaters have been closed for more than a year. Although they opened this summer, few people went to the movies. People’s entertainment life is mainly dependent on home theater, including stars from offline to online performance, through online interaction with fans.

From the perspective of the video industry, we have witnessed a very important milestone in the past year, namely the finalization of the new generation of H.266/VVC international video standard. The VVC Standard was officially standardized from April 2018, and after more than two years, it will reach the Final Draft International Standard, or the Final version of the first edition, in the summer of 2020.

In the whole journey of more than two years, especially in the last half year, affected by the epidemic, nearly 300 video experts from all over the world participated in the technical discussion in the form of network meeting day and night, and finally completed the formulation of h. 266/VVC new generation standard as scheduled.

Similar to each previous generation of international video standards, VVC has halved the bandwidth cost relative to the previous generation of HEVC standard.

The figure above shows the VVC subjective performance test results. It shows the bandwidth savings that VVC reference platform can achieve compared with HEVC reference platform under the premise of the same subjective quality.

The video content here is divided into 5 categories. The first two categories are UHD and HD, namely ultra high definition and HIGH definition video. We can see that VVC’s VTM reference software can achieve bandwidth savings of 43% to 49% compared with HEVC’s HM reference software.

For HDR and 360 panoramic video, two more innovative video formats, VVC can achieve higher bandwidth savings of 51 to 53 percent, respectively.

The last column is for low-latency applications, using the time-domain predictive structure used in video conferencing. Since the predictive structure is more limited, the BANDWIDTH savings achieved by VVC are slightly lower, but still a significant 37%.

Space is limited and this is a highly summary of the figures. If the reader is interested in intermediate details, check out the subjective test suite reports from the three sessions of the JVET Standards Committee T/V/W2020, which contain a great deal of detail.

Under the background of video explosion and the finalization of the latest standard VVC, Alibaba began the development of Ali266 technology. Let’s take a look at the history of Ali266 technology.

2. Development history of Ali266 technology

What is Ali266? What do we want it to do?

Ali266 is our codec implementation of the latest standard VVC. First, we hope to achieve high compression performance and get the bandwidth saving bonus brought by VVC. The second point is the high definition real-time coding speed. Compared with HEVC, VVC has more coding tools. Maintaining real-time coding speed is of great significance for real commercial use. The third point is to enable Ali266 to have a complete self-contained coding and decoding ability, so as to better start to end ecology.

The purpose of Ali266 is to realize the three challenging technical points mentioned above, achieve technological leadership, transform into product competitiveness, and help us expand our business.

The figure above shows a number of VVC coding tools. Here I divide the main functional modules in the traditional video codec framework into several categories, including block partitioning, intra-frame prediction, inter-frame prediction, residual coding, change quantization, loop filtering, and other coding tools.

The blue circle above is the coding tool for HEVC, and the purple circle below is the coding tool for VVC. We can see that HEVC only has three or four corresponding coding tools in the corresponding function modules, while VVC supports a richer coding tool set, which is the main reason why it can have strong compression capability and get bandwidth saving bonus.

Coding tools have a certain level of complexity, so each additional coding tool brings a corresponding increase in complexity and performance.

Above is a comprehensive overview of the complexity and coding performance that each coding tool tracked by the JVET Standards Committee during the development of the VVC standard.

The horizontal axis is time and the vertical axis is the improvement of coding performance. Different color points correspond to different VVC coding tools. The more to the right of the horizontal axis, the lower the complexity of a coding tool, and the higher the vertical axis, the higher the performance of the coding tool.

So we want coding tools to be in the upper right corner, but as you can see from the graph, the VVC coding tools are basically blank in the upper right corner. More coding tools can provide a 1%, 1.5% performance gain, but also a small increase in complexity.

This poses a challenge to the encoder optimization, because it is not enough to grasp several major coding tools for optimization, but to quickly and accurately select the coding tool to be used in view of the current input video in a rich set of coding tools, which is the main difficulty in the optimization of H.266 encoder.

The chart on the right of the figure above shows the time ratio of different coding tools to be profiled in our software coding system. Corresponding to the figure on the left, again 40% of the coding tools are not time-consuming, only about 2%, but all provide performance, so we have to decide how to choose. In addition, 92% of coding tools take less than 10% of the time, which poses a challenge to the optimization of the whole engineering algorithm.

The above illustration illustrates not only the challenges of h. 266 encoder optimization, but the challenges of any real-time encoder.

Because in the process of video coding, we have to go through the tug of war of compression performance increase and coding speed decrease, so what we need to do is to overcome this tug of war.

If we compare VVC reference platform VTM to HEVC reference platform HM, although the bandwidth is halved, but VTM encoding speed is only one eighth of HM encoding speed, which is not acceptable for real-time coding, so next I mainly talk about Ali266 optimization.

We will cover optimization from two dimensions, the first being ** code quality (code performance) ** optimization.

Coding quality (coding performance) optimization

We do a lot of work to maintain code quality and performance. Due to lack of space, I will present only one example, which is a combined optimization of preanalysis, preprocessing, and core coding tools.

In the pre-analysis, scene switch detection is selected. Students who do encoders know that it is necessary for every commercial encoder to carry out accurate scene switch detection. The pre-processing is the MCTF process, and the following will briefly introduce what MCTF is. The core coding tool is VVC’s new coding tool LMCS.

This is an introduction to the MCTF pre-processing process.

MCTF stands for Motion conmpensated Temporal Filtering, which filters the input video signal in time domain through layer by layer motion search and motion compensation, and filters the input video signal in time domain through bilateral filter to effectively reduce video noise. And noise reduction in the time domain and in the airspace also play a noise reduction effect.

MCTF can greatly improve coding efficiency, which is why MCTF is supported on VTM and VVEnc (VVC’s open source encoder) platforms.

So let’s look at how scene switching and MCTF work together.

The figure above shows that the encoder performs MCTF on the light yellow and low time domain video frame. Because MCTF uses motion compensation and search in the time domain, each light yellow frame has a corresponding light gray frame as the reference frame of MCTF, while the light blue frame has no relation with MCTF. Due to the time-domain reference, MCTF needs to be modified in case of scene switching.

We can see that under normal circumstances, frame 8 is MCTF frame, and the two frames before and after it are MCTF reference frames. In case of scene switching, such as scene switching in frame 10, frame 10 was originally MCTF reference frame, but due to scene switching, frame 10 will become a new FRAME I, and its time domain layer will be reduced accordingly. The original MCTF filter frame and MCTF reference frame must be adjusted, that is, light yellow and light gray frames will be adjusted. As can be seen from the up-down comparison, due to the scene switch, the MCTF reference frame of frame 8 is adjusted to its first three frames and the next frame, while the MCTF filter frame of frame 10 is changed to its last four MCTF reference frames.

Take a look at how scene switching and LMCS work together.

LMCS is a new coding tool in VVC, which requires the encoder to calculate the corresponding parameters and transmit them through APS. LM here refers to LUMA Mapping, which adjusts the dynamic range of brightness signal to make better use of the dynamic range of brightness signal. For example, 8bit is 0-255 dynamic range, 10bit is 0-1023 dynamic range.

Since brightness signals are adjusted during LM, CS process, also known as Chroma Scaling, is required to adjust chroma signals in the same block to compensate for the impact of brightness signal adjustment on chroma.

How does this tool work with scene switching?

Using the previous example, it was found that there was scene switch in the tenth frame, which was a new I frame. The dynamic range of the new scene may be completely different, so it will determine whether LMCS parameter update is needed on the new I frame, and after the corresponding GOP prediction structure is changed, the new frame will become a new low time domain frame. For example, in the case of GOP16, frame 26 becomes a low time domain frame, then we will judge whether the motion is relatively violent. If the motion is violent, LMCS parameter update is also required in the low time domain frame.

Through this optimization, what performance can be achieved by the combination of scene switch + pre-processing MCTF+LMCS optimization?

If the video is fairly long, including more than one scene switch, 2% bandwidth savings can be achieved if optimized with LMCS alone; If optimized alone with MCTF, bandwidth savings of 2.1% can be achieved; If all three are optimized at the same time, the performance can be perfectly stacked, resulting in a 4.1% performance gain.

If scene switching is quite frequent in a video, reaching more than two times, further performance improvement can be seen in the table, from 2.1% and 2.9% optimized separately to 5% performance gain achieved by optimizing all three simultaneously.

If there are more frequent scene switches, the bonus of this joint optimization will be even higher, reaching 3.6% when combined with LMCS. If combined with MCTF, it can reach 3.2%; If all three are combined, a performance gain of 6.8% can be obtained.

As everyone who makes encoders knows, 6.8% performance is quite impressive, and we can get it through pre-analysis, pre-processing and joint optimization of core coding tools.

While I’ve been focusing on code quality performance optimization, I’ll look at how to optimize code speed from the second, very important dimension.

Coding speed optimization

Let’s start with an example:

A very representative new tool of VVC is the flexible block partition structure. In the figure above, VVC and HEVC divide the same scene, VVC is on the left and HEVC is on the right. In the same scene, VVC can better describe the object contour through more flexible block division.

Let’s look at the big picture. In the case of HEVC, each piece is square because it supports only quad tree partitioning.

VVC allows more flexibility in binary tree (BT) or ternary tree (TT) partitioning in horizontal and vertical directions. The binary tree and the cubic tree are called MTT (Mutli-type tree). By comparing the enlarged image on the left and the enlarged image on the right, VVC can describe the finger more accurately by dividing it into rectangles.

Although VVC uses more block partitioning methods to get better object contour description, the difficulty brought to the encoder is that the encoder needs to try more choices, so the decision of how to accelerate MTT partitioning is very important to improve the coding speed.

Here we use the gradient based MTT acceleration concept. If a block’s texture changes dramatically horizontally, it is less likely to be divided horizontally, and the same is true for vertical partitions.

If we take the horizontal example, based on this observation, four gradients in the horizontal direction, the vertical direction, and the two diagonals are computed before making a concrete decision to partition each block.

Taking the horizontal direction as an example, if I find that the horizontal gradient is greater than that of the other three directions and exceeds a certain threshold, it means that the texture change of the current block in the horizontal direction is quite drastic. Therefore, the encoder will no longer make horizontal BT and TT decisions and accelerate the coding time.

We can see that the acceleration effect of this technology can achieve 14.8% improvement in absolute frame rate and encoder speed, which is quite a significant percentage.

Of course, there is a performance degradation due to skipping some block partitioning decisions, but since the performance loss is only 0.4%, this is a perfectly fast algorithm in terms of overall acceleration and performance-to-performance performance.

We have a lot of other optimization work, due to the space is not a more detailed. Let me make a summary of Ali266 encoder.

Now Ali266 supports two levels:

Slow grade, mainly suitable for offline applications, according to x265 Veryslow grade, Ali266 Slow grade has the same coding speed as X265 Veryslow grade, and can achieve 50% bit rate savings compared to X265 Veryslow grade. That’s half the bandwidth.

At the same time, Ali266 also supports the Fast grade, which is very important for commercialization. For commercial applications with strict requirements on real-time encoding speed, it can achieve 720p30 frames per second real-time encoding, leading the VVC encoder speed in the industry. Achieving a 40% bit rate savings is a very large bandwidth bonus.

In terms of coding speed, we did not stop at 720P30, and continue to develop 2k, 4K, 8K ultra HD video real-time coding capabilities.

In addition, in the preparation of this sharing process, Ali266 has achieved 2k, that is, 1080p30 frames per second real-time coding capability, increasing our confidence in the challenge of uhd real-time coding.

Our main goal for the continued development of Ali266 is to maintain VVC’s performance advantage and accelerate the commercial launch of VVC.

After talking about encoders, I will talk about decoders, because as we mentioned earlier, one of the main goals of the development of Ali266 was to provide complete VVC codec capability.

Decoder design objectives from the commercial point of view have the following several, the first is real-time decoding speed, or even faster than real-time; Second, the decoder needs to be very stable and robust; Then there is the concept of thin Decoding, which hopes that the decoder is relatively light.

In order to achieve these design goals, we optimized in four ways, one of the most important dimensions was to start from scratch.

This mean we abandoned before all of the open source or reference platform architecture design, data structure design, starting from scratch, in accordance with the standard of VVC document to entirely new data structure and framework design, in the design process using the people familiar with the acceleration of method, including multithreading acceleration, assembly optimization, memory, and cache efficiency optimization, etc. Through these four dimensions to improve the performance of THE Ali266 decoder.

The figure above lists the decoding performance of Ali266 from four dimensions.

In terms of speed, we pay more attention to low-end machines (VVC has the concept of universality), and then in the low-end machine test, we found that Ali266 only needs three threads to achieve real-time decoding of 720p. Due to the low thread occupancy rate, it can effectively reduce the CPU occupancy rate and the power consumption of the phone, which is a very favorable indicator for the actual commercial use.

From the point of view of stability, we have carried out tests on a number of iphones and Android phones, covering the two major mobile operating systems, and fully covering high, middle and low grade mobile devices to ensure stability.

In terms of robustness, we used tens of thousands of error streams to attack the Ali266 decoder, ensuring a perfect and fast error recovery mechanism for errors both above and below slice.

In the end, it was because we were starting from scratch that we were able to give a satisfactory answer on Thin Decoder. Our Ali266 decoder is less than 1MB in size and uses only 33MB of memory to decode HD 720p.

I came to make a summary of the Ali266 decoder.

In terms of current performance, Ali266’s decoding speed, stability, robustness, decoder footprint and other indicators all meet the design target and commercial requirements. Next, we hope to provide full support for VVC’s Main Profile, mainly referring to the full support for 10-bit decoding.

In addition, we will also try our best to optimize the player ecology, and cooperate with Ali266 encoder to accelerate the commercial landing of VVC.

Since we have mentioned commercial landing many times before, let’s look at the business outlook for Ali266.

3. Business outlook of Ali266

First look at the VVC standard level of two to three years of landing prospects.

Like HEVC and H264 before it, VVC is a universal standard, so it can cover a wide range of video applications, including on-demand, video conferencing, live streaming, IoT video surveillance and other existing video applications.

There are also many emerging video applications emerging, including panoramic video, AR, VR, and recently the popular metasverse, which also require the technical base of video codec, so for these emerging applications, the VVC standard is also universal.

So let’s take a look at the application prospects of Ali266.

Let’s start with Alibaba Group, which lists four aspects: Youku, Dingduo video conference, Aliyun video cloud and Taobao.

My personal view on how to move forward with the Ali266 application is to move from closed loop application to open application. Why is that logic?

The reason is that closed-loop business has stronger end-to-end controllability, which can be opened through closed-loop method when the new standard ecology is not perfect. Among them, Youku and Dingding video conference are perfect examples of closed-loop business.

After polishing Ali266 in a closed loop and going through the whole link from content to playback, we will be more ready and mature to deal with open applications. When we start pushing for large-scale open applications, VVC will have more comprehensive mobile and on-side hard support, and that’s when the compression power of THE VVC standard will really be shown on a large scale.

I just talked about Youku. Here I would like to introduce youku Frame Sharing, which is an ultra HD audio-visual experience jointly created by artists and scientists.

It relies on several very important UHD technical metrics, including high frame rates from 60 to 120 frames per second. In terms of spatial resolution, 4K-8K is within the frame range. In terms of dynamic range, frame enjoy fully supports HDR high dynamic range contrast and wide color gamut. In addition, the shadow must have sound, youku frame enjoy also includes 3D surround sound support.

Another very novel application of Youku is Youku Free ViewPoint Video, which mainly supports Free ViewPoint Video (FVV). FVV provides users with a good Feature, because the Video format transmitted by it is panoramic Video, and users can swipe on the screen to select their own viewing Angle. You can freely choose what you want to watch from different angles. Youku’s free perspective is supported in major CBA competitions and large-scale variety shows such as This is Hip-hop.

To see what kind of value Ali266 can bring to Youku, how to help improve the resolution of the frame, frame rate, dynamic range.

The bandwidth dividend from the VVC standard is over 50% for HDR video. It has very good technical support for enjoying 8K120 HDR ultra HD experience.

In terms of panoramic video free perspective, VVC native supports 360 panoramic video, which can better improve subjective quality and help Youku incubate new business in this aspect.

In addition, although not mentioned before, VVC, like HEVC, also has a Still picture profile, so it can help save bandwidth and storage for static pictures. Therefore, youku thumbnail and cover image static scenes can also perfectly use Ali266’s powerful compression ability. At present, our team has carried out in-depth cooperation with Youku, and we hope to report the results of Ali266 landing in Youku in the near future.

Now that we’ve talked about what’s happened in the past year, let’s take a look at the opportunities and challenges we see in the video industry in the post-VVC era.

4. Opportunities and challenges in the post-VVC era

This is divided into two parts, technology and application. From a technical point of view, each generation of standards primarily seeks higher compression rates, so VVC is not the end of the road.

In the exploration of higher compression rate, it includes the exploration of traditional codec framework and the exploration of video codec framework and tool set supported by AI technology. From the perspective of applications, a brief look at emerging applications AR, VR, MR, cloud games, meta-universe these emerging applications put forward opportunities and challenges in the post-VVC era.

Higher compression: The Framing debate

In the quest for higher compression, it’s time to see if the framework that has been used for video codec standards will continue to be used in the next generation.

On the left is a hand-built video codec framework from several generations of video standards, including different functional modules, block segmentation, inter-frame coding, loop filtering and so on.

On the right is the new Learning based framework, which is completely learned by AI method. For encoder and decoder, it is realized by full neural network.

Within the traditional framework, the JVET Standards Committee has recently established a reference platform for ECM (Enhanced Compression Model) to explore the next generation of coding technologies.

The current ECM version is 2.0. This table compares the compression performance of ECM2.0 and VTM-11.0. It can be seen that ECM2.0 can achieve 14.8% performance gain in brightness signal, and has higher performance in chroma signal. Encoder and decoder have also increased in complexity, but now it is mainly about push compression, complexity is not the most important dimension at this stage.

ECM is based on a traditional framework, with most of the tools seen in the previous development of VVC. After further algorithm iteration and polishing, the performance gain of 14.8% is obtained.

The state of AI coding is divided into two parts: end-to-end AI, and toolset AI.

As the example diagram shows, end-to-end AI is completely different from traditional frameworks, with a new framework.

In terms of the end-to-end AI capability today, the coding performance of a single image can slightly surpass VVC. However, if we consider the real video coding, that is, the time domain dimension is also taken into account, the performance of end-to-end AI is still close to HEVC, and there is still room for improvement.

In addition, AI technology can also be used to make tool set AI. On the premise of not changing the traditional framework, AI coding tools can be developed on some functional modules to replace or superimpose on the existing traditional coding tools to improve performance.

Many examples in this section are intra-frame coding and intra-ring filtering tools. As far as we know today, NNLF loop filtering technology based on multi-neural network model can achieve a performance gain of 10% compared to VVC.

AI video coding has its own challenges, divided into three dimensions.

The first challenge is computational complexity, because right now we are mainly using the concept of number of arguments for performance gain. Recently, we saw a Google paper giving quantitative guidance that if an AI tool can provide single-digit performance gain, then we want the number of arguments in the tool to be on the order of 50K. Many OF today’s AI tools have parameters ranging from 500K to 1 Megabyte, which is still an order of magnitude different from the target parameter number and needs to be simplified. In addition, the computational complexity also includes the need to consider fixed-point parameter, computation, especially multiplication computation.

The second challenge is the amount of data interaction, especially the tool-level AI and other functional modules of traditional encoders may have many pixel-level interactions. Whether it happens at frame level or block level, it is a great challenge for the throughput of codec. The tools with better performance seen today all rely on multi-neural network model, which requires model switching. When the number of model parameters is large, the data interaction generated by switching model also poses a challenge to throughput rate.

The third challenge is decoding on mobile terminals. It is common for people to watch videos on mobile phones. In my opinion, it is not feasible to do decoding on mobile terminals by making decoder + external NPU due to the data interaction volume mentioned above.

The same Google Paper states that the cost of a traditional decoder is equivalent to the cost of implementing a 2M parameter MobileNet model. We know that MobileNet is a relatively lightweight neural network, if an NNLF filter requires 1M parameters, it is half the cost of the decoder. So the cost reduction needs more efforts to achieve. Therefore, the main challenges of AI coding can be summarized as the need to achieve a more reasonable cost performance ratio, which requires a large amount of r&d investment by various companies to achieve a reasonable cost performance ratio. When we can get reasonable cost performance and give full play to the potential of AI video coding, we still wait and see.

I’d like to end with a personal observation.

One reason why AI coding has such cost-performance challenges is that AI technology is originally data-driven, and it is easier to design Data Driven in a specific scenario, while major technologies must be more challenging in general scenarios.

Therefore, I think we can take a look at the AI coding in specific scenarios, which may provide breakthrough opportunities in technology and business sooner. You may have recently noticed that Facebook and Nvidia are implementing end-to-end AI coding for face videos. In this particular scenario, AI coding can make a big breakthrough in face sharpness recovery compared to traditional methods at ultra-low bit rate, which shows the potential of AI coding.

Emerging applications

Finally, three examples of emerging applications: AR/VR/MR, cloud gaming, and the metasverse. The first two are part of the metadverse, so let’s look at the metadverse.

Let’s first look at what the meta-universe is.

When the term “metasverse” came up recently, I wasn’t quite sure what it meant myself, so I looked it up. This is taken from a New York Times article about Metaverse, which the New York Times defines as a hybrid mode of virtual experience, environment, and property.

Here are five examples of metaverse embodiment, which we’ll look at counterclockwise from the top: If your favorite game can build its own world and interact with other people, that’s the embodiment of the metaverse; If you recently attend a meeting or party and use a digital Avatar instead of a real person, it is also a manifestation of the metasomes. If wearing a helmet or glasses to experience the virtual environment given by AR and VR, it is also the embodiment of the meta-universe; If you have virtual property like NFT or crypto Currency, it’s a manifestation of the metasverse; Finally, I find it interesting that the New York Times believes that most social networks are also the embodiment of the metasomes, because the online and offline you are not exactly the same, and the online you may have some virtual elements, so it is also the embodiment of the metasomes.

Supporting the meta-universe and various AR/VR experiences have several things in common from a video technology perspective: low latency, high concurrency, and personalization.

The previous two points are similar to the requirements of existing applications. For example, livestreaming requires low latency and high concurrency. But the third requirement, which is personalized, is a completely different brand new technical support.

Because in these virtual scenarios, every user is in pursuit of their own experience and personalized choices. From the perspective of Ali Cloud intelligence, personalized cloud computing puts forward further challenges, higher requirements. Today, we support a live broadcast with tens of thousands or even millions of concurrent volume, serving many customers at a time.

However, if each customer has its own personalized requirements and each delivery can only support a dozen or dozens of customers with similar requirements, higher requirements are put forward for the quality and throughput of video processing capabilities on the cloud, requiring an order of magnitude improvement in processing capabilities.

Therefore, I believe that customization of hardware on the cloud is an inevitable technical trend in the future for video processing and delivery.

5. To summarize

Finally, let’s make a conclusion for today’s sharing.

First of all, we introduced Ali266, the VVC codec developed by Ali Cloud. Ali266 provides complete codec capability for VVC, the latest video standard, and the speed can reach real-time HD. At present, our fastest speed can reach 1080p30 frame encoding speed.

The Ali266 compression performance is excellent, achieving 50% bandwidth savings in Slow class and 40% bandwidth savings in real-time Fast class, so the Ali266 can cover the needs of different businesses from quality first to speed first. At the same time, we are very happy to report that we are conducting in-depth cooperation with Youku. We hope to implement The Ali266 technology into Youku, help Youku reduce cost and improve quality, and enable new business technical support.

Looking to the future, technically, the next generation of codec standards still need to get a better compression rate, but how to choose the framework we are still exploring, there is no conclusion today. ECM under traditional framework can get 15% performance gain compared to VVC, but it is still far from 40%, 50% requirement. AI coding can provide good performance potential, but in terms of cost performance, it is still not up to the requirements and needs to make great progress.

From an application perspective, the metasurverse will bring you a richer virtual experience and support the growth of many new applications. For the metasomes to become a reality, cloud computing needs to achieve high quality and high throughput personalized cloud computing capabilities as soon as possible to meet the challenges posed by emerging applications.

Finally, though not mentioned before, virtual world experiences need to be made friendlier, which means lighter and more inclusive AR/VR devices.

This is the end of this sharing, thank you very much, and I also want to thank the sponsor LVS for giving me this opportunity to share. Due to the impact of the epidemic, IT is a pity that I can’t communicate with you face to face. If you have any questions or want to further discuss what I have shared this time, please leave a message on the background of the official account.

“Video cloud technology” your most noteworthy audio and video technology public account, weekly push from Ali Cloud front-line practice technology articles, here with the audio and video field first-class engineers exchange exchange. You can join ali Cloud video cloud product technology exchange group, and the industry together to discuss audio and video technology, get more industry latest information.