Understanding the characteristics of different codec standards can help us develop audio and video applications to take advantage of their features in accordance with local conditions, which can effectively improve the performance and experience of applications. At THE RTC 2019 Real-time Internet Conference, Ma Siwei, a professor at Peking University, shared technical details of the new generation AVS3 video coding standard, as well as the progress in developing the standard. This article will briefly summarize some of the lecture content. You can watch the video review at the end of the presentation and download the PPT.

For h. 266, a new generation of audio and video codec standards, both the state and enterprises have a high level of participation, including Huawei, Haisi and DJI, as well as some Internet companies, such as Alibaba, Tencent, Kuaishou and Bytedance. Chinese companies already account for more than half of the seats in international standards-setting organizations. At the same time, these companies are also participating in the formulation of the domestic codec standard AVS. A technical standard should face different industries, industries, but also take into account a variety of technical factors, in different application scenarios play different roles. Let’s look at the progress of the AVS3 standard in detail.

AVS3 is mainly targeted at 8K ultra HD and VR applications, and the coding efficiency is twice as high as AVS2. The technology is mainly reflected in two aspects: traditional signal processing related technology, such as block division, motion prediction, etc.; On the other hand, optimization and exploration using intelligent algorithms, such as neural networks to optimize signal predictive coding, have seen performance improvements in this area.

Around the end of 2017, the formal development plan for AVS3 started. The first call for responses was in March 2018. At that time, companies including Huawei and Samsung, as well as some research institutes, were actively involved. AVS3 is not just for Chinese companies. It is open and anyone can join AVS as long as they follow a proprietary management policy. By January 2019, AVS3 has basically completed the technology development stage; In March, several rounds of compliance tests were conducted. In September, Huawei Haisi released the first AVS3 8K video decoding chip. From technology to standards to chips, AVS3 took only a year and a half.

The AVS3 plan is divided into two phases: the first phase is about getting these tools to find a compromise between coding complexity and performance, and the second phase is about improving coding efficiency. The next time point is the 2022 Winter Olympics in Beijing. The above is the background and planning of AVS3.

AVS3 key technology

Before introducing these key technologies, we need to talk about a basic idea. Why are there so many key technologies? We can summarize the mechanism behind its technology in two ways.

One is the efficient representation of complex motion video content. There are all kinds of videos out there, and some of the videos where coding works better, is how the relationship between coding and video content is represented through modeling. For example, block translation, once you have rotation and scaling, you can’t handle it with a simple block translation model. Therefore, we need to carry out more efficient and accurate motion prediction based on this imitation motion model. Of course, the size of the block affects the prediction efficiency when making motion representations. So that’s the first aspect, how to represent complex motions to make more accurate predictions.

The second is after prediction, and then coding, including transformation, quantization and so on. These actions are also closely related to the characteristics of video content. For example, some smooth blocks are easy to encode, while some blocks have edges or complex textures. Therefore, transformation and applicability after transformation are required to ensure the improvement of coding efficiency. In the same way, the filtering that follows should be more adaptive to achieve better coding efficiency. To sum up, on the one hand, it is necessary to achieve more efficient prediction, on the other hand, it is more efficient transformation processing.

The above is a simple comparison of block partitioning, motion vector prediction, motion vector accuracy and transformation from earlier MPEG 2 to now AVS3. The original motion vector prediction was extended to time, and then to more, such as motion vector information prediction based on history. Of course, many models and processes depend on the support of computing resources. Now, advances in computing power allow us to do more complex coding techniques. At first, we used DCT transform, if the signal correlation is high, the coding efficiency will be high, if the image has edge, the transformation efficiency will be reduced. Later, Mr. Ma Siwei shared the technical details of AVS3 in more detail in his speech, including: block partition structure QTBT+EQT

  • Block partition constraint

  • Coded data organization

  • In-frame predictive filtering IPF

  • Two – step chromaticity in – frame prediction

  • Affine motion prediction

  • Motion vector prediction

  • Adaptive motion vector accuracy

  • Motion vector extension

  • Fine tuning of motion vectors

  • Adaptive transform kernel selection

For more details, click here to skip to a replay of the video. In addition, you can also get a POWERPOINT presentation and more reviews of the conference speeches here.