AV1 is the first generation of video coding standard developed by the Alliance for Open Media (AOM), and has received great attention and support from the industry since its launch. Tencent Multimedia Lab also joined the team and other companies to actively promote the optimization and implementation of AV1 encoder, to provide customers with high-performance and efficient cloud coding services. This article is about teacher Zhao Xin, an expert researcher of Tencent Multimedia Lab, sharing and organizing in “Yunjia Community Salon Online”, hoping to communicate with you.

First, the development of video coding

There are four mainstream standards organizations in the field of video coding:

1. Moving Image Expert Group (MPEG)

The MPEG organization is part of ISO and IEC. It was founded in 1988 by Hiroshi Yasuda(NTT) and Leonardo Chiariglione. Its membership mainly includes industry, universities and research institutions.

2. Video Coding Expert Group (VCEG)

VCEG is affiliated to the International Telecommunication Union (ITU) and headquartered in Geneva, Switzerland. Membership includes industry, founded in 1984, and first held in Tokyo, Japan.

3. Expert Group on Technical Standards for Digital Audio and Video Codec (AVS)

AVS is our own standards organization. It was founded in 2002 with the approval of the Science and Technology Department of the former Ministry of Information Industry of The People’s Republic of China. The membership consists of 92 universities and companies, mainly in China. The first conference was held in Beijing in 2002. In recent years, it has gradually gained international attention and foreign enterprises have joined in.

4. Open Media Alliance (AOMedia)

AOMedia was established in 2015, and its members include 44 companies, including 14 companies on the board of directors. A large number of members are Internet companies in the Bay Area of the United States, led by Multimedia Lab. Tencent joined AOMedia as a member of the board of directors in 2019. It is the only Chinese company on the board so far. This is also one of Tencent’s milestones in the international video ballroom scene.

The standards launched by the standards organization can be divided into three parts, among which the more special is THE MPEG under ISO and IEC and THE VCEG under ITU, there are countless links between them.

Both standards bodies were founded in the 1980s and initially launched their own set of standards. As the call for a unified standard grew, they jointly launched standards such as MPEG-2 and H.264/AVC, which also promoted the development of the streaming media industry.

At present, AVS has launched three standards, namely AVS1, AVS2 and AVS3, which are China’s independent intellectual property standards and a pride of China in the field of video standards.

The third standard is AOMedia, because it is relatively young, the only one that has been introduced is AV1. AV1 is the predecessor of VP-8 and VP-9, which are Google’s proprietary standards mainly used in streaming media services.

As for the development of VVC standards, Tencent started to invest in them at the end of 2017. After more than two years of efforts, many people in the multimedia Lab served as the joint chief editor of VVC standards, the co-chair of VVC reference software, convenor of a number of core experiments, and chairman of a number of expert groups. Tencent has played an important role in the formulation of VVC standards.

AOMedia was promoted by Tencent Multimedia Lab from 2019, and Tencent joined as a board member in October of the same year.

The following figure shows the distribution of patent pools for international mainstream video standards. HEVC, for example, has more than 17,000 patents. There are three main patent pools, namely HEV Advance, MPEG-LA and VELOS.

In HEVC stage, the situation of patent pool is quite large and complicated. Others have invested but are more nuanced, hovering outside the three patent pools. Because it is very expensive to pay patent fees to the three patent pools, this raises the problem that streaming media products will face some risks overseas.

It was this complex situation that led to the birth of AOMedia. AOMedia’s main goal is to develop a patent free video coding standard, which can be used free of charge by all enterprises that join AOMedia.

Second, the new generation OF AV1 video standard

1. AV1 coding technology

First, let’s introduce the coding technology of AV1. AV1, a next-generation video compression standard finalized in 2018, uses a so-called hybrid coding technology framework.

The whole coding system of AV1 is composed of many modules mixed together, and each module compresses data redundancy in different aspects of images from different perspectives and means. Therefore, different modules are combined together and complement each other to achieve relatively high performance, which is the framework of hybrid coding technology.

Hybrid coding adopted by the technical framework of the basic technical process, is such as this is an input image, it will put the image block as the unit is divided into multiple blocks and then to block as the unit project forecast, forecast to transform again, after a transformation before quantization and entropy coding, form the data compression. In the past few decades, the technical framework of codec is based on the hybrid coding framework.

(1) Block division

AV1 coding block partition technology is to divide the image into multiple rectangular blocks, and then decode the image by block. In AV1, images are divided into 128×128 units, also known as maximum coding units, or LCU for short. The LCU can be further divided into four equal parts (SPLIT) or two equal parts (HORZ,VERT). Quad-equal subblocks can be further recursively divided, and each subblock can be further divided into smaller units in up to nine ways.

The reason for the need of so many modes is that the content of the image itself is complex and diverse, and we need to divide the image synchronously in order to carry out the most effective coding for the complex and diverse image.

Generally, an object has multiple components, and it is usually necessary to divide it into multiple parts, and each part adopts different prediction modes, so as to carry out targeted prediction.

(2) Intra-frame prediction

Now let’s talk about the prediction part. The so-called in-frame prediction is the removal of spatial redundancy between images, where a pixel has a strong correlation with the pixels around it. For example, the color of the white wall is a single color, and the trend of each pixel is very close to the trend of other pixels, which leads to a strong data redundancy. Intra – frame prediction is to remove the spatial data redundancy by using certain technical means.

The main methods include the following five:

Direction prediction model
Recursive filtering mode
Paeth prediction operator
Cross component prediction model
DC prediction model, smooth prediction model

Direction prediction is to assume that the image has directional texture, the image can be predicted along the direction of the better.

Recursive filtering mode is a unit that divides the image into subdivided blocks. Each unit will form a filter with surrounding pixels, and then make linear weighted prediction. In this mode, the filtering process needs to be carried out sequentially.

The Paeth prediction operator is a hypothesis when the image locally presents a plane. In addition, there is a cross component prediction model, which is mainly for color images. Color has three components, each of which has a strong correlation.

In addition, there are DC prediction model and smooth prediction model. These two models are mainly used for smooth texture prediction.

(3) Inter-frame prediction

Inter – frame prediction refers to time – free redundancy on the image. The so-called time redundancy refers to the video is completed by a series of images played in sequence, so it constitutes a video. So why does sequential play make video? Because they belong to the same product and content in space, but there are some differences in movement, there is a very strong correlation in data.

In order to deal with this correlation, affine motion model is introduced on AV1 to imitate more complex models such as rotation and scaling. Similar overlap block motion mode compensation, mixed prediction mode and so on.

(4) Transformation

Extended transformation types include: DCT, ADST, IDT, Flip-ADST. AV1 supports up to 16 row and column transformation combinations.

(5) Entropy coding

The main emerging technology of entropy coding is multi-symbol context adaptive arithmetic coding engine. Compared with binary arithmetic coding engine, the throughput of entropy coding can be improved by single cycle.

(6) Intra-ring filtering

Intra-ring filtering includes block-removal filtering, constrained direction enhancement filtering and loop repair filtering. Loop repair filter includes Wiener filter and self – guided projection filter.

(7) Palette mode

The color palette mode refers to the pointer to the screen content of the video image. The brightness/chroma value is sparse, and the image is indexed and coded as a graphic block.

(8) Intra-frame block matching

Tencent LOGO contains two N’s and two E’s. The image is complicated. If the image N is encoded, adding a vector can predict another N, and the effect will be improved.

2. Application scenario of AV1 coding

An important application scenario of AV1 is streaming media. The Open Media Alliance includes many streaming companies, including Google, Youtube, Netflix, Hulu and IQiyi, among others. Youtube currently uses a combination of AV1 and VP9 encoding for hd videos, and 8K videos with AV1 encoding have been released this year. Netflix has also been supporting AV1 streaming on Android since February 2020.

Tencent Multimedia Lab is actively developing commercial application technology products of AV1 codec. Last year, Tencent Video Cloud cooperated with multimedia Lab to launch AV1 live broadcast and vod services, and Tencent Video Cloud became the first public cloud manufacturer in China to support BOTH live broadcast and VOD video processing business of AV1. In addition, multimedia Lab collaborates with other codec teams of Tencent to promote the commercialization of AV1 codec in different businesses. The lab is cooperating with Tencent Video to promote the application of AV1 in product business.

In terms of cloud transcoding, AWS Elemental MediaConvert announced support for AV1 encoding in March 2020.

Iii. AV1 standard and cloud coding

In cloud coding, AV1 has the following advantages:

AV1’s open source community provides rich encoder configurations to address different business needs, such as real-time/non-real-time
Support for Temporal Scalability
Support for frame-level super-resolution coding
Free of patent copyright fee, support products to go to sea

In terms of multimedia experiments, Tencent Multimedia Lab cooperates with Tencent Cloud and Tencent Video. Tencent Multimedia Lab and Tencent Cloud are actively promoting the commercial application of the video AV1 standard. Promoted by multimedia Lab, Tencent has joined the upcoming SVT Foundation as a board member to help open source community AV1 software coding.

Here is a brief introduction to the next generation of video coding standards.

The first is Versatile Video Coding standard, which was developed by ITU-T SG 16 WP 3 and THE JOINT ISO/IEC JTC 1/SC 29/WG 11(MPEG) Working Group JVET. It was officially launched at the San Diego conference in April 2018, and the standard document was finalized in July 2020.

Compared with the previous generation OF HEVC standard, it can save 35% bit rate (under the premise of the same PSNR quality), and the reference software coding time is 10 times and decoding time is 2 times.

Tencent Multimedia Lab has obtained nearly 100 technical proposals during the standardization process of VVC which lasted more than two years, filling the gap of Tencent in the field of international video standardization. Many people in Tencent Multimedia Lab held important positions in the process of VVC standardization, including standard co-editor, reference software co-chair, several core experiment leaders, and several expert group chairs.

In addition to VVC, there is AOMedia Video 2 standard.

AOMedia began preparing the next generation standard, AV2, in 2019, and a reference software platform for the next generation STANDARD is expected to be launched in the near future. Tencent Multimedia Lab and Google jointly organized technology discussions to establish the Coding Technology Incubator Group. At present, Tencent Multimedia Lab has initially launched three coding technologies, and relevant work has been published in ICIP 2020.

Fourth, the Q&A

Q: Why are there so many coding standards?

A: It has something to do with the development of video coding. In the beginning, there were two standards organizations, namely MPEG and VCEG. They developed their own standards, which caused some confusion in the industry, so the two standards bodies joined forces to pool their resources and efforts to develop a common set of standards.

The standard was very successful and had a huge impact in the industry, and the pie grew bigger and bigger, with more and more companies paying attention and the size of the patent pool growing rapidly. Later, in order to deal with this complex situation and promote the updating and iteration of technology, other standard organizations also emerged, including AVS and AOMedia, mainly for the high patent copyright fees.

Q: Will AV1 lead the next generation of video coding?

A: PERSONALLY, I think the next generation of video coding will be A state of contention. As we know, international video coding standards have been developed for decades, with a very perfect standard formulation process, a large number of participants and strong technical strength, so they have accumulated profound skills in polishing standards.

AOM is a relatively young standards organization and its technical input is relatively concentrated at present. So far, Tencent and Google have invested more in the next generation of AV2 standard technology than other AOM members. Although AOM is relatively young, through the input of Tencent Multimedia Lab, a number of companies have been working together on the technology, hoping to make a greater breakthrough in the technology research and development of the next generation AV2 standard.

I think the biggest advantage of AOM is that it is free of royalties. If you want to use more complexity you can consider the international organization standards, take what you need. I also hope that in the future there will be some convergence of standards bodies, seeking common ground while reserving differences, and concentrating on the benefits of the industry as a whole.

Q: Is royalty free permanent?

A: When the Open Media Alliance was founded, the goal was royalties, but it is not easy and there will be some challenges. We are aware that other companies that are not AOM members also claim to hold the necessary patents for such standards.

The companies in the Open Media Alliance have a huge impact on the industry, and it is one of our objectives and fundamental principles to exempt patent royalties.

Q: What are the advantages of AV1 compared with others?

A: Mainly in terms of performance. AV1, HEVC and VP9 belong to different generations of standards. As far as I know, AV1 has been improved by more than 20% compared with the previous generation of standards, which will save a lot of bandwidth. In addition, for emerging video services, such as 8K video, AV1’s performance advantages will be even more prominent.

Q: Will the patent pool problem persist for a long time?

A: This is A problem that has puzzled the industry for A long time and is not easy to solve, hence the birth of the Open Media Alliance. Currently the INTERNATIONAL Standards Organization has so-called organizations to solve the patent pool problem. I don’t see a complete solution yet, but people are aware of the seriousness of the problem and are actively pushing for a solution. It is hoped that in the future the industry will be able to make effective use of new technologies in solving problems in patent pools.

Q: AV1 software is inefficient. How much room for further optimization? Can only expect hardware coding to improve this problem?

A: Software coding efficiency WHAT I understand is the efficiency of speed in power consumption. In fact, its coding performance is quite considerable. The optimization space for coding performance should be endless. In the first two years, the optimization iteration of the encoder will be faster, but it is an ongoing thing. Tencent Multimedia Experiment also has great investment in this area, hoping to promote the development and widespread deployment of AV1 software encoder. In addition, there has been continued good news within the Open Media Alliance recently, with very significant performance improvements.

Q: Will more hardware vendors support it?

A: I understand there will be, and there will be strong support. As far as I know, In terms of hardware decoding, Mediatek, Samsung and LG have launched corresponding hardware decoding. On the coding side, there is a bigger challenge, on the hardware encoder side, we expect to see more hardware growth support towards the end of this year or in the next 2-4 years.

Q: How does Tencent Multimedia Lab exist?

A: Tencent Multimedia Lab focuses on the exploration, research and development, application and implementation of cutting-edge technologies in multimedia technology and related fields. Research areas and product development direction include audio and video codec, network transmission and real-time communication, multimedia content processing, analysis, understanding and quality assessment based on signal processing and deep learning, immersive media (VR, AR, point cloud, etc.) system design and end-to-end solutions. The lab continues to output core technologies and engineering implementation for a number of businesses, covering office, education, cultural tourism, e-sports, pan-entertainment and other fields. It serves tens of millions of DAUs and outputs general solutions and products through Tencent cloud. The lab is also responsible for the development of international and domestic industry standards, including multimedia data compression, network transport protocol, multimedia systems, 5G and AI.

Tencent Multimedia Lab owns more than 500 global patents (including patent applications) in multimedia and related fields, nearly 200 of which have been adopted by international standards such as multimedia data compression, system and network transmission protocol. On behalf of the Company, Multimedia LABS has held several international organizational positions, including board of directors of the International 8K Association, open Media Alliance (AOM), and Board of Directors of DASH Forum.

Q: Is AV1 a soft solution so far?

A: At present, hardware decoders have been launched, including Mediatek Breguet 1000, Samsung and LG 8K TVS, which are embedded with AV1 hardware decoders.

Q: Besides live broadcast and on-demand, what other application scenarios are available for AV1? Can medical imaging be used?

A: Medical imaging has its specific requirements, such as very low distortion of the image and real-time operation. If the image is defective, it will interfere with the medical diagnosis, which must be avoided.

In addition, the demand for video applications is also reflected in the membership of the Open Media Alliance, which has yet to see any medical imaging companies join. In terms of the technology itself, AV1 has no problem in medical imaging. AV1 supports very high quality image and video encoding. In addition to medical imaging, live, on-demand, that involves the application of video communication scenarios, such as cultural tourism, education and other fields, and VR panoramic video, 8 k AV1 emerging applications such as video scene has its place, we also hope to AV1 can be widely used in video communication of each application scenario, boost the development of the multimedia industry.

This is also the original intention of Tencent to join the Open Media Alliance. If the standard organization is expected to consider the specific needs of the company’s business, it can join the standard and feedback some business needs, so that the formulation of the standard can better meet the needs of specific businesses.

Author’s brief introduction

Zhao Xin is an expert researcher at Tencent Multimedia Lab. Responsible for the research and development of new generation video compression algorithm and standard formulation. Since joining Tencent in 2017, he has been mainly involved in the formulation of a new generation of international video compression standard H.266/VVC, responsible for the research and development of a number of Tencent’s patented technologies and promoting their adoption into H.266/VVC standard, filling the gap in Tencent’s international video standard field. At present, I am mainly involved in the open source collaborative project of Tencent AV1 encoder and the optimization of SVT encoder in the open source community, and responsible for the formulation of standards under the Open Media Alliance (AOM), including the technical pre-research and standard preparation of the next generation AV2 standard.

“Yunjia community” public account, reply “PPT” to get the teacher’s speech PPT~