This article is based on the speech given by Professor Ma Siwei at the next generation cloud audio and video technology special session of RTC2017 Real-time Internet Conference.

Welcome to visit the RTC developer community to communicate with more developers of real-time Communication technology (RTC) and participate in more developer activities.

Ma Siwei, Professor, School of Information Science and Technology, Peking University, received his ph. D. degree from Institute of Computing Technology, Chinese Academy of Sciences in 2005. From August 2005 to August 2007, he did postdoctoral research in University of Southern California, USA. Then he worked in Peking University until now. In 2013, he was supported by the National Natural Science Foundation of China (NSFC) outstanding Youth Project. In 2015, he was selected into the Second Group of Top-notch Young Talents Program of Organization Department of the CPC Central Committee. His research interests are video coding and processing. He has published more than 200 papers and obtained more than 40 authorized invention patents. IEEE Transactions on Circuits and SystemforVideo Technolgoy(TCSVT), Journal of Visual Communication and Representation(JVCIR) editorial Board (AE), Council member of Chinese Society of Image and Graphics, co-leader of AVS Video Group, etc. Since 2002, it has participated in the formulation of a series of national standards for AVS1, AVS+ and AVS2, and has won the second prize of The State Technological Invention Award and the second prize of the State Scientific and Technological Progress Award.Copy the code

First of all, let’s review the history of video coding. Video coding originated from broadcast and TELEVISION. For a long period of time, the main driving force of video coding and decoding is from broadcast and television. Of course, today we look at the Internet video coding is faster and faster, yesterday at ICET2017 World Congress, ICET president also said that in the past a code is updated every 10 years, but now from the latest progress of H.265, it may be less than 10 years.

In our computer industry, the computer was born in 1946, but it was not until 1957 that images appeared on computers. Kirsch was the creator of the first digital image, and he used his son to make the first digital image. 2007 was the 50th anniversary of the image, and now it’s 60, and the baby is now in his 50s and 60s. Today’s digital images have gone to 4K or even 8K.

When it comes to coding, the principle of coding is that there are many redundancies in video, including time-domain redundancies in which two consecutive images appear, perceptual redundancies in which human eyes are insensitive to high-frequency information, and perceptual redundancies in which they are insensitive to perception. Based on this principle, MY current video codec framework has not changed much since the early days of 261, until 266 is almost out today.

Within this framework, coding techniques can be divided into three main chunks. And part of that is transformation coding, which I just mentioned is the transformation of high-frequency information to get rid of the redundant information. There are also spatial and temporal domains to remove redundant information. There’s predictive coding, there’s entropy coding. This is a three-block coding technique.

If you look at the three big pieces of coding, it was actually 1950, 1946 was the birth of the computer, 1948 was relative information theory, and the beginning of digitization in the early 1950s was the beginning of video digital coding. In the early stage, due to the limitation of computing capacity, it was basically processed based on pixels, which were all processed by chance statistics, such as huffma programming statistical coding technology we see. With increased computing power there is now block-based processing. The original can only be based on image to do, later can be based on block processing, based on block motion estimation, motion compensation, such as block size can also be changed, we see today h.264, 265 are such. Transformation, by the end of the ’70s, basically established the video coding industry that we have today, which is basically 40 years old. Now if we go back and this is in the last few years, of course 265 is a few years old, this is where we see the progress, including the image transformation and the image transformation. This is the technical development of video coding.

Now look at the progress of video coding technology. The first is the spatial resolution, from the original small image to standard definition, to high definition, and then ultra high definition. The second is the time resolution, which is up from 15 frames, and up to 20 frames, to 120. Third, the sampling accuracy, the current HDR high dynamic range TV, is at least 10 bits, but 10 bits is not enough, it will develop in the future, maybe 20 bits. Fourthly, viewpoint number and field of view are closely related. Instead of sending a video one way, it could be two ways, it could be multiple ways, this is the number of views. The range of the video, the Angle of view, is getting wider and wider. This is the field of view. Fifth, model data. The model data includes the rendering of the contour object. Depth data also includes features, knowledge of image content and objects. There are also point clouds, which can completely reconstruct objects and rely on this technology for vision reproduction.

Take a look at the trends in video coding. 4K is becoming more and more popular. We have seen Internet broadcasting. Recently, Guangdong province also held a meeting to propose 4K TV broadcasting technology as the next step. Beijing also proposed that the 2022 Winter Olympics will be the trial broadcast of 8K. The image above can be compared to how small the hd in the upper left corner is with the test 8K. Of course, 8K video is not just a matter of resolution, but also the accompanying technology, including sampling accuracy, frame rate and sound.

I also have a quick review of the latest progress, one is the most concerned about JEM266, Qualcomm to promote the development of a new generation, was first proposed in the MPEG conference, proposed HM – KTA – 1.0, to 2020 to develop a new standard.

JEM’s performance has improved significantly. On objectivity tests, the coding efficiency has increased by 30% and the complexity has increased by 12 times, which is still very stressful to implement. This is an early prototype, and there will definitely be more optimization and more technology on the back end between complexity and performance.

That was JEM’s technology. Areas like the light field, which used to focus an image, now record content rays in different directions, once using a camera array, or microlens.

This is the sequence coding framework, which wraps the coded data into each box. Of course, the point cloud data is more complicated.

The most recent is Feature-based coding (CDVA), the latest standard in 2018. Video surveillance is a typical application. Millions of channels of video, if it is traditional encoding, the data amount reaches 1Tbps, if it is CDVA, the data amount drops to 10Gbps, tens of thousands of times compression ratio. Transmit very little data for the purpose of analysis and identification.

Conclusion: 4K is popular, 8K is the future trend, the next generation standard is worth paying attention to; The acquisition revolution further expands the dimension of visual data and enriches the diversity of visual data coding. The convergence of acquisition, computing and cognitive technologies makes intelligent coding possible.