【 Experience sharing 】 VIDEO coding and decoding of RTC technology series

To understand what video codec is, we first need to understand what video is.

Video is ultimately a series of continuous image frames. When these images are played at a certain rate, human eyes will judge that they are continuously active, thus constituting a video.

Then why do you need to conduct video codec? Because the digital video signal has a huge amount of data, which will occupy a lot of bandwidth and storage space when it is transmitted or stored on the network, resulting in waste. The current mainstream 1080P resolution, 30 frames per second video example 1080P image height and width are 1080 and 1920 respectively, each pixel is represented by three primary color RGB (i.e., three bytes per pixel), Therefore, the data amount of each frame of image is 1080 * 1920 * 3 * 8=49766400, and 30 frames per second, so it needs to be multiplied by 30,49766400 * 30 = 1,492,992,000 BPS. So video coding and decoding technology was born.

Why can video be compressed? Let’s look at this in two ways:

1 in an image, there are often similar color areas, which contains a lot of redundant information. Redundant information can be processed based on variable coding and quantization coding to achieve the possibility of compression.

2. There must be a large number of identical and similar parts between two images, so it is possible for motion compensation and motion estimation to describe motion vectors to compress redundant information between images.

Based on the theory of predictive coding within image and predictive coding between image, a large number of video codecs have been created.

The H.26X series, from H.261,H.263, to the current mainstream H.264, and H.265, the current most recently formulated standard H.266; The h.26X series has been developed for the purpose of using technology to optimize the amount of compressed data to achieve better video quality; Something like h.265 can play video of the same quality with half the bandwidth. It has a similar algorithmic architecture to H.264, while improving some of the related technologies to dramatically improve video quality.

Mpeg series, Mpeg1,Mpeg2,Mpeg4(after Mpeg4 and H.264 fusion);

VP series, VP8,VP9; VP codec series is Google’s own and open source codec series. Google created VP codec series for the reason that H.264 requires patent fees, that is, if WebRTC uses H.264, it needs to pay patent fees by browser (of course, due to the wide support of H.264, The main reason is that Cisco open source Open enh64), VP8 is h.264, except in the field of WebRTC, its popularity and support is relatively limited; VP9 is benchmarked H.265. One of the goals of VP9 is to reduce about 50% of the bit rate relative to VP8 under the condition of ensuring the same quality. In other words, the same bit rate, VP9 can improve the picture quality significantly than VP8.

Domestic series,AVS standard, AVS1.0,AVS2.0; AVS is the second generation source coding standard with independent intellectual property rights in China. AVS2.0, which is a new generation standard in the same class as H.265 and VP9; Although the use and popularity of AVS is not high, it indicates that China has noticed the concurrency in this field.

SVAC standard, a codec applied in the field of video surveillance in China, is characterized by high security of encryption authentication. It provides encryption and authentication interfaces and data formats to ensure data security, integrity and non-repudiation. Region of Interest (ROI) coding: The image is divided into several regions of interest (ROI) and a background region. Real-time video information is guaranteed in the key monitoring area, and the frame rate is high to save the cost of non-regions of interest. Video information embedding, can be sound recognition feature parameters, special events, time and other information can be embedded in the code, without the premise of unraveling the video, can be targeted extraction, fast retrieval, classification query; Scalable video decoding (SVC) : the video data is layered to meet the needs of different transmission networks and data storage environments. The common coding program transmits main code stream and sub-code stream, which occupy a large bandwidth. SVAC transmits only one kind of code stream, and the image information of different resolutions can be obtained by layered code stream.

WebRTC, originally proposed by Google, is mainly browser-based RTC communication, so it is called WebRTC. In the early stage, major browsers have different support for WebRTC and its video codec, such as Chrome, FireFox and Opera in the Mozilla camp. Chrome only supports VP series in the early stage, for reasons described above, and then gradually extends to H.264. Since the original RTC communication field mostly uses H.264 codec, the support of H.264 provides great convenience for cross-field RTC communication, and I think it also speeds up the development of WebRTC to some extent. For example, browser and mobile phone can join video conference at the same time. Or the browser and the current SIP terminal point call, due to h.264 support, greatly reduce the transcoding requirements, do video transcoding is very performance consumption, or use dedicated hardware implementation.

Of course, more and more manufacturers have joined the field of WebRTC, such as THE RTC system of Agora Sound network, which has surpassed WebRTC, such as its VARIOUS hardware chip platform SDK adaption, SD-RTN system (priority path selection to ensure high transmission quality, after all, communication is not a purely terminal side function, Juda), excellent weak network adversation algorithm, can resist 70% video packet loss to ensure smooth calls.

At present, with the development of the Internet of Things, in addition to people’s telephone communication, audio and video conference, RTC is more and more widely used in various fields; For example, security monitoring, intelligent hardware terminals, video processing hardware equipment is becoming smaller and even miniaturization, the original software-based coding and decoding, no matter from memory, CPU, performance and other resource occupation, all show great disadvantages; Many manufacturers are also aware of this situation, so more and more professional chips do professional things more obvious trend. In the field of surveillance, Huawei HAisi ARM+ professional video processing unit occupies more than 70% of the domestic video market; NVIDIA has launched a series of Jeston chips to deal with edge computing scenarios. The processing mode of ARM+GPU is more universal. Meanwhile, due to the low power consumption of ARM, edge devices can be equipped with video processing, machine vision processing and AI analysis capabilities, greatly enriching the application of intelligent Internet of things.

Due to the epidemic in the past two years, the development of online education and live broadcasting, Web real-time communication has brought great opportunities for development, and its commercial success has also injected vitality into the development of technology. With the fervour of 5G, the emergence of VR/AR, automatic driving and other new application scenarios will certainly bring new power to WebRTC technology and drive the development of real-time audio and video communication technology based on the Internet.

【 Experience sharing 】 VIDEO coding and decoding of RTC technology series

Related Posts

[8]

React source code 18 Event system

Face-api.js: a JavaScript interface for face recognition in the browser