Introduction:

Live broadcast on demand has been closely related to daily life. What is the most important thing in this process, which is lower broadcast cost? Or better picture quality? This involves narrowband HD technology, for video narrowband HD technology, intelligent video coding is one of the most basic and important part.

Cheng Ling | netease cloud senior audio engine development engineers

Overview of narrowband HD technology

Narrowband HD technology is actually a set of video coding technology based on the optimal subjective perception of human eyes. It represents a video service concept with the most reasonable configuration of cost and experience and the best cost-effective performance. While narrow band means saving unnecessary bits, hd means allocating bits to places where they can produce more value, thus achieving sharper and better picture quality at the same bandwidth.

Under the influence of the epidemic, live broadcasting has penetrated from traditional shows to various fields. With the advent of the era of nationwide live broadcasting, there is an increasing demand for narrowband HD technology. This paper will first introduce some mature narrowband HD solutions in the industry, then share netease Yunxin’s exploration and practice in narrowband HD technology, and finally share its key technical point JND sensing coding technology.

Introduction to narrowband HD solutions in the industry

The industry has a relatively mature application of narrowband HD technology, the following will introduce some typical technical solutions.

Taobao live

Taobao Live adopts HEVC coding to achieve 720p/25fps,800kbps compression, and PSNR> 43dB /VMAF>90. Its video narrowband HD technology is mainly applied in three aspects:

  • Audio and video enhancement, using AI-based image enhancement, beauty enhancement and voice enhancement to improve production quality
  • Perception processing, using source channel joint adaptive coding, including ROI detection, setting different coding parameters according to scene classification, intelligent code control, etc
  • S265 encoder, S265 encoder is the industry leading HEVC encoder

Ali narrow band HD

Ali’s narrowband HD solution is based on the human vision model, and the optimization goal of the encoder is adjusted from the classic “highest fidelity” to “best subjective experience”. With the unique algorithm, it weakens the areas easily ignored by the human eye, strengthens the details that the human eye pays attention to, and fixes the content that the human eye hates. It breaks the upper limit of the ability of the contemporary video encoder, and provides a clearer viewing experience while saving the bit rate.

Tencent Speed HD

Tencent speed HD is the use of video intelligence (video is divided into games, shows, sports, outdoor, animation, food, film and TV series and dozens of categories of dozens of small scenes), intelligent coding parameters (different scenarios configure different optimal coding parameters), pre-processing (sharpening, soft blur, block removal, noise reduction) and other technologies as far as possible to solve the transcoding distortion, low resolution blur, camera jitter, noise, low bit rate sawtooth block and other problems exist in the transcoding, douyu, Penguin e-sports, CCTV, Xinying sports, etc.

03 NE264 narrowband HD technology

NE264 is a video encoder in accordance with H.264 standard developed by netease Yuntelecom, which has been applied in RTC and live broadcast on demand. For live broadcast on demand, NE264 aims to achieve lower bandwidth and higher picture quality under the existing architecture, that is, NE264 narrowband HD. Below we will briefly introduce the video coding technology and visual perception coding technology proposed according to the characteristics of human vision, on this basis proposed and implemented NE264 narrow band HD technology.

Video coding

Video coding uses redundancy between data to compress. Early video coding improves compression efficiency by optimizing spatial redundancy, time-domain redundancy and frequency-domain redundancy. From MPEG-1 to MPEG-2, the bit rate is saved by about 50%, the coding efficiency is doubled, and the complexity is increased by about 5%.

H.264, a classic video compression protocol, is introduced in 2003. After the introduction of H.264, the optimization efficiency of the traditional coding method becomes increasingly low. From H.264(AVC) to H.265(HEVC), the coding efficiency increased by 40%, but the underlying complexity increased by a factor of 5, while from H.265 to the latest H.266(VVC) standard, the coding efficiency was less than 40%, but the complexity increased by more than a factor of 10.

As coding standards evolve, the benefits diminish. With the development of technology, technical breakthrough is more and more difficult, so a new idea of coding compression is urgently needed.

Human visual system (HVS)

With the development of physiological and psychological research on human visual system (HVS), we find that there is a lot of information redundancy when human brain processes vision, and the efficiency of visual compression can be significantly improved by using the characteristics of human vision, which is the principle of human perception compression.

The human visual system consists of the eyeball, nervous system and visual center of the brain. When the human eye is watching a video scene, the incident light is first adjusted and focused by the pupil and lens, so that the scene can be imaged on the retina. Then the neurons on the retina convert the light signal into nerve signal and send it to the visual cortex. The visual cortex and other areas of the brain are further processed to form the perception of the video scene.

In recent years, under the guidance of visual psychology and physiology, through the observation and research of some visual phenomena of human eyes, people have found many characteristics of HVS. Currently, HVS features commonly used in visual perception coding include visual attention, visual masking, visual sensitivity, visual statistical learning mechanism, etc. Some features of HVS are shown as follows:

Visual masking: It is easy for human eyes to perceive a single visual signal. When several visual signals exist at the same time, HVS ‘perception ability to one or more of them will decrease or even disappear, and the perception threshold will change, including:

  • Brightness masking: the human eye perceives lighter or darker areas more weakly
  • Texture masking: the visibility threshold of non-uniform area is significantly higher than that of uniform area
  • Pattern masking: the human eye’s resolution of regular objects is obviously higher than that of irregular objects
  • Motion masking: the human eye’s resolution to the scene of intense motion will be significantly reduced

Visual attention, that is, when human eyes pay attention to the video scene, they will quickly focus on the video content or object they are interested in. There are two modes:

  • Bottom-up processing driven by external incentives. It is mainly related to the salience of the image content, and the target with great difference from the surrounding area is easy to attract the visual attention of the observer.
  • Task-driven, top-down processing. Consciousness dominates, depends on specific commands, and is determined by the person’s “cognitive factors”, such as knowledge, expectations and current goals, such as monitoring the human body in the scene is more likely to attract attention.

Visual perception coding

The purpose of visual perception coding is to take advantage of known HVS characteristics, eliminate the information that human eyes cannot perceive to the maximum, and provide video images with better visual perception quality with less bit resources. To this end, researchers have proposed a number of visual perception coding methods. According to the different characteristics of HVS used by coding methods, coding methods based on visual mask and coding methods based on visual attention are more studied and applied.

As for the coding method of visual mask, the characteristics of the multi-channel model of human eyes are that the existence of one incentive will lead to the change of the detection threshold of another incentive, resulting in the decrease or disappearance of human eyes’ perception of one or more of the incentives, which provides a possibility for eliminating visual redundancy. At present, the coding methods based on visual mask mainly include JND model and SSIM,VMAF and other subjective evaluation mechanisms. The coding method of JND model is currently widely used in human vision coding, which is also the technology we focus on.

As for the coding methods of visual attention, according to whether the foveal characteristics of HVS are considered, the coding methods based on visual attention can be divided into two categories: the coding method based on the region of interest and the coding method based on the significance detection of human eyes.

  • The basic idea of region of Interest (ROI) based coding method is to conduct visual perception analysis of input video scene to determine region of interest before video coding. During the coding process, coding parameters such as QP are adjusted to control the distortion degree of the REGION of interest (ROI) and non-ROI respectively, so as to improve the coding quality of ROI. This technology has been proposed for many years, but its practical improvement is limited.
  • The coding method based on Visual saliency detection is to extract the salient region (that is, the region of human interest) in the image according to the Visual characteristics of human. When faced with a scene, humans automatically process the regions of interest and selectively ignore the regions of interest, which are called salience regions. This technology is a common one in human eye perception coding. It is usually combined with JND and other technologies to achieve better compression effect, and it is also the technology we want to study first.

NE264 technology

At present, the industry has been relatively mature narrowband HD technology, combined with the CODING characteristics of NE264 and the goal to achieve, our narrowband HD technology is mainly divided into three parts:

  • Video pre-enhancement processing technology: texture enhancement to enhance subjective experience
  • Saliency detection technology: based on the characteristics of human visual attention, it can distinguish significant and non-significant regions for coding and improving compression rate
  • JND perceptual coding technology: based on human vision masking characteristics, acting on coding, improve compression rate

The specific process can be seen in the following figure: For the input video, we can analyze the content characteristics of the video through machine learning, and then conduct pre-processing of the video enhancement to improve the image quality. Then, significance detection is carried out to distinguish the significant and non-significant regions, which are transmitted to the NE264 encoder, and THE JND coefficient is calculated by THE NE264 encoding. Final output display.

The following figure shows the contrast effect before the enhancement, in which the left picture is the original picture and the right picture is the effect after the enhancement. It can be found that the subjective image after the video enhancement has been significantly improved.

The following figure is the significance detection effect diagram, in which the color diagram above is the original one, and the black and white diagram below is the significance detection effect diagram, ranging from 0 to 255. The brighter it is, the more significant the area is.

04 JND sensing coding technology

Let’s take a look at the key technology mentioned above: JND sensing coding technology.

JND(Just Viewer Distortion) is the minimum appreciable error, which measures the sensitivity of the human eye to the Distortion of different regions of an image. It is widely used in image/video coding, digital watermarking, and image quality evaluation based on visual characteristics. At present, several JND models have been proposed, which can be divided into two categories: JND model based on pixel domain and JND model based on DCT domain.

  • The JND model based on the pixel domain can give the JND threshold value of each pixel point more intuitively in the pixel domain. It does not need to consider the frequency domain characteristics, and the calculation is simple and convenient, but the accuracy is not high.
  • The JND model based on DCT domain takes into account the frequency domain characteristics and is more widely applied, usually including three parts. Luminance Adaptation (LA), Contrast Masking (CM), and Contrast Sensitivity Function (CSF). We mainly use JND sensing coding technology based on DCT domain. JND calculation formula is as follows:

The JND sensing coding based on NE264 is as follows: For the input YUV image, we first calculate the brightness sensitivity, texture sensitivity and contrast sensitivity to obtain the JND coefficient, and then act on the DCT domain to change the original DCT coefficient, and then encode and output the bit stream.

05 summary

NE264 narrowband hd technology is mainly introduced in this paper and JND cognitive encoding technology, for the live broadcast on demand applications, how to ensure that as much as possible to reduce the bandwidth on the basis of high-definition has always been the pursuit of the goal, video coding is one of the crucial link of both traditional coding technology, or in combination with intelligent coding technology, we will continue efforts, Bring lower delay, higher quality video experience.

That is all the content of this share. Click here to view the video review of this share.

The authors introduce

Cheng Ling is a senior audio and video algorithm engineer of netease Yunxin. Currently, she is mainly engaged in video coding algorithm research in netease Yunxin, and has rich experience in video quality optimization and bit rate control algorithm.

More technical dry goods, welcome to pay attention to [netease Smart enterprise technology +] wechat public number