At present, the main video coding technologies are: prediction, transformation, quantization, entropy coding and loop filtering. This article gives an overview of these encodings, and the detailed steps for each encodings will be described separately in subsequent articles.

Predictive coding

The core idea of predictive coding is not to encode all signals directly, but to predict the current signal by pinching you a signal, and to encode the difference between the current signal and the predicted value.

Why would you code with a difference?

For example, if the image is 4×4 and every pixel is 255, then binary encoding of 255 requires at least 8 bits. If we assume that the predicted value is 255, then the difference is 255-255=0. Then we can encode only one bit of 0, which greatly reduces the number of code elements.

Prediction coding can be divided into intra-frame prediction and inter-frame prediction. The encoding image using only interframe prediction becomes I frame, the encoding using only forward prediction becomes P frame, and the encoding using two directions prediction becomes B frame.

Frame prediction

To put it simply, intra-frame prediction is to predict all the remaining data in a frame except the first row and column using the original data for encoding. Of course, this is not the only choice of raw data.

The purpose of intra-frame prediction is to represent the current frame with much less data than the full frame.

Why is intra – frame prediction possible? In general, adjacent pixel brightness and chromaticity values are often very close to, that is to say, assuming that two adjacent pixels for A/B, use the original value of the pixel encoding, and then use C (C = | | A – B) to represent the value of B, because it is very close, this difference is very small, shorten the length coding. It saves space.

Interframe prediction

As the name implies, the adjacent frame data using predictive coding method.

Transform coding

Orthogonal transformation is performed on the image to remove the correlation between spatial pixels. What is the purpose of transform coding? The image signal described in space domain is transformed to frequency domain, and then the transformed frequency domain coefficients are encoded. What are the benefits of this? Generally speaking, the image has a strong correlation in the spatial domain, after the transformation to the frequency domain, can achieve de-correlation and energy concentration.

Quantitative coding

Quantization is the process of reducing the precision of data representation. Through quantization, the amount of data that needs to be encoded can be reduced and the purpose of data compression can be achieved. Quantization is a lossy compression technology, and the video image after quantization cannot be lossless restored, so there is an error between the original image and the reconstructed image, which becomes distortion.

Entropy coding

Entropy coding principle is Shannon information theory, the definition of information entropy. Refer to the previous: [Video Codec -02] purpose, conditions and objectives of video coding

For a video sequence, the original signal always has redundancy, which is called representation redundancy. If the shortest piece of data can be found to represent the amount of data represented by the whole video, the shortest data broken is entropy. The basic idea of entropy coding is that symbols of high probability allocate short code words and symbols of low probability allocate long code words. Finally, the source code length is the shortest.

In the previous article, because there is a lot of redundancy in video data, it is necessary to compress and encode. The common coding methods of space and time redundancy are summarized as follows:

Spatial redundancy

  • The prediction method, which uses the surrounding pixels to predict the value of the current pixel, is usually designed based on the adaptive filter theory.
  • Orthogonal cross transform, namely transform coding, transforms the pixel matrix of spatial domain image signal into frequency domain for processing, and performs data expression and bit redistribution according to the contribution of different frequency signals to visual quality.

Time redundancy

  • Prediction method, time domain prediction coding
  • Motion compensation prediction technology
  • Bidirectional forecasting technique