define

The basic idea of predictive coding is not to encode the signal directly, but to use the previous signal to predict the current signal, and to encode and transmit the difference between the current signal and the predicted value.

Why would you code with a difference?

For example, if the image is 4×4 and every pixel is 255, then binary encoding of 255 requires at least 8 bits. If we assume that the predicted value is 255, then the difference is 255-255=0. Then we can encode only one bit of 0, which greatly reduces the number of code elements.

What is intra prediction?

The purpose of intra-frame prediction is to represent the current frame with much less data than the full frame.

Why is intra – frame prediction possible?

In general, adjacent pixel brightness and chromaticity values are often very close to, that is to say, assuming that two adjacent pixels for A/B, use the original value of the pixel encoding, and then use C (C = | | A – B) to represent the value of B, because it is very close, this difference is very small, shorten the length coding. It saves space.

For a frame of data, how do you predict the encoding?

Prediction in H.265/HEVC frame can be divided into the following three steps: 1. Determine whether the current adjacent reference pixels of TU are available and do corresponding processing; 2. 2. Filter the reference pixels; 3. Calculate the predicted pixel value of the current TU according to the filtered reference pixel;

TU: transform unit

As shown in the figure above, the current region to be encoded is TU (NxN), whose reference pixels are divided into five parts: lower left (A), left (B), upper left (C), upper (D) and upper right (E), with A total of 4N+1 points.

If the current TU is at the edge of the image, or at the edge of a Slice or Tile (h.265 /HEVC specifies that adjacent slices or tiles cannot be referenced to each other in intra-frame coding), adjacent reference pixels may not exist or be unavailable. In addition, in some cases the block in which A or E is located may not have been encoded and these reference pixels may not be available. When the reference pixel does not exist or is not available, the H.265/HEVC standard fills with the nearest pixel. For example, if the reference pixel of region A does not exist, all reference pixels of region A are filled with the lowest pixel of region B; If the reference pixel of region E does not exist, all reference pixels of region E are filled with the pixel to the far right of D. It should be noted that if all reference pixels are unavailable, the reference pixels are filled with a fixed value of [R = 1 << (BitDepth -1)].

Flow chart of intra-frame prediction coding

  1. First, the blue part in the figure above, suppose that NOW I need to encode a pixel X. Before encoding this pixel, suppose that I already have a reference pixel X ‘, which is related to a neighboring pixel of the same frame. Based on the value of the reference pixel X ‘, I get a predictive value Xp.
  2. Then, for the red part in the figure above, I subtracted the predicted value Xp from the encoded pixel X to obtain the residual d, which was encoded into the final image instead of the original value X, saving bit rate
  3. Finally, the black part in the figure above, the residual D is added to the predicted value Xp to obtain X ‘, which is used for the prediction of the next pixel.

Steps is very simple, but there are some questions to clear, the first thing to know is that we can according to the pixel is to predict, but it’s too trouble, to calculate many times, but due to the nature of frame prediction, do you want to predict the number of pixels of the current can reference pixel only is its adjacent pixels, which is already done coding, The pixels behind it have not been coded yet, so of course they cannot be used to encode the current pixel, so we have to encode one pixel at a time, one after the next, which is obviously slow. A macro block is 16×16 pixels, which can then be divided into sub-blocks as small as 4×4 (this is the size for brightness encoding, and for chroma encoding, the 4:2:0 chroma macro block is half as long and half as wide as the brightness macro block), which can also greatly improve the computing speed. Therefore, the following “value” can stand for “pixel” or “block”. In principle, it is the same, but in practice, “block” is used, so I will stick to the word “block”.

The question arises as to how the values of X ‘and Xp are derived: Xp is the predicted value. In order to minimize the difference between the predicted value and the original value, it needs to be calculated by the relevant formula, that is, by calculating ABCDE five parts according to the formula; When we get Xp, we get the difference from the original value;

What is this particular formula mentioned above? According to the white paper, Intra (intra-frame prediction) comes in two sizes, one 4×4 size brightness block and one 16×16 size brightness block.

For the 4×4 size brightness block, we have nine prediction modes, as shown in the figure below

The 16 pixel values of the current block are completely determined by the four pixel values in the last row of the block above it. All Xp values in the first column are equal to A, and all Xp values in the second row are equal to B, and so on.

The same is true for prediction mode 1 (Horizontal), which is entirely determined by the IJKL four pixel values in the figure.

In prediction mode 2 (DC), the values of the sixteen pixels are completely equal, which is equal to the average value of the eight pixels ABCDIJKL.

For prediction modes 3-8, each pixel is weighted from A to L pixels by A formula of unequal weights

For model 3 (diagnal down – left), a = (a + 2 b + C + 2) / 4, + 2 on behalf of the round here, b and e = (b + 2 C + D) / 4, cfi = (C + 2 D + e + 2) / 4, DGJM = (D + e + F 2 + 2) / 4, HKN = (e + 2 F + G + 2) / 4, Lo = (F + 2 G + H + 2) / 4, p = (G + 2 H + H + 2) / 4 = (G + 3 H + 2) / 4

In mode 4 (diag down-right), the weighting coefficient is also (1,2,1)/4. Afkp is calculated by IMA three pixels, and so on.

In mode 5 (vertical right), AJ =(M+A+1)/2, bk is the mean of AB, CL is the mean of BC, and D is the mean of CD (because these extension lines are not in the 13 pixels used for prediction). En is on the extension of M, so it is equal to (I+2M+A+2)/4, and fo\gp\h\ I \ M can be calculated in the same way.

Note that for mode 8, all four pixels of klmnop are equal to L, because the extension line is below and in front of L, there are no pixels to average, so we have to use L instead.

For example,

P is the predicted value and Q is the residual

Conclusion:

  1. The first step of prediction is to take a part of the original data of the frame data as the encoding. Secondly, the predicted value of corresponding TU is calculated according to the formula and the difference value is obtained. Finally, when decoding, the difference value and the predicted value are added to get the original value
  2. There is still a difference between the “original value” obtained after decoding and the real original value, and the closer this is, the better.
  3. Methods to evaluate the effect of prediction: SAD,SATD;
  4. So if we’re going to code it, we’re going to have to include residuals, and we’re going to have to figure out what kind of pattern I’m using, and 9 is going to be a little bit awkward, because there’s exactly three bits for eight, and four bits for 16, so it’s a waste of three bits for four. What to do? It is to have a clever, but I have 1 bits used to represent me in front of the current mode and is the same, because there are often such a situation, in front of me with the predicted direction and now this piece in the same way as a prediction of direction (such as edges is a straight line, then the corresponding prediction direction of that a few pieces with probably are the same), if the same, I only use 1bit to store, if not, I use 4 more bits to store, and I have saved a bit.

Reference:

App.yinxiang.com/client/web#… www.jianshu.com/p/d19d7eb38… www.jianshu.com/p/d5d924211…