The concept of i-frames, P-frames and B-frames (i-frames, p-frames and B-frames) is fundamental to the field of video compression. These three frame types are used in specific situations to improve the compression efficiency of the codec, compress the video quality of the stream, and prepare the stream for transmission and storage errors and failures.

Translation | Alex

Technology review | zhao

Krishna Rao Vijayanagar wrote this article from OTTVerse.

I-frame, P-frame and B-frame easy-tech #002#

In this article, we will learn how I, P, and B frames work and what they are used for.

Well, let’s start with the most basic concepts in modern video compression — intra – and inter-frame prediction.

Intra-frame prediction and inter-frame prediction

In this article, I will not go into the technical details of Intra and Inter-prediction, but I will tell you why they exist and why they are useful.

Take the picture below for example. The image shows two video frames (adjacent to each other) with a rectangular block of black pixels moving through them. In the first frame, the block is on the left side of the image, and in the second frame, it has moved to the right.

If I wanted to compress frame 2 with a modern video encoder (such as H.264 or HEVC), I would do this:

1. Decompose the video into multiple pixel blocks (macro blocks) and compress them one by one.

2. To compress each macroblock, first search the current frame and the surrounding frame to find a macroblock similar to the one we want to compress.

3. Record the position of the macroblock that best matches (in which frame and in that frame). The difference between the two macroblocks is then compressed and sent to the decoder along with the location information.

Take a look at the image below. What do you think is the best way to compress the macroblock in frame 2 (marked by the red box)? What should I do?

1. First, I can look at frame 1 and find the matching macro block. It seems to have moved a frame width (I know less) and is about the same height as the pixel block in frame 2. Ok, the motion vector appears.

2. I search within the same frame and quickly discover that the macroblock in the red box mark is the same as the macroblock above. So, I can tell the decoder to copy this macro block without having to search for other frames. In this way, the motion vector is minimal (if any).

Now let’s look at the next example. If we want to compress the macro block containing the blue sphere in frame 2, what should we do?

Search within the same frame, or search in the previous encoded frame?

1. First, I looked at frame 1 and found the matching sphere, which seemed to have moved a frame width (I know it was less) and moved up a bit. And that gives us our motion vector. Also, the differences between macroblocks containing two spheres seem very small (presumably).

2. Then I searched within the same frame and immediately realized that there were no other macroblocks containing spheres in the same frame. It seems that this time the luck is not too good, did not find a matching macro block in the same frame.

What have we learned from the examples above?

1. The encoder searches for matching macroblocks to reduce the size of the data that needs to be transmitted. The whole process is accomplished by motion estimation and compensation, which enables the encoder to detect the horizontal and vertical displacement of the macroblock in another frame.

2. The encoder can search for matched macroblocks in ** the same frame (intra-frame prediction) and adjacent frames (inter-frame prediction).

3. It compares the inter-frame and intra-frame prediction results of each macro block and selects the best result. This process is called “mode selection,” and I think it’s the most central part of a video encoder.

Now, with a quick introduction to intra – and inter-frame prediction, let’s learn about I, P, and B frames!

What is i-frame?

An I-frame or key-frame or intra-frame (i-frame or key-frame or intra-frame) consists only of macroblocks that are predicted within the frame.

Each macroblock in an i-frame can only match other macroblocks within the same frame, which means that it can only be compressed using intra-frame “spatial redundancy”. Spatial redundancy is a term used to refer to the similarity between the pixels of a single frame.

I frames appear in different forms in different video codecs, such as IDR, CRA, or BLA. These different types of I frames are essentially the same: there is no time domain prediction.

There are many uses for i-frames, and we’ll look at them after we learn about P-frames and B-frames.

What is p-frame?

P frame stands for prediction frame. In addition to spatial prediction, it can also be compressed by time domain prediction. The p-frame is used for motion estimation by referring to the previous frame. Each macro block in a P frame can be:

  • The time domain to predict

  • The airspace to predict

  • To be skipped over (to make the decoder copy the macroblock 0 motion vector in the same position in the preceding frame).

 

I made an illustration to illustrate the main points. Frames I and P are shown in the figure above. As discussed earlier, p-frames refer to the previous i-frames or p-frames. In the figure, frames are encoded and decoded in the same order as they are presented to the user. This is because the P-frame is encoded only with reference to the previous image.

What is a B frame?

B frames can refer to frames that appear before or after them. The B in a B frame is bi-directional. If your video codec uses macroblock-based compression (as h.264 /AVC does), then each macroblock in a B frame can:

  • Backward prediction (using future frames)

  • Forward prediction (using past frames)

  • No inter-frame prediction, only intra-frame prediction

  • Skip completely (intra-frame or inter-frame prediction)

Since a B-frame can reference and insert two (or more) frames (in the time dimension) that occur before and after it, it can significantly reduce the size of the frame while maintaining video quality. B-frames are able to take advantage of spatial and temporal redundancy (future and past frames), which makes them very useful in video compression.

However, B-frames are resource-intensive — both on the encoding side and the decoding side, and let’s see why.

To understand what b-frames can do, we need to understand the concepts of render/display order and decode order.

Take frames I and P for example. If you only use these two types of frames, then each frame refers either to itself (I frame) or to the previous frame (P frame). Thus, frames can enter and leave the encoder in the same order. Here, the rendering order (or display order) is the same as the encoding and decoding order.

 

But what do you do if a frame refers to a frame that will be displayed in the future? This is often the case when we use B-frame compression. The structure of GOP (Group of Pictures) is shown in the figure below. GOP is a group of continuous pictures. In each mini-GOP, two B frames and one P frame are used, namely IBBPBBP.

The decoder also operates in the same way.

In decoding order, the decoder decodes frame 1 (I frame) and then frame 2 (P frame). But it can’t show frame 2 because it’s actually frame 4 in the decoding order! So, the decoder needs to put frame 2 into the buffer (in decoded order) and wait for the time to display it.

So, encoders and decoders need to maintain two “sequences” or “sequences” in memory: one to place the frames in the correct display order, and one to place the frames in the order required for encoding and decoding.

Due to reordering requirements, B-frames affect the size of the decoder buffer and increase latency.

This is why many systems place strict limits on the number of frames available for reference when compressing a B-frame. Along the same lines, h.264 /AVC Baseline Profile doesn’t allow B frames or Slice because it’s aimed at low-end devices.

Reference B frame and unreference B frame

As we learned above, a B frame can refer to two or more frames, usually one before and one after. We also know that frame I does not refer to any frame, frame P refers only to the previous frame. So the question is – can any frame use B frame as its reference frame?

The answer is yes.

  • If a B-frame can be used as a reference frame, it is called a reference B-frame.

  • If a B-frame is not used as a reference frame, it is called an unreferenced B-frame.

It is important to label reference and non-reference B-frames in the bitstream because the decoder needs to store reference frames in DBP (Decoded Picture Buffer).

If a frame is marked as a non-reference B-frame but is used as a reference frame, the decoder is likely to crash. Because the decoder will most likely have deleted the frame after decoding and displaying it.

Most decoders achieve better quality when quantizing reference B-frames compared to non-reference b-frames, thus reducing propagation losses.

Use I, P, and B frames in video compression/streaming

Now that we understand how I, P, and B frames work, let’s address an important question: Why use them?

In the following sections, we’ll look at the most important use cases for I, P, and B frames in video compression.

Where to use i-frames?

As we learned in the previous section, i-frames can be independently encoded and decoded, which makes them widely used in video compression.

Refresh video Quality

The insertion of an I frame usually indicates the end of a GOP (or video clip). I frame compression does not depend on the previous frame encoding, thus refreshing the video quality. Because i-frames play such an important role in maintaining video quality, encoders generally favor i-frames in terms of size and quality. After encoding a high quality I frame, the encoder can use this I frame as a reference image to compress P and B frames.

So i-frames can only be used to refresh video quality? Not only that.

Error restoring bitstream

As we said earlier, i-frames can be encoded and decoded independently. This means that I-frames can be used to recover from catastrophic failures in video files or video streams.

Let’s see how it does it.

If P frames and reference B frames are corrupted, all other frames that depend on them cannot be decoded in their entirety, which directly causes the video to malfunction. Video usually cannot be recovered from such problems. However, when the corrupted video stream reaches i-frame, the video problem can be recovered from i-frame because i-frame is encoded and decoded independently.

This I frame is usually called IDR frame (Instantaneous Decoder Refresh), and the action of not referring to the image preceding the I frame is called Closed GOP.

An IDR frame typically represents a new segment of a video in an ABR stream. Starting with IDR frames, the platform ensures that new fragments can be decoded independently of other fragments. This feature ensures that the video will continue to play even if some footage is corrupted or lost due to transmission problems.

Trick Modes (Fast forward and fast back)

Finally, keyframes are crucial for Trick Modes!

If you want to fast forward and fast backward in a video, you need an I frame at the beginning of the video, right?

Suppose you search for P or B frames, but the decoder has removed reference frames from memory, how do you reconstruct them? The video player will naturally find a starting point (i-frame) and decode it successfully, then start playing from that point.

And that brings up another interesting thing.

If your keyframes are spaced long apart in the video, say 20 seconds apart, then your users will only be able to fast-forward and rewind in 20-second increments, which is a terrible experience!

If you put too many keyframes, it’s great to fast-forward and rewind, but the video is too big and can cause problems with network buffering.

So designing the best GOP and mini-GOP structures is really a balancing act.

Where do I use P frames and B frames?

People often ask: Where, when, and how to use P frames and B frames?

If you understand how p-frames and B-frames work as described above, then you know that p-frames and B-frames can reduce video size while maintaining video quality. That’s their main purpose! Inserting P and B frames in the right places can reduce video file size or bit rate while still maintaining a certain level of video quality.

Depending on the GOP and mini-GOP structures you use, compress p-frames and B-frames (referenced or unreferenced) with relevant QP values and you can achieve your target bit rate or video quality.

The small knot

I hope this article on I, P, and B frames has helped you increase your knowledge of video compression. To understand them in more depth, you can download the statically compiled version of FFmpeg and use the GOP, no-B-frame Settings in FFmpeg to see how the size and quality of the video changes.

This article has been authorized by author Krishna Rao Vijayanagar. Thank you.

Original link: ottverse.com/i-p-b-frame…


In the scanQr codeLearn more about the conference