If you have worked on audio and video projects, you are sure to understand the concepts related to H.264. Why am I writing this article? One is to sum up the knowledge, and the other is to give a reference to the students who just started the audio and video.

Basic concept

H.264, also known as MPEG-4, is a block-oriented video coding standard based on motion compensation. It is the most commonly used video coding format in the market at present. In the Android. You can use MediaCodec createEncoderByType (” video/avc “) to create the form of an encoder, It is also possible to create an encoder by soft coding AVcodec_find_encoder (AV_CODEC_ID_H264)/avcodec_find_encoder_by_name(“libx264”). Since the topic of this article is analyzing THE H.264 stream, I won’t go into much more about how to encode it. Now let’s get straight to some common concepts.

GOP

A Group Of pictures formed between two I frames is the concept Of GOP (Group Of Pictures).

The I frame

I frame is also called the video keyframe, you can think of it as the complete image of a frame, you can directly take the I frame to decode

Features:

1, it is a full frame compression coding frame, it will be the whole frame image information JPEG compression coding and transmission

2. Only I frame data can be used to reconstruct the complete image when decoding

3. I frame describes the details of the image background and the moving subject

4. I frames are generated without reference to other frames

5. I frame is the reference frame of P/B frame (its quality directly affects the quality of the next frame in the same group)

6. I frame is the first frame of GOP frame group, and there is only one I frame in a group

7. I frames do not need to consider motion vectors

8. I frame occupies a large amount of data information

B frame

B frame is also known as bidirectional differential frame, that is, the difference between this frame and the front and back frames. In plain English, it means to decode B frame, not only to get the cached picture before, but also to decode the picture after, and restore the final picture through the superposition of the front and back pictures.

Features:

1. B frame is predicted by the preceding I frame or P frame and the following P frame

2. B frame transmits the prediction error and motion vector before I frame or P frame and after P frame

3. B frame is a bidirectional predictive coding frame

4. B frame has the highest compression rate, because it only reflects the change of the motion subject between the reference frames, so the prediction is more accurate

5. B frame is not a reference frame and will not cause the spread of decoding errors

P frame

P frame is also called pre-predictive coding frame. P frame represents the difference between this frame and the previous I or P frame. When decoding, the difference defined in this frame needs to be superimposed on the cached picture to generate the final picture.

Features:

1. P frame is an encoding frame separated by 1 to 2 frames after FRAME I

2. P frame adopts the method of motion compensation to transmit the true difference between it and the preceding I or P and the motion vector

3. A complete P frame image can be reconstructed only after summing the predicted value and prediction error in I frame during decoding

4. P frame belongs to forward prediction inter-frame coding. It only refers to the nearest I or P frame

5. The P frame can be the reference frame of the following P frame or the reference frame of the B frame before and after it

6. Since the P-frame is a reference frame, it may cause a proliferation of decoding errors

7, because it is a difference transmission, so the compression rate of P frame is relatively high

Now that the basic frame concepts are understood, let’s analyze the stream based on the code

Code flow analysis

H.264, also known as bare stream, is composed of multiple NALUs. If the Slice corresponding to NALU is the start of a frame, it is represented by 4 bytes, 0x00 00 00 01, otherwise by 3 bytes, 0x00 00 01. To analyze the H.264 bit stream, first search the starting bit from the stream, i.e. 0x00 00 00 01 or 0x00 00 01, then separate the NALU, and finally parse the fields. NALU Header type NALU Header type

type instructions
0 keep
1 A segment in a non-IDR image that does not use data partitioning
2 A segment of class A data in A non-IDR image
3 A fragment of class B data in a non-IDR image
4 A fragment of class C data in a non-IDR image
5 Fragment of IDR image
6 Supplementary Confidence Enhancement (SEI)
7 SPS (Sequence parameter Set)
8 PPS (Image parameter Set)
9 separator
10 End of sequence
11 End of flow
12 Fill in the data
13 Sequence parameter set extension
14 The NAL unit is prefixed
15 Subsequence parameter set
16 to 18 keep
19 Auxiliary encoding image fragments without data partitioning
20 Code fragment extension
21-23 keep
24 to 31 keep

In the actual development of the most used is 1, 5, 7, 8, we use the code to analyze:

The code analysis

The code is very simple, is to read the file, search for the start code and then read byte by byte, here directly post the results, the detailed code can directly look at the portal

reference

  • Raytheon – Introduction to Visual and Audio Data processing: H.264 video stream parsing