H.264 Introduction

If you have worked on audio and video projects, you are sure to understand the concepts related to H.264. Why am I writing this article? One is to sum up the knowledge, and the other is to give a reference to the students who just started the audio and video.

Basic concept

H.264, also known as MPEG-4, is a block-oriented video coding standard based on motion compensation. It is the most commonly used video coding format in the market at present. In the Android. You can use MediaCodec createEncoderByType (” video/avc “) to create the form of an encoder, It is also possible to create an encoder by soft coding AVcodec_find_encoder (AV_CODEC_ID_H264)/avcodec_find_encoder_by_name(“libx264”). Since the topic of this article is analyzing THE H.264 stream, I won’t go into much more about how to encode it. Now let’s get straight to some common concepts.

GOP

A Group Of pictures formed between two I frames is the concept Of GOP (Group Of Pictures).

The I frame

I frame is also called the video keyframe, you can think of it as the complete image of a frame, you can directly take the I frame to decode

Features:

1, it is a full frame compression coding frame, it will be the whole frame image information JPEG compression coding and transmission

2. Only I frame data can be used to reconstruct the complete image when decoding

3. I frame describes the details of the image background and the moving subject

4. I frames are generated without reference to other frames

5. I frame is the reference frame of P/B frame (its quality directly affects the quality of the next frame in the same group)

6. I frame is the first frame of GOP frame group, and there is only one I frame in a group

7. I frames do not need to consider motion vectors

8. I frame occupies a large amount of data information

B frame

B frame is also known as bidirectional differential frame, that is, the difference between this frame and the front and back frames. In plain English, it means to decode B frame, not only to get the cached picture before, but also to decode the picture after, and restore the final picture through the superposition of the front and back pictures.

Features:

1. B frame is predicted by the preceding I frame or P frame and the following P frame

2. B frame transmits the prediction error and motion vector before I frame or P frame and after P frame

3. B frame is a bidirectional predictive coding frame

4. B frame has the highest compression rate, because it only reflects the change of the motion subject between the reference frames, so the prediction is more accurate

5. B frame is not a reference frame and will not cause the spread of decoding errors

P frame

P frame is also called pre-predictive coding frame. P frame represents the difference between this frame and the previous I or P frame. When decoding, the difference defined in this frame needs to be superimposed on the cached picture to generate the final picture.

Features:

1. P frame is an encoding frame separated by 1 to 2 frames after FRAME I

2. P frame adopts the method of motion compensation to transmit the true difference between it and the preceding I or P and the motion vector

3. A complete P frame image can be reconstructed only after summing the predicted value and prediction error in I frame during decoding

4. P frame belongs to forward prediction inter-frame coding. It only refers to the nearest I or P frame

5. The P frame can be the reference frame of the following P frame or the reference frame of the B frame before and after it

6. Since the P-frame is a reference frame, it may cause a proliferation of decoding errors

7, because it is a difference transmission, so the compression rate of P frame is relatively high

Now that the basic frame concepts are understood, let’s analyze the stream based on the code

Code flow analysis

H.264, also known as bare stream, is composed of multiple NALUs. If the Slice corresponding to NALU is the start of a frame, it is represented by 4 bytes, 0x00 00 00 01, otherwise by 3 bytes, 0x00 00 01. To analyze the H.264 bit stream, first search the starting bit from the stream, i.e. 0x00 00 00 01 or 0x00 00 01, then separate the NALU, and finally parse the fields. NALU Header type NALU Header type

type	instructions
0	keep
1	A segment in a non-IDR image that does not use data partitioning
2	A segment of class A data in A non-IDR image
3	A fragment of class B data in a non-IDR image
4	A fragment of class C data in a non-IDR image
5	Fragment of IDR image
6	Supplementary Confidence Enhancement (SEI)
7	SPS (Sequence parameter Set)
8	PPS (Image parameter Set)
9	separator
10	End of sequence
11	End of flow
12	Fill in the data
13	Sequence parameter set extension
14	The NAL unit is prefixed
15	Subsequence parameter set
16 to 18	keep
19	Auxiliary encoding image fragments without data partitioning
20	Code fragment extension
21-23	keep
24 to 31	keep

In the actual development of the most used is 1, 5, 7, 8, we use the code to analyze:

The code analysis

The code is very simple, is to read the file, search for the start code and then read byte by byte, here directly post the results, the detailed code can directly look at the portal

reference

Raytheon – Introduction to Visual and Audio Data processing: H.264 video stream parsing

Basic concept

GOP

The I frame

B frame

P frame

Code flow analysis

The code analysis

reference

Related Posts

Open source project – using Swift to develop notepad APP

Hongmon OS Technical Architecture features in detail – learn new technologies

Android: Kotlin – Package – Control Flow – Return and jump – Basic syntax (3)