If you have worked on audio and video projects, you are sure to understand the concepts related to H.264. Why am I writing this article? One is to sum up the knowledge, and the other is to give a reference to the students who just started the audio and video.

Basic concept

H.264, also known as MPEG-4, is a block-oriented video coding standard based on motion compensation. It is the most commonly used video coding format in the market at present. In the Android. You can use MediaCodec createEncoderByType (” video/avc “) to create the form of an encoder, It is also possible to create an encoder by soft coding AVcodec_find_encoder (AV_CODEC_ID_H264)/avcodec_find_encoder_by_name(“libx264”). Since the topic of this article is analyzing THE H.264 stream, I won’t go into much more about how to encode it. Now let’s get straight to some common concepts.

GOP

A Group Of pictures formed between two I frames is the concept Of GOP (Group Of Pictures).

The I frame

I frame is also called the video keyframe, you can think of it as the complete image of a frame, you can directly take the I frame to decode

Features:

1, it is a full frame compression coding frame, it will be the whole frame image information JPEG compression coding and transmission

2. Only I frame data can be used to reconstruct the complete image when decoding

3. I frame describes the details of the image background and the moving subject

4. I frames are generated without reference to other frames

5. I frame is the reference frame of P/B frame (its quality directly affects the quality of the next frame in the same group)

6. I frame is the first frame of GOP frame group, and there is only one I frame in a group

7. I frames do not need to consider motion vectors

8. I frame occupies a large amount of data information

B frame

B frame is also known as bidirectional differential frame, that is, the difference between this frame and the front and back frames. In plain English, it means to decode B frame, not only to get the cached picture before, but also to decode the picture after, and restore the final picture through the superposition of the front and back pictures.

Features:

1. B frame is predicted by the preceding I frame or P frame and the following P frame

2. B frame transmits the prediction error and motion vector before I frame or P frame and after P frame

3. B frame is a bidirectional predictive coding frame

4. B frame has the highest compression rate, because it only reflects the change of the motion subject between the reference frames, so the prediction is more accurate

5. B frame is not a reference frame and will not cause the spread of decoding errors

P frame

P frame is also called pre-predictive coding frame. P frame represents the difference between this frame and the previous I or P frame. When decoding, the difference defined in this frame needs to be superimposed on the cached picture to generate the final picture.

Features:

1. P frame is an encoding frame separated by 1 to 2 frames after FRAME I

2. P frame adopts the method of motion compensation to transmit the true difference between it and the preceding I or P and the motion vector

3. A complete P frame image can be reconstructed only after summing the predicted value and prediction error in I frame during decoding

4. P frame belongs to forward prediction inter-frame coding. It only refers to the nearest I or P frame

5. The P frame can be the reference frame of the following P frame or the reference frame of the B frame before and after it

6. Since the P-frame is a reference frame, it may cause a proliferation of decoding errors

7, because it is a difference transmission, so the compression rate of P frame is relatively high

Now that the basic frame concepts are understood, let’s analyze the stream based on the code

Code flow analysis

H.264, also known as bare stream, is composed of multiple NALUs. If the Slice corresponding to NALU is the start of a frame, it is represented by 4 bytes, 0x00 00 00 01, otherwise by 3 bytes, 0x00 00 01. To analyze the H.264 bit stream, first search the starting bit from the stream, i.e. 0x00 00 00 01 or 0x00 00 01, then separate the NALU, and finally parse the fields. NALU Header type NALU Header type

type instructions
0 keep
1 A segment in a non-IDR image that does not use data partitioning
2 A segment of class A data in A non-IDR image
3 A fragment of class B data in a non-IDR image
4 A fragment of class C data in a non-IDR image
5 Fragment of IDR image
6 Supplementary Confidence Enhancement (SEI)
7 SPS (Sequence parameter Set)
8 PPS (Image parameter Set)
9 separator
10 End of sequence
11 End of flow
12 Fill in the data
13 Sequence parameter set extension
14 The NAL unit is prefixed
15 Subsequence parameter set
16 to 18 keep
19 Auxiliary encoding image fragments without data partitioning
20 Code fragment extension
21-23 keep
24 to 31 keep

In the actual development of the most used is 1, 5, 7, 8, we use the code to analyze:

The code analysis

The code is very simple, just read the file, search for the start code and read it byte by byte, so let’s just post the results

How to learn more

Systematic audio and video entry level information is very few, a fresh graduate white may be difficult to understand, because audio and video involves a lot of theoretical knowledge, and code writing needs to combine these theories, so it is very important to understand audio and video, coding and decoding and other theoretical knowledge.

I have been in touch with audio and video projects since my internship. I have read many people’s articles and found an open source project with a star of 6.8K on GitHub by chance. I would like to share it with you here, so that more students who are preparing to learn audio and video can get started faster.

Here are some sections of the development document:

Stage 1: Android multimedia

Chapter 1: Three ways to draw pictures

Chapter 2 AudioRecord PCM audio

Chapter 3 AudioTrack Plays PCM audio

Chapter 4 Camera Video capture

MediaExtractor MediaMuxer realizes video unencapsulation and synthesis

Chapter 6 MediaCodec hardmarshalling process and practice

Phase 2: OpenGL ES

Chapter 7 Basic concepts of OpenGL ES

Chapter 8 GLSL and Shader rendering process

Chapter 9 OpenGL ES drawing plane graphics

Chapter 10 GLSurfaceView source code parsing &EGL environment

Chapter 11 OpenGL ES Matrix Transformation and coordinate system

Chapter 12 OpenGL ES Textures

Chapter 13 OpenGL ES Filter

Chapter 14 OpenGL ES Real-time Filter

Chapter 15 OpenGL ES particle System – Fountain

Chapter 16 OpenGL ES particle Effects – Fireworks Explosion

Phase 3: JNI&NDK

Chapter 17: Learning and using JNI and NDK

Chapter 18 JNI – Reference types, exception handling, function registration

Chapter 19 NDK construction methods NDK-build and cmake

Chapter 20 Pointers, Memory Models, references

Chapter 21 operator overloading, inheritance, polymorphism, templates

Chapter 22 containers for STL

Subseries algorithm

Chapter 23 algorithm series – Bubble sort

Chapter 24 algorithm series – Quicksort

Chapter 25 Algorithm series – Heap sort

Chapter 26 algorithms series – Selection, insertion sort, and implementation of SORT in STL

Chapter 27 algorithm sequence – binary search tree

Chapter 28 algorithm sequence – balanced binary tree

Chapter 29 algorithm sequence – hash table

Stage 4: FFmpeg

Chapter 30 Audio and video basics

Chapter 31 FFMPEG Common commands

Chapter 32 FFMPEG +OPENSL ES audio decoding and playback

Chapter 33 FFMPEG + OPENGLES decoding and playing video

Friends in need can [Click here toCome to me for free.

summary

Audio and video industry has been developing for years, with the emergence of the mobile terminal in recent years, more and more audio and video APP, the audio and video to a climax, but as a result of the audio and video learning cost is very high, many developers, in order to closely to the pace of The Times, need friends can free access to the information above, to break the “high threshold” of audio and video, I hope we can make progress together.

In a word, audio and video have been strong rise, I believe that the next decade must be the decade of audio and video. And combining audio and video technology with computer vision and artificial intelligence will lead the next two decades.

I will share more relevant articles in the future, pay attention to me not to get lost!

Now is the best time to learn audio and visual technology, we must seize the opportunity to keep pace with The Times, so that they can have a great success in the future.