Audio and video (four) video stream H264 assembly principle slice NAL unit, video stream H264 code stream analysis actual combat

This is the fourth day of my participation in the November Gwen Challenge. Check out the details: The last Gwen Challenge 2021

Video streaming H264 assembly

Audio and video master advanced (four), strive for small white can understand, due to the length of the reason, one can not be fully integrated into an article, so several times published, if there is a need for a complete PDF document, can go to my home profile contact me to take the PDF version

Introduction to the H264

We learned what macro fast is, and that macro fast, as the smallest part of compressed video, needs to be organized and then transmitted between networks.

Simply sending data using macros would be messy, as if goods were stacked randomly on ships before there were containers.

Loading and unloading is very painful. When the container came along, everything changed, and transmission efficiency increased dramatically.

Container can be understood as H264 coding standard, which formulated the format of mutual transmission, the macro is organized, structured, and sequentially formed into a series of code streams. The code stream can either be transmitted through InputStream network stream data or encapsulated into a file for storage

H264: H264/AVC is a widely used encoding method. The main function is for transmission

1.1 H264 bit stream composition

The structure of the H264 bit stream consists of the following parts, in descending order

H264 video sequence, image, slice group, slice, NALU, macro block, pixel.

Similar to earth country cities towns villages

1.1.1 H264 coding layer

NAL Layer: (Network Abstraction Layer) : As long as H264 is transmitted over the network, each Ethernet packet is 1500 bytes in the transmission process, while H264 frames are often larger than 1500 bytes. Therefore, it is necessary to unpack a frame into multiple packets for transmission. All unpack or group packets are processed through the NAL layer.
VCL Layer (Video Coding Layer) : Compress Video raw data

1.1.2 TRANSMISSION of H264

H264 is a code stream similar to a river with no head and no tail. How do YOU get the data you want from the sum stream,

There is a package format called “annex-B” byte stream format in the H264 standard brick. It is the main byte stream format for H264 encoding.

Almost all market encoders output in this format. The start code is 0X 00 00 00 01 or 0X 00 00 01.

Two bytes between 0x 00 00 00 01 indicate a NAL Unit

1.1.3 Coding structure

Slice head: Contains information about a set of slices, such as the number of slices, order, etc

1.1.4 H264 code stream hierarchical structure diagram

Video stream H264 code stream analysis actual combat

1.1 H.264 encoding format

H.264 has two layers of functionality:

Video coding layer

Network extraction layer

VCL data is the output of the encoding process, which represents the video data sequence after being compressed and encoded. Before VCL data is transmitted or stored, the encoded VCL data is mapped or encapsulated into NAL units.

Each NAL unit includes a raw byte sequence load (RBSP), a set of NAL headers corresponding to the video encoding.

The basic structure of RBSP is that the end bit is added to the end of the original encoded data. One bit “1” and several bits “0” for byte alignment.

1.2 H.264 network transmission

H.264’s encoded video sequence consists of a series of NAL units, each containing an RBSP,

See table 1. The coding slice (including IDR slice) and sequence RBSP terminator are defined as VCL NAL units, and the rest are NAL units.

A typical RBSP unit sequence is shown in Figure 2.

Each unit is transmitted as a separate NAL unit. The information header (one byte) of the unit defines the type of the RBSP unit, and the rest of the NAL unit is RBSP data.

2.1 H.264 code stream structure diagram

Start code: if the Slice corresponding to NALU is the start of a frame, it is represented by 4 bytes, i.e. 0x00000001; Otherwise, use 3 bytes, 0x000001. NAL Header: forbidden_bit, nal_reference_bit (priority), nal_unit_type (type). Unshell operation: To keep the NALU body from including the start code, a byte 0x03 is inserted for every two (consecutive) bytes of 0 encountered during encoding to distinguish it from the start code. When decoding, the corresponding 0x03 is deleted.

H.264 decodes the Nal_referrence_IDC (NRI) of NAL header information for marking the importance of a NAL unit during reconstruction,

A value of 0 indicates that the NAL unit has no prediction and can therefore be discarded by the decoder without error propagation;
A value higher than 0 indicates that THE NAL unit is used for drift-free reconstruction, and the higher the value is, the greater the influence of NAL unit loss is.
The hidden bit of NAL header information is 0 by default in the H.264 encoder. It can be set to 1 when the network recognizes that there is a bit error in the unit. Hidden bits are mainly used to adapt to different types of network environments (such as wired and wireless environments).

The decoding process of NAL unit is as follows: Firstly, RBSP grammar structure is extracted from NAL unit, and then RBSP grammar structure is processed according to the process shown in Figure 4. The input is the NAL unit and the output is the decoded sample point of the current image. The NAL unit contains sequence parameter set and image parameter set respectively. Image parameter sets and sequence parameter sets are used as reference in the transmission process of other NAL units. In the header of these NAL units, the number of image parameter sets used by them is set by the syntax element PIC_parameter_set_id. For each image parameter set, the syntax element seq_paramter_set_id sets the sequence parameter set number they use.