The introduction

For more articles, visit the Do-it-yourself H.264 decoder

There are currently two popular h.264 packaging options called AnnexB and avcC. For these two formats, different countries have different levels of support. For example, Android hard decoding MediaCodec only accepts AnnexB format data, while Apple’s VideoToolBox only supports avcC format. So this requires us practitioners to have an understanding of both formats. In this chapter, we first introduce AnnexB

AnnexB

If we write multiple NALUs into a single file, the nALUs are first strung together in a string, because nALUs are of different lengths and there is no specific identifier to identify themselves as a single NALU, So when we read this file, there’s really no way to write together NALU to effectively differentiate. To solve this problem, we have to add some data to the NALUS and split up the nALUs. AnnexB is a format used to wrap NALU layers.

The AnnexB format is very simple: an NALU is preceded by three or four bytes that contain either 0, 0, 0, 1 or 0, 0, 1. When we read an H264 stream, as soon as we encounter 0, 0, 0, 1 or 0, 0, 1, we assume that a new NALU has started. Therefore, these bytes used as delimiters are also commonly called start code.

Emulation Prevention Bytes

But adding only the start code before the NALU is problematic, because it is possible to have 0, 0, 0, 1 or 0, 0, 1 in the original stream, causing the reader to mistakenly split one NALU into multiple NALUs. To prevent this, AnnexB introduced the concept of Emulation Prevention Bytes.

Emulation Prevention Bytes means that before adding the start code to the NALU, the code stream is traversed to find the Bytes in the code stream 0, 0, 1,0, 2,0, 3, and then modified as follows

0 0 0 => 0 0 3 0 0 1 => 0 0 3 1 0 0 2 => 0 0 3 2 0 0 3 => 0 0 3 3Copy the code

That is, in the four cases above, after 0, 0, a byte of 3 is inserted. After this processing, the stream will not repeat the starting code (0 0 1, 0 0 0 1) and cause conflicts.

Of course, in the decoding process, after the NALU data is successfully segmented by the start code, the contention-proof bytes are removed.

0 0 3 0 => 0 0 0 0 3 1 => 0 0 1 0 0 3 2 => 0 0 2 0 0 3 3 => 0 0 3Copy the code

This is how you get the true NALU stream.

avcC

AnnexB’s principle is to write a special start code in front of each NALU and use this start code as the separator of the NALU to divide each NALU. AvcC takes a different approach. The NALU is preceded by a few bytes that form an integer (big endian) that represents the length of the entire NALU. When reading, you read the integer, get the length of the NALU, and then read the entire NALU.

AvcC,

The SPS and PPS hold the parameters necessary to decode a stream of H.264. That is, if you want to decode a stream of H.264, SPS and PPS must be obtained first. We will cover SPS and PPS in more detail later in the course, but for now you just need to know that SPS and PPS are special and important NALUs.

In AnnexB, SPS and PPS were treated as ordinary NALU. In avcC, SPS and PPS information is treated as special information.

The first part of the H.264 stream packaged with avcC will be a piece of data called Extradata that defines the basic properties of the H.264 stream, as well as the SPS and PPS data.

Let’s look at the Extradata data format

bits 8 version ( always 0x01 ) 8 avc profile ( sps[0][1] ) 8 avc compatibility ( sps[0][2] ) 8 avc level ( sps[0][3] ) 6 Reserved (all bits on) 2 NALULengthSizeMinusOne // The value is (prefix length -1) 3 Reserved (all bits on) 5 Number of SPS NALUs (usually 1) repeated once per SPS: 16 SPS size variable SPS NALU data 8 number of PPS NALUs (usually 1) repeated once per PPS 16 PPS size variable PPS NALU  dataCopy the code

Notice the value NALULengthSizeMinusOne. By incrementing this value by 1, we get the number of bytes that each subsequent NALU is prefixed with (that is, an integer indicating length)

For example, if the NALULengthSizeMinusOne is 3, then the length of the prefix before each NALU is 4 bytes. When we read the following data, we can read four bytes first, and then convert these four bytes into integers, which is the length of the NALU. Note that this length does not include the initial four bytes, but is the length of the pure NALU.