The introduction

For more articles in this series, visit the Do-it-yourself H.264 decoder

In the last summary, we decoded the syntactic elements from the source stream one by one against the syntax table in the standard document. The decoded syntax elements are numerous, most of which will be described in more detail later. In this summary, we introduce some important and soon-to-use information, and then use this information to parse out the width and height of the video.

The macro block

Before we begin, let’s introduce a concept: macroblocks. We’re going to be using it a lot in the course, but let’s give it a brief introduction.

In H.264, we usually don’t process images pixel by pixel. Instead, we divide images into small chunks of fixed length and width, and then process them in small chunks, which we call macrochunks.

Suppose we have an image 176 pixels wide and 144 pixels high. According to h.264, if a macro block is 16 pixels wide and 16 pixels high, the image will be divided as follows:

Pic_width_in_mbs_minus1 and pic_height_in_map_units_minus1

H.264 has two syntax elements: pic_width_IN_MBs_minus1 represents the number of horizontal macroblocks minus1, and pic_height_IN_MAP_units_minus1 represents the number of vertical macroblocks minus1. In other words, if you add one to both of them, you get the number of macroblocks horizontally and the number of macroblocks vertically. Now that we know that a macro block is 16 pixels long and 16 pixels high, we can calculate the width and height of the video:

width = (pic_width_in_mbs_minus1 + 1) * 16;
height = (pic_height_in_map_units_minus1 + 1) * 16;
Copy the code

But the width and height calculated by this formula are all multiples of 16, and obviously many of the videos we see in real life are not multiples of 16. So how does H.264 handle video that is not a multiple of 16?

frame_crop_left_offset, frame_crop_right_offset, frame_crop_top_offset, frame_crop_bottom_offset

In SPS, there are also four values, which are explicitly stored when frame_CROpping_flag is 1 and 0 at other times. These values represent the offset of the upper, lower, left, and right of the image respectively. Since macroblocks can only represent multiples of 16, h.264 adds pixels along the edges of the image when the width and height of the image are not multiples of 16. Then pass frame_crop_LEFt_offset, frame_croP_right_offset, frame_crop_top_offset, Frame_crop_bottom_offset is used to record how much data is filled up. We call this process Crop.

Do you get the original width and height of the image by multiplying the number of macroblocks by 16 and subtracting 4 crops?

Not really. There are a few things we need to consider.

Field coding

The first special case to consider is the field code. Most of our videos today are frame coded, but there is field coding. In field coding, one frame is equal to two fields, and the number of vertical macro blocks of the field is half of the frame. While in field, crop 1 is 1 pixel, which is equivalent to 2 pixels of the frame. So frame_crop_xxx_offset also needs special processing.

In SPS, the frame_MBS_only_flag is used to refer to field codes. When the frame_MBS_only_flag is 1, frame codes are used. When the frame_MBS_only_flag is 0, field codes may exist.

So is it ok to take the field coding into account?

Not really. We have more to consider.

YUV 420, YUV 422, YUV 444 Crop

As we know, the data before h.264 compression is the original data in YUV format. In general, YUV 420 mode with four Y components and one set of UV components is used. However, H.264 also supports YUV 422, YUV 444 and y-only monochrome mode. Let’s first look at the pixel arrangement of the three YUVs.

  • The pixels of YUV 420 are arranged as follows:

  • The pixels of YUV 422 are arranged as follows:

  • The pixel arrangement of YUV 444 is as follows:

So the question is, for a YUV 420 image, can you subtract a row of pixels?

In this picture, for example, can you remove the Y component from the red box? You can’t get rid of it, because for the YUV 420, four yys share a set of UVs, and you get rid of one row of yys, and that leaves the rest of the data incomplete. So, for YUV 420, only an even number of pixels can be subtracted. For 422, any pixel can be subtracted vertically, but only an even number of pixels can be subtracted horizontally.

The Crop quantity of H.264 is multiplied by 2 at 420. On 422, we’re going to multiply by 2 horizontally. So how should we calculate the Crop information for each of the correct formats?

chroma_format_idc

First, we need to know what format our codestream is in. In SPS, there is a syntax element that marks the raw data format called chroma_format_idc, and its values are shown in the following table:

value meaning
chroma_format_idc = 0 monochromatic
chroma_format_idc = 1 YUV 4:2:0
chroma_format_idc = 2 YUV 4:2:2
chroma_format_idc = 3 YUV 4:4:4

It is important to note that this value does not exist explicitly for all streams.

Looking at the code table, you can see that chroma_format_idc is explicitly logged only if profile_IDC equals these values, and if profile_IDC is not, chroma_format_idc takes the default. It’s easy to make a mistake here, and most students will default to 0, but 0 means Y channel only monochrome, and by default, it’s actually YUV 420. So the default chroma_format_idc is 1.

This problem must be paid attention to, otherwise it will affect the later data.

separate_colour_plane_flag

After reading chroma_format_idc in the bit stream, we immediately see another quantity:

This syntax element is only available when chroma_format_idc is equal to 3, which is YUV 444 mode. So what does this value mean?

When the picture is YUV 444, the three components of YUV have the same specific gravity, so there are two ways to encode. The first is to attach the UV component to the Y component as in other formats. The second way is to separate UV from Y. Separate_colour_plane_flag The default value is 0, indicating that the UV is attached to and coded with Y. If separate_colour_plane_flag changes to 1, the UV and Y are coded separately. For separately coded patterns, we apply the same rules as for monochromatic patterns.

ChromaArrayType

Next, let’s look at another number that appears many times in the H.264 standard document, but is not in the syntax table. This value is derived from separate_colour_plane_flag and chroma_format_idc, as follows:

if (separate_colour_plance_flag == 0){
    ChromaArrayType = chroma_format_idc;
}
else{
    ChromaArrayType = 0;
}
Copy the code

This variable will be more useful in the future, but we just want to use it to derive two more quantities:

SubWidthC and SubHeightC

SubWidthC and SubHeightC are the ratios of the Y and UV components in the YUV component in the horizontal and vertical directions. When ChromaArrayType is equal to 0, there are only Y components or independent modes representing YUV 444, so SubWidthC and SubHeightC are meaningless.

  • YUV 420:

    In both the horizontal and vertical directions, we have 2 Y’s sharing 1 pair of UVs, so SubWidthC is 2, SubHeightC is 2

  • YUV 422:

    In the horizontal direction, it’s 2 Y’s sharing 1 pair of UV, so SubWidthC is 2; In the vertical direction, it’s 1 Y with 1 pair of UV, so SubHeightC is 1

  • YUV 444:

    In the horizontal direction, it’s 1 Y with 1 pair of UV, so SubWidthC is 1; In the vertical direction, it’s 1 Y with 1 pair of UV, so SubHeightC is 1

The complete derivation code is as follows:

if (ChromaArrayType == 1) { SubWidthC = 2; SubHeightC = 2; } else if (ChromaArrayType == 2) { SubWidthC = 2; SubHeightC = 1; } else if (ChromaArrayType == 3) { SubWidthC = 1; SubHeightC = 1; } // SubWidthC and SubHeightC are useless when ChromaArrayType = 0Copy the code

The final formula for calculating the width and height

Armed with this information, we can derive the final calculation formula:

width = (pic_width_in_mbs_minus1 + 1) * 16;
height = (2 - frame_mbs_only_flag) * (pic_height_in_map_units_minus1 + 1) * 16;

if(frame_cropping_flag){
    int crop_unit_x = 0;
    int crop_unit_y = 0;

    if(ChromaArrayType == 0){
        crop_unit_x = 1;
        crop_unit_y = 2 - frame_mbs_only_flag;
    }
    else if(ChromaArrayType == 1 || ChromaArrayType == 2 || ChromaArrayType == 3){
        crop_unit_x = SubWidthC;
        crop_unit_y = SubHeightC * (2 - frame_mbs_only_flag);
    }

    width -= crop_unit_x * (frame_crop_left_offset + sps->frame_crop_right_offset);
    height -= crop_unit_y * (frame_crop_top_offset + sps->frame_crop_bottom_offset);
}

Copy the code