Advanced Geometric- Based Inter Prediction for Versatile Video Coding

Compared with trigonometric division, geometric division can better fit the contour of the object.

Related work

Triangulation TPM in VVC

Triangulation TPM is a VVC inter-frame prediction tool. It divides a block into two triangular regions by using the main diagonal or the auxiliary diagonal. Each region requires only one motion vector by using one-way inter-frame prediction. Both MV are obtained through motion compensation, which will generate two intermediate prediction blocks Pi of W x H, and the final prediction block P_B is obtained by weighting the two blocks.

Where W0+W1=8, and are integer weights. The weight is determined by the manhatton distance from the pixel value to the dividing line, for example, W0=clip(0,8, w_TMP0+4). W_TMP0 is the distance. For the triangulation from the upper left corner to the lower right corner, the distance can be obtained by the following equation,

A, b are constant factors, determined by the block aspect ratio.

The wedge prediction for AV1 here

AV1 defines 16 wedge divisions, with wedges either horizontal or vertical, or ±2, ±0.5 (depending on the shape). Like TPM, the two prediction blocks eventually need to be weighted.

Inter-frame prediction GIP based on geometric partition

Geometric- Based Inter Prediction (GIP) is a complement to TPM, which can better adapt to the shape of the object. Their MV merge method and encoding method are also the same. GIP supports a total of 82 partitions and only supports blocks larger than 8×8. The coding side needs to determine the GIP index Si∈{0… 81}, and passed to the decoder by truncating binary encoding. The boundary defined by GIP can better fit the contour of the object, thus improving the coding efficiency.

Split boundary definition

The segmentation boundary is defined in a polar coordinate system with two parameters, Angle ϕ and the offset value ρ. The distance of pixels (X_c,y_c) to the boundary is calculated by the following formula (the origin of coordinates is the center of the block),

Note: The distance formula derived by me is inconsistent with that given in the paper. By referring to the author’s article “Geometric Partitioning Mode in Versatile Video Coding” in TSCVT2020: Algorithm Review and Analysis found the corresponding formula. The formula in TCSVT is consistent with my derivation, and the formula is as follows:

ϕ and ρ are the parameters that define the partition boundary, as follows:

Quantization of boundary parameters

The boundary parameters ϕ and ρ need to be quantified so that 82 partition modes can divide the space uniformly.

The parameter ϕ needs to be quantized to the predefined ϕj,j∈{0… 23}. ϕ j to non-uniform division made two PI tan (ϕ j) remains stationary, tan (ϕ j) ∈ * {* 0, + / – 1/4, 1/2, plus or minus + 1, + 2, + 4, up}.

 

The parameter ρ needs to be quantized to the predefined ρk,k∈{0… 3}. In order to avoid uneven distribution of the dividing line for blocks of different sizes, ρk is obtained as follows:

When j<12, ρx,k and ρy,k is negative, otherwise positive.

The figure above shows the GIP partition. GIP has a total of NGIP = NϕNρ−Nϕ/2−2 = 82 ϕ modes, Nϕ=24, Nρ=4, in which the symmetrical horizontal and vertical partition due to the same result as the binary tree partition so removed.

GIP weighted

The two prediction blocks after geometric partition should be weighted to generate the final prediction block, and the weight is related to the distance between pixels and the segmentation boundary.

F_B function is shown in the figure below.

Here’s a weighted example,

The experimental results

The following table shows the experimental results under RA and LD configurations respectively.

The figure below is the percentage of pixels using TPM and GIP under different QPS in RA configuration.

If you are interested, please pay attention to wechat public account Video Coding