Learning-based coding (vii) : Lambda field rate control based on CNN’s intra Frame

The algorithm introduced in this paper is from the JVEt-M0215 proposal.

Lambda and bitrate model based on SATD for intra frame bitrate control is constructed as follows:

 

C represents content complexity, which can be estimated by SATD of element pixels. For the first I frame alpha and beta are set as constants, and subsequent I frames are gradually updated. The proposal proposes a CNN model that can directly predict the alpha and beta of each CTU.

The network structure

The network structure of CNN is shown as follows:

 

Before CTU coding, its brightness and chromaticity components are taken out and input into two trained CNNS. The output is the CTU parameters alpha and beta.

Boundary CTU correction

CNN is a component of CTU, such as a 128×128 brightness component or a 64×64 chroma component. For CTU with graph boundary, its size can be less than 128×128. In this case, boundary pixels need to be filled with 128×128, as shown below.

 

In order to eliminate the influence of filling value, the output of CNN needs to be adjusted. The adjustment mode is as follows:

 

CTU level bit rate allocation

When the alpha and beta parameters of CTU are obtained, the target bit rate of each CTU can be obtained:

 

When encoding the ith CTU, its target bit rate also needs to be adjusted according to the actual encoding of the previous (i-1) CTU:

 

Tar is the sum of target bit rates from the ith CTU to the last CTU, Rem is the remaining bit rates after coding (i-1) CTU, and SW is the sliding window.

training

The training set used a total of 2800 images in DIV2K, Flickr and RAISE datasets. Image is first converted into YUV420 format, and use the VTM compression, QP for 20,22,24,26,28,30,32,34,36,38,40} {, obtain each CTU lambda and bit rate under different QP R. Then calculate alpha and beta for each CTU as a label as follows:

1) Log (lambda) and logR are obtained by logarithmic processing of 11 pairs (R,lambda).

2) Use least square regression to fit the following equation

 

3) In order to avoid overfitting, a regularization term is added to fit the loss:

 

W1 and w2 are both set to 0.01.

4) Use the fitting results log(alpha) and log(beta) as the label of this CTU.

In order to train CNN, the following loss function is defined:

 

With SGD training, the initial learning rate was set at 0.001, with a decline of 0.1 at the 150000th, 250000, and 350000 iterations.

The experiment

In VTM3.0, QP{22,27,32,37} codes are used without rate control in All Intra configuration, and the rate after four QP codes is used as the target rate. The experimental results are as follows:

 

The fluctuation of bit rate is as follows:

 

The fluctuation of bit rate is measured by the following formula:

If you are interested, please pay attention to wechat public account Video Coding