Learning-based coding (VI) : DRNLF

The algorithm in this paper comes from JVEt-L0242. The dense Residual network Based In-loop Filter (DRNLF) is used in the loop filtering of VTM, after DBF and before SAO and ALF, as shown in the figure below.

 

It is up to RDO to decide whether to use DRNLF.

The network structure

The structure of the DRNLF is shown below:

 

N represents the number of dense Residual unit (DRU), and M represents the number of convolution nuclei.

The algorithm in this paper is an improvement of JVET-K0391. The structure of DRU in K0391 is shown as follows:

 

 

There are five main improvements:

  1. The external (Global Identity Skip Connection) 3×3 convolution layer was deleted to speed up the training.
  2. The normalized QP map is fed into the DRN along with the reconstructed image, and only one model can adapt to different QP situations.
  3. Training in YUV space.
  4. To reduce computational complexity, the number of DRUs was reduced from 8 to 4, and the convolution kernel was reduced from 64 to 32.
  5. The 3×3 convolution layer is replaced by the 3×3 DSC (depth-wise) layer.

The above five improvements reduced the model parameters from 810K to 22K.

training

DIV2K is used to generate the training set and validation set. The training set contains 800 images and the validation set contains 100 images. The network is trained in YUV space, so DIV2K images need to be converted from RGB space to YUV space. Use VTM2.0.1 to compress images with different QPS in the AI(All Intra) configuration. The compressed image and the corresponding QP are used as network input. The image before compression is used as ground truth. Assume that the compressed image set is {X} and the corresponding ground truth set is {Y}, then the loss function is as follows:

 

The experiment

In VTM2.0.1, AI configuration, QP is {22,27,32,37}, using only CPU environment experimental results are as follows:

If you are interested, please pay attention to wechat public account Video Coding