Abstract:Thanks to the ability of deep neural network to extract source features, deep learning technology has achieved more achievements in the field of source compression coding than traditional methods

This article is shared from Huawei Cloud Community “Image and Video Compression Coding Based on Deep Learning”, originally written by Luo Peng.

Thanks to the ability of deep neural network to extract source features, deep learning technology has achieved better results than traditional methods in the field of source compression coding.

Image compression coding based on deep learning

Since the encoder

Bale 1 proposed an end-to-end image compression model based on a variational autoencoder, which adopted a superpriori scheme combining side information. The model is shown in the figure below.

Q stands for quantization; AE and AD represent arithmetic encoding and decoding, respectively; The convolution parameter is expressed as layer times× length times× width/lower sampling or up sampling, uparrow↑ represents upper sampling, downarrow↓ represents lower sampling.

Minnen2 proposed a scheme combining superpriori with autoregressive model.

Recurrent Neural Network (RNN)

The Google3 team proposed a method to encode images with variable compression rate based on a neural network architecture based on Long Short-Term Memory (LTSM). The model is shown in the figure below.

The figure above shows the residual encoder based on convolution kernel inverse convolution. The proposed model is to replace the second and third convolution/inverse convolution modules on the upper and lower layers with LTSM modules.

The Google4 team built on the previous work by introducing GRU and RESNET modules and using entropy coding to further improve the compression rate.

The model is shown in the figure below.

Generative Adversarial Network (GAN)

Agustsson5 proposes an image compression scheme based on GaN, which can selectively generate corresponding semantic labels for some or all images. When decoding, the normally compressed image part is decoded normally, and the non-image part is generated by the GaN network. The model is shown in the figure below.

E_E_ is the encoder; Q_Q_ is the quantizer; G_G_ is the decoder and generator; D_D_ is the antagonist.

Video compression coding based on deep learning

Deep learning-based video coding can be divided into two types:

  • Deep learning is adopted to replace some modules in traditional video coding
  • End-to-end deep learning coding compression

Part of the plan

The sampling depth neural network can replace the traditional video coding modules, including: intra-frame/inter-frame prediction, transformation, up and down sampling, loop filtering, entropy coding, etc. 6.

End-to-end solution

Lu7 proposed an end-to-end deep learning scheme for video coding compression. The convolutional optical flow estimation is used for motion estimation, and two autoencoders are used to compress the optical flow information and residual information. The coding framework is shown in the following figure:

A convolutional network module is used for optical flow estimation 8 to serve as motion estimation. An autoencoder is used to compress optical flow information. The autoencoder network is shown in the figure below:

The motion compensation image is obtained by combining the previous frame image and optical flow information. The motion compensation network is shown in the figure below:

The difference between the original image and the compensated image is calculated to obtain the residual error, which is also compressed by the autoencoder.

Rippel9 proposes an end-to-end video compression scheme based on machine learning (including deep learning); Multi-frame reference optical flow estimation is used for motion estimation, optical flow information and residual coding are compressed by autoencoder, and bit rate is controlled by machine learning.

Reference

  1. [2018 ICLR]

    Variational image compression with a scale hyperprior
  2. [2018 NIPS]

    Joint Autoregressive and Hierarchical Priors for Learned Image Compression
  3. [2016 ICLR]

    Variable Rate Image Compression with Recurrent Neural Networks
  4. [2017 CVPR]

    Full Resolution Image Compression with Recurrent Neural Networks
  5. [2019 ICCV]

    Generative Adversarial Networks for Extreme Learned Image Compression
  6. [2019 MM]

    Deep Learning-Based Video Coding: A Review and A Case Study
  7. [2019 CVPR]

    DVC: An End-to-end Deep Video Compression Framework
  8. [2017 CVPR]

    Optical Flow Estimation using a Spatial Pyramid Network
  9. [2019 ICCV]

    Learned Video Compression

Click on the attention, the first time to understand Huawei cloud fresh technology ~