The age of the Internet, social media sharing, autopilot, enhanced display, satellite communications, high-definition television or video monitoring application scenarios such as demand is strong for images and video, compression algorithm and therefore, much attention has been paid but different application scenarios of compression algorithm performance requirement is different also, some demand is to keep hd image quality is first, Some demand is small volume is the first, can damage some picture quality.

So how do you use deep learning to design compression algorithms? This article will be simple to tell you.


Deep learning image compression framework and basic concepts



Figure 1. Image compression based on self-coding network

As shown in Figure 1, a typical self-encoding image compression framework includes encoder, quantization, de-quantization, decoder, codeword estimation, entropy encoding and rate-distortion optimization.

Figure 1 is taken as an example to explain the role of each module in the self-coding image compression network: assuming that the size of the input image is, the dimension of coding features obtained through encoder and quantization is, and assuming that the average number of bits occupied by each coding feature unit after entropy encoding is R, the calculation formula of coding bit rate is as follows:

   (1)

Formula (1) is analyzed, where it is related to the input picture and determined by network structure, while R is determined by quantization, encoding feature distribution and entropy encoding.

The compressed features can be decoded by anti-entropy encoding, anti-quantization and decoder. Under the condition that the quality of the decoded image is unchanged, the lower R is, the higher the compression efficiency is.

Encoding and decoding structures are the basis of self-encoding networks, which are designed to learn nonlinear transformations.

Quantization is used to convert floating point numbers to integers or binary; reverse quantization is used to do the opposite.

Quantization is an important way to reduce code words, but it is also the primary cause of information loss. Theoretically, the more precise quantization, the less information loss, but it may lead to the increase of code words and increase the difficulty of training, so it is very important to design efficient quantization algorithm.

In order to improve the compression efficiency, it is necessary to use code word estimation module to constrain R in training.

In code word estimation, the prior probability model is used to estimate the encoding feature distribution accurately to ensure that the estimated encoding feature distribution is as close as possible to the actual distribution, and then the entropy-encoded code word is estimated by computational entropy.

In general, probabilistic models with parameters can be used to model prior, such as gaussian mixture model to fit data distribution:

 (2)

Based on the feature distribution estimation, the entropy coding module firstly calculates the context probability of the feature and further compresses the encoding feature, which can further reduce the R value. Code word estimation is to estimate the number of bits consumed after arithmetic coding under the premise of efficient execution of arithmetic coding. The lower limit of code word size can be expressed as entropy:

(3)

Code word estimation is mainly used for training. In practice, prior model can be used for adaptive arithmetic coding to generate code stream.

From the perspective of information theory, when the encoding features are more concentrated, the entropy will further decrease, so that the code word after entropy encoding will be reduced, but the representation ability of the network will be affected, and the quality of the reconstructed image will be reduced, resulting in distortion.

Therefore, there is a tradeoff between entropy coded words and image reconstruction quality, which is called rate-distortion optimization. Rate corresponds to coding rate and distortion corresponds to reconstruction quality.

The loss function constructed by general pass-rate-distortion optimization is used to train the self-coding compression network.


The elements of the compression algorithm

Our current thinking is to improve the performance of the current image compression algorithm, especially the compression performance at low bit rate, by researching and developing the optimization of the self-coding network structure, designing a new quantization method, designing a new compression feature prior modeling method and rate-distortion optimization.

Technical difficulties and barriers: Based on the variational autocoding network, it is difficult to solve the problems of how to optimize the structure of the autocoding network, how to jointly optimize the quantitative and prior modeling, and how to improve the performance of image compression, reduce complexity and improve practicability.

Some experience sharing: Accurate prior probability estimation is conducive to adaptive arithmetic coding, and accurate prior probability estimation is conducive to constraint on code word distribution in training. In practice, probability models with parameters, such as Gaussian mixture model, are used to model priors.

On the basis of accurate estimation of priors, there will be a tradeoff: the more concentrated the encoding features are, the lower the entropy will be, but the representation ability of the network will be affected, which will reduce the reconstruction quality. Two questions need to be answered:

1) Accurate prior estimation;

2) Bit rate and reconstruction quality tradeoff.


Why deep learning

Compression algorithms are in demand in social media sharing, augmented reality, autonomous driving, medical imaging, HDTV and other applications.

It is not easy for existing algorithms to optimize the existing compression algorithms according to different application requirements, and the performance improvement of image and video compression algorithms has encountered bottlenecks.

From the perspective of image compression, the biggest advantage of deep learning-based technology is that it can carry out targeted design and training according to different applications, and can carry out targeted training on subjective or objective indicators.

From the perspective of video compression, deep learning compression adopts different architecture from H.264, H.265 and H.266, and uses convolutional neural network as the theme to design, which can more flexibly apply optical flow estimation and other algorithms in the current field of machine vision to inter-frame relationship modeling to design efficient video compression algorithms.

In addition, we will combine super resolution and other technologies to further optimize the traditional algorithm and reduce the communication bandwidth.



FIG. 2 Comparison between TNG subjective and objective training models and other algorithms

The biggest advantage of the self-coding compression algorithm at the present stage is that it can be optimized for specific indicators, which can significantly improve the subjective effect of reconstructed images.

GAN (adversity-generated) network is good for subjective performance and visual effect improvement, but the objective quality is not good. Generally, algorithms designed with GAN network will generate some details inconsistent with the original data. At present, we give priority to the improvement of objective quality (PSNR), while the subjective quality has also achieved good results.

In the future, we will consider whether to adopt GAN network according to different application requirements. Although we focus on the objective performance improvement at the present stage, we also found in the experiment that the better the objective quality, the smaller the difference with the original image, the subjective quality will also be improved accordingly. Meanwhile, we found that the mS-SSIM performance can be very good by combining MSE and MS-SSIM.

As shown in Figure 2, mS-SSIM indicators obtained from the model trained by using MS-SSIM as loss function (TNG Objective) are significantly better than those obtained by using MSE as loss function training model (TNG Objective).

Under the same subjective MS-SSIM index, TNG Subjective consumes half of the code words of BPG and saves 50% of the traffic. Compared to JPEG, TNG consumes only 25-30% of the original traffic.


conclusion

The current market demand for compression algorithm is great, which accelerates the research pace of the new generation of compression algorithm. But because the market demand is different, resulting in a new generation of compression algorithm implementation difficult! At present
TNG uses
Variational self-coding network
Method, and did not
GAN networks are added. This is because our primary need is objective quality. As mentioned above, the results we achieved with this design are: TNG compression is twice as good as BPG compression and 3.5-4 times better than JPEG compression. In daily practice, we can design an appropriate image compression algorithm according to our specific compression requirements.