This article was originally published by AI Frontier.
dwz.cn/7yzX0F


The author | figure duck technology


Edit | Vincent

In recent years, deep learning has played a dominant role in the field of computer vision. Deep learning has become an important technology for image research, whether in image recognition or super-resolution reproduction. Now deep learning has entered the field of image compression. Take Tiny Network Graphics (TNG), the latest image compression developed by Tuya Technology, as an example. Its image size is only 45% of that of JPEG while maintaining the same image quality with deep learning convolutional neural Network as its core.


This article will share how to design image compression algorithms using deep learning convolutional neural networks.

When it comes to image compression algorithms, the most influential image compression technologies on the market are WebP and BPG.

WebP: An image file format that provides lossless compression and lossless compression. It is based on VP8 and has been available for lossless and transparent color since November 2011. Facebook, Ebay, and others already use this format.

BPG: An image format developed by Fabrice Bellard, a well-known programmer and author of projects such as FFMPEG and QEMU, which uses HEVC as the coding core and has half the file size of JPEG in the same volume. BPG also supports 8-bit and 16-bit channels and so on. Although BPG has a good compression effect, HEVC has a high patent fee, so there is little use in the current market.

BPG is better than WebP in terms of compression, but it is not widely used in the market due to the patent fees brought by the HEVC kernel used in BPG. In this case, deep learning is used to design image compression algorithm.

How to design image compression algorithm with deep learning technology

One of the purposes of designing compression algorithms through deep learning technology is to design a better compression algorithm than the current commercial image compression, and at the same time, with the help of deep learning technology can also design a simpler end-to-end algorithm. In the field of image and video compression, convolutional neural network (CNN) is the main deep learning technology used. As shown in Figure 1, like building blocks, a convolutional neural network consists of modules such as convolution, pooling, nonlinear function and normalization layer. The final output depends on the application, such as in the field of face recognition, we can use it to extract a string of numbers (the technical term is called features) to represent a picture of a face, and then by comparing the similarities and differences of features for face recognition.

FIG. 1 Schematic diagram of convolutional neural network

The complete framework of image compression based on deep learning includes CNN encoder, quantization, inverse quantization, CNN decoder, entropy encoding, codeword estimation and bitrate-distortion optimization. The function of the encoder is to transform the image into compression features, and the decoder is to recover the original image from compression features. The encoder and decoder can be designed and built by convolution, pooling and nonlinear modules.

(Figure 2. Image compression using deep learning)

How to judge compression algorithms

Before diving into the technical details, let’s look at how compression algorithms are judged. There are three important indexes to judge the quality of a compression algorithm: Peak Signal to Noise Ratio (PSNR), Bit per Pixel (BPP) and Multi-Scalessim Index (MS-SSIM). We know that any data in a computer is stored in bits, and the more bits you need, the more storage space you take up. PSNR is used to evaluate the image recovery quality after decoding, BPP is used to represent the number of bits occupied by each pixel in the image, and MS-SSIM value is used to measure the subjective quality of the image. Simply speaking, at the same Rate/BPP, the higher PSNR, the better the recovery quality, the higher MS-SSIM, the better the subjective feeling.

For example, suppose that a picture with a length of 768×512 is 1M in size, encode it with deep learning technology, and generate compressed feature data including 96×64×192 data units through the encoding network. If each data unit consumes 1 bit on average, It takes 96×64×192 bits to encode the whole graph. After compression, the number of bits required to encode each pixel is (96×64×192) /(768×512) =3, so the BPP value is 3bit/pixel and the compression ratio is 24:3=8:1. This means that a 1 Megabyte image, compressed, takes only 0.125 megabytes of space, in other words, eight images instead of one.

How to do compression with deep learning

When we talk about deep learning for compression, let’s go back to that example. After sending a 768×512 three-channel image into the coding network for forward processing, the compression features occupying 96×64×192 data units can be obtained. Computer-literate readers might imagine that this data unit could contain a floating-point, integer, or binary number. What kind of data should you put in? From the perspective of image restoration and the principle of neural network, if the compressed feature data are floating point, the restored image quality is the highest. However, a floating point number takes up 32 bits, which is calculated as (96×64×192×32)/(768×512) =96. After compression, the number of bits per pixel changes from 24 to 96. Instead of compression, it increases, which is a bad result.

So to design a reliable algorithm, we use a technique called quantization, which converts floating point numbers into integers or binary numbers. The simplest operation is to remove the decimal after the floating point number. When floating point numbers become integers, they only take up 8 bits, which means that each pixel takes up 24 bits. Correspondingly, at the decoding end, inverse quantization technology can be used to restore the transformed feature data to floating point number, such as adding a random decimal to the integer, which can reduce the influence of quantization on the accuracy of neural network to a certain extent and improve the quality of restored image.

The 8:1 compression ratio does not seem to us to be a desirable result, even though each data in the compression feature occupies one bit. So how do you further optimize the algorithm? Let’s look at the formula for BPP. Assuming that each compressed feature data unit occupies one bit, the formula can be written as :(96×64×192×1) /(768×512) =3, and the calculated result is 3 bit/pixel. For the purpose of compression, the smaller the BPP, the better. In this formula, the denominator is determined by the image, and the adjustable part is in the numerator, where the three numbers 96, 64 and 192 are related to the network structure. Obviously, as we design better network structures, these three numbers will get smaller.

What module does that 1 relate to? 1 means that each compressed feature data unit occupies 1 bit on average. Quantization will affect this number, but it is not the only influencing factor. It is also related to bit rate control and entropy coding. The purpose of bit rate control is to make the data distribution in the compressed feature data unit as concentrated as possible and the range of values as small as possible on the premise of ensuring the quality of image restoration. In this way, we can further reduce the value of 1 through the entropy coding technology, and the image compression rate will be further improved.

Video compression using deep learning can be regarded as an extension of image compression based on deep learning. It can further reduce bit rate on the basis of single compression by combining spatio-temporal information such as optical flow between frames of video sequences.

Advantages of deep learning image compression

Image compression TNG developed by Toya Technology through deep learning technology has surpassed WebP and BPG in internal tests. The following figure shows the evaluation results on Kodak24 standard data set, which are PSNR value and MS-SSIM value respectively.

FIG. 3 FIG. 4 Evaluation results on Kodak24 standard data set. PSNR results are shown in the figure above, and MS-SSIM results are shown in the figure below

Friends familiar with image compression can be seen directly from PSNR and MS-SSIM value :TNG PSNR value and MS-SSIM value is significantly higher than WebP, JPEG2000 and JPEG; The PSNR value of TNG is higher than that of BPG in the case of high code words, and its MS-SSIM value is generally higher than that of BPG.

  • Comparison of COMPRESSION effect between TNG and WebP in the case of low code words

FIG. 5 FIG. 6 Comparison of COMPRESSION effect between TNG and WebP in the case of low code words FIG. 5 TNG and FIG. 6 WebP

Compared with TNG, although WebP retains more details, it has more distortion, which is not conducive to later recovery. However, TNG adopts edge preserving filtering method to reduce distortion, and the overall image effect is better than WebP.

  • Comparison between TNG and BPG in the case of high code words

FIG. 7 FIG. 8 Comparison of COMPRESSION effects between TNG and BPG in the case of high code words, FIG. 7 TNG and BPG

The above two pictures show the situation of high code words. In the actual test, BPG will have the color distortion as shown in the figure above. In contrast, TNG basically does not appear this kind of distortion.

This is because the YUV channel of BPG is encoded and decoded separately when the picture is encoded and compressed, resulting in some chromatic aberrations.

However, TNG takes into account the overall picture when coding and adopts the same coding, thus avoiding the above situation.

  • Comparison of TNG and BPG in the case of low code words

FIG. 9 FIG. 10 Comparison of COMPRESSION effects between TNG and BPG in the case of low code words, FIG. 9 TNG and 10 BPG

In the case of low code words, false contour and block effect appear in BPG compressed image, and the continuity of the whole image is poor. And TNG image continuity and object contour maintain better.

Image compression can be used in a wide variety of fields, from social applications, news clients to games, where there are images, there are image compression features. Using more advanced image compression technology can help enterprises that use a lot of images save a lot of image bandwidth costs, and help users save image traffic and reduce the time required to load images.

conclusion

In general, image compression algorithm design by deep learning is a very promising but also very challenging technology. Deep learning image compression technology can make people have better visual experience in the era of comprehensive HD screen. Meanwhile, in the fields of games and spatial image sensing, deep learning image compression technology can help images achieve higher resolution and smaller storage space, so as to provide users with better visual experience.

Here is a link to TNG’s test:

www.tucodec.com/picture/ind…

You can test by yourself (it is recommended to test on the PC). After the test, you can download the compressed pictures and binary files. After downloading and installing the decoder, you can also restore the compressed pictures.

For more content, you can follow AI Front, ID: AI-front, reply “AI”, “TF”, “big Data” to get AI Front series PDF mini-book and skill Map.