This paper mainly introduces the development and representative work of GAN(generative adversarial network) in distribution difference measurement, IPM and regularization, dual learning, condition and control, resolution improvement, evaluation index and other issues in recent years, hoping to help students who have not followed GAN related work before.

Author: Heng Heng

Machines don’t learn

Editor: happyGirl

Recently, the author conducted some research on GAN + GCN/video under the guidance of my tutor. It has to be said that GAN has been popular for so long that it still has strong vitality in the cross fields and application fields such as graph convolution and video analysis. There are a lot of problems to solve when 1+1=2. In the research process, the author first tries some basic models and selects representative ones to record. In the future, I will continue to record GAN + GCN/GAN + Video and pyTorch GAN Zoo I am implementing.

In 9102, the goal that everything can be embedding has been basically realized, indicating that learning is widely valued and generative learning is in full bloom. In my recent research, I found that there is little glue work left in the crossover area of 1+1=2. However, after 1+1=2, there are still some small problems related to task characteristics. In order to solve the small problems in the field of the author, the author summarizes and replicates the classic GAN network, hoping to be helpful to students who have not followed up GAN related work before

advertising

Many existing machine learning tasks can be boiled down to domain transform, converting data from the source domain to the target domain, such as generating images from text, generating the next frame from the previous frame, converting one style to another, and so on. The existing neural network Module has been able to help us map source data to any target size, while traditional Loss functions such as MSE, MAE and Huber Loss can also measure the difference between the generated sample and the target domain sample.

However, models built this way (such as Auto Encoder) tend to be less satisfactory after BP.

In the research process, some work found that these traditional loss functions can only roughly calculate the gradient according to the average error of all pixels in the process of guiding NN update, resulting in many edge distributions and local differences that have not been learned.

Figure 1: Limitations of MSE

As shown in the figure above, the two generated images have the same number of modified pixels on the basis of the original image, so their MSE errors are the same. However, according to common sense, the second image clearly does not fit the pattern of 0. A good loss function should give a larger MSE to the second generated image.

Based on this, a learning loss function, namely Discriminator, is proposed for GAN networks, which adaptively measures the difference between two population distributions, namely continuous probability distributions. (Different from fixed Loss functions such as MSE, MAE and Huber Loss, the difference between two samples is measured, that is, discrete probability distribution).

Figure 2: Instead of measuring the difference between samples, GAN measures the difference between two population distributions

In the process of derivation, most of the work is based on the theory of “Bayesian statistics” to maximize the likelihood of the generator domain and the target domain.

In my opinion, a real research direction with vitality may not have good performance, but at least it should be able to be divided into different sub-problems and blossom and bear fruit separately. If we all stack and change modules on the same issue, this research direction may be short-lived. GAN, as a topic that is still developing vigorously in 2019, can be divided into the following 6 categories in CV:

First, the measurement of distribution difference

The accuracy and diversity of generating effect are improved by measuring the difference between generating distribution and target distribution

IPM and regularization

The stability of GAN convergence is improved by truncating gradient and adding regularization for gradient

3. Dual learning

Cyclic consistency is used to make full use of data by adding constraints on source domain and reconstruction domain

Conditions and control

Fusing known conditions to control the features of the generation process and the generated results

5. Efforts to improve resolution

Traditional GAN networks tend to be fuzzy when generating large images. Some work has been done to improve the resolution of generated images

6. Evaluation indicators

Measurement of different GAN generation effects

First, the measurement of distribution difference

In the previous article, we mentioned that the essential goal of GAN is to make the generated distribution and the target distribution as close as possible. But how do you measure the difference between their probability distributions?

GAN

Figure 3: GAN consists of generator and discriminator

Goodfellow first proposed minimax game, which opened the chapter of GAN. GAN needs to train two models simultaneously, namely, a generation model that can capture data distribution and a discriminant model that can estimate whether the data is a real sample. The training goal of the generator is to maximize the probability that the discriminator will make an error, that is, by optimizing the generation distribution, the discriminator will mistakenly believe that the generated false sample is true. The training goal of the discriminator is to minimize the probability of making mistakes, that is, to find the false samples generated by the generator. Loss can be expressed as follows:

In the implementation process, the discriminator and generator of GAN are often optimized alternately (or 5:1), and the optimization objective of the discriminator and generator can be written separately:

Paper: Arxiv (arxiv.org/abs/1406.26…)

Github (github.com/eriklindern…)

LSGAN

LSGAN codes the generated samples and real samples as, respectively, and uses square error to replace the logic loss of GAN:

The experimental results show that LSGAN can partially solve the problems of unstable GAN training and poor image quality. However, the excessive penalty of square error on outliers may lead to excessive imitation of real samples and reduce the diversity of generated results.

Paper: Arxiv code: [github](link.zhihu.com/?target=htt… WGAN-GP-DRAGAN-Pytorch/blob/master/v0/train_celeba_lsgan.py)

f-GAN

F-gan further expands the loss function of GAN, and considers that the JS divergence used by GAN and the Chi-square divergence used by LSGAN are special cases of divergence. Other different distances or divergence can be used to measure the real distribution and the generated distribution. On this basis, F-GAN designs a set of losses calculated according to different divergence:

Among them, can be replaced by a variety of expressions according to different divergence; Since the range of the discriminator is required, the activation function of the output layer of the discriminator also needs to be replaced:

Figure 4: Various forms of F-gan

Paper: Arxiv (arxiv.org/abs/1606.00…)

Code: making (github.com/shayneobrie…).

EBGAN

While F-GAN is integrated from the divergence perspective, EBGAN treats the discriminator as an energy function and as a trainable loss function. The energy function considers the regions close to the real distribution as low energy regions and those far from the real distribution as high energy regions. The generator will generate the smallest possible fake sample of energy. In this perspective, the network structure and loss function of generators are more flexible. EBGAN proposes to use the autoencoder structure and replace the classification results of classifiers with reconstruction errors:

Figure 5: EBGAN discriminator adopts autoencoder structure

That is,. When designing the loss function, the author adds a marginal value to make the energy model more stable:

Thesis: arxiv (arxiv.org/abs/1609.03…).

Code: making (github.com/eriklindern…).

IPM and regularization

Most of the time, due to adversarial learning, the convergence of GAN is not ideal. IPM (integral probability measure) transforms the discriminator’s output from probability to real number, and limits the gradient within a certain interval through regularization, which effectively prevents the discriminator from being optimized prematurely and causing the generator gradient to disappear.

WGAN

After analyzing the reasons for the instability of GAN convergence, WGAN believes that it is difficult to control the gradient of discriminator training that leads to the instability of GAN convergence. If the discriminator is well trained, the gradient of generator disappears and Loss is difficult to decrease. Discriminator training isn’t good, generator gradients aren’t accurate, and Loss runs around. The balance between discriminator and generator in a zero-sum game is only right.

WGAN has made the following changes:

  1. The last layer of the discriminator cancels sigmoID

2. Use gradient clipping on the discriminator to limit the gradient value within the interval.

3. Use RMSProp or SGD and optimize with low learning rate

The loss function can be expressed as:

The function of, is to limit the drastic change of, which can be expressed as:

The implementation is to limit the value of the gradient to the interval.

Thesis: arxiv (arxiv.org/abs/1701.07…). Code: making (https://github.com/Zeleni9/pytorch-wgan/blob/master/models/wgan_clipping.py)

WGAN-GP

Soon after WGAN was proposed, the author of WGAN optimized WGAN by replacing weight clipping with gradient penalty, and proposed WGAN-GP with gradient penalty.

Thesis: arxiv (arxiv.org/abs/1704.00…). Code: making (https://github.com/caogang/wgan-gp/blob/master/gan_cifar10.py)

BEGAN

BEGAN further combined the ideas of WGAN and EBGAN. On the one hand, BEGAN uses autoencoder and reconstruction error measure to generate the difference between the sample and the real sample:

The discriminator of BEGAN in Figure 6 also adopts the structure of autoencoder

On the other hand, BEGAN trains a hyperparameter to balance the optimization speed of the discriminator and generator:

Thesis: arxiv (arxiv.org/abs/1703.10…). Code: making (https://github.com/shayneobrien/generative-models/blob/master/src/be_gan.py)

3. Dual learning

Some works extend the generation-recognition process of GAN to generation-recognition and reconstruction-recognition process through dual learning, making full use of the information of source domain and target domain. DaulGAN, CycleGAN and DiscoGAN have similar network structures, but the differences in motivation are very interesting:

DaulGAN

DaulGAN proposed that converting a source distribution to a target distribution and converting a target distribution back to a source distribution is a dual problem, which can be synergistically optimized.

Figure 7: DaulGAN’s network structure

CycleGAN

CycleGAN proposed the cycle-consistent principle, whose basic idea is that after an image is mapped into another kind of image, it should be able to transform back to the original image through inverse mapping.

Figure 8: CycleGAN network structure

Thesis: arxiv (arxiv.org/abs/1703.10…). Code: making (https://github.com/aitorzip/PyTorch-CycleGAN/blob/master/models.py)

DiscoGAN

To learn about mapping between domains, DiscoGAN first came up with the idea of adding a second generator, and a reconstruction loss term to compare real and reconstructed images.

Figure 9: Single mapping network of DiscoGAN

However, the model thus designed is unidirectional and cannot simultaneously learn how to map from the target domain back to the source domain. In addition, due to the excessive punishment of MSE for outliers, the model will also have the problem of pattern collapse, and only minor modifications will be made on the source graph. Therefore, the author further proposed DiscoGAN of bidirectional mapping:

Figure 10: DiscoGAN double mapping network

Thesis: arxiv (arxiv.org/abs/1703.05…). Code: making (https://github.com/carpedm20/DiscoGAN-pytorch/blob/master/models.py)

Conditions and control

The generation of GAN samples is not controllable. ConditionalGAN directs the process of generating the samples by adding a prior/condition, thus controlling the generated samples to meet certain characteristics.

cGAN

GAN can be used to generate distributions that are close to the target distribution, such as numbers from 0 to 9. However, we cannot interfere with the traditional GAN generation distribution process, such as specifying that the number 1 is generated, etc. Therefore, cGAN changes the probability distribution in GAN to conditional probability:

Specifically, we concatenate known conditional vectors in both generator and discriminator inputs:

Figure 11: Network structure of cGAN

In the figure, represents the noise sampled from normal distribution; Represents the sample sampled in the real distribution, and represents the conditional vector, such as the one Hot encoding of the sample tag. When the discriminator discriminates and generates samples, the discriminator discriminates according to conditions, forcing the generator to generate samples with reference to condition vectors.

Thesis: arxiv (arxiv.org/abs/1411.17…). Code: making (https://github.com/eriklindernoren/PyTorch-GAN/blob/master/implementations/cgan/cgan.py)

IcGAN

Initially, cGAN only takes the one HOT encoding of the sample tag as input and controls the generation of samples at the tag level. How do you fine-tune some of the characteristics of the generated sample? IcGAN learned the mapping from the original image to its feature vector through the encoder, and now it generates the desired features by modifying some features of the feature vector as the input of the generator:

Figure 12: NETWORK structure of IcGAN

ACGAN

Instead of feeding conditions (categories of samples) directly into the discriminator, ACGAN trains the discriminator to classify the samples, that is, the discriminator not only needs to judge whether each sample is true or false, but also needs to predict known conditions (categories of samples, adding a loss of classification).

Figure 13: Network structure of ACGAN

One benefit of ACGAN is that the design of the discriminator output conditions allows us to use models pretrained on other data sets for prior learning, resulting in clearer images that mitigate the problem of pattern collapse. In addition, as shown in the figure above, there are other similar designs that add prior distributions for GAN, such as SemiGAN and InfoGAN, but with little difference.

Thesis: arxiv (arxiv.org/abs/1610.09…). Code: making (https://github.com/eriklindernoren/PyTorch-GAN/blob/master/implementations/acgan/acgan.py)

5. Efforts to improve resolution

In its initial work, GAN was limited by the size of the noise from the normally distributed sampling to produce low resolution images of 32×32. Some work has been done on how to generate high resolution images.

DCGAN

DCGAN introduced CNN into GAN for the first time (previously GAN was mostly composed of fully connected layers) and proposed a CNN + GAN structure that could stabilize convergence. Many tricks provide the basis for subsequent studies:

Figure 14: DCGAN generator

  1. Downsampling uses convolution with step sizes rather than pooling

2. Upsampling uses deconvolution instead of interpolation

3. The activation function of the discriminator uses Leaky ReLU

4. Use the BatchNorm layer (note: not applicable in WGAN)

5. Duality of generator and discriminator, etc

Thesis: arxiv (arxiv.org/abs/1511.06…). Code: making (https://github.com/eriklindernoren/PyTorch-GAN/blob/master/implementations/dcgan/dcgan.py)

SAGAN

With the deepening of research, some modules commonly used in CV combined with CNN are gradually introduced. SAGCN proposed to introduce Self Attention module into generator and discriminator to obtain information of distant related regions and improve the clarity of generated images.

Figure 15: Structure of Self Attention

In the original implementation, Self Attention only needs to be added to the last two layers of the generator and discriminator.

Thesis: arxiv (arxiv.org/abs/1805.08…). Code: making (https://github.com/heykeetae/Self-Attention-GAN/blob/master/sagan_models.py)

BigGAN

As the number of modules available increases, so does the arms race for the number of network parameters. As a milestone in the development history of GAN, BigGAN has achieved a leapfrog improvement in accuracy (128×128 resolution). Although its model is large and difficult to reproduce locally, BigGAN uses Self Attention, Res Block, large channel/ Batch and gradient stage techniques to provide references for subsequent studies.

Figure 16: BigGAN structure

Thesis: arxiv (arxiv.org/abs/1809.11…). Code: making (https://github.com/ajbrock/BigGAN-PyTorch/blob/master/BigGANdeep.py)

LAPGAN

LAPGAN combined with CGAN applied the idea of iteration and hierarchy to image generation. LAPGAN argues that instead of producing large resolution images all at once, the image should be low resolution. In the process of upsampling and improving resolution, let the generator generate the missing details, i.e., the “residual” picture, each time, and add the upsampled picture to get a higher resolution image:

Figure 17: LAPGAN’s reasoning process

In the training process, LAPGAN learns the information loss between the sampled image and the original image after down-sampling and up-sampling, namely the generation of residuals, under each resolution and the following sampled images as prior conditions:

Figure 17: LAPGAN’s training process

Thesis: arxiv (arxiv.org/abs/1506.05…).

Github (github.com/AaronYALai/…)

6. Evaluation indicators

Loss of the generator can measure the performance of the generated images that can deceive the discriminant, but it cannot measure the accuracy and diversity of the generated images. Therefore, in addition to subjective evaluation, objective evaluation indicators such as IS and FIP (similar to PSNR evaluation of image quality) have appeared in recent years to evaluate the accuracy and diversity of generated images (some students ask whether these evaluation indicators can be used as loss: These indicators only reflect some statistical characteristics of the generated data, and loss cannot guide GAN optimization.

IS

Inception Score as an early evaluation indicator, it is proposed that the results generated by GAN can be measured by two dimensions: accuracy (separability) and diversity of generated results: In the case of generated images, for a clear image, it should have a very high probability of being in one category and a low probability of being in another (it can be accurately classified for Inception V3). At the same time, if a GAN can generate a sufficient variety of images, the images it produces should be evenly distributed across categories (rather than just a few, i.e. mode collapse).

It IS worth noting that the larger IS, the better the effect of GAN.

Code: making (github.com/sbarratt/in…).

FID

However, there IS a problem with IS that real images are not involved in the evaluation process of the generated images. Therefore, FID proposes to evaluate the accuracy and diversity of generated images by comparing them with real images (at the feature Map level of Inception V3).

Notably, the smaller the FID, the better the effect of GAN.

Code: making (github.com/mseitzer/py…).

other

Both FID and IS are assessment methods based on feature extraction. Feature map can effectively describe whether certain features appear, but cannot describe the spatial relations of these features. Therefore, in recent years, GAN Dissertation, on GAN and GMM and other articles further analyze the generation effect of GAN.

An interesting conclusion is that compared with the original GAN model, most GAN models are not substantially improved, but have faster and more stable convergence. Therefore, when solving cross-field problems, the author usually uses conventional WGAN-GP for testing to get a rough baseline, and then decides whether to continue in-depth research or explore the problems of task special.

endnotes

See a very good words, guide our scientific research work (escape) with everyone mutual encouragement ~

Hierarchy does not mean that discipline X is “just an application of Y”. Each new level requires entirely new laws, concepts, and generalizations, and the research process, like its predecessor, requires a great deal of inspiration and creativity. Psychology is not applied biology, and biology is not applied chemistry.

Note: the public menu contains the arrangement of a AI cheat sheet.Perfect for studying on the commute.

You are not alone in the battle. The path and materials suitable for beginners to enter artificial intelligence download machine learning online manual Deep learning online Manual note:4500+ user ID:92416895), please reply to knowledge PlanetCopy the code

Like articles, click Looking at the