This is the 13th day of my participation in the August More text Challenge. For details, see: August More Text Challenge

GAN generates adversarial network study notes

1. The story behind GAN’s birth:

Ian Goodfellow, the founder of GAN, was discussing academic issues with his colleagues after a little drunk in the bar. At that time, he came up with GAN’s initial idea, but it was not recognized by his colleagues at that time. When he went back from the bar, he found that his girlfriend was already asleep, so he stayed up all night writing code, and found it was really effective. GAN was born, a pioneering work. Generative Adversarial Nets was first proposed in this paper.

Link to the paper: arxiv.org/abs/1406.26…

2. Principle of GAN:

GAN’s main inspiration comes from the idea of zero-sum game in game theory. When applied to deep learning neural network, G learns the distribution of data through the game between Generator (G) and Discriminator (D). If it is used in picture generation, after training, G can generate a realistic image from a random number. The main functions of G and D are:

  • G is a generative network that receives a random noise, Z (a random number), from which images are generated

  • D is a network that determines whether a picture is “real” or not. Its input parameter is x, x represents a picture, and the output D (x) represents the probability that X is a real picture. If it is 1, it is 100% real, and if it is 0, it cannot be real

In the training process, the goal of generating network G is to generate as many real pictures as possible to deceive network D. And D’s goal is to try to distinguish the fake image that G generates from the real one. Thus, G and D form a dynamic “game process,” and the final equilibrium is the Nash equilibrium.


The colloquial meaning is between the counterfeiting of money by criminals and the identification of counterfeit money by police

[1] The generation model G is equivalent to the counterfeiters. Its purpose is to generate more real counterfeiters that cannot be identified by the police according to the coins seen and the identification technology of the police.

[2] Discriminant model D is equivalent to the party that identifies counterfeit money. Its purpose is to identify the counterfeit money produced by criminals as much as possible. In this way, through the competition between counterfeiters and counterfeiters and the improvement of the purpose, the Nash equilibrium effect that the model can generate real money as much as possible and the counterfeiters can not judge the true or false can be achieved finally (the probability of true or false money is 0.5).


As shown in the figure:


3.GAN schematic diagram:


4. Characteristics of GAN:

  1. Compared with the traditional model, there are two different networks, rather than a single network, and the training method is confrontation training

  2. The gradient update information of G in GAN comes from the discriminator D, not from the data sample


5. Advantages of GAN:

  1. GAN is a generative model that, in contrast to other generative models (Boltzmann machine and GSNs), only uses back propagation and does not require a complex Markov chain

  2. GAN can produce a clearer, more realistic sample than any other model

  3. GAN is a kind of unsupervised learning style training, which can be widely used in unsupervised learning and semi-supervised learning fields

  4. Compared to the variational autoencoder, GANs does not introduce any deterministic bias. The variational methods introduce deterministic bias because they optimize the lower bound of the logarithmic likelihood rather than the likelihood itself, which seems to result in the instance generated by VAEs being more ambiguous than GANs

  5. Compared with VAE and GANs, there is no variational lower bound. If the discriminator is well trained, then the generator can learn the distribution of training samples perfectly. In other words,GANs is asymptotically consistent, but VAE is biased

  6. GAN is applied to some scenes, such as image style migration, super resolution, image completion, denoising, to avoid the difficulty of loss function design, no matter what, as long as there is a benchmark, directly on the discriminator, the rest will be handed over to the confrontation training.


6. Disadvantages of GAN:

  1. Training GAN requires Nash equilibrium, which can sometimes be done using gradient descent and sometimes not. We haven’t found a good way to achieve Nash equilibrium, so training GAN is not stable compared with VAE or PixelRNN, but I think it is still more stable than training Boltzmann machine in practice

  2. GAN is not suitable for processing discrete forms of data, such as text

  3. There are problems of training instability, gradient disappearance and mode collapse in GAN (which have been solved at present)


7. Some skills to train GAN:

  1. The input is normalized to between (-1, 1), and the activation function of the last layer uses tanh (except the BEGAN).

  2. Using wassertein GAN’s loss function,

  3. If there is label data, try to use the label, some people have suggested that using reverse label effect is good, and use label smooth, single side label smooth or double side label smooth

  4. Use mini-batch norm instead of batch norm. Use instance norm or weight norm instead of batch norm

  5. Avoid the use of RELU and pooling layers to reduce the possibility of sparse gradient and use the Leakrelu activation function instead

  6. The optimizer should try to select ADAM, and the learning rate should not be set too large. The initial 1E-4 can be referred to. In addition, the learning rate can be continuously reduced as the training goes on.

  7. Adding Gaussian noise to the network layer of D is equivalent to a kind of regularity.


8. What are the extensions of GAN:

DCGAN
CGAN
ACGAN
infoGAN
WGAN
SSGAN
Pix2Pix GAN
Cycle  GAN
Copy the code

9. What can GAN do: The answer is to generate data

Generate audio to generate pictures (animals: cats, dogs, etc.; Face picture, face to animation picture, etc.).......Copy the code

Let’s start with a food map to slow things down.

Continue !!!!!

10. The classic case of GAN: Generating handwritten digital images

  • The source code and how to get the dataset are below

  • There are two formats: py and IPynb (the code is the same)

The code is as follows:

# -*- coding: utf-8 -*-
""" Created on 2020-10-31@author: """
# Import data package
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
#get_ipython().run_line_magic('matplotlib', 'inline')
import numpy as np
import glob
import os
 
 
# # input
(train_images,train_labels),(_,_)=tf.keras.datasets.mnist.load_data()
 
 
train_images  = train_images.astype('float32')
 
 
# # Data preprocessing
train_images=train_images.reshape(train_images.shape[0].28.28.1).astype('float32')
 
 
# normalize to [-1, 1]
train_images = (train_images -127.5) /127.5
 
 
BTATH_SIZE=256
BUFFER_SIZE=60000
 
 
# Input pipe
datasets = tf.data.Dataset.from_tensor_slices(train_images)
 
 
# Out of order, and take bTAth_size
datasets  =  datasets.shuffle(BUFFER_SIZE).batch(BTATH_SIZE)
 
 
# # Generator model
def generator_model() :
    model = tf.keras.Sequential()
    model.add(layers.Dense(256,input_shape=(100,),use_bias=False))
    #Dense, input_shape=(100,), use_bias=False, because there is BN layer behind
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())# activation
    
    The second floor #
    model.add(layers.Dense(512,use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())# activation
    
    # output layer
    model.add(layers.Dense(28*28*1,use_bias=False,activation='tanh'))
    model.add(layers.BatchNormalization())
    
    model.add(layers.Reshape((28.28.1)))# to become an image to pass in tuple
    
    return model
    
# # Discriminator model
def discriminator_model() :
    model = keras.Sequential()
    model.add(layers.Flatten())
    
    model.add(layers.Dense(512,use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())# activation
    
    model.add(layers.Dense(256,use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())# activation
    
    model.add(layers.Dense(1))# Output number, >0.5 real picture
    
    return model
    
# # Loss function
cross_entropy=tf.keras.losses.BinaryCrossentropy(from_logits=True)#from_logits=True because the last output is not active
 
 
# # Generator loss function
def generator_loss(fake_out) :# expect fake_out to be true
    return cross_entropy(tf.ones_like(fake_out),fake_out)
 
 
 
 
# # The discriminant loss function
def discriminator_loss(real_out,fake_out) :The output of the discriminator is 1 for real images and 0 for false images
    real_loss=cross_entropy(tf.ones_like(real_out),real_out)
    fake_loss=cross_entropy(tf.zeros_like(fake_out),fake_out)
    return real_loss+fake_loss 
 
 
 
 
# # Optimizer
 
 
generator_opt=tf.keras.optimizers.Adam(1e-4)# Learning rate
discriminator_opt=tf.keras.optimizers.Adam(1e-4)
 
 
EPOCHS=500
noise_dim=100 # A random vector of length 100 generates a handwritten data set
num_exp_to_generate=16 # Generate 16 samples per step
seed=tf.random.normal([num_exp_to_generate,noise_dim]) # Generate a random vector and observe the change
 
 
# # training
generator=generator_model()
discriminator=discriminator_model()
 
 
 
 
# # Define the batch training function
def train_step(images) :
    noise = tf.random.normal([num_exp_to_generate,noise_dim])
    
    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        # Identify true images
        real_out = discriminator(images,training=True)
        # Generate image
        gen_image = generator(noise,training=True)
        # Identify generated images
        fake_out = discriminator(gen_image,training=True)
        
        
        # Loss function test
        gen_loss = generator_loss(fake_out)
        disc_loss = discriminator_loss(real_out,fake_out)
    
    # Training process
    # Gradient of generator and generator trainable parameters
    gradient_gen = gen_tape.gradient(gen_loss,generator.trainable_variables) 
    gradient_disc = disc_tape.gradient(disc_loss,discriminator.trainable_variables)
    
    # The optimizer optimizes the gradient
    generator_opt.apply_gradients(zip(gradient_gen,generator.trainable_variables))
    discriminator_opt.apply_gradients(zip(gradient_disc,discriminator.trainable_variables))
        
# # Visualization
def generator_plot_image(gen_model,test_noise) :
    pre_images = gen_model(test_noise,training=False)
    # Draw 16 images in one 4x4
    fig = plt.figure(figsize=(4.4))
    for i in range(pre_images.shape[0]):
        plt.subplot(4.4,i+1) # rows from 1
        plt.imshow((pre_images[i,:,:,0] +1) /2,cmap='gray') # Normalization, grayness
        plt.axis('off') # Do not display axes
    plt.show()
 
 
def train(dataset,epochs) :
     for epoch in range(epochs):
        for image_batch in dataset:
            train_step(image_batch)
        #print(' epoch '+ STR (epoch+1)+' epoch ')
        if epoch%10= =0:
            print('the first'+str(epoch+1) +'Training results')
            generator_plot_image(generator,seed)
 
 
train(datasets,EPOCHS)
 
 
Copy the code

Training results:

  • Results of the first training session

  • Results of the 100th training session

Conclusion:

After 100 sessions, you can see the numbers clearly, after 300 sessions, you can see the numbers clearly, but after 300 sessions, 400,500 sessions, the effects gradually decline. The content of the picture becomes blurred.