Teach you to write the first generative adversarial network GAN

My friends, for reprint please indicate the source: blog.csdn.net/jiangjunsho…

We’ve already covered the basics of GAN, but you can’t internalize it without putting it into practice, so we’re going to use TensorFlow to implement a naive GAN. (Tensorflow 1.x syntax is used in this article.)

We basically create the simplest GAN and then train it to produce handwritten digital images that look just like real ones. Let’s write the code directly.

(1) Import third-party libraries.

 import tensorflow as tf

  import numpy as np

  import pickle

  import matplotlib.pyplot as plt
Copy the code

We use TensorFlow to implement the network architecture of GAN and train the constructed GAN. Use NUMpy to generate random noise for generating input data for the generator; Pickle is used to persist variables. Finally, matplotlib is used to visualize the changes of two network structure losses and the images generated by GAN during GAN training.

(2) In order to train GAN to generate pictures in MNIST handwritten data set, the real pictures in MNIST data set need to be read as the real data of training discriminator D. TensorFlow provides a method for processing MNIST, which can be used to read MNIST data.

The from tensorflow. Examples. Tutorials. Mnist import input_data mnist = # read in mnist data Input_data. read_data_sets('./data/MNIST_data') img = mnist.train.images[500] # imshow(0 28)), cmap='Greys_r') plt.show()Copy the code

After reading in MNIST images, each image is represented by a one-dimensional matrix.

Print (type(img)) print(img.shape) <class 'numpy.ndarray'> (784,)Copy the code

The input_data.read_data_sets method will not download automatically after TensorFlow 1.9. If MNIST data sets are not available locally, an error will be reported, so we must download it beforehand.

We then define methods for receiving input, using TensorFlow placeholder placeholder to get the input data.

  def get_inputs(real_size, noise_size):

      real_img = tf.placeholder(tf.float32, [None, real_size], name='real_img')

      noise_img = tf.placeholder(tf.float32, [None, noise_size], name='noise_img')

      return real_img, noise_img
Copy the code

Then you can implement the generator and discriminator. Let’s look at the generator. The code is as follows.

Def generator(noise_img, n_units, out_DIM, reuse=False, alpha=0.01): "" Paramout_dim: the tensor size is 32×32=784: Param reuse: Param alpha: leakeyReLU reuse: paramout_dim ''' with tf.variable_scope("generator", reuse=reuse): # hidden1 = tf.layers. Dense (noise_img, n_units) # Hidden1) Hidden1 = tf. Layers. Dropout (hidden1, rate=0.2, training=True) # Dense: 10 outputs = tf.tanh(logits) return logits, outputs = tf.layers.dense(hidden1, out_DIMCopy the code

It can be found that the network structure of generator is very simple, just a neural network with a single hidden layer, and its overall structure is input layer → hidden layer → output layer. At the beginning, just write the simplest GAN, and in the later advanced content, the structure of generator and discriminator will be more complicated.

To briefly explain the above code, we first create a space named generator using TF.variable_scope. The main purpose of this space is to realize that variables can be reused and components between different convolutional layers can be easily distinguished.

Then the dense method under TF.Layers was used to fully connect the input layer and the hidden layer. The TF.Layers module provides many methods of higher encapsulation level. Using these methods, we can construct the corresponding neural network structure more easily. In this case, the dense method is used, and its role is to achieve full connectivity.

We chose Leaky ReLU as the activation function for the hidden layer, using the tf.maximum method to return the larger value after being activated through Leaky ReLU.

Then, the dropout method of TF. layers is used, which is to randomly discard the network units in the neural network according to a certain probability (that is, set the parameters of the network units to 0) to prevent over-fitting. Dropout can only be used in training, but not in testing. Finally, the whole connection between the hidden layer and the output layer is realized through dense method, and Tanh is used as the activation function of the output layer (Tanh is better as the activation function generator in the test). The output range of Tanh function is −1 ~ 1, that is, the pixel range of the generated image is −1 ~ 1. However, the pixel range of the real image in the MNIST data set is 0 ~ 1, so the pixel range of the real image should be adjusted to make it consistent with the generated image during training.

The Leakey ReLU function is a variant of the ReLU function, which differs from the ReLU function in that ReLU sets all negative values to zero, whereas Leakey ReLU multiplies negative values by a slope.

Now let’s look at the code for the discriminator.

Def discirminator(img, n_units, reuse=False, alpha=0.01): def discirminator(img, n_units, reuse=False, alpha=0.01): :param alpha: :return: ''' with tf.variable_scope('discriminator', reuse=reuse): hidden1 = tf.layers.dense(img, n_units) hidden1 = tf.maximum(alpha * hidden1, hidden1) logits = tf.layers.dense(hidden1, 1) outputs = tf.sigmoid(logits) return logits, outputsCopy the code

The implementation code of the discriminator is not much different from that of the generator, except that the output layer of the discriminator has only one network unit and uses sigmoID as the activation function of the output layer. The sigmoID function output values range from 0 to 1.

Once the generator and discriminator are written, it’s time to write the actual graph, starting with some initialization such as defining the required variables and clearing the Default Graph graph.

Img_size = mnist.train.images[0].shape[0]# Noise_size = 100 # noise,Generator initial input g_units = 128# Generator hidden layer parameter d_units = 128 alpha = 0.01 # Leaky ReLU parameter Learning_rate = 0.001 # Learning rate smooth = 0.1 # Tag smooth # Reset the default graph calculates the graph and nodes tf.reset_default_graph()Copy the code

We then use the get_inputs method to get the real picture inputs and noise inputs, and pass in generators and discriminators for training. Now, of course, we just build the training structure of the entire GAN network.

Outputs g_logits, g_outputs = generator(noise_img, g_units, img_size) D_outputs_real = discirminator(real_img, d_units) D_logits_fake, d_outputs_fake = discirminator(g_outputs, d_units, reuse=True)Copy the code

The code above passes noise, the number of nodes in the generator’s hidden layer, and the real image size to the generator because the generator is required to generate an image of the same size as the real image.

The discriminator first passes in the real image and the discriminator hidden layer node to score the real image, and then trains the generated image with the same parameters to score the generated image.

After the training logic is built, the generator and discriminator losses are defined. Recall the previous description of the loss. The loss of the discriminator consists of the difference between the score the discriminator gives to the real image and the expected score, and the difference between the score the discriminator gives to the generated image and the expected score. The highest score is 1 and the lowest score is 0, which means that the discriminator wants to give the real image a score of 1 and the generated image a score of 0. The generator’s loss is essentially the difference between the probability distribution of the generated image and the real image, which is transformed here into the difference between the score that the generator expects the discriminator to give to its generated image and the score that the discriminator actually gives to the generated image.

d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( logits=d_logits_real, labels=tf.ones_like(d_logits_real))*(1-smooth)) d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( Logits = d_logitS_fake, labels= tf.zerOS_like (d_logitS_fake))) d_loss_fake) g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( logits=d_logits_fake, labels=tf.ones_like(d_logits_fake))*(1-smooth))Copy the code

Sigmoid_cross_entropy_with_logits method is used to calculate the loss. It first calculates the sigmoID function for the logits parameters passed in and then calculates their cross entropy loss. At the same time, the method optimizes the calculation method of cross entropy so that the results do not overflow. The name of the method makes it intuitive.

Once the loss is defined, all you have to do is minimize the loss.

Tensor g_vars = [var for var in train_vars if var.name.startswith("generator")] # discriminator D_vars = [var for var in train_vars if var.name. Startswith ("discriminator")] #AdamOptimizer tf.train.AdamOptimizer(learning_rate).minimize(d_loss, var_list=d_vars) g_train_opt = tf.train.AdamOptimizer(learning_rate).minimize(g_loss, var_list=g_vars)Copy the code

To minimize losses, parameters in the corresponding network structure, namely variables of the generator and discriminator, are first obtained, which are the objects to be modified to minimize losses. AdamOptimizer method is used here to minimize the loss, and Adam algorithm is implemented internally, which is based on the gradient descent algorithm, but it can dynamically adjust the learning rate of each parameter.

At this point, the whole calculation result is roughly defined, and then the concrete training logic starts to be implemented. Some training-related variables are initialized first.

Batch_size = 64 # Number of training rounds epochs = 500 # Number of training rounds n_SAMPLE = 25 # Number of samples taken = [] # Storage test samples = [] # Storage loss # Save generator variables saver = tf.train.Saver(var_list=g_vars)Copy the code

Write training specific code.

Sess.run (tf.global_variables_initializer()) for e in range(epochs): for batch_i in range(mnist.train.num_examples // batch_size): Batch = mnist.train.next_Batch (batch_size) #28 × 28 = 784 batch_images = batch[0]. 0 0 784)) # scale image pixels This is because Tanh output is between (-1,1). The discriminator parameter batch_images = batch_images * 2-1 # generates a noise image batch_noise = Np.random. Uniform (-1,1,size=(batch_size, noise_size)) Batch_images, noise_img:batch_noise}) _= sess.run(g_train_opt, feed_dict={noise_img:batch_noise}) # Loss train_loss_d = ess. Run (d_loss, feed_dict={real_img:batch_images, Train_loss_real = sess.run(d_loss_real, Feed_dict ={real_img:batch_images,noise_img:batch_noise}) # loss_fake = sess.run(d_loss_fake, Train_loss_g = sess.run(g_loss, feed_dict={noise_img: batch_images,noise_img:batch_noise}) Batch_noise}) print(" {}/{}..." . The format (e + 1, epochs), "discriminant implement total loss: {: 4 f} (real image loss: {: 4 f} + false images loss: {: 4 f})..." Format (train_loss_d, train_LOss_d_real,train_loss_d_fake)," Generator loss: {:.4f}". Format (train_loss_g)) # Losses. Append (train_loss_d, train_loss_d_real, train_loss_d_fake, Sample_noise = Np.random. Uniform (-1, 1, size=(n_sample, noise_size)) Gen_samples = sess.run(Generator (noise_img, g_units, IMg_size, reuse=True), Feed_dict ={noise_img:sample_noise}) sample.append (gen_samples) # Storage that I've captured saver.save(sess, './data/generator.ckpt') with open('./data/train_samples.pkl', 'wb') as f: pickle.dump(samples,f)Copy the code

Is to create the Session object at the beginning, then use double deck for loop for the training of the GAN, the first layer to how many round of training, the training of each round, the second said to take the sample, because the breath training in a real image efficiency will be lower, common practice is to be divided into groups, then rounds of training, There are 64 cards in a group.

Then there is read in a set of real data, because the generator using Tanh as output layer activation function, lead to generate image pixel range is 1 ~ 1 -, so here is simply adjust the real picture pixel access, from the 0 ~ 1-1 ~ 1, and then use numpy uniform method to generate the random noise between – 1 ~ 1. Once you have the real data and the noise data ready, you can throw in the generator and discriminator, and the data will run according to the graph we designed earlier. It is worth noting that you train the discriminator first, and then train the generator.

After all real images are trained in this round, the losses of generators and discriminators in this round are calculated and recorded for the convenience of visualized changes in losses during GAN training. In order to intuitively feel the changes of generator during GAN training, a group of pictures are generated and saved after each round of GAN training. Once the training logic is written, you can get the training code running and output something like this.

Training rounds 1/500… Total loss of discriminator: 0.0190(loss of real picture: 0.0017 + loss of fake picture: 0.0173)…

Generator losses: 4.1502

Training rounds 2/500… Total loss of discriminator: 1.0480(true image loss: 0.3772 + false image loss: 0.6708)…

Generator loss: 3.1548

Training rounds 3/500… Total loss of discriminator: 0.5315(true image loss: 0.3580 + false image loss: 0.1736)…

Generator loss: 2.8828

Training rounds 4/500… Total discriminant loss: 2.9703(true image loss: 1.5434 + false image loss: 1.4268)…

Generator loss: 0.7844

Training rounds 5/500… Total loss of discriminator: 1.0076(true image loss: 0.5763 + false image loss: 0.4314)…

Generator loss: 1.8176

Training rounds 6/500… Total loss of discriminator: 0.7265(true image loss: 0.4558 + false image loss: 0.2707)…

Generator loss: 2.9691

Training rounds 7/500… Total loss of discriminator: 1.5635(true image loss: 0.8336 + false image loss: 0.7299)…

Generator loss: 2.1342

The whole exercise will take 30 to 40 minutes.

Teach you to write the first generative adversarial network GAN

Related Posts

Tensorflow 1.x Tutorial — The learning rate decays and displays variable changes in the TensorBoard

Caffe source code compilation

Abstract: Bottom-up Abstractive Summarization