Introduction to generative Adversarial Networks (with TensorFlow code examples)

TensorFlow## deep learning ## generative adversarial networks ## neural networks #

This post is from Aylien’s blog. Generative Adversarial Networks Generate Adversarial Networks Generative Adversarial Networks generate Adversarial Networks Generative Adversarial Networks generate Adversarial Networks Generative Adversarial Networks Using Tensorflow to build deep Web classifiers), this blog post will be easier to follow.


  • Discriminant model vs. generative model
  • Example: Approximate one-dimensional Gaussian distribution
  • Improve sample diversity
  • Final thoughts
  • Some discussion about GAN

    Recently, there has been renewed interest in generating models (a case study of OpenAI on generating models). The generation model learns how to generate data that is very similar to the data we are given (real data). Let’s use an example to illustrate the principle behind this. For example, if we want to build a model that can generate high-quality news, it must first learn a lot of news articles. In other words, there should be a good representation of news documents within the model. We want to use this representation to help us with related tasks, such as classifying news by topic.

    In fact, training such a model is not easy, but recent years have seen a surge in research. One of the most famous models is the Generative Adversarial Networks (GANs). Yann LeCun, head of Facebook’s renowned AI research institute and an expert on deep learning, recently called GANs the most important development in deep learning:

    “There are many interesting recent development in deep learning… The most important one, in my opinion, is adversarial training (also called GAN for Generative Adversarial Networks). This, and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, In my opinion. “– Yann LeCun

    The rest of this blog is devoted to describing GAN formation in detail and providing a very simple example (including a TensorFlow code) that uses GAN to solve a small problem.

    Discriminant model vs. generative model

    GAN is a very interesting idea, first proposed by Ian Goodfellow of University of Montreal (now at OpenAI) in 2014. The idea behind GAN consists of two competing neural network models. One takes noise as input and produces some samples (generators). The other model (the discriminator) takes both the data generated by the generator and the real data and identifies their source. The two networks play a continuous game, in which the generator should generate data that looks more and more like real data, and the discriminator gets better and better at discriminating. The two neural networks are trained at the same time, and finally the data generated by the generation model is almost indistinguishable from the real data.

    It is common to see generators being likened to forgers trying to produce counterfeit money, while discriminators are seen as police officers trying to spot counterfeits. This setup is similar to reinforcement learning in that the generator receives a reward signal from the discriminator to see if the data it generated is correct. However, the biggest difference between enhanced learning and GAN is that we can propagate gradient information back from discriminant model to generative model. So the generator knows how to tweak the parameters to better generate data and fool the discriminator.

    Currently, GANs is mainly used in modeling natural images. They can do great image generation tasks. They can generate sharper images than other models that use maximum likelihood as training targets. Here are some examples of images produced by GANs:





    Arxiv.org/abs/1511.06…




    Arxiv.org/abs/1606.03…

    Example: Approximate one-dimensional Gaussian distribution

    To better understand how it works, we use a GAN to solve a simple problem — learning to approximate a one-dimensional Gaussian distribution. This is an example from a blog post by Eric Jang. The full demo code is at github.com/AYLIEN/gan-… . Here we focus only on the snippets of code we are interested in.

    First, we create a real distribution, a Gaussian distribution with a mean of 4 and a standard deviation of 0.5. We can get some samples of this distribution (sorted by value) from this function.

        
    1. class DataDistribution(object):
    2. def __init__(self):
    3. self.mu = 4
    4. The self. The sigma = 0.5
    5. def sample(self, N):
    6. samples = np.random.normal(self.mu, self.sigma, N)
    7. samples.sort()
    8. return samples

    The distribution is shown below:

    At the same time, we also define a generator with the input as the noise distribution (the sample function is similar to before). Following the example of Eric Jang, we also use a layered sampling method to generate the generator’s input noise. Samples are first drawn uniformly from a range and randomly perturbed.

        
    1. class GeneratorDistribution(object):
    2. def __init__(self, range):
    3. self.range = range
    4. def sample(self, N):
    5. Return np.linspace(-self.range, self.range, N) + np.random. Random (N) * 0.01

    Our generator and discriminator network is very simple. The generator uses a linear transformation, passed through a non-linear (a softplus function), followed by another linear transformation:

        
    1. def generator(input, hidden_size):
    2. h0 = tf.nn.softplus(linear(input, hidden_size, 'g0'))
    3. h1 = linear(h0, 1, 'g1')
    4. return h1

    In this case, we found that the discriminator had to be better than the generator, otherwise it wouldn’t be able to tell the sample from the correct source. So, we use a much deeper network, very high dimensional. Use TANH nonlinearities except for the final layer. The final layer is the sigmoID function.

        
    1. def discriminator(input, hidden_size):
    2. h0 = tf.tanh(linear(input, hidden_size * 2, 'd0'))
    3. h1 = tf.tanh(linear(h0, hidden_size * 2, 'd1'))
    4. h2 = tf.tanh(linear(h1, hidden_size * 2, 'd2'))
    5. h3 = tf.sigmoid(linear(h2, 1, 'd3'))
    6. return h3

    Let’s connect these pieces using TensorFlow diagrams. At the same time, we also set a loss function for each layer of the network, the goal is that the generator can fool the discriminator.

        
    1. with tf.variable_scope('G'):
    2. z = tf.placeholder(tf.float32, shape=(None, 1))
    3. G = generator(z, hidden_size)
    4. with tf.variable_scope('D') as scope:
    5. x = tf.placeholder(tf.float32, shape=(None, 1))
    6. D1 = discriminator(x, hidden_size)
    7. scope.reuse_variables()
    8. D2 = discriminator(G, hidden_size)
    9. loss_d = tf.reduce_mean(-tf.log(D1) - tf.log(1 - D2))
    10. loss_g = tf.reduce_mean(-tf.log(D2))

    We used the GradientDescentOptimizer in TensorFlow to optimize each layer of the network. We should note that finding good optimization parameters requires tuning parameters.

        
    1. def optimizer(loss, var_list):
    2. Initial_learning_rate = 0.005
    3. Decay = 0.95
    4. num_decay_steps = 150
    5. batch = tf.Variable(0)
    6. learning_rate = tf.train.exponential_decay(
    7. initial_learning_rate,
    8. batch,
    9. num_decay_steps,
    10. decay,
    11. staircase=True
    12. )
    13. optimizer = GradientDescentOptimizer(learning_rate).minimize(
    14. loss,
    15. global_step=batch,
    16. var_list=var_list
    17. )
    18. return optimizer
    19. vars = tf.trainable_variables()
    20. d_params = [v for v in vars if v.name.startswith('D/')]
    21. g_params = [v for v in vars if v.name.startswith('G/')]
    22. opt_d = optimizer(loss_d, d_params)
    23. opt_g = optimizer(loss_g, g_params)

    To train the model, we extract some data from the data distribution and the noise distribution, and switch back and forth between optimizing generator parameters and discriminator parameters.

        
    1. with tf.Session() as session:
    2. tf.initialize_all_variables().run()
    3. for step in xrange(num_steps):
    4. # update discriminator
    5. x = data.sample(batch_size)
    6. z = gen.sample(batch_size)
    7. session.run([loss_d, opt_d], {
    8. x: np.reshape(x, (batch_size, 1)),
    9. z: np.reshape(z, (batch_size, 1))
    10. })
    11. # update generator
    12. z = gen.sample(batch_size)
    13. session.run([loss_g, opt_g], {
    14. z: np.reshape(z, (batch_size, 1))
    15. })

    Be/mobnwr-U8PC Here we can see that at the beginning the generator produced results that were quite different from the real data. After many iterations (about 750 iterations) you get close to the true distribution. But until it converges, it optimizes around the mean of the input distribution. The final training results are shown in the figure below:

    The rationale behind this is also easy to understand. The discriminator looks at the real data from a separate sample and the generator generates the data. The generator can fool the discriminator if it can produce something near the mean of the real data.

    There are many ways to solve this problem. In this example, we could add some sort of early stop sorting, stopping training when the similarity between the two sections reaches a threshold. However, it is difficult to have a more general approach to a more complex problem. Even in this simple example, it is difficult to guarantee that the distribution of generators will reach some extent at early stops. The best way is that the discriminator can detect multiple samples at the same time.

    Improve sample diversity

    According to recent work by Tim Salimans and his colleagues at OpenAI, one of the main problems with GAN is that the generator can collapse under certain parameters and output a less-than-good distribution. They came up with a solution: run a discriminator to look at multiple samples at once, called minibatch discrimination.

    In this article, minibatch discrimination is defined as any method in which a discriminator can simultaneously detect all samples and determine which are generated from a generator and which are real samples. They also came up with a concrete method for batch modeling the distance between a given sample and other samples. These distances are then joined with the original sample and passed to the discriminator, so that it uses both samples and distances in its classification.

    This approach can be summarised simply as follows:

    • Takes out some output from the middle layer of the discriminator.
    • A matrix is obtained by multiplying a 3D tensor (of size num_kernels x kernel_DIM in the code below).
    • Calculate the L1 distance between rows in this matrix and apply it to a negative exponent.
    • The minibatch characteristic of a sample is the sum of these exponential distances.
    • The original input is concatenated to the minibatch layer using the newly created MiniBatch feature, and this is passed to the next input of the discriminator.

    In TensorFlow, it can become the following:

        
    1. def minibatch(input, num_kernels=5, kernel_dim=3):
    2. x = linear(input, num_kernels * kernel_dim)
    3. activation = tf.reshape(x, (-1, num_kernels, kernel_dim))
    4. diffs = tf.expand_dims(activation, 3) - tf.expand_dims(tf.transpose(activation, [1, 2, 0]), 0)
    5. abs_diffs = tf.reduce_sum(tf.abs(diffs), 2)
    6. minibatch_features = tf.reduce_sum(tf.exp(-abs_diffs), 2)
    7. return tf.concat(1, [input, minibatch_features])

    Be / 0r3G7-4bmyu The new training process is as follows (please bring your own ladder) : youtu.be/ 0r3G7-4bmyu Convergence is shown in the figure below:

    Finally, the Batch size is more important than the hyperparameter. In our case, we set it to be small (less than 16).

    Final thoughts

    Generative adversarial networks give us a whole new way to do unsupervised learning. Most of GANs’ successful applications have been in image recognition, but we are expanding our research into natural language processing. One important question is how to evaluate these models. In image recognition we can determine the quality of these models by looking at the generated images, although this is not a good method. In the textual world, this is useless. In the maximum likelihood based training model, we can generate metrics based on likelihood for unobserved data, but this is not appropriate here. From the resulting samples, there are some papers that produce GAN based on kernel density estimation. But it’s not appropriate in high-dimensional data. Another solution is to evaluate based on some subsequent task (such as sorting).


    Some discussion about GAN

    Finally, we offer some discussion about GAN:

    An introduction to Generative Adversarial Networks (with code in TensorFlow) Ian Goodfellow’s explanation of the difficulty of using GAN in NLP tasks Introduction to the latest development of GAN from the perspective of adversarial Samples Ian Goodfellow’s report on GAN in NIPS2016


    Alipay scans code to reward