Red Stone’s personal website: Redstonewill.com

Today, Red Stone takes you through a hot deep learning model: Generate Adversarial Network (GAN). GAN is very interesting, I will explain it in the most straightforward language, and finally implement a simple GAN program to help you understand.

1. What is GAN?

Ok, so GAN is so powerful, so what kind of model structure is it? The machine learning or neural network models that we’ve studied before can do two main things: prediction and classification, which we’re all familiar with. So can the machine model automatically generate a picture, a speech? And you can adjust different model input vectors to get specific images and sounds. For example, you can adjust the input parameters to get a face with red hair and blue eyes, you can adjust the input parameters to get a female voice clip, and so on. In other words, such machine models can automatically generate what we want on demand. Hence, GAN!

GAN, or Generative adversarial network, mainly consists of two modules: Generative Model and Discriminative Model. The generative model and discriminant model play with each other and learn to produce a good output. In the case of images, the generator’s main task is to learn from the real image set so that the images it generates are closer to the real image, in order to “fool” the discriminator. The main task of the discriminator is to find out the image generated by the generator, distinguish the difference between it and the real image, and distinguish the true and false. Throughout the iteration, the generator is constantly trying to make the generated image look more and more real, and the discriminator is constantly trying to figure out whether the image is real or fake. This is similar to the game between a generator and a discriminator, where, over iteration, the two eventually reach a balance: the generator produces a picture that is very close to the real one, and the discriminator has a hard time telling the difference. It is shown that the probability output of the discriminator is close to 0.5 for both true and false images.

Still a little confused about GAN? It doesn’t matter. Let me give you a vivid example.

Recently, Red Stone wanted to learn painting because he wanted to paint similar works after seeing van’s paintings. Master Van’s paintings look like this:

The red stone asked professor Wang, who has studied the works of Master Van for many years, to guide me. Professor Wang has a rich experience and a keen eye, which can not escape his eyes when imitating the paintings of Van Masters on the market. Professor Wang said a word to me: When your painting can fool me, you will be considered a success.

Red Stone was very excited and immediately drew this picture for Professor Wang:

Professor Wang glanced gently, his face black lines, the spirit of the straight trembling, “0 points! Is that a painting? Far from it!” After listening to professor Wang’s words, the red stone reflected on itself and did not draw well, even without eyes and nose. So he drew another one:

Professor Wang looked at it and dropped four words in less than two seconds: 1 minute! Heavy painting! He went back to study the style of Master Van’s paintings and kept improving and recreating them. One day, Red Stone took a new painting to Professor Wang:

Professor Wang took a look at it and said it was similar. I’ll have to take a closer look. In the end, tell me, no, no, the details are terrible! Let’s go ahead and redraw it. Alas, Professor Wang is becoming more and more strict! The red stone sighed and went back to continue his research. At last, he gave professor Wang a painting that satisfied him very much:

This time, Professor Wang wearing glasses, carefully analysis, after a long time, Professor Wang patted my shoulder and said, the painting is very good, I have not recognized the true and false. Ha ha, got the praise and affirmation of professor Wang, I felt so happy that I could finally create the painting works like Vatican Master. Consider a career change next.

All right, that’s it. This example is actually a GAN training process. The red stone is the generator, the purpose is to output a picture that can fool Professor Wang, so that professor Wang can not distinguish between true and false! Professor Wang is the discriminator, the purpose is to identify the red stone painting, judging it is fake! The whole process is a game of “generation-confrontation”, and in the end, the red Stone (generator) produces a painting so “fake” that even Professor Wang (discriminator) can’t tell the difference.

This is GAN, you get the idea.

2. Basic structure of GAN model

Before we get to know the GAN model, let’s take a look at Yann LeCun’s personal views on the future breakthroughs in deep learning:

The most important one, in my opinion, is adversarial training (also called GAN for Generative Adversarial Networks). This is an idea that was originally proposed by Ian Goodfellow when he was a student with Yoshua Bengio at the University of Montreal (he since moved to Google Brain and recently to OpenAI).

This, and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.

Yann LeCun believes that GAN is likely to bring new breakthroughs to deep learning models, and is the coolest idea in machine learning in 20 years. GAN has been gaining momentum in recent years. This graph shows the total number of papers submitted at the ICASSP conference in recent years with the keywords “generative”, “Adversarial” and “reinforcement”.

The data showed an explosion of generative papers containing the keywords “generative” and “adversarial” in 2018. It is not difficult to foresee more papers on GAN in the coming years.

Let’s take a look at the basic structure of GAN. We already know that GAN consists of a generator and a discriminator, represented by G and D respectively. Taking the image generation application as an example, its model structure is shown as follows:

The GAN basic model consists of input Vector, G network and D network. Among them, G and D are generally composed of neural networks. The output of G is a picture, but in fully connected form. The output of G is the input of D, and the input of D also contains the real sample set. In this way, D tries to output a higher score for the real sample and a lower score for the sample generated by G. In each iteration, G network continuously optimizes network parameters so that D cannot distinguish true from false. Meanwhile, NETWORK D is also constantly optimizing network parameters to improve the identification, so that there is a gap between the score of true and false samples.

Finally, after several training iterations, the GAN model is established:

In the final GAN model, the sample generated by G is indistinguishable from the real one, and the score output by D is close to 0.5, indicating that the real and false samples are difficult to distinguish and the training is successful.

Here, the focus is on input vector. What does the input vector do? In fact, each dimension in the input vector can represent some feature of the output image. For example, the first dimension of the vector can be used to change the color of the hair in the generated image, from red to black. Enter the second dimension of vector to adjust the skin color of the generated image; Input the third dimension of the vector to adjust the emotion of the generated image, and so on.

This is where the power of GAN comes in. By tweaking the input vector, you can generate images with different characteristics. The resulting images are not in the actual sample set, but are reasonable and unknown. Isn’t that interesting? The figure below shows how different vectors generate different images.

After the GAN model, let’s take a brief look at the algorithm principle of GAN. Since there are two modules: G and D, each module has a corresponding network parameter.

First, let’s take a look at module D, whose goal is to make the real sample score as large as possible and the sample score generated by G as small as possible. Then the loss function of D can be obtained as follows:

Where x is the real sample and G(z) is the sample generated by G. We want D(x) as big as possible, D(G(z)) as small as possible, so we want -d (x) as small as possible, minus log(1-D(G(z)) as small as possible. From the point of view of the loss function, we get this.

As for the G module, its goal is to generate a model that can get as high a score as possible in D. Then the loss function of G can be obtained as:

With the loss function known, the model can then be trained using various optimization algorithms.

3. Write a GAN model

Next, I’ll implement a simple GAN model using PyTorch. Again, using painting as an example, suppose we wanted to create the following “famous paintings” (using sine figures as an example) :

The code to generate this “art painting” is as follows:

def artist_works():    # painting from the famous artist (real target)R = 0.02 * np.random. Randn (1, ART_COMPONENTS) paintings = np.sin(PAINT_POINTS * np.pi) + r paintings = torch.from_numpy(paintings).float()return paintings
Copy the code

Then, G network and D network models are defined respectively:

G = nn.Sequential(                  # Generator
   nn.Linear(N_IDEAS, 128),        # random ideas (could from normal distribution)
   nn.ReLU(),
   nn.Linear(128, ART_COMPONENTS), # making a painting from these random ideas
)

D = nn.Sequential(                  # Discriminator
   nn.Linear(ART_COMPONENTS, 128), # receive art work either from the famous artist or a newbie like G
   nn.ReLU(),
   nn.Linear(128, 1),
   nn.Sigmoid(),                   # tell the probability that the art work is made by artist
)
Copy the code

We set Adam algorithm for optimization:

opt_D = torch.optim.Adam(D.parameters(), lr=LR_D)
opt_G = torch.optim.Adam(G.parameters(), lr=LR_G)
Copy the code

Finally, the GAN iterative training process is constructed:

plt.ion()    # something about continuous plotting

D_loss_history = []
G_loss_history = []
for step in range(10000):
   artist_paintings = artist_works()          # real painting from artist
   G_ideas = torch.randn(BATCH_SIZE, N_IDEAS) # random ideas
   G_paintings = G(G_ideas)                   # fake painting from G (random ideas)
   
   prob_artist0 = D(artist_paintings)         # D try to increase this prob
   prob_artist1 = D(G_paintings)              # D try to reduce this prob
   
   D_loss = - torch.mean(torch.log(prob_artist0) + torch.log(1. - prob_artist1))
   G_loss = torch.mean(torch.log(1. - prob_artist1))
   
   D_loss_history.append(D_loss)
   G_loss_history.append(G_loss)
   
   opt_D.zero_grad()
   D_loss.backward(retain_graph=True)    # reusing computational graph
   opt_D.step()
   
   opt_G.zero_grad()
   G_loss.backward()
   opt_G.step()
   
   if step % 50 == 0:  # plotting
       plt.cla()
       plt.plot(PAINT_POINTS[0], G_paintings.data.numpy()[0], c='#4AD631', lw=3, label='Generated painting',)
       plt.plot(PAINT_POINTS[0], np.sin(PAINT_POINTS[0] * np.pi), c='#74BCFF', lw=3, label='standard curve') PLT. Text (1, 0.75,'D accuracy=%.2f (0.5 for D to converge)' % prob_artist0.data.numpy().mean(), fontdict={'size': 8}) PLT. Text (1, 0.5,'D score= %.2f (-1.38 for G to converge)' % -D_loss.data.numpy(), fontdict={'size': 8}) plt.ylim((-1, 1)); plt.legend(loc='lower right', fontsize=10); plt.draw(); PLT. Pause (0.01) PLT. Ioff) (PLT) show ()Copy the code

I used dynamic drawing to observe the GAN model training at any time.

When the iteration number is 1:

When the number of iterations is 200:

When the number of iterations is 1000:

When the number of iterations is 10000:

Perfect! After 10,000 iterations, the resulting curve was pretty close to the standard curve. D’s score was close to 0.5 as expected.

The complete code has two versions of.py and.ipynb. I have put them on GitHub. If you need them, please click the link below.

GitHub-GAN