Abstract: GAN can be interpreted from another perspective, which is more intuitive from the perspective of probability distribution.

Compiled by Alireza Koochali McGL

GANs/Generative Adversarial Networks (GANs/Generative Adversarial Networks) is one of the hot topics in the FIELD of AI. In this article, we will look at GANs from a different perspective, not as a generator of beautiful images, but as a probability distribution transformation function. We will explore the core ideas of GAN without getting bogged down in implementation and complex math. Let’s start by analyzing the type of problem at hand. We then observe how the requirements of the solution shape GAN’s thinking.

Welcome to the amusement park

Suppose we have an amusement park. Inside we have a machine that puts in $10 and randomly returns a toy worth between $1 and $100. The machines are very popular with tourists because they win very cool and expensive toys from time to time. Besides, this machine is very profitable for us. Therefore, the toy selection logic of the machine hits the sweet spot, which ensures our and our customers’ satisfaction.Therefore, we want to add more machines to get more profit. However, there is a problem. This machine is super expensive. Therefore, we are interested in developing and manufacturing this machine ourselves. To do this, we need to figure out the machine’s toy-selection logic. Obviously, the key parameter in choosing a toy is its value. If a toy is expensive, it should be less likely to be selected to ensure our profit. However, if we drastically reduce the possibility of choosing expensive toys, it will lead to dissatisfaction among visitors. Therefore, our goal is to understand the probability distribution of toy values as accurately as possible. First of all, we have a list of the toys that the machine spit out in the past and the corresponding prices. We tried to study the distribution of toys. If the distribution resembles a famous probability distribution, the problem is solved. We use probability distributions as the core of our new machine’s toy selection logic. We take a sample from this distribution to determine which toy to return.

Complex machine, complex problem

However, if we encounter a complex spit toy distribution, we need to devise a way to understand the probability distribution of the generation process, given only samples of this distribution.In other words, we need a model to look at our data and figure out the machine logic. The key point is that “learning the probability distribution of data is the primary task of data generation.”

It’s generated by transformation

Let’s draw our goals abstractly. First, we have a set of data, which from now on we call real data. Our goal is to forge artificial data that looks like the real thing. Artificial data is often called fake data. Therefore, we need a model that inputs real data and generates realistic fake data. The goal is clear. Now we need to move from an abstract goal to a more detailed description of our task, preferably in relation to something we are more familiar with. To do that, we need to change the way we look at the problem. First, we need to be familiar with transformation functions. Let’s say we have a sample from a probability distribution. By applying a transformation function, we can transform these samples from their original distribution to the desired target distribution. In theory, we can convert from any source distribution to any target distribution. However, computing these conversion functions is not always analytically feasible.

Now, back to our question. We can redefine our generation problem as a transformation task. So let’s start with a given distribution. In general, we’ll pick a normal distribution with mean 0 and standard deviation 1. We call this distribution “potential space”. Now, we need to define a transformation function to transform the sample from the potential space to the data space. In other words, our transformation function takes samples from the potential space and outputs samples from the data space, that is, data points. Look! We generate data! There is only one problem, it is impossible to define this function analytically. However, shouldn’t we use neural networks to approximate complex functions that can’t be defined analytically? Yes, we know, and that’s exactly what we’re going to do. We use neural networks to approximate our transformation function. We call this neural network a “Generator” because it will eventually produce data. Very reasonable.Since we want to use a neural network, we need to define a loss function to train our neural network. Loss function is the key to correct training and realistic data generation. Therefore, we need to define it precisely according to our goals.

Discriminator: Very helpful

In general, the loss function evaluates how well our neural network is performing against our goals and provides feedback (in the form of gradients) to the model to improve itself. Here, we need a loss function to measure the degree to which the data we generate obeys the real data distribution. In other words, we need a loss function to tell us the true extent of the falsification of data. Still, we don’t have any information about the true distribution of data. This was our main problem from the beginning. However, we can achieve the same goal by distinguishing between real data and fake data.

Suppose our loss function can distinguish between real data and fake data.Then, we can supply our fake data to the function. We don’t need to do anything about fake samples that can’t be distinguished from real data. For other forged samples, the loss function will provide feedback to update and improve our generator.More specifically, we can use classifiers as loss functions to distinguish between true and false data. If the generated data point is classified as real data, it means that it resembles real data and we do not need to take any further action. For those identified as spurious samples that generate data, we ask the loss function how we should update our generator to make these samples look more realistic. The loss function provides the answer in the form of gradients to update the weights in our neural network.

Looks like we’ve found the last part of the solution! However, we have to deal with another problem. Although our proposed loss function meets our requirements, it is not simple to implement in practice. Therefore, our loss function is a complex function whose characteristics we can define, but not directly implement. It looks like a dead end. But what prevents us from using neural networks to approximate this loss function? Nothing! So, let’s get started. We can use a classifier neural network as our loss function. We call this network a “Discriminator” because it distinguishes between true and false data. Very wise naming.Most importantly, we are very familiar with the use of neural networks for classification. We know how to train them, their loss functions, and what their inputs and outputs should look like. However, it is not traditional to train two neural networks simultaneously. Now, the final question is how do we train all these networks together at the same time.

Let the Train go!

If we had a perfect classifier before we started training our generator, training would be very easy. Unfortunately, at the beginning of the training process, our discriminator was as clueless as our generator. To make matters worse, we can’t train the discriminator before we start training the generator, because we need dummy data to train the discriminator. As you can see, the training of the two networks depends on each other. The generator needs feedback from the discriminator to improve, and the discriminator needs to be updated as the generator improves. So we train them alternately. For a batch, we train the discriminator to classify true and false samples. Then for the other batch, we trained the generator to generate samples that were recognized as real by the discriminator. This method is called Adversarial Training. When we use adversarial training for data generation tasks, we get generative adversarial networks or GAN for short.

However, when we look at the training process, we do not see the adversary itself. To find out where the term “adversarial training” comes from, we should look closely at the goals of two networks. The goal of the discriminator is to classify true and false data as accurately as possible. Therefore, in the discriminator training stage, the discriminator tries to correctly identify false samples. On the other hand, we train the generator to generate realistic fake data. In order to pass the authenticity test, the generator should convince the discriminator that the data it generates is real. In other words, the generator tries to fool the discriminator, and the discriminator tries not to be fooled by the generator. These conflicting goals set the training process in motion.

In the course of training, the objectives of both networks were improved. Finally, at some point, the generator gets so good that the discriminator can’t tell the difference between fake data and real data, and our training is complete.

Trap. How is!

GANs is a beautiful and complex solution to a very difficult problem. With GAN, we have a fast, efficient, precise answer to this long-term question that paves the way for many exciting applications. However, before we get to the application, we should understand the common problems with GANs. First, a generator is a neural network, which by definition is a black box. When a trained generator empowers information about the distribution of real data into its weights, we cannot access it explicitly. When we work with low-dimensional data, we can retrieve this information by sampling, but for higher-dimensional data, we can do nothing. In addition, unlike other neural networks, the loss function of GANs provides little information about the training progress. During the training, we need to manually check the sample generator to check the training progress. Finally, as mentioned earlier, training takes place through a struggle between generators and discriminators. If they stop fighting each other, the training process will stop, and unfortunately they often stop fighting after a while. There are many reasons for this problem. For example, if one network improves much faster than the other, it can overwhelm the other and training will stop. Therefore, the network architecture should be balanced. But what does this balance mean? There is no clear answer to this question. In general, one should find them by trial and error. Therefore, the training process of GAN is quite unstable. There are many ways to solve the stability problem, but most of them solve one problem but introduce another, or need to meet some specific conditions. In short, improving the progress of GAN training is still an open question, and research around it is very active.

Verdict: Tip of the iceberg

Let’s go back to our expensive machine. In terms of time and resources, this machine symbolizes all the expensive data generation. Let’s say we have a moderately sized dataset of faces. For our application, we need a larger dataset. We can pick up our cameras, take pictures of people, and then add them to the data set. However, this is a time-consuming process. Now, if we train a GAN with available images, we can generate hundreds of images in seconds. Therefore, data enhancement technology is one of the most important applications of GANs.

Data scarcity is not the only motivation. Let’s go back to the face dataset. If we want to use these photos, we are likely to run into questions about privacy. But what if we use generated fake personas that don’t actually exist? Very good! No one is worried. So GANs provides a neat solution to the data privacy problem.

The GAN research community is very active right now, with new applications or improvements being proposed every day. Still, much remains to be discovered. This is just the tip of the iceberg.

“Source:” towardsdatascience.com/gans-a-diff…