Compile | AI technology base (rgznai100)


Participate in | s day, hu, pigeons






Oh, my God, these teenagers, how can we, who have been programmers for years, still eat grass, how can we… AI has one last bit of confidence left, and that is…





Get down to business…… That…






Do you know what a variational autocoder is?






Do you know why you need to know variational autocoding machines?






Do you know how to figure out the variational autocoding machine as fast as possible?





I’m not going to say anything. I’m going to let this 16-year-old genius tell you.





Kevin Frans is a junior high school student in Palo Alto, California, who has written two papers and is knowledgeable about generative models. He made his name with a project called deepcolor.





He is currently working as an intern at OpenAI on reinforcement learning.





This article is Kevin Frans with his own example to explain the variational self-coding machine, for the working principle of self-coding machine and variational self-coding machine, the advantages and disadvantages of using variational self-coding machine, he has done a special careful explanation, is to understand the variational self-coding machine is a rare good article.





Here, let’s take a look at the strength of this high school student:


I once talked about generative adversarial networks (GAN), a simple example of using it to generate realistic images.





There are, however, two major disadvantages to using GAN alone.





First, the image here is generated from some random noise. If you want to produce an image of a particular detail, there is no way to find their values other than to traverse the entire distribution of the initial noise points.





Second, generative adversarial model can only distinguish “true” and “false” images. You can’t force it to look like a cat. This creates a problem because the images it produces are not based on real objects, but on how they look in pictures, and are not particularly realistic in style.





How to solve these two problems?





In this article, I will introduce another kind of neural network, variational autocoding machine, to solve these two problems.





What is variational self – coding machine?


To understand VAE, we start with a simple network and add components step by step.





A common way to describe a neural network is to explain it as some approximation of the function we want to model. However, they can also be understood as some kind of data structure that stores information.





Suppose there is a neural network composed of several deconvolution layers, we set the input as a unit vector, and then train the network to reduce the mean square error between it and the target image. Thus, the “data” of the image is included in the current parameters of the neural network.








Now, let’s try this step out with multiple images. Instead of a unit vector, the input is a heat vector. For example, input [1, 0, 0, 0] might produce an image of a cat, whereas input [0, 1, 0, 0] might produce an image of a dog. This works, but we can only store a maximum of four images. Making the network remember more images means using longer vectors, which also means more and more parameters.





To do that, we need to use real vectors, not heat vectors. We can think of it as the encoding of an image, such as vector [3.3, 4.5, 2.1, 9.8] for the image of a cat and vector [3.4, 2.1, 6.7, 4.2] for the image of a dog, which is where the term encoding/decoding comes from. This initial vector is our potential variable.





Picking potential variables randomly, as I did before, is obviously a bad idea. In the auto-coding machine, we add a component that can automatically encode the original image into a vector. The above deconvolution layer can “decode” these vectors back to the original image.








Thus, our model finally reached a stage where it could be useful. We can train the network with as many images as we need. If the encoding vector of an image is saved, we can reconstruct the image with the decoding component at any time. The whole process requires only a standard self-encoding machine.





Here, however, we want to build a generative model, not just a fuzzy structure that “remembers” the image data. Other than coding potential vectors from existing images as we did earlier, we don’t yet know how to create these vectors, so we can’t generate any images out of thin air.





Here’s an easy way to do it. We add a constraint to the coding network that forces the potential vectors it generates to roughly obey the unit Gaussian distribution. This constraint makes variational autocoding machine different from standard autocoding machine.





Now it is easy to generate a new image: we simply sample a potential vector from the unit Gaussian distribution and pass it to the decoder.





In practice, we need to carefully balance the accuracy of the network with the fit of the potential variables in the unit Gaussian distribution.





The neural network can decide for itself what to do here. For the error terms, we induce two independent types: generation error, which is used to measure the accuracy of network reconstructed image; Potential error, the KL divergence used to measure the fit of potential variables on the unit Gaussian distribution.





generation_loss = mean(square(generated_image – real_image))


latent_loss = KL-Divergence(latent_variable, unit_gaussian)


loss = generation_loss + latent_loss


To optimize the KL divergence, we use a simple technique for reparameterization: generate a mean vector and a standard deviation vector instead of generating a real vector directly.








Our KL divergence calculation looks like this:





# z_mean and z_stddev are two vectors generated by encoder network


Latent_loss = 0.5 * tf.reduce_sum(tf.square(z_mean) + tf.square(z_stddev) -tf.log (tf.square(z_stddev)) -1,1)


When calculating the error of the decoding network, we simply take a sample from the standard deviation and add the mean vector to get our potential vector:





Samples = tf. Random_normal ([batchsize n_z], 0, 1, dtype = tf. Float32)


sampled_z = z_mean + (z_stddev * samples)


In addition to allowing us to generate random potential variables, this constraint also improves the generalization ability of VAE networks.





Figuratively, we can think of potential variables as transformation coefficients of data.





In the interval [0, 10], suppose you have a series of real-name pairs, with one real number representing the name of an object. For example, 5.43 represents an apple and 5.44 represents a banana. When someone gives you the number 5.43, you definitely know they’re talking about Apple. Essentially, an infinite amount of information can be encoded in this way, since there are, after all, an infinite number of real numbers between [0, 10].





However, what if every time someone tells you a new number, its Gaussian noise increases by one? For example, if you receive a number of 5.43 and the original number is somewhere between [4.4 and 6.4], the real number someone else says is 5.44 (bananas).





The larger the standard deviation of the noised points, the less information can be conveyed by the mean variable.





Using this same logic, we can pass potential variables between the encoder and decoder. The more efficiently we encode the original image, the larger the standard deviation we can sample on the Gaussian distribution, up to 1 (standard normal distribution).





This constraint forces the encoder to be very efficient, creating potentially informative variables. Its enhanced generalization capabilities allow us to generate potential variables randomly or encoded from untrained images that will yield better results when decoded.






How well do VAE do?





I did some tests on MNIST handwriting data sets, and it shows how well the variational autoencoder works.







Left: Generation 1, Middle: Generation 9, right: original imageCopy the code

It looks good! After running it for 15 minutes on my laptop without a graphics card, it produced some good MNIST results.






Advantages of VAE:





Because they follow a code-decode pattern, we can directly compare the resulting image to the original, which is impossible with gans.






Deficiencies of VAE:





Because it uses mean square error directly instead of adversarial network, its neural network tends to generate more fuzzy images.





There are also studies that need to incorporate VAE and GAN: using the same encoder-decoder configuration, but using adversarial networks to train the decoder.





Please refer to the paper for details


https://arxiv.org/pdf/1512.09300.pdf


http://blog.otoro.net/2016/04/01/generating-large-images-from-latent-vectors/


In this paper, the code


https://github.com/kvfrans/variational-autoencoder


https://jmetzen.github.io/2015-11-27/vae.html


The original link


http://kvfrans.com/variational-autoencoders-explained/