Image Completion with Deep Learning in TensorFlow by Brandon Amos MOLLY && Cold Xiaoyang ([email protected]) Time: April 2017. Reference: blog.csdn.net/han_xiaoyan… Disclaimer: All rights reserved. Please contact the author and indicate

  • Introduction to the
  • Step 1: Understand the image as a sample of a probability distribution

    • How do you fill in the blanks?
    • But how do you start counting? These are all images.
    • So how do we complete the image?
  • Step 2: Quickly generate fake images

    • In the case of unknown probability distribution, learning generates new samples
    • [ML-heavy] Generative Adversarial Net (GAN) architecture
    • Pseudo image is generated using G(z)
    • DCGAN/ML – Heavy training
    • Existing GAN and DCGAN implementations
    • [ML-Heavy] Build DCGANs on Tensorflow
    • Run DCGAN on the photo gallery
  • Step 3: find the best pseudo image for image completion

    • DCGAN is used for image completion
    • Loss function of projection from [ML-heavy] to PGPG
    • [ML-Heavy] uses Tensorflow to complete DCGAN images
    • The completion of image
  • conclusion

Introduction to the

Content-aware fill is a powerful tool that designers and photographers can use to fill in unwanted or missing parts of an image. Image completion and restoration are two closely related techniques when filling missing or damaged parts of an image. There are many ways to implement content recognition padding, image completion and repair. In this blog post, I will introduce a paper by Raymond Yeh and Chen Chen et al, “Semantic Image Inpainting with Perceptual and Contextual Losses.” The paper, published on arXiv on July 26, 2016, describes how to use DCGAN network for image completion. The post is intended for readers with a general technical background, some of which requires a machine learning background. I’ve tagged the relevant chapters [ML-heavy], so you can skip these chapters if you don’t want to get into too much detail. We will only cover the case of filling in the missing part of the face image. The Tensorflow code has been posted on GitHub: Bamos /dcgan-completion. Tensorflow. Image completion is divided into three steps.

  • First we think of the image as a sample of a probability distribution.
  • With this understanding, learn how to generate fake images.
  • Then we find the best fake image to fill back in.



Use Photoshop to complete the missing part of the image



Use Photoshop to automatically delete unwanted parts



Image completion is described below. The center of the image is automatically generated. Source code can be downloaded fromhereDownload.

These images are a random sample that I took from the LFW dataset.

Step 1: Understand the image as a sample of a probability distribution

How do you fill in the blanks?

In the example above, imagine that you are building a system that can fill in the missing pieces. What would you do? What do you think the human brain does? What kind of information did you use? In this post, we will focus on two kinds of information: contextual information: You can infer the missing pixels by looking at the surrounding pixels. Perceptual information: You fill in the “normal” parts, such as what you see in real life or other pictures. Both are important. How do you know which one to fill in without context information? Without perceptual information, an infinite number of possibilities can be generated from the same context. Some pictures where machine learning systems look “normal”, humans may not look normal. It would be nice if there were an exact, intuitive algorithm that could capture the two properties mentioned in the previous image completion steps introduction. It is feasible to construct such an algorithm for certain cases. But there is no general method. The best solution so far is to use statistics and machine learning to come up with an approximation.

But how do you start counting? These are all images.

To get you thinking, let’s start with a probability distribution that’s well understood and can be written succinctly: a normal distribution. This is the probability density function of the normal distribution (PDF). You can think of PDF as moving across the input space, with the vertical axis representing the probability of a value occurring. (if you are interested, may draw the picture of the code from bamos/dcgan – completion. Tensorflow: simple – distributions. Py download.)

So if you sample this distribution, you get some data. What needs to be clear is the connection between the PDF and the sample.



Sampling from a normal distribution



PDF and sampling of 2d images. PDF is represented by a contour map with sample points drawn above.

This is a 1-dimensional distribution, because the input can only go along one dimension. You can do the same thing in two dimensions. The key connection between images and statistics is that we can think of an image as a sample taken from a high-dimensional probability distribution. The probability distribution corresponds to the pixels of the image. Imagine you’re taking a picture with a camera. The resulting image is composed of a finite number of pixels. When you take a picture with your camera, you’re sampling from this complex probability distribution. This probability distribution determines whether an image is normal or abnormal. For pictures, unlike a normal distribution, we don’t know the true probability distribution, we just collect samples. In this article, we will use color images, which are represented by RGB colors. Our image is 64 pixels wide and 64 pixels high, so our probability distribution is 64⋅64⋅3≈ 12K dimensions.

So how do we complete the image?

First consider the multivariate normal distribution, just to get some inspiration. Given x=1, what is the most likely value of y? We can fix the value of x and then find the y that maximizes PDF.



In a multidimensional normal distribution, given x, you get the highest possible y

This concept can be naturally generalized to image probability distributions. We know some values and we want to complete the missing values. This can be simply understood as a maximization problem. We search for all possible missing values, and the image used for completion is the most likely value. From the sample of the normal distribution, just from the sample, we can get the PDF. Just pick the statistical model you like and fit the data. However, we don’t actually use this method. For simple distributions, PDFS are easy to come by. But for more complex image distribution, it is very difficult to deal with. Part of the complexity is due to complex conditional dependencies: the value of one pixel depends on the value of the other pixels in the image. Also, maximizing a generic PDF is a very difficult and tricky non-convex optimization problem.

Step 2: Quickly generate fake images

In the case of unknown probability distribution, learning generates new samples

In addition to learning how to calculate PDFS, another mature idea in statistics is learning how to use generative models to generate new (random) samples. Generative models are often difficult to train and process, but the deep learning community has made a surprising breakthrough in this area. Yann LeCun has a great discussion on how to train for model generation in this Quora answer, calling it the most interesting idea in machine learning in the last 10 years.



Yann LeCun’s introduction to generative adversarial networks



Think of generative adversarial networks as arcade games. The two networks work against each other and advance together. It’s like two humans playing against each other in a game.

Other deep learning methods, such as Variational Autoencoders(VAEs), can also be used to train generative models. In this blog post, we use Generative Adversarial Nets (GANs).

[ML-heavy] Generative Adversarial Net (GAN) architecture

This idea is based on a landmark paper “Generative Adversarial” presented by Ian Goodfellow et al at the Neural Information Processing Systems (NIPS) Symposium in 2014 Nets, GANs). The idea is that we define a simple, commonly used distribution, called P and z. In the following, we use p and z to represent the uniform distribution on the closed interval from -1 to 1. We’ll call a sample from one of the distributions z ~ p z. If p and z are five-dimensional, we can sample them with a line of Python numpy code:

Z = NP. Random. Uniform (-1, 1, 5) array([0.77356483, 0.95258473, -0.18345086, 0.69224724, -0.34718733])Copy the code

Now that we have a simple distribution for sampling, we define a function G(z) to sample from our original probability distribution.

def G(z):
   ...
   return imageSample
z = np.random.uniform(-1, 1, 5)
imageSample = G(z)
Copy the code

So how do we define G of z so that it takes in a vector and it takes out a picture? We’re going to use a deep neural network. There are many tutorials on the basics of neural networks, so I won’t cover them here. I recommend some good references, such as Stanford CS231n course, Deep Learning Book by Ian Goodfellow, Image Kernels Explained, And convolution arithmetic Guide.

There are many ways to construct a G(z) based on deep learning. The original GAN paper presented an idea, a training process, and preliminary experimental results. This idea has been greatly expanded, One idea is described in the paper “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial networks Networks) by Alec Radford, Luke Metz, and Soumith Chintala, So, this is the International Conference on Learning Representations (ICLR). This paper proposes deep convolution GANS (called DCGANs), which uses microstep covolution to up-sample an image.

So what is microstep coiling and how does it upsample an image? Vincent Dumoulin and Francesco Visin’s”Deep learning convolution operations guide(A guide to convolution arithmetic for deep learning) “andConvolution operationThe project is a great introduction to convolution operations in deep learning. The legend is a great way to get an intuitive understanding of how microstep coiling works. First, make sure you understand how general convolution slides the kernel through the input space (blue) to the output space (green). Here, the output is smaller than the input. (If you do not understand, refer toCS231n CNN section 或 the convolution arithmetic guide)



The convolution operation is shown in blue as the input and green as the output.

Next, suppose you have a 3X3 input. Our goal is to do upsample so that we get a larger output. You can think of microstep coiling as magnifying the input image and inserting zeros between the pixels. The convolution operation is then performed on this magnified image, resulting in a larger output. Here, the output is 5X5.



Illustration of microstep coiling operation, blue is the input, green is the output.

Note: The up-sampled convolution layer has many names: Full convolution, in-network upsampling, Fractionally strided convolution, Backwards convolution, deconvolution, upconvolution, or transposed convolution. It is highly discouraged to use the term “deconvolution” because it already has a different meaning: in certain mathematical operations, as well as in other applications of computer vision, the term has an entirely different meaning.

Now we have the microstep covolution structure, and we can get the representation of G(z), with a vector Z ~ p z as input, and an RGB image of 64x64x3 output.



A way to construct generators using DCGAN. The image fromDCGAN paper.

The DCGAN paper also presents other techniques and adjustments in training DCGANs, such as Batch normalization and Leaky RELUs.

Pseudo image is generated using G(z)

Let’s stop and appreciate how powerful G(z) is. The DCGAN paper shows what DCGAN looks like when trained on a bedroom data set. G(z) can then give the following pseudo-image of what the generator thinks the bedroom should look like. The following images are not in the original data set.



Alternatively, you can do algebra in the input space Z. Below is a network that generates faces.



Face algebra DCGAN based on DCGAN paper.

DCGAN/ML – Heavy training

Now we’ve defined G(z), and we’ve seen how powerful it is. So how do we train it? We have a lot of unknown variables (parameters) and need to find them. At this point, we use adversarial networks. First we have to define some symbols. The probability distribution of the data (unknown) is denoted as p d ata. So G(z), where z ~ p z, can be interpreted as sampling from a probability distribution. Let’s call this probability distribution P, g.

A probability distribution

The discriminator network D(x) inputs image x and returns the probability that image X is sampled from the distribution of P D ATA. In theory, the discriminator outputs a value close to 1 when the input image is sampled from p d at A, and a value close to 0 when the input image is a pseudo-image, such as one sampled from P g. In DCGANs, D(x) is a traditional convolutional neural network.



Discriminator convolutional neural network, image fromPaper on Image Restoration

The objectives of the training discriminator are:

  • 1. Maximize d (x) for each image of the real data distribution from X to P d at a.
  • 2. Minimize d (x) for each image that is not a real data distribution x ≁p d at a.

The training goal of the generator G(z) is to generate samples that can confuse D. The output is an image that can be used as input to the discriminator. So the generator wants to maximize D(G(z)), which is to minimize (1-d (G(z)), because D is a probability between 0 and 1.

The adversarial network is realized by the following min-max strategy. The mathematical expectation in the first item traverses the real data distribution, and the mathematical expectation in the second item traverses the sample in P z, i.e., G (z) ~ p G.

You can train them by using the gradient of this expression with respect to the parameters of D and G. We know how to quickly evaluate each part of this expression. The mathematical expectation can be estimated by a small batch of data of size M, and the inner maximization can be estimated by a k step gradient. It has been proved that k=1 is a more suitable value for training.

We use theta d for the parameters of the discriminator, and theta g for the parameters of the generator. The gradient of loss with θ D and θ g can be calculated by back propagation, since both D and G are composed of mature neural network modules. Here are the training strategies in GAN’s paper. In theory, after training, p g == p d at A. So G(z) can generate samples that follow the p d at A distribution.



GAN paperTraining algorithm based on

Existing GAN and DCGAN implementations

On Github, you can see a lot of great GAN and DCGAN implementations. Goodfeli/Adversarial: GAN paper author of Theano GAN implementation. Tqchen/MxNET-GAN: Unofficial MxNET GAN implementation. Newmu/ DCGAN_code: DCGAN paper author of Theano GAN implementation. Torch: Implementation of Torch DCGAN by Soumith Chintala, one of the authors of dcGAN paper. Carpedm20 / DCGAN-TensorFlow: Unofficial DCGAN implementation of Tensorflow. Openai/Improved -gan: The code behind OpenAI’s first paper. A large number of modifications were made on the basis of Carpedm20 / DCGAN-TensorFlow. Mattya/Chainer-DCgan: Unofficial Chainer DCGAN implementation. Jacobgil/Keras-DCGAN: Unofficial (unfinished) Keras DCGAN implementation.

We will construct the model based on Carpedm20 / DCGan-Tensorflow.

[ML-Heavy] Build DCGANs on Tensorflow

The implementation of this section is in my BAMOS/DCgan-completion. tensorflow Github library. I should emphasize that this part of the code comes from Taehoon Kim’s Carpedm20 / DCgan-TensorFlow. I used it in my own library so we can use it in the next section of image completion.

Most of the implementation code is in DCGAN, a Python class in Model.py. There are many benefits to putting everything into a single class, so that we can hold intermediate processes after training and load them later in use.

First we define the generator and discriminator structures. Linear, conv2d_transpose, conv2D, and Lrelu functions are defined in OPs.py.

def generator(self, z):
    self.z_, self.h0_w, self.h0_b = linear(z, self.gf_dim*8*4*4, 'g_h0_lin', with_w=True)

    self.h0 = tf.reshape(self.z_, [-1, 4, 4, self.gf_dim * 8])
    h0 = tf.nn.relu(self.g_bn0(self.h0))

    self.h1, self.h1_w, self.h1_b = conv2d_transpose(h0,
        [self.batch_size, 8, 8, self.gf_dim*4], name='g_h1', with_w=True)
    h1 = tf.nn.relu(self.g_bn1(self.h1))

    h2, self.h2_w, self.h2_b = conv2d_transpose(h1,
        [self.batch_size, 16, 16, self.gf_dim*2], name='g_h2', with_w=True)
    h2 = tf.nn.relu(self.g_bn2(h2))

    h3, self.h3_w, self.h3_b = conv2d_transpose(h2,
        [self.batch_size, 32, 32, self.gf_dim*1], name='g_h3', with_w=True)
    h3 = tf.nn.relu(self.g_bn3(h3))

    h4, self.h4_w, self.h4_b = conv2d_transpose(h3,
        [self.batch_size, 64, 64, 3], name='g_h4', with_w=True)

    return tf.nn.tanh(h4)

def discriminator(self, image, reuse=False):
    if reuse:
        tf.get_variable_scope().reuse_variables()

    h0 = lrelu(conv2d(image, self.df_dim, name='d_h0_conv'))
    h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim*2, name='d_h1_conv')))
    h2 = lrelu(self.d_bn2(conv2d(h1, self.df_dim*4, name='d_h2_conv')))
    h3 = lrelu(self.d_bn3(conv2d(h2, self.df_dim*8, name='d_h3_conv')))
    h4 = linear(tf.reshape(h3, [-1, 8192]), 1, 'd_h3_lin')

    return tf.nn.sigmoid(h4), h4
Copy the code

When we initialize this class, we will use these two functions to build the model. We need two discriminators that share (reuse) parameters. One for the small batch of images from the data distribution and the other for the small batch of images generated by the generator.

self.G = self.generator(self.z)
self.D, self.D_logits = self.discriminator(self.images)
self.D_, self.D_logits_ = self.discriminator(self.G, reuse=True)
Copy the code

Next, we define the loss function. Instead of summing, we use the cross entropy between the predicted value and the true value of D, because it’s easier to use. The discriminator expects the prediction to be 1 for all “true” data and 0 for all “fake” data generated by the generator. The generator wants the discriminator to predict 1 for both.

self.d_loss_real = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(self.D_logits,
                                            tf.ones_like(self.D)))
self.d_loss_fake = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(self.D_logits_,
                                            tf.zeros_like(self.D_)))
self.g_loss = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(self.D_logits_,
                                            tf.ones_like(self.D_)))
self.d_loss = self.d_loss_real + self.d_loss_fake
Copy the code

The variables for each model are aggregated so that they can be trained separately.

t_vars = tf.trainable_variables()

self.d_vars = [var for var in t_vars if 'd_' in var.name]
self.g_vars = [var for var in t_vars if 'g_' in var.name]
Copy the code

Now let’s start optimizing the parameters, using ADAM optimization. It is an adaptive non-convex optimization method, which is very competitive against SGD and generally does not require manual adjustment of learning rate, momentum, and other hyperparameters.

d_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
                  .minimize(self.d_loss, var_list=self.d_vars)
g_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
                  .minimize(self.g_loss, var_list=self.g_vars)
Copy the code

Now let’s walk through the data. At each iteration, we sampled a small batch of data and then used the optimizer to update the network. Interestingly, if G is updated only once, the loss of the discriminator does not become zero. Also, I think the last calls to d_loss_fake and d_loss_real did some unnecessary calculations because these values were already computed in d_optim and g_optim. As a link to Tensorflow, you can try to optimize this section and send PR to the original REPO.

for epoch in xrange(config.epoch):
    ...
    for idx in xrange(0, batch_idxs):
        batch_images = ...

        batch_z = np.random.uniform(-1, 1, [config.batch_size, self.z_dim]) \
                    .astype(np.float32)

        # Update D network
        _, summary_str = self.sess.run([d_optim, self.d_sum],
            feed_dict={ self.images: batch_images, self.z: batch_z })


        # Update G network
        _, summary_str = self.sess.run([g_optim, self.g_sum],
            feed_dict={ self.z: batch_z })


        # Run g_optim twice to make sure that d_loss does not go to zero (different from paper)
        _, summary_str = self.sess.run([g_optim, self.g_sum],
            feed_dict={ self.z: batch_z })


        errD_fake = self.d_loss_fake.eval({self.z: batch_z})
        errD_real = self.d_loss_real.eval({self.images: batch_images})
        errG = self.g_loss.eval({self.z: batch_z})
Copy the code

Done! Of course, the full code has more comments, which can be viewed in Model.py.

Run DCGAN on the photo gallery

If you skipped the previous section but want to run through the code, it’s available in the BamOS /dcgan-completion.tensorflow Github library. I want to emphasize again that this code comes from Taehoon Kim’s Carpedm20 / dcgan-TensorFlow. We’re using my library here because it’s easier to go to the next step. Warning, if you don’t have a CUDA-enabled GPU, training on this part of the network will be very slow.

Please inform me if the following part fails to be executed

First, clone my Bamos /dcgan-completion. tensorFlow Github library and OpenFace locally. We will use the Python-only part of OpenFace for image preprocessing. Don’t worry, you don’t need to install OpenFace’s Torch dependency. Create a new directory, clone the repository below.

git clone https://github.com/cmusatyalab/openface.git
git clone https://github.com/bamos/dcgan-completion.tensorflow.git
Copy the code

Next, install OpenCV and python2-enabled Dlib. If you’re interested, try implementing Dlib support for Python3. There are a few tips for installing it. I wrote some notes in the OpenFace Setup Guide, including which version I installed and how to install it. Next, install OpenFace’s Python library so we can preprocess the image. If you are not using a virtual environment, you will need to use sudo for global installation when running setup.py. (If this part is too difficult for you, you can also use OpenFace’s Docker installation.)

Download a face image dataset below. It doesn’t matter if there’s a label in the dataset, we’ll delete it. The incomplete list is as follows: MS-Celeb-1M, CelebA, Casia-Webface, FaceScrub, LFW, and MegaFace. Put pictures in the catalogue dcgan – completion. Tensorflow/data/your – the dataset/raw, indicates that it is a data set of the original data.

Now we use OpenFace’s Alignment tool to preprocess the image to 64X64 data.

./openface/util/align-dlib.py data/dcgan-completion.tensorflow/data/your-dataset/raw align innerEyesAndBottomLip data/dcgan-completion.tensorflow/data/your-dataset/aligned --size 64
Copy the code

Finally, we flatten the directory of processed images so that there are only images in the directory and no subfolders.

cd dcgan-completion.tensorflow/data/your-dataset/aligned find . -name '*.png' -exec mv {} . \; find . -type d -empty -delete cd .. /.. /..Copy the code

Now we can train DCGAN. Install Tensorflow and start training.

./train-dcgan.py --dataset ./data/your-dataset/aligned --epoch 20
Copy the code

You can see what a random sample image from the generator looks like in the sample folder. I trained on the Casia-Webface dataset and the FaceScrub dataset because I had those two datasets at hand. After 14 rounds of training, my sample looks like this.



Samples of DCGAN after 14 rounds of training on Casia-Webface and FaceScrub

You can also view the Tensorflow image on TensorBoard, as well as the loss function.

tensorboard --logdir ./logs
Copy the code



TensorBoard loss visualization. Update in real time during training.



TensorBoard visualization for DCGAN networks

Step 3: find the best pseudo image for image completion

DCGAN is used for image completion

Now that we have the discriminator D(x) and the generator G(z), how can we use it for image completion? What I want to introduce in this chapter is a paper by Raymond Yeh and Chen Chen et al, “Semantic Image Inpainting with Perceptual and Contextual Losses.” The paper was posted on arXiv on July 26, 2016.

For image completion of a certain image y, a reasonable but not feasible solution is to maximize D(y) for the missing pixels. The result is neither a data distribution (PDATA) nor a generation distribution (PDG). What we want is to project y onto the generation distribution.



(a): An ideal reconstruction of y that generates the distribution (blue surface). (b) : A failed example of trying to reconstruct Y by maximizing D(y). The image fromPaper on Image repair.

Loss function of projection from [ML-heavy] to P g

To give a reasonable definition of the projection, we first define some symbols for the image completion. We use a binary mask, M(mask), which has only 0 and 1 values. A value of 1 means this part of the image we want to keep, and a value of 0 means this part of the image we want to complete. Now we can define how to complete y given a binary mask M. Multiply the elements in y by the elements in M. The product of the positional elements of two matrices is also called the Hadamard product, denoted by M ⊙y. M ⊙ Y represents the original part of the image.



Binary mask legend

Next, assume that we have found a z ̂ n, and a reasonable G (z ̂ n) can be generated for refactoring the missing value. Assume that pixels (1−M)⊙G (Z) can be added to the original pixels to obtain the reconstituted image:

Now all we have to do is find a kind of G (z) ̂ ndom that is suitable for completing the image. To find Z ̂ ns, let’s review the context and perceptions mentioned at the beginning of this article as the context of DCGANs. To this end, we define the loss function for any z ~ p z. The smaller the loss function, the more likely z may be.

Context loss: In order to obtain the same context as the input image, it is necessary to ensure that G (z) at the corresponding position of the known pixel y is as similar as possible. So, when the output of G(z) is not similar to the image of y’s known position, G(z) needs to be punished. To do this, we subtract the corresponding pixels in y from G(z) and get the degree to which they are dissimilar:



norm
norm

Ideally, the y and G(z) pixels of the given part are the same. Is for pixel I known position, | | M even G (z) – I M even y I | | = 0,  c ont the ex tu al (z) = 0.

Perceptual loss: In order to reconstruct a reality-looking image, you need to ensure that the discriminator determines that the image looks real. To do this, we performed the same steps as we did in training DCGAN.

Finally, if you combine contextual loss and perceived loss, you can find Z ̂ ns;

λ is a hyperparameter, which is used to control the importance of context loss compared with perception loss. (I used the default λ =0.1 and didn’t delve too deeply into this value.) Then, as mentioned earlier, G(z) is used to reconstruct the missing parts of Y.



poisson blending

[ML-Heavy] uses Tensorflow to complete DCGAN images

This chapter presents my modification of Taehoon Kim’s Carpedm20 / DCGan-TensorFlow code for image completion.

self.mask = tf.placeholder(tf.float32, [None] + self.image_shape, name='mask')
Copy the code

By gradient descent of Gradient ∇ Z  (z), we can iteratively find arg min z  (z). After we define the loss function, automatic differentiation of Tensorflow can automatically calculate this value for us! So, a complete DCGans-based implementation can be accomplished by adding four lines of Tensorflow code to an existing DCGAN implementation. (Of course, some non-TensorFlow code is required to implement it.)

self.contextual_loss = tf.reduce_sum(
tf.contrib.layers.flatten(
    tf.abs(tf.mul(self.mask, self.G) - tf.mul(self.mask, self.images))), 1)
self.perceptual_loss = self.g_loss
self.complete_loss = self.contextual_loss + self.lam*self.perceptual_loss
self.grad_complete_loss = tf.gradients(self.complete_loss, self.z)
Copy the code

Next, we define the mask. I just add one in the middle of the image, you can add something else, like a random mask, and make a pull request.

if config.maskType == 'center': Scale = 0.25 assert(scale <= 0.5) mask = Np.ones (self.image_shape) l = int(self.image_size*scale) u = Int (self image_size * (1.0 - scale)) mask [l, u, l, u, :] = 0.0Copy the code

In terms of gradient descent, we use small-batch, momentum-containing projection gradient descent for the projection of Z on [-1, 1].

for idx in xrange(0, batch_idxs):
    batch_images = ...
    batch_mask = np.resize(mask, [self.batch_size] + self.image_shape)
    zhats = np.random.uniform(-1, 1, size=(self.batch_size, self.z_dim))

    v = 0
    for i in xrange(config.nIter):
        fd = {
            self.z: zhats,
            self.mask: batch_mask,
            self.images: batch_images,
        }
        run = [self.complete_loss, self.grad_complete_loss, self.G]
        loss, g, G_imgs = self.sess.run(run, feed_dict=fd)

        v_prev = np.copy(v)
        v = config.momentum*v - config.lr*g[0]
        zhats += -config.momentum * v_prev + (1+config.momentum)*v
        zhats = np.clip(zhats, -1, 1)
Copy the code

The completion of image

Choose some pictures for image completion, place them into dcgan – completion. Tensorflow/your – test – data/raw. Then like before dcgan – completion. Tensorflow/your that order – test – data/aligned. So I’m going to pull some random images from LFW. My DCGAN is not trained using images from LFW.

You can complete the image like this:

./complete.py ./data/your-test-data/aligned/* --outDir outputImages
Copy the code

This code generates the image and periodically outputs it in the -outdir folder. You can use ImageMagick to generate a GIF:

cd outputImages
convert -delay 10 -loop 0 completed/*.png completion.gif
Copy the code



Final image completion. The center of the image is automatically generated. The source code is downloaded here. This is a random sample that I picked from the LFW.

conclusion

Thanks for reading, now we’ve made it! In this paper, we cover a method of image completion:

1. The image is understood as a probability distribution. 2. Generate pseudo images. 3. Find the best pseudo image for completion.

My example is a human face, but DCGANs can also be used on other types of images. In general, GANs training is difficult, and we don’t yet know how to train on a particular kind of object, or how to train on large images. However, it’s a promising model, and I’m excited to see what GAN will do for us!