This article is the third of my “AI painting series”. Click here to view the whole series of articles. The source of this article can be obtained by replying to “Image Style Transfer” in the background of wechat public number “01 binary”

Introduction to the

The so-called image style transfer refers to the technology of fusing the content of A content picture A with the style of A style picture B, so as to generate A picture C with the style of A and the content of B. At present, this technology has been widely used. Here is an app called “Great Painter”, which can automatically transform users’ photos into images with the style of an artist.

To prepare

In fact, AT the beginning of writing this article, I was prepared to introduce the principle in detail, but later I found that there were too many formulas, even if I wrote it, few people would read it, and the users of this article were newcomers who only need to implement functions. In addition, there are so many blogs that explain the principles of style migration, so I will focus on how to implement a fast style migration application using TensorFlow instead of explaining the principles. If you just want to do this you can skip to the “Run” section.

The first step is to install TensorFlow in advance. If you have a GPU, please refer to my article on how to build a GPU environment: AI Drawing first shot — Using the GPU to speed up your training process. If you want to run TensorFlow directly on the CPU, just execute the following line

pip install numpy tensorflow scipy 
Copy the code

The principle of

This article is based on A Neural Algorithm of Artistic Style. It is A Neural network Algorithm for Artistic Style transfer.

In order to fuse the style of the style diagram and the content of the content diagram, the generated image should be as close to the content diagram and the style diagram as possible in terms of content and style. Therefore, the content loss function and style loss function need to be defined and used as the total loss function after weighting.

Pretraining model

CNN has the ability of abstraction and understanding of image, therefore the output of the convolution layers can be considered as the content of the image, we adopted here using VGG19 trained model to study migration, it is generally believed that convolution of the neural network training is the extraction process of step by step, the feature of data set from the characteristics of simple, to the complex characteristics. Training is good model to study the image feature extraction method, and the model is in imagenet model in the process of the training data set, so, in theory, can also be directly used for extracting features of other images, although the effect may not be trained on the original data set model is good, but can save a lot of training time, Very useful in certain situations.

Load the pre-training model

def vggnet(self):
    # Read the pre-trained VGG model
    vgg = scipy.io.loadmat(settings.VGG_MODEL_PATH)
    vgg_layers = vgg['layers'] [0]
    net = {}
    The convolutional layer and pooling layer of VGG network are constructed by using pre-trained model parameters
    The full connection layer is not required
    # Note that all arguments except input are constant
    # Unlike usual, we don't train the VGG parameters, they stay the same
    # What we need to train is the input, which is the final image we generate
    net['input'] = tf.Variable(np.zeros([1, settings.IMAGE_HEIGHT, settings.IMAGE_WIDTH, 3]), dtype=tf.float32)
    # The number of layers corresponding to parameters can be referred to the VGG model diagram
    net['conv1_1'] = self.conv_relu(net['input'], self.get_wb(vgg_layers, 0))
    net['conv1_2'] = self.conv_relu(net['conv1_1'], self.get_wb(vgg_layers, 2))
    net['pool1'] = self.pool(net['conv1_2'])
    net['conv2_1'] = self.conv_relu(net['pool1'], self.get_wb(vgg_layers, 5))
    net['conv2_2'] = self.conv_relu(net['conv2_1'], self.get_wb(vgg_layers, 7))
    net['pool2'] = self.pool(net['conv2_2'])
    net['conv3_1'] = self.conv_relu(net['pool2'], self.get_wb(vgg_layers, 10))
    net['conv3_2'] = self.conv_relu(net['conv3_1'], self.get_wb(vgg_layers, 12))
    net['conv3_3'] = self.conv_relu(net['conv3_2'], self.get_wb(vgg_layers, 14))
    net['conv3_4'] = self.conv_relu(net['conv3_3'], self.get_wb(vgg_layers, 16))
    net['pool3'] = self.pool(net['conv3_4'])
    net['conv4_1'] = self.conv_relu(net['pool3'], self.get_wb(vgg_layers, 19))
    net['conv4_2'] = self.conv_relu(net['conv4_1'], self.get_wb(vgg_layers, 21))
    net['conv4_3'] = self.conv_relu(net['conv4_2'], self.get_wb(vgg_layers, 23))
    net['conv4_4'] = self.conv_relu(net['conv4_3'], self.get_wb(vgg_layers, 25))
    net['pool4'] = self.pool(net['conv4_4'])
    net['conv5_1'] = self.conv_relu(net['pool4'], self.get_wb(vgg_layers, 28))
    net['conv5_2'] = self.conv_relu(net['conv5_1'], self.get_wb(vgg_layers, 30))
    net['conv5_3'] = self.conv_relu(net['conv5_2'], self.get_wb(vgg_layers, 32))
    net['conv5_4'] = self.conv_relu(net['conv5_3'], self.get_wb(vgg_layers, 34))
    net['pool5'] = self.pool(net['conv5_4'])
    return net
Copy the code

Training of thinking

We use the output of several layers in VGG to represent the content and style characteristics of the image. For example, I use [‘ conv4_2 ‘, ‘conv5_2’] said content characteristics, use [‘ conv1_1 ‘, ‘conv2_1’, ‘conv3_1’, ‘conv4_1’] said style characteristics. Do this in settings.py.

# define a list of VGG layer names and corresponding weights to calculate content loss
CONTENT_LOSS_LAYERS = [('conv4_2'.0.5), ('conv5_2'.0.5)]
# define a list of VGG layer names and corresponding weights to calculate style loss
STYLE_LOSS_LAYERS = [('conv1_1'.0.2), ('conv2_1'.0.2), ('conv3_1'.0.2), ('conv4_1'.0.2), ('conv5_1'.0.2)]
Copy the code

Content loss function

Where, X is the characteristic matrix of the noise image, and P is the characteristic matrix of the content image. M is the length times the width of P, and N is the number of channels. The final content loss is the weighted sum of the content loss of each layer, and then average the number of layers.

I know many people get a headache when they see a mathematical formula, but the simple understanding is that this formula allows the model to continuously extract the content of the picture during training.

Style loss function

Calculate style loss. We use the GRAM matrix of the feature matrix of the style image at the specified layer to measure its style. The style loss can be defined as the L2 norm of the difference between the feature matrix of the style image and that of the noise image.

For the style loss function of each layer, we have:

Where M is the length * width of the eigenmatrix, and N is the number of channels of the eigenmatrix. G is the Gram matrix of noise image features, and A is the Gram matrix of style image features. The final style loss is the weighted sum of the style loss of each layer, and then average the number of layers.

Again, it doesn’t matter if you can’t read the formula, just think of it as a formula that can capture the image style during training.

Calculate the total loss function and train the model

Finally, we just need to put the content loss function and the style loss function into the original formula. What we need to do is to control some style weights and content weights:

# Content loss weight
ALPHA = 1
# Style loss weight
BETA = 500
Copy the code

The larger ALPHA is, the larger the image content information will be. Similarly, the larger the BETA, the more stylized the resulting image.

When the training starts, we generate a noise picture based on the content picture and noise. And feed the noise picture to the network, calculate the loss, and then adjust the noise picture according to the loss. Feed the adjusted picture to the network, recalculate the loss, adjust again, calculate again… Until the specified number of iterations is reached, at this point, the noise picture has both the content of the content picture and the style of the style picture, and can be saved.

run

If you are not interested in the above principle, want to directly run the code can go to wechat public number “01 binary” background reply “image style migration” to obtain the source code. Let’s talk about how to use this code to run out of your own painting.

First look at the project structure:

Output is the file generated during training. The.mat file is the pre-training model. Models. py is the file we implemented to read the pre-training model, settings.py is the configuration file and train.py is the final training file.

To run the project, we just need to execute Python train.py, and to change the style and content, we just need to replace the original image in the images file. You can also change the path in settings.py:

# Content image path
CONTENT_IMAGE = 'images/content.jpg'
# Style image path
STYLE_IMAGE = 'images/style.jpg'
# output image path
OUTPUT_IMAGE = 'output/output'
# Pre-trained VGG model path
VGG_MODEL_PATH = 'imagenet-vgg-verydeep-19.mat'
Copy the code

Let’s take a look at the post-training pictures:

The last

Although we can achieve image style transfer through the above code, but it has a biggest disadvantage, is unable to save the trained model, each style change has to run again, if using CPU to run 1000 rounds, about 30 minutes, so it is recommended to use GPU for training.

However, even with the use of GPU, the training time is not enough for commercial use, so is there any way to save the trained style model and directly generate target images quickly? Of course there are. Fei-fei Li of Stanford has published a paper called Perceptual Losses for Real-time Style Transfer and Super-Resolution, Using perceptual Loss to replace per-pixels Loss using pre-trained VGG model simplifies the original loss calculation and adds a Transform Network to directly generate the style of Content image. I won’t say more here, but for those who are interested, please refer to the following two links:

  1. The depth of interesting | 30 fast image style migration.
  2. Principles behind Style Transfer and TensorFlow Implementation

The above is the whole content of this article, I feel very interesting to do it, because of my limited ability, if there are mistakes in the article, please correct them, thank you very much!

reference

  1. A Neural Algorithm of Artistic Style
  2. Deep learning combat (A) fast understanding to achieve style transfer
  3. The depth of the interesting | 04 migration image style
  4. Study notes: Image style transfer

Welcome to follow my wechat public account: “01 binary”, after following you can get the computer data carefully collected by the blogger