To see the illustrated content of the article, go to studyai.com/pytorch-1.4…

This tutorial describes how to implement a Neural-Style algorithm developed by Leon A.Gatys. Neural-style, or neural-Transfer, allows you to take a new artistic Style of image and representation of an image. The algorithm accepts input image, Content-image and style-image, and modifies the input to make it similar to the artistic style of the content and style image of the content image. content1

The underlying principle

The principle is simple: we define two distances, one for content (DC) and one for style (DS). DC measures how different the content is between two images, while DS

Measure how different the styles are between the two images. We then accept the third image as input and transform it to minimize its content distance from the content image and its style distance from the style image. Now we can import the necessary packages and start neural transfer. Import packages and select devices

The packages listed below are all used to implement neural Transfer.

Torch, torch. Nn, Numpy (PyTorch neural network essential software package) Torch. Optim (efficient gradient descent algorithm optimization package) PIL, PIL. Matplotlib.pyplot (packages for loading and displaying images) Torchvision. transforms (PIL images into tensors) Torchvision. models (training and loading pre-trained models) copy (Deep copy model; system package)Copy the code
from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from PIL import Image
import matplotlib.pyplot as plt

import torchvision.transforms as transforms
import torchvision.models as models

import copy
Copy the code

Next, we need to choose which device to run the network on and import the content and style images. Running the Neural Transfer algorithm on large images takes longer and is much faster on gpus. We can use torch.cuda.is_available() to check if a GPU is available. Next, we set up torch. Device to be used throughout the tutorial. In addition, the.to(device) method is used to move a tensor or module to the desired device.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Copy the code

Load the image

Now we will import the style image and the content image. The original PIL images had values between 0 and 255, but when converted to Torch Tensors, their values were converted to between 0 and 1. The image also needs to be resized to have the same size. One important detail to note is that the tensor value of the neural network in torch Library varies from 0 to 1. If you try to supply a tensor image with values from 0 to 255 to the network, the active feature map will not feel the expected content and style. However, the pre-trained network from Caffe’s library is trained to image tensors 0 to 255.

Note

Here are the download addresses for the two images used in this tutorial: Picasgrade.jpg and dancing.jpg. Download the two images and place them in a folder named Images in your current working directory.

The desired size of the output image
imsize = 512 if torch.cuda.is_available() else 128  # If you don't have a GPU, make it smaller

loader = transforms.Compose([
    transforms.Resize(imsize),  # Zoom the imported image
    transforms.ToTensor()])  # Convert it to Torch Tensor


def image_loader(image_name):
    image = Image.open(image_name)
    Virtual Batch dimension, in order to meet the latitude requirements of network input
    image = loader(image).unsqueeze(0)
    return image.to(device, torch.float)


style_img = image_loader("./data/images/neural-style/picasso.jpg")
content_img = image_loader("./data/images/neural-style/dancing.jpg")

assert style_img.size() == content_img.size(), \
    "we need to import style and content images of the same size"
Copy the code

Now, let’s create a function that displays the image by converting a copy of the image to PIL format and displaying the copy using plt.imshow. We will try to display the content image and the style image to ensure that they are imported correctly.

unloader = transforms.ToPILImage()  # Convert to PIL Image again

plt.ion()

def imshow(tensor, title=None):
    image = tensor.cpu().clone()  # we clone the tensor to not do changes on it
    image = image.squeeze(0)      # remove the fake batch dimension
    image = unloader(image)
    plt.imshow(image)
    ifTitle is not None: plt.title(title) plt.pause(0.001)# pause a bit so that plots are updated


plt.figure()
imshow(style_img, title='Style Image')

plt.figure()
imshow(content_img, title='Content Image')
Copy the code

Loss function

Content Loss

Content loss is a function that represents the weighted content distance of a single layer. This function receives a feature map FXL of layer L of the network that processes input X and returns input image X and content image C. The feature graph (FCL) of the content image must be known in order to be able to calculate the content distance. We implement this function as a Torch Module with a constructor that takes FCL as input. The distance ∥ FXL – FCL ∥ 2

Is the mean square error between two feature graph sets, which can be calculated using nn.mseloss.

We’ll add the content loss Module directly after the convolution layer that calculates content distance. In this way, each time the network receives an input image, the content loss will be calculated at the desired layer, and because of Auto grad, all gradients will be calculated. Now, to make the content loss layer transparent, we need to define a forward method that calculates the content loss and then returns the input for that layer. The calculated losses are saved as module parameters.

class ContentLoss(nn.Module):

    def __init__(self, target,):
        super(ContentLoss, self).__init__()
        # we 'detach' the target content from the tree used
        # to dynamically compute the gradient: this is a stated value,
        # not a variable. Otherwise the forward method of the criterion
        # will throw an error.
        self.target = target.detach()

    def forward(self, input):
        self.loss = F.mse_loss(input, self.target)
        return input
Copy the code

Note

Important details: Although this module is named ContentLoss, it is not a true PyTorch loss function. If you want to define content loss as a PyTorch loss function, you must create a PyTorch automatic gradient function to manually calculate/implement the gradient in the BACKWARD method.

Style Loss

The Implementation of The Style Loss Module is similar to that of The Content Loss Module. It acts as a transparent layer in the network to calculate the style loss of the layer, we need to calculate gram matrix GXL. A Gram matrix is the product of a given matrix multiplied by its transpose. In this application, the given matrix is a reshaped version of the feature diagram FXL of layer L. FXL is 0 0 to form F^XL, a KxN matrix where K is the number of features of layer L and N is the length of any vectorized feature of FkXL. For example, the first row of F^XL corresponds to the first vectorized feature graph, F1XL

.

Finally, the Gram matrix must be normalized by dividing each element by the total number of elements in the matrix. This normalization is to cancel out F^XL with large N dimensions

The fact that the matrix produces larger values in the Gram matrix. This extremely large value will cause the previous layer (before the pooling layer) to exert significant influence during the gradient descent. Style Features tends to be deeper in the network, so this normalization step is extremely important.

def gram_matrix(input):
    a, b, c, d = input.size()  # a=batch size(=1)
    # b=number of feature maps
    # (c,d)=dimensions of a f. map (N=c*d)

    features = input.view(a * b, c * d)  # resise F_XL into \hat F_XL

    G = torch.mm(features, features.t())  # compute the gram product

    # we 'normalize' the values of the gram matrix
    # by dividing by the number of element in each feature maps.
    return G.div(a * b * c * d)
Copy the code

The Style Loss Module now looks almost exactly like the Content Loss Module. Use GXL and GSL

The mean square error between is calculated by style distance.

class StyleLoss(nn.Module):

    def __init__(self, target_feature):
        super(StyleLoss, self).__init__()
        self.target = gram_matrix(target_feature).detach()

    def forward(self, input):
        G = gram_matrix(input)
        self.loss = F.mse_loss(G, self.target)
        return input
Copy the code

Into the model

Now we need to introduce a pre-trained neural network. We will use a 19-layer VGG network like the one used in the paper.

The VGG implemented by PyTorch is a module divided into two Sequential modules: features (containing the convolutional layer and the pooling layer) and Classifier (containing the fully connected layer). We will use the Features Module because we need the output of individual convolutional layers to measure content loss and style loss. Some layers behave differently from evaluation during training, so we must use.eval() to set the network to evaluation mode.

cnn = models.vgg19(pretrained=True).features.to(device).eval()
Copy the code

In addition, the VGG network was trained on images normalized by mean=[0.485, 0.456, 0.406] and STD =[0.229, 0.224, 0.225] for each channel. We’re going to use them to normalize the image, and then send the normalized image to the network for processing.

Cnn_normalization_mean = torch. Tensor ([0.485, 0.456, 0.406]). To (device) cnn_normalization_std = torch. Tensor ([0.229, 0.224, 0.225]) to (device)Create a module to normalize the input images so that we can simply send them to nn.Sequential.
class Normalization(nn.Module):
    def __init__(self, mean, std):
        super(Normalization, self).__init__()
        # .view the mean and std to make them [C x 1 x 1] so that they can
        # directly work with image Tensor of shape [B x C x H x W].
        # B is batch size. C is number of channels. H is height and W is width.
        self.mean = torch.tensor(mean).view(-1, 1, 1)
        self.std = torch.tensor(std).view(-1, 1, 1)

    def forward(self, img):
        # normalize img
        return (img - self.mean) / self.std
Copy the code

Sequential Module contains an ordered list of child modules. For example, vGG19.features contains a sequence (Conv2d, ReLU, MaxPool2d, Conv2d, ReLU…) aligned in the right order of depth. We need to add a content loss layer and a style loss layer immediately after the convolution layer they detect. To do this, we must create a new Sequential module with a content-loss module and a style-loss module inserted correctly.

# To calculate the depth layer required for style/ Content losses:
content_layers_default = ['conv_4']
style_layers_default = ['conv_1'.'conv_2'.'conv_3'.'conv_4'.'conv_5']

def get_style_model_and_losses(cnn, normalization_mean, normalization_std,
                               style_img, content_img,
                               content_layers=content_layers_default,
                               style_layers=style_layers_default):
    cnn = copy.deepcopy(cnn)

    # normalization module
    normalization = Normalization(normalization_mean, normalization_std).to(device)

    # just in order to have an iterable access to or list of content/syle
    # losses
    content_losses = []
    style_losses = []

    # assuming that cnn is a nn.Sequential, so we make a new nn.Sequential
    # to put in modules that are supposed to be activated sequentially
    model = nn.Sequential(normalization)

    i = 0  # increment every time we see a conv
    for layer in cnn.children():
        if isinstance(layer, nn.Conv2d):
            i += 1
            name = 'conv_{}'.format(i)
        elif isinstance(layer, nn.ReLU):
            name = 'relu_{}'.format(i)
            # The in-place version doesn't play very nicely with the ContentLoss
            # and StyleLoss we insert below. So we replace with out-of-place
            # ones here.
            layer = nn.ReLU(inplace=False)
        elif isinstance(layer, nn.MaxPool2d):
            name = 'pool_{}'.format(i)
        elif isinstance(layer, nn.BatchNorm2d):
            name = 'bn_{}'.format(i)
        else:
            raise RuntimeError('Unrecognized layer: {}'.format(layer.__class__.__name__))

        model.add_module(name, layer)

        if name in content_layers:
            # add content loss:
            target = model(content_img).detach()
            content_loss = ContentLoss(target)
            model.add_module("content_loss_{}".format(i), content_loss)
            content_losses.append(content_loss)

        if name in style_layers:
            # add style loss:
            target_feature = model(style_img).detach()
            style_loss = StyleLoss(target_feature)
            model.add_module("style_loss_{}".format(i), style_loss)
            style_losses.append(style_loss)

    # now we trim off the layers after the last content and style losses
    for i in range(len(model) - 1, -1, -1):
        if isinstance(model[i], ContentLoss) or isinstance(model[i], StyleLoss):
            break

    model = model[:(i + 1)]

    return model, style_losses, content_losses
Copy the code

Next we select the input image. You can use a copy of the content image or a white noise image as the input image.

input_img = content_img.clone()
# If you want to use white noise, uncomment the following line:
# input_img = torch.randn(content_img.data.size(), device=device)

# Add the original input image to the figure:
plt.figure()
imshow(input_img, title='Input Image')
Copy the code

Gradient descent

The authors of the algorithm suggest that we use the L-BFGS algorithm to run gradient descent. Unlike training a network, we want to train input images to minimize content/ Style losses. We will create a PyTorch L-BFgs optimizer, Optim.lBFgs, and pass our image to it as the tensor to be optimized.

def get_input_optimizer(input_img):
    # this line to show that input is a parameter that requires a gradient
    optimizer = optim.LBFGS([input_img.requires_grad_()])
    return optimizer
Copy the code

Finally, we must define a function that performs neural Transfer. For each iteration of the network, it gets an updated input and calculates a new loss. We will run the BACKWARD method for each Loss Module to dynamically calculate their gradients. The optimizer needs a “closure” function that reevaluates the module and returns losses.

We have one final constraint to resolve. The network can try to optimize input values that exceed the 0 to 1 tensor range of the image. We can solve this problem by correcting the input values to 0 to 1 each time the network runs.

def run_style_transfer(cnn, normalization_mean, normalization_std,
                       content_img, style_img, input_img, num_steps=300,
                       style_weight=1000000, content_weight=1):
    """Run the style transfer."""
    print('Building the style transfer model.. ')
    model, style_losses, content_losses = get_style_model_and_losses(cnn,
        normalization_mean, normalization_std, style_img, content_img)
    optimizer = get_input_optimizer(input_img)

    print('Optimizing.. ')
    run = [0]
    while run[0] <= num_steps:

        def closure():
            # correct the values of updated input image
            input_img.data.clamp_(0, 1)

            optimizer.zero_grad()
            model(input_img)
            style_score = 0
            content_score = 0

            for sl in style_losses:
                style_score += sl.loss
            for cl in content_losses:
                content_score += cl.loss

            style_score *= style_weight
            content_score *= content_weight

            loss = style_score + content_score
            loss.backward()

            run[0] += 1
            if run[0] % 50 == 0:
                print("run {}:".format(run))
                print('Style Loss : {:4f} Content Loss: {:4f}'.format(
                    style_score.item(), content_score.item()))
                print(a)return style_score + content_score

        optimizer.step(closure)

    # a last correction...
    input_img.data.clamp_(0, 1)

    return input_img
Copy the code

Finally, we can run the algorithm

output = run_style_transfer(cnn, cnn_normalization_mean, cnn_normalization_std,
                            content_img, style_img, input_img)

plt.figure()
imshow(output, title='Output Image')

# sphinx_gallery_thumbnail_number = 4
plt.ioff()
plt.show()
Copy the code