Structure of autoencoder

Auto Encoder is a kind of neural network model. It consists of two parts: an encoder (Ecoder) and a Decoder (Decoder).

Encoders are used to encode Input Data and map the Input Data to a lower-dimensional Latent Space, resulting in Encoded Data.

Decoders are used to restore (decode) encoded data in hidden space into “input data”. The reason for the quotation marks here is that the restored “input data” will have some loss compared to the original input data, so it is not the true input data.

The following figure shows the structure of an autoencoder

Application of autoencoder

Data dimension reduction/feature extraction

From the structure of the autoencoder it is easy to imagine its use for this purpose. In the training stage, XXX is mapped by the encoder to obtain low-dimensional ZZZ. ZZZ restores X’X ‘X ‘with the same dimension and similar content through the decoder, and updates the network weight through back propagation to minimize the loss between input XXX and output X’X ‘X’.

Where ZZZ is the data after dimension reduction, because if the input XXX can be restored through ZZZ, it can be said that ZZZ has learned most features of XXX.

At the same time,
z z
Is a representation of input data that is highly compressed (dimensionality reduction), so the autoencoder can also be regarded as a feature extractor, and the extracted feature representation is
z z
.

Image denoising

Image denoising is to remove the noise points on the image. For example, in the figure below, the left half is noiseless, while the right half is noisy. Our goal is to train an autoencoder whose input is the image with noise and the output is the image with noise removed, thus achieving the purpose of noise removal.

The training method is also very simple. Prepare some noise-free images XXX, then manually add noise to these images to get the image XnoiseX_{noise}Xnoise. Input XXX into the encoder and then through the decoder to get the output X’X ‘X ‘, updating the weight by back propagation to minimize the loss between X’X ‘X ‘and XXX. In this way, the output X’X ‘X ‘will be close to the noisi-free XXX, which achieves the purpose of noise reduction.

In the application, just input the image with noise into the trained autoencoder, then the output image without noise can be obtained.

Image “generation”

In fact, the introduction of image “generation” is already included in the introduction of the previous two applications. Remove the encoder in the trained autoencoder, and the remainder can be regarded as a “generator”. A low dimensional feature of the input hidden space represents ZZZ, and a generated picture can be obtained through the decoder.

But there’s a problem. Taking the picture as an example, if the previously encoded hidden space feature ZZZ is input to the decoder, the decoder will output a picture similar to the corresponding input of ZZZ. However, if we try to input the random noise consistent with the hidden space feature dimension into the decoder, we will get a meaningless noise picture.

Therefore, the “generation” here is not the real sense of generation, more accurately should be “reconstruction”.

Variational autoencoders can remedy this shortcoming, which will be discussed in a subsequent article.

Write an autoencoder in Pytorch

Now, train an autoencoder using a handwritten numeric data set.

The handwritten digital data set was originally a number of pictures of 1∗28∗281*28*281∗28∗28. Here, we trained a fully connected neural network, so we used the feature of 1∗28∗28=7841*28*28=7841∗28∗28=784 dimension after flatten. In other words, if there are NNN images, then the shape of our data is NNN row 784784784 column.

First import the required libraries:

import torch
import torch.nn as nn
import torch.optim as optim 
import torchvision
import matplotlib.pyplot as plt
import numpy as np
Copy the code

Then prepare the handwritten numeric data set:

transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

train_dataset = torchvision.datasets.MNIST(
    root="torch_datasets", train=True, transform=transform, download=True
)

test_dataset = torchvision.datasets.MNIST(
    root="torch_datasets", train=False, transform=transform, download=True
)

train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=128, shuffle=True, num_workers=4, pin_memory=True
)

test_loader = torch.utils.data.DataLoader(
    test_dataset, batch_size=32, shuffle=False, num_workers=4
)
Copy the code

The above code will automatically download the data set from the network to the one you specifyrootUnder the path.

Next, build the autoencoder network:

class AE(nn.Module) :
    def __init__(self, **kwargs) :
        super().__init__()
        # encoder
        self.encoder_hidden_layer = nn.Linear(
            in_features=kwargs["input_shape"], out_features=128)
        self.encoder_output_layer = nn.Linear(
            in_features=128, out_features=128)
        # decoder
        self.decoder_hidden_layer = nn.Linear(
            in_features=128, out_features=128)
        self.decoder_output_layer = nn.Linear(
            in_features=128, out_features=kwargs["input_shape"])

    def forward(self, features) :
        # encoder
        activation = torch.relu(self.encoder_hidden_layer(features))
        code = torch.relu(self.encoder_output_layer(activation))
        # decoder
        activation = torch.relu(self.decoder_hidden_layer(code))
        reconstructed = torch.relu(self.decoder_output_layer(activation))
        # Return the reconstructed image
        return reconstructed
Copy the code

Set some more parameters:

# Set a random seed so that the result is the same every time
# Reason: Some of Pytorch's operations are random
seed = 42
torch.manual_seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
# Set the number of iterations and the learning rate
epochs = 20
learning_rate = 1e-3

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AE(input_shape=784).to(device)
# Use the Adam optimizer
optimizer = optim.Adam(model.parameters(), lr=1e-3)
# Use mean square error as loss function
criterion = nn.MSELoss()
Copy the code

Start training:


for epoch in range(epochs):
    loss = 0
    for batch_features, _ in train_loader:
        # Flatten N *28*28 images into N 784-dimensional feature vectors
        batch_features = batch_features.view(-1.784).to(device)

        optimizer.zero_grad()
        
        # forward propagation
        outputs = model(batch_features)
        
        # Calculate the loss
        train_loss = criterion(outputs, batch_features)
        
        # Calculate the gradient of backpropagation
        train_loss.backward()
        
        # Update network weights
        optimizer.step()
        
        # Loss accumulation
        loss += train_loss.item()
    
    # Calculate the losses of this round
    loss = loss / len(train_loader)
    
    # the progress bar
    print("epoch : {}/{}, loss = {:.6f}".format(epoch + 1, epochs, loss))
Copy the code

As you can see, the losses are diminishing.

After the training is complete, the first batch of the test set is used for testing:


Select the first batch to test
with torch.no_grad():
    for batch_features in test_loader:
        # fetch only images, no tags
        batch_features = batch_features[0]
        # a draw
        test_examples = batch_features.view(-1.784).to(device)
        # Forward reasoning
        reconstruction = model(test_examples)
        break
Copy the code

Reconstruction consists of 32 reconstructed images, each image represented by 784-dimensional features. 1*28*281∗28 *28*281 *28 ∗28 *28

with torch.no_grad():
    number = 10
    plt.figure(figsize=(20.4))
    for index in range(number):
        # Visualize the original image
        ax = plt.subplot(2, number, index + 1)
        plt.imshow(test_examples[index].cpu().numpy().reshape(28.28))
        plt.gray()
        # do not display the axis
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

        # Visualize the reconstructed image
        ax = plt.subplot(2, number, index + 1 + number)
        plt.imshow(reconstruction[index].cpu().numpy().reshape(28.28))# on gpu
        plt.gray()
        # do not display the axis
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
    plt.show()
Copy the code

The output image is as follows:

The first line is the original image, and the second line is the reconstructed image corresponding to the first line. It can be seen that the reconstructed picture is basically the same as the original picture, but there are some losses, which are represented as small black spots in the picture.

This article was first published on wechat public number: I will find you in the Antarctic, a public number only dry goods, the bottom menu bar dry goods full, welcome to pay attention to!

Reference:

[1] [medium.com/pytorch/imp…].

[2] [www.compthree.com/blog/autoen]…

[3] [www.pgrady.net/music-compr]…

[4] [towardsdatascience.com/denoising-a…].