Good books to share: Introduction to Machine learning

0. Identify the results

If you want to try it yourself, you can go to my official account [Thumb Notes] and reply to “MLP” in the background

1. Multi-layer Perceptron (MLP)

This section introduces the concept of multilayer neural network using multilayer perceptron as an example.

1.1 the hidden layer

The figure below is the neural network diagram of a multilayer perceptron.

Multilayer perceptron introduces one or more hidden layers on the basis of single-layer neural network. The hidden layer as shown in the figure has 5 hidden units in total. Since the input layer does not involve computation, the number of layers of this multilayer perceptron is 2. The hidden layer and output layer in the multilayer perceptron as shown in the figure are fully connected layers.

For a multilayer perceptron with only one hidden layer and the number of hidden elements h, its output is denoted as H. Since both the hidden layer and the output layer in the multilayer perceptrons are fully connected layers, it can be assumed that the weight parameters and deviation parameters of the hidden layer are W_h and b_H respectively, and the weight parameters and deviation parameters of the output layer are W_o and b_O respectively

Thus we can get the relation between input, output and output of single hidden layer neural network


To find the relationship between the inputs and outputs, put the two equations side by side.


It can be seen from Equation (2) that such a neural network is equivalent to a single-layer neural network although it introduces (one or more) hidden layers. The root cause of this problem is the full connection layer. The full connection layer only performs affine transformation on data, but the superposition of multiple affine transformation is still one affine transformation. In order to solve this problem, a nonlinear transformation, namely the activation function, is introduced.

For those of you who don’t know the activation function, click here

2. Realize multi-layer perceptron

In this section, a multilayer perceptron will be used to read the Fashion MNIST data set.

Start by importing the required libraries.

import torch
from torch import nn
import numpy as np
import sys
sys.path.append("..") 

import torchvision
from IPython import display
from time import time
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
from time import time 
Copy the code

2.1 Obtaining and reading data

This part still uses the previous fashion-Mnist data set.

batch_size = 256

mnist_train = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST',train=True,download=True,transform=transforms.ToTensor())
Get the training set
mnist_test = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST',train=False,download=True,transform=transforms.ToTensor())
Get the test set

Generate iterators
train_iter = torch.utils.data.DataLoader(mnist_train,batch_size=batch_size,shuffle = True,num_workers = 0)

test_iter = torch.utils.data.DataLoader(mnist_test,batch_size = batch_size,shuffle=False,num_workers=0)
Copy the code

2.2 Define and initialize model parameters

The image in the Fashion-MNIST dataset is 28*28 pixels, that is, 784 eigenvalues. The fashion-MnIST dataset consists of ten categories. So the model requires 784 inputs and 10 outputs. Assume a hidden unit of 256 (hyperparametric, adjustable).

num_inputs,num_outputs,num_hiddens = 784.10.256

Initialization of hidden layer weight parameters, hidden layer deviation parameters, output layer weight parameters and output layer deviation parameters.
w1 = torch.tensor(np.random.normal(0.0.01,(num_inputs,num_hiddens)),dtype = torch.float)
b1 = torch.zeros(num_hiddens,dtype = torch.float)
w2 = torch.tensor(np.random.normal(0.0.01,(num_hiddens,num_outputs)),dtype = torch.float)
b2 = torch.zeros(num_outputs,dtype = torch.float)

params = [w1,b1,w2,b2]

Enable automatic derivative for w1,w2,b1,b2
for param in params:
	param.requirers_grad_(requires_grad = True)
Copy the code

2.3 Define the activation function

def relu(X):
	return torch.max(input=X,other = torch.tensor(0.0))
Copy the code

2.4 Defining the Model

def net(X):
	X = X.view((- 1,num_inputs))
    Convert the input to a column vector form
    H = relu(torch.matmul(X,W1)+b1)
    Calculate the output of the hidden layer
    O = torch.matmul(H,W2)+b2
    # compute output
    return O
Copy the code

2.5 Define the loss function

Here we use softmax calculation and cross entropy for better numerical stability.

loss = torch.nn.CrossEntropyLoss()
Copy the code

2.6 Optimization function

Small batch stochastic gradient descent algorithm is used.

def sgd(params,lr,batch_size):
    # LR: learning rate,params: weight parameters and deviation parameters
    for param in params:
        param.data -= lr*param.grad/batch_size
        #. Data is performed on a backup of data without changing the data itself.
Copy the code

2.7 Calculate classification accuracy

Principle of calculation accuracy:

We take the category with the highest prediction probability as the output category. If it is consistent with the real category Y, it indicates that the prediction is correct. Classification accuracy is the ratio of the number of correct predictions to the total number of predictionsCopy the code

First we need to get the predicted results.

Find the index corresponding to the highest probability from a set of predicted probabilities (variable y_hat)

#argmax(f(x)) function that maximizes f(x) at the point x. If f(x)= dim=1, we can find the index corresponding to the maximum value on all rows.
A = y_hat.argmax(dim=1)	
The final output is a column vector with the same number of rows as y_hat
Copy the code

Then we need to compare the category corresponding to the maximum probability obtained with the real category (y) to determine whether the prediction is correct

B = (y_hat.argmax(dim=1)==y).float()
Argmax (dim=1)==y then we have a Tensor at home. Float () then we have a Tensor at home
Copy the code

Finally, we need to calculate the classification accuracy

We know that the number of rows of y_hat corresponds to the total number of samples, so the average of B is the classification accuracy

(y_hat.argmax(dim=1)==y).float().mean()
Copy the code

The final result from the previous step is in the form of tensor(X), so you need to do the next step to get the final PyTorch Number

(y_hat.argmax(dim=1)==y).float().mean().item()
# PyTorch Number is obtained via.item()
Copy the code

The classification accuracy function is obtained

def accuracy(y_hat,y):
    return (y_hat.argmax(dim=1).float().mean().item())
Copy the code

As a generalization, the function can also evaluate the accuracy of model NET on dataset data_iter.

def net_accurary(data_iter,net):
    right_sum,n = 0.0.0
    for X,y in data_iter:
    Get X and y from iterator data_iter
        right_sum += (net(X).argmax(dim=1)==y).float().sum().item()
        # Count the number of accurate judgments
        n +=y.shape[0]
        Get the number of elements in the zero dimension of y from Shape [0]
    return right_sum/n
Copy the code

2.8 Training Model

In the training of the model, the number of iteration cycles num_epochs, the number of neural units at the hidden layer num_hiddens and the learning rate LR are all adjustable hyperparameters. By adjusting the values of hyperparameters, a more accurate model can be obtained.

num_epochs,lr = 5.100

def train_MLP(net,train_iter,test_iter,loss,num_epochs,batch_size,params,lr ,optimizer,net_accurary):
    for epoch in range(num_epochs):
        # Loss value, correct amount, total initialized.
        train_l_sum,train_right_sum,n= 0.0.0.0.0
        
        for X,y in train_iter:
            y_hat = net(X)
            l = loss(y_hat,y).sum()
            The value of the dataset loss function = the sum of the values of the loss function for each sample.
            optimizer.zero_grad()			Clear the gradient of the optimization function
            for param in params:
                param.grad.data.zero_()
            l.backward()	Find the gradient for the loss function
            optimzer(params,lr,batch_size)
            
            train_l_sum += l.item()
            train_right_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
            
        test_acc = net_accuracy(test_iter, net)	# Accuracy of test set
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f' % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
        
train_MLP(net,train_iter,test_iter,loss,num_epochs,batch_size,params,lr,sgd,net_accurary)
Copy the code

2.9 Training Results

It can be seen that the recognition accuracy of neural network is improved after the use of multi-layer perceptron. And the running time is longer (80 seconds for 5 learning cycles).

2.10 Identify test sets

Use the trained model to predict the test set

The ultimate goal of making a model is not training, so let’s try identifying data sets.

# Convert the sample's category number to text
def get_Fashion_MNIST_labels(labels):
    text_labels = ['t-shirt'.'trouser'.'pullover'.'dress'.'coat'.'sandal'.'shirt'.'sneaker'.'bag'.'ankle boot']
    return [text_labels[int(i)] for i in labels]
    #labels is a list, so we have a for loop to retrieve the list of text

# display image
def show_fashion_mnist(images,labels):
    display.set_matplotlib_formats('svg')
    # Draw vector diagrams
    _,figs = plt.subplots(1,len(images),figsize=(12.12))
    # set the number and size of subgraphs to be added
    for f,img,lbl in zip(figs,images,labels):
        f.imshow(img.view(28.28).numpy())
        f.set_title(lbl)
        f.axes.get_xaxis().set_visible(False)
        f.axes.get_yaxis().set_visible(False)
    plt.show()

Get samples and tags from test sets
X, y = iter(test_iter).next()

true_labels = get_Fashion_MNIST_labels(y.numpy())
pred_labels = get_Fashion_MNIST_labels(net(X).argmax(dim=1).numpy())

# Add real tags and predicted tags to the image
titles = [true + '\n' + pred for true, pred in zip(true_labels, pred_labels)]

show_fashion_mnist(X[0:9], titles[0:9])

Copy the code

Identify the results


Writing is not easy, if you think it is useful to add a follow, my public account [thumb notes], update my study notes every day ~