Quick start PyTorch(3)- Train a picture classifier and multiple GPUs

The original link: mp.weixin.qq.com/s/3hXlcOVuJ…

Quick start with the first two articles in the PyTorch tutorial:

Quick start Pytorch(1)- Installation, tensors and gradients
Quick Start PyTorch(2)- How to Build a Neural Network

This is the third and final tutorial for PyTorch, a quick start. This tutorial will simply train a picture classifier on the CIFAR10 dataset. It will simply implement a classifier from the network definition, data processing and loading to the training network model, and finally test the model performance. And how to train the network model using multiple GPUs.

The contents of this article are as follows:

4. Train the classifier

The previous section described how to build the neural network, calculate the weight parameters of loss and update the network, and all you need to do next is implement a picture classifier.

4.1 Training Data

Before training the classifier, of course, you need to consider the data. Typically when dealing with data such as images, text, speech, or video, the standard Python library is used to load and convert it into a Numpy array, and then back into a PyTorch tensor.

For images, you can usePillow, OpenCVLibrary;
For speech, yesscipy 和 librosa;
For text, you can choose native Python or Cython to load the data, or useNLTK 和 SpaCy 。

PyTorch specifically created a torchVision library for computer vision, which contains a data loader that can load common datasets such as Imagenet, CIFAR10, MNIST, etc. And then there’s one for the image data converter (data transformers), call the library is torchvision datasets and the torch utils. Data. The DataLoader.

In this tutorial, you will use the CIFAR10 data set, which contains 10 categories, namely aircraft, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks. The images in the dataset are all 3x32x32. Some examples are as follows:

4.2 Train image classifier

The training process is as follows:

By calling thetorchvisionLoading and normalizationCIFAR10Training set and test set;
Construct a convolutional neural network;
Define a loss function;
Training network on training set;
Test network performance on a test set.

4.2.1 Loading and normalization of CIFAR10

First import the necessary packages:

import torch
import torchvision
import torchvision.transforms as transforms
Copy the code

The output images of torchvision’s data set are all PILImage, that is, the value range is [0, 1], and a transformation is needed to change the value range into [-1, 1], as shown below:

Normalize image data from [0,1] to the value range of [-1, 1
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5.0.5.0.5), (0.5.0.5.0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane'.'car'.'bird'.'cat'.'deer'.'dog'.'frog'.'horse'.'ship'.'truck')
Copy the code

After downloading the data, part of the training pictures can be visualized. The code is as follows:

import matplotlib.pyplot as plt
import numpy as np

# display image function
def imshow(img):
    img = img / 2 + 0.5     # nonnormalization
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1.2.0)))
    plt.show()


Get training set images randomly
dataiter = iter(trainloader)
images, labels = dataiter.next()

# Show pictures
imshow(torchvision.utils.make_grid(images))
Print the image category tag
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
Copy the code

The display picture is as follows:

The category labels are:

 frog plane   dog  ship
Copy the code

4.2.2 Build a convolutional neural network

In fact, this part of conv1 could be used directly as the network defined in the previous section, except that the input channel of conv1 was changed from 1 to 3, because this time it was receiving 3-channel color images.

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3.6.5)
        self.pool = nn.MaxPool2d(2.2)
        self.conv2 = nn.Conv2d(6.16.5)
        self.fc1 = nn.Linear(16 * 5 * 5.120)
        self.fc2 = nn.Linear(120.84)
        self.fc3 = nn.Linear(84.10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(- 1.16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
Copy the code

4.2.3 Define loss functions and optimizers

Category cross entropy function and SGD optimization method with momentum are adopted here:

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Copy the code

4.2.4 Training network

The fourth step is naturally to train the network, specify the epoch to be iterated, then input data and print the information of the current network for specified times, such as performance evaluation criteria such as Loss or accuracy.

import time
start = time.time()
for epoch in range(2):
    
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0) :Get input data
        inputs, labels = data
        Clear the gradient cache
        optimizer.zero_grad()
        
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        Print statistics
        running_loss += loss.item()
        if i % 2000= =1999:
            Print the information every 2000 iterations
            print('[%d, %5d] loss: %.3f' % (epoch + 1, i+1, running_loss / 2000))
            running_loss = 0.0
print('Finished Training! Total cost time: ', time.time()-start)
Copy the code

Here, the training is defined as two epochs in total. The training information is as follows, which takes about 77s.

[1.2000] loss: 2.226
[1.4000] loss: 1.897
[1.6000] loss: 1.725
[1.8000] loss: 1.617
[1.10000] loss: 1.524
[1.12000] loss: 1.489
[2.2000] loss: 1.407
[2.4000] loss: 1.376
[2.6000] loss: 1.354
[2.8000] loss: 1.347
[2.10000] loss: 1.324
[2.12000] loss: 1.311

Finished Training! Total cost time:  77.24696755409241
Copy the code

4.2.5 Test model performance

After training a network model, it is necessary to test the generalization ability of the network model with test sets. For image classification tasks, accuracy is generally used as the evaluation standard.

First of all, we used a batch of pictures to conduct a small test, where Batch =4, i.e., 4 pictures, the code is as follows:

dataiter = iter(testloader)
images, labels = dataiter.next()

# Print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: '.' '.join('%5s' % classes[labels[j]] for j in range(4)))
Copy the code

The images and labels are as follows:

GroundTruth:    cat  ship  ship plane
Copy the code

Then input these four images into the network and see what the network predicted:

# Network output
outputs = net(images)

# Predicted results
_, predicted = torch.max(outputs, 1)
print('Predicted: '.' '.join('%5s' % classes[predicted[j]] for j in range(4)))
Copy the code

The output is:

Predicted:    cat  ship  ship  ship
Copy the code

The first three images were correct, and the fourth incorrectly predicted that the plane was a ship.

Now, let’s see how accurate we can get across the entire test set.

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))
Copy the code

The output is as follows

Accuracy of the network on the 10000 test images: 55 %
Copy the code

The accuracy may not be the same here, but the result in the tutorial is 51%, which may be somewhat variable due to weight initialization problems, which is not bad compared to a random guess of 10 categories (10%), which is actually very bad, but we’re only using a 5-layer network, And it’s just a sample code for the tutorial.

Predicted == Labels). Squeeze () Output 1 or 0 for true or false according to whether the predicted and real labels are equal. Therefore, when calculating the number of correct predictions for the current category, you add them directly. If the prediction is correct, you add 1, and if the prediction is wrong, you add 0, that is, there is no change.

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1
        

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (classes[i], 100 * class_correct[i] / class_total[i]))
Copy the code

As you can see from the output, cats, birds and deer are the top three categories with the least accurate predictions, while boats and trucks are the most accurate.

Accuracy of plane : 58 %
Accuracy of   car : 59 %
Accuracy of  bird : 40 %
Accuracy of   cat : 33 %
Accuracy of  deer : 39 %
Accuracy of   dog : 60 %
Accuracy of  frog : 54 %
Accuracy of horse : 66 %
Accuracy of  ship : 70 %
Accuracy of truck : 72 %
Copy the code

4.3 Training on GPU

Deep learning naturally requires GPU to speed up training. So here is how to train on GPU.

First, you need to check if there is an available GPU for training, with the following code:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
Copy the code

The following output indicates that your first or only GPU graphics card is idle and available, otherwise the CPU will be printed.

cuda:0
Copy the code

Now that available GPU is available, it is time to train on GPU. The code to be modified is as follows: network parameters and data need to be transferred to GPU respectively:

net.to(device)
inputs, labels = inputs.to(device), labels.to(device)
Copy the code

Modified training code:

import time
Note the need to put the network and data on the GPU
net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

start = time.time()
for epoch in range(2):
    
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0) :Get input data
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        Clear the gradient cache
        optimizer.zero_grad()
        
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        Print statistics
        running_loss += loss.item()
        if i % 2000= =1999:
            Print the information every 2000 iterations
            print('[%d, %5d] loss: %.3f' % (epoch + 1, i+1, running_loss / 2000))
            running_loss = 0.0
print('Finished Training! Total cost time: ', time.time() - start)
Copy the code

Note that after calling net.to(Device), the optimizer needs to be defined, i.e. the network parameters of the CUDA tensor are passed in. The training results were similar to the previous ones, and actually because the network was so small, there wasn’t much of a speed increase when moving to the GPU, and my training results actually seemed to slow down, possibly because of the GPU graphics card on my laptop.

If you need to speed things up even further, consider using multiple GPUs. See the Data parallelism tutorial here, which is optional.

Pytorch.org/tutorials/b…

This section tutorial:

Pytorch.org/tutorials/b…

The code for this section:

Github.com/ccc013/Deep…

5. Data parallelism

In this tutorial you will learn how to use DataParallel to work with multiple GPUs training networks.

First, training the model on the GPU is simple, as shown in the following code, defining a device object and then using the.to() method to place the network model parameters on the specified GPU.

device = torch.device("cuda:0")
model.to(device)
Copy the code

The next step is to put all tensor variables on the GPU:

mytensor = my_tensor.to(device)
Copy the code

Notice that my_tensor. To (device) is returning a new copy of my_tensor, not modifying the my_tensor variable directly, so you need to assign it to a new tensor, and then use that tensor.

Pytorch uses only one GPU by default, so you need to use multiple Gpus and DataParallel, as shown in the following code:

model = nn.DataParallel(model)
Copy the code

This code is the key to this tutorial and will be covered in more detail.

5.1 Import and Parameters

Start by importing the necessary libraries and defining some parameters:

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# Parameters and DataLoaders
input_size = 5
output_size = 2

batch_size = 30
data_size = 100

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Copy the code

Here, the network input size and output size, batch and picture size are mainly defined, and a device object is defined.

5.2 Build a dummy dataset

The next step is to build a fake (random) data set. The implementation code is as follows:

class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len

rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
                         batch_size=batch_size, shuffle=True)
Copy the code

5.3 Simple Model

Then, a simple network model was constructed, including only a full-connection layer of neural network, and the print() function was added to monitor the size of network input and output tensors:

class Model(nn.Module):
    # Our model

    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, input):
        output = self.fc(input)
        print("\tIn Model: input size", input.size(),
              "output size", output.size())

        return output
Copy the code

5.4 Create model and data parallelism

This is the core of this section. You need to define an instance of the model and check if you have multiple GPUs. If so, you can wrap the model in nn.DataParallel and call model.to(device). The code is as follows:

model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  # dim = 0 [30, xxx] -> [10, ...] , 10,... , 10,...  on 3 GPUs
  model = nn.DataParallel(model)

model.to(device)
Copy the code

5.5 Operating Model

Then you can run the model and see the printed information:

for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("Outside: input size", input.size(),
          "output_size", output.size())
Copy the code

The output is as follows:

In Model: input size torch.Size([15.5]) output size torch.Size([15.2])
        In Model: input size torch.Size([15.5]) output size torch.Size([15.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
        In Model: input size torch.Size([15.5]) output size torch.Size([15.2])
        In Model: input size torch.Size([15.5]) output size torch.Size([15.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
        In Model: input size torch.Size([15.5]) output size torch.Size([15.2])
        In Model: input size torch.Size([15.5]) output size torch.Size([15.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
        In Model: input size torch.Size([5.5]) output size torch.Size([5.2])
        In Model: input size torch.Size([5.5]) output size torch.Size([5.2])
Outside: input size torch.Size([10.5]) output_size torch.Size([10.2])
Copy the code

5.6 Running Results

If there are only one or no Gpus, then when Batch =30, the model will get the input and output sizes of 30. But if there are multiple GPUs, the result is as follows:

2 GPUs

# on 2 GPUs
Let's use 2 GPUs!
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
    In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
Copy the code

3 GPUs

Let's use 3 GPUs!
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
Copy the code

8 GPUs

Let's use 8 GPUs!
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
Copy the code

5.7 summarize

DataParallel automatically splits data sets and sends tasks to multiple models on multiple GPUs. Then, after each model has done its job, it collects and fuses the results, and returns.

More detailed data parallelism tutorial:

Pytorch.org/tutorials/b…

This section tutorial:

Pytorch.org/tutorials/b…

summary

The third part is mainly to simply implement an image classification process, select data sets, build network models, define loss functions and optimization methods, train network, test network performance, and check the accuracy of each category. Of course, this is just a very simple process.

Then there is the operation of training the network with multiple GPUs.

Next you can choose:

Train a neural network to play video games
Train ResNet on Imagenet
GAN is used to train a face generator
A word – level language model is trained using a cyclic LSTM network
More examples
More tutorials
Discuss PyTorch in the Forums community

Welcome to follow my wechat official account – Machine Learning and Computer Vision, or scan the qr code below, we can communicate, learn and progress together!

Past wonderful recommendation

Machine learning series

Beginners of machine learning actual combat tutorial!
Model evaluation, over-fitting, under-fitting and hyperparameter tuning methods
Summary and Comparison of Commonly used Machine Learning Algorithms
Summary and Comparison of Common Machine Learning Algorithms (PART 1)
How to Build a Complete Machine Learning Project
Data Preprocessing for feature Engineering (PART 1)
Learn about eight applications of computer vision

Github projects & Resource tutorials recommended

[Github Project recommends] a better site for reading and finding papers
TensorFlow is now available in Chinese
Must-read AI and Deep learning blog
An easy-to-understand TensorFlow tutorial
Recommend some Python books and tutorials, both beginner and advanced!
[Github project recommendation] Machine learning & Python
[Github Project Recommendations] Here are three tools to help you get the most out of Github
Github provides information about universities and foreign open course videos
Did you pronounce all these words correctly? Incidentally recommend three programmers exclusive English tutorial!

Quick start PyTorch(3)- Train a picture classifier and multiple GPUs

4. Train the classifier

4.1 Training Data

4.2 Train image classifier

4.2.1 Loading and normalization of CIFAR10

4.2.2 Build a convolutional neural network

4.2.3 Define loss functions and optimizers

4.2.4 Training network

4.2.5 Test model performance

4.3 Training on GPU

5. Data parallelism

5.1 Import and Parameters

5.2 Build a dummy dataset

5.3 Simple Model

5.4 Create model and data parallelism

5.5 Operating Model

5.6 Running Results

2 GPUs

3 GPUs

8 GPUs

5.7 summarize

summary

Past wonderful recommendation

Machine learning series

Github projects & Resource tutorials recommended

Related Posts

Parameter learning and gradient descent

Cloud Racing -Amazon Deepracer in Past and Present Lives

Install TensorFlow on Ubuntu