10 minutes to fully understand PyTorch

As an increasingly popular framework for deep learning, PyTorch is basically the most popular framework for newcomers to deep learning. Compared to TensorFlow, PyTorch is easier to learn, faster to get started, and easier to implement the demo you want. Today’s article starts with the basics of PyTorch to help you get started. \

First of all, this article requires you to have a certain understanding of the theoretical knowledge of deep learning, know the basic concepts such as CNN and RNN, and know the processes such as forward propagation and back propagation. After all, this article is mainly a practical tutorial.

Secondly, I would like to launch this article from a general perspective. In the process of learning, we should pay more attention to sharing ideas on how to design learning routes when we contact new knowledge. This line of thinking may not work for everyone, but it can certainly teach you something, and you can use it to come up with something that works better for you.

The following steps will help you get started with the PyTorch tutorial.

1. Start a simple classifier

2. Implement a CNN on MNIST

3. Common network layers

4. Tensorboard visualization

5. Take VGG as an example to achieve some tips on the deep Web

6. GPU acceleration and save loading model

7. RNN and LSTM realize classification and regression

What does an example of parallel generation tell you about PyTorch’s future

These eight steps correspond to my eight learning notes. This paper introduces the learning path from a series of ideas, and more details of the corresponding steps will be shown in specific articles. At the end of each step and the end of the full text, we will also provide links to the article, you can eat ~

1. Start a simple classifier

My personal priority when learning a new language, a new framework, a new technology is to get feedback. In the case of learning PyTorch, many tutorials start with tensors. I’ve followed this tutorial myself, and it’s pretty exhaustive, but I don’t recommend it for two reasons: 1) It’s boring and unfulfilling; 2) Some knowledge content belongs to the basic skills of deep learning, which is too verbose.

So I think the best way to start a new knowledge is to build up the structure first and then slowly add details. So IN the first part of this article, I chose to build a simple classifier to give you an idea of what a Code flow should look like under a Pytorch.

For those of you who have studied C, it’s a good idea to start with a Hello world instead of studying the #include in the first line.

For the first PyTorch program, what we did was run through the process first. If it was a simple classifier, the data set could not be too complex. Therefore, we consider from three aspects: 1) Custom generated points, divided into two categories; 2) Learn how to build a shallow neural network; 3) Try the training and testing process in PyTorch.

1.1 User-defined Generated data sets

First, customize the generation of our dataset. Using the torch’s own zeros and Ones methods, we generated random points into two categories. Let’s say we randomly generate some random numbers with the mean of (2, 2) and (-2, -2), as two categories, and then we get the data set we want.

1.2 Learn the process of building a network

Second, build a shallow neural network. Here is a code example to see how the basic PyTorch network should be built:

class Net(torch.nn.Module) :
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.n_hidden = torch.nn.Linear(n_feature, n_hidden)
        self.out = torch.nn.Linear(n_hidden, n_output)


    def forward(self, x_layer):
        x_layer = torch.relu(self.n_hidden(x_layer))
        x_layer = self.out(x_layer)
        x_layer = torch.nn.functional.softmax(x_layer)
        return x_layer




net = Net(n_feature=2, n_hidden=10, n_output=2)
# print(net)


optimizer = torch.optim.SGD(net.parameters(), lr=0.02)
loss_func = torch.nn.CrossEntropyLoss()
Copy the code

This Net class is the code framework we build, and the objects generated from it are a network we can use for training and testing. In this class, the initialization function represents the structure Settings for each network layer, and the forward () method represents the interaction order and relationship between each layer.

The optimizer is the optimizer, which contains the parameters that need to be optimized. Loss_func is the loss function that we set up.

It’s like writing hello world, we just need to know how to construct a network. When we need to adjust, we replace the corresponding modules.

1.3 Training and testing

Then came the training and testing phase. Training we need to know that three lines of code are the core:

optimizer.zero_grad()    
loss.backward()    
optimizer.step(a)Copy the code

The core idea here is gradient clearing, back propagation, parameter updating. These three lines of code correspond to what they do. In PyTorch, gradients are retained, so zero_grad() is used to clear them, and the loss function is propagated back to calculate gradients. Finally, each parameter that needs to be optimized is updated using the optimizer we defined.

The testing phase is easy, just throw in the input and see the predicted results. Now we randomly generate some data points as the test set, and we can see that the classification result of the training set is obvious.

At this point, we have completed a description of a simple classifier. Of course, if you don’t know anything about the previous article, you may find this section not enough to get started. Here is my first note: PyTorch Learning Notes (1) : Start a simple classifier. Here is a detailed description of how to implement a simple classifier. \

2. Implement a CNN on MNIST

What should our learning path look like once we’ve completed a linear classifier? I think the best thing to do is change the “Hello, World” part first, so let’s see what happens if we change the most intuitive part. And you can get very direct feedback on your achievements.

For deep learning, I am most familiar with CNN’s image classification. On a picture, features are extracted layer by layer through convolution to achieve the effect of classification. So now that we know how to implement a classifier, let’s look at how to complete the image classification with CNN.

The data set we chose here is MNIST, which is the image classification data set that people often use as a starting point, and the content is a display of various handwritten numbers. When installing a Torch, the tutorials you refer to also recommend installing TorchVision. In this, a dataset is presented, which includes a variety of common data sets, mnIST is naturally one of them.

It is not difficult to use these data sets mainly by root, transform and other parameters. The corresponding DataLoader method in Torch. Data can be used to generate the desired batch of data. How to generate data in parallel is explained in the last part of this article. All you need to know here is that you can use DataLoader to generate data in batches in parallel.

The core is how to build a CNN network. We learned about the classifiers earlier and only use a hidden layer for embedding. So if we want to realize CNN, we naturally need to add operations such as convolution layer, activation layer and pooling layer.

class CNN(nn.Module) :
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(
                in_channels=1,
                out_channels=16,
                kernel_size=5,
                stride=2,
                padding=2,
            ),
            nn.ReLU(),
            nn.MaxPool2d(2))self.conv2 = nn.Sequential(
            nn.Conv2d(16.32.5.1.2),
            nn.ReLU()
        )


        self.out = nn.Linear(32 * 7 * 7.10)


    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)
        output = self.out(x)
        return output
Copy the code

Therefore, in this way, we can transform the structure of the previous simple classifier to generate our current CNN structure. As can be seen from the code, the first convolution layer is designed as follows: convolution + activation + maximum pooling; The second convolution layer is designed as: convolution + activation. Finally, follow a full connection layer to realize the network structure design of the whole CNN.

The final network operation results can achieve a classification accuracy of more than 97% for MNIST data sets, which shows that CNN has unique advantages in the field of image classification.

Now we have implemented a CNN application on image classification through appropriate modification. For more details, please refer to pyTorch Learning Notes (2) : Implement a CNN on MNIST

So after this step, we may need to think: if I want to make some more customized network structure, how to achieve it? How do I know where to change, and what to make of it? So the next thing you need to know is what common network layers torch provides for integration.

3. Common network layers

Through two progressive examples, we already know how to implement a basic CNN network structure. But as mentioned earlier, if you want to change something, how should you change it?

So from a learning point of view, it’s time to consider introducing the common network layers. Then we can (be a switch man. Choose the right module for the network structure you want to design

In this section we introduce the network layer provided by PyTorch from the following aspects:

Convolution layer: it has one-dimensional, two-dimensional and three-dimensional convolution functions;

Pooling layer: the options include maximum pooling, average pooling, etc.

Dropout: One-dimensional, two-dimensional selection;

BN layer: whether to add BN layer operation;

Activation functions: ELU, RELU, SigmoID, TANH, Softmax and other layers are available;

Loss functions: MSE, CrossEntropy, etc.

Pytorch Learning Notes (3) : Introduction to common Network layers * * * *

4. Tensorboard visualization

Now we have the ability to preliminarily build a custom network structure, and we can also complete the operation of training and testing on our own data set. So how do we get a more intuitive understanding of performance during training? How do you visualize the network structure? What does the content of the dataset look like?

All of these functions can be implemented using a tool called TensorBoard, which is also commonly used in TensorFlow.

How do you learn to use TensorBoard? We suggest the following steps: First, take a simple example to get the code example running; Then visualize the entire training process; Finally, I show you how to visualize the contents of the data set and the flow of the network structure.

4.1 Run an example

Here we choose to run an example from the official tutorial to see how to use the basic flow of TensorBoard:

from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
x = range(100)
for i in x:
    writer.add_scalar('y=2x', i * 2, i)
writer.close()
Copy the code

As we can see from this flow, we introduce a SummaryWriter class, and then generate a Writer object, adding content each time we call the add_Scalar () method in the for loop.

After completing this code, if we type in the terminal:

tensorboard --logdir='runs'
Copy the code

We’re going to get a diagonal y equals 2x, and that’s going to reveal the essence of the Tensorboard. Passing one value at a time into a file in the ‘runs’ folder and then calling the saved data in the terminal produces the desired pattern.

This step is mainly to understand the flow above, so we will see how to replace the module we want to replace, to generate the graphics we want to generate.

4.2 Visualization of CNN’s training data

In the second part, we defined a CNN to achieve the classification effect of pictures. Then how do accuracy and loss change in the training process?

output = cnn(b_x)
loss = loss_func(output, b_y)
optimizer.zero_grad()
loss.backward()
optimizer.step()


if step % 50= =0:
    test_output = cnn(test_x)
    pred_y = torch.max(test_output, 1) [1].data
            
    accuracy = float((pred_y == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
    writer.add_scalar("Train/Accuracy", accuracy, step)


writer.add_scalar("Train/Loss", loss.item(), step)
Copy the code

Here we can see that this is the training part, but with the addition of one of our previous steps: two lines of the add_scalar() method. In fact, the test was conducted every 50 steps during the training, and the test results were recorded, and the loss of each step was also saved.

So at the end, we enter the above: tensorboard — logdir=dir in the terminal, and we see the following graph:

4.3 Visualization of pictures and models

In addition to the recording of numerical values above, Tensorboard also provides visualizations such as images and models. Instead of add_Scalar (), we will use add_image() and add_graph().

Add_image () is used to save image data and input batch data each time. In other words, batch size is equivalent to how many images are visualized.

Add_graph () saves the structure of the model, which can be displayed automatically when visualized.

Pytorch Learning Notes (4) : TensorBoard Visualization

5. VGG and Some Tricks

This part of the content is relatively simple, looking for a more classical deep network to achieve, verify our previous foundation. In addition, a method is introduced to simplify the construction of deep network.

The first implementation of a VGG itself is not too difficult, we look at the paper, you can know the structure of the network Settings. We are not just talking about VGG, but about the structure of a deep network.

To achieve a long network, in essence, or according to the previous idea, layer upon layer of the network stack. Let’s look at what it looks like to build a convolutional layer using Sequential:

This is the network appearance of the first layer. According to the network definition we want to implement, for example, referring to the paper content of VGG, we defined the parameters of the convolution layer, plus BN layer and relU for activation. \

The whole is a layer that we have defined, and the other layers can be built by analogy, using the common network layer that we introduced earlier, like building blocks.

So following the previous tutorial thread, a deep neural network, such as VGG, can be implemented essentially by simple stacking. Finally, we define the following contents in the forward function:

def forward(self, x):
    x = self.conv1(x)
    x = self.conv2(x)
    x = self.conv3(x)
    x = self.conv4(x)
    x = self.conv5(x)
    x = self.conv6(x)
    x = self.conv7(x)
    x = self.conv8(x)
    x = self.conv9(x)
    x = self.conv10(x)
    x = self.conv11(x)
    x = self.conv12(x)
    x = self.conv13(x)
    x = x.view(x.size(0), -1)


    output = self.out(x)
    return output
Copy the code

You can see from the content is the previous CNN expansion, no technical new things. But it’s also obvious, it’s a little ugly, writing a long forward that looks like it’s repeating things, and programmers can’t tolerate repeating things all the time.

So there are two steps to consider how to simplify a model: sequential inputs the network structure of each layer as a list; More convenient to generate the list of each layer network structure. What exactly does that mean? So let’s just expand it out a little bit.

For the setup of a network, we use Sequential to define the layer of network we want. This layer usually refers to convolution + activation + pooling, etc., but of course this is not always the case. In other words, more than one network is defined within a Sequential. Can we put all networks into one Sequential? The answer is yes!
For a Sequential, we can input all the network structures as dynamic parameters. That is, we make the Sequential input to be of the form: *[network layer 1, network layer 2…, network layer n]. As you can see, a list is preceded by *, so you can pass in all the elements in the list as arguments.
But the parameters that you put in like this, you need a very, very long list. When defining this list, our model becomes even uglier. So we need an elegant way to generate a list where each element is the desired network layer structure. So the way to generate it is as follows:

def make_layers(cfg, batch_norm=False) :
    layers = []
    in_channels = 3
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)


cfg = [64.64.'M'.128.128.'M'.256.256.256.'M'.512.512.512.'M'.512.512.512.'M']
Copy the code

It can be seen directly that the generated layers list is what we want, which contains each network layer structure we need. In the for loop, this is how we generate layers, as defined in the CFG parameter. The number represents the output channel of the convolution layer, and the letter ‘M’ represents maximum pooling.

As you can see, in this way, we can achieve the goal of deep neural network design by just adjusting the CFG list. For a more detailed introduction and design, please refer to our previous study notes: PyTorch Study Notes (5) : VGG implementation as well as some tricks.

6. GPU and how to save loading model

At this point, with our network depth added, it’s time to think about GPU acceleration. However, PyTorch provides a very friendly interface for GPU use. Let’s take a look at how to use GPU acceleration and how to save trained models and load them up for testing.

6.1 Let’s see how GPU works first

Let’s take a look at how easy the GPU is to use in PyTorch. Start with a simple command:

torch.cuda.is_available(a)Copy the code

This command will check to see if you have the GPU version of PyTorch installed, or if your graphics card is ready to use, and if the result shows True, we can proceed to the next step.

In PyTorch, we remember three parts: migrating data, migrating model, and migrating data back.

First of all, data migration means that we need to migrate data to GPU. At this time, the importance of video memory is reflected. The larger the video memory is, the more data can be migrated.

Secondly, the migration model, that is to say, the network model defined by us can also be migrated to GPU. At this time, the given model can be trained and tested on GPU with the migrated data.

Finally, the data is migrated, which means that the test results are returned to the CPU for further processing, such as calculating accuracy.

Here’s a little chestnut to show you the three steps:

# specify a nice GPU device, usually 0 if it is a single card.
device = "cuda:0"


# Migrate data
images = images.to(device)
labels = labels.to(device)


# Migrate network, migrate our defined network CNN to GPU.
cnn.to(device)


# training...
# test... Generate test result pred_y


Move pred_y back to the CPU.
pred_y = pred_y.cpu()
Copy the code

From this example, we can clearly see how the GPU can be used to complete the three steps described above. As long as you make sure that all three of these parts are in your code and that the training and testing in between remain the same, we’ve achieved our goal of using GPU acceleration.

6.2 How to save and load the trained model?

There are generally two ways to save models in PyTorch: save only network parameters and save the entire network.

The first thing to know is that all network parameter data in PyTorch is a dict, the state_dict() parameter of a network object. So if we want to save the content we need, in fact, the underlying operation is not complicated.

To save the model, a single statement: torch. Save (content, path) will save the required content to the target’s path. The only thing to think about here is how to distinguish between saving only network parameters and saving the entire network.

Our content is cnn.state_dict() when we save only network parameters, or CNN when we save the entire network. The following two lines of code save only parameters and save the entire network respectively:

torch.save(cnn.state_dict(), PATH)
torch.save(cnn, PATH)
Copy the code

As you can see, it’s very convenient to save it in a single function. So what’s the corresponding way to read it?

Two different methods are used to read: load_state_dict() and load().

Just looking at the name, you can also think that the former is to read parameters, and the latter is to read the entire network. However, if we only read parameters, we need to define the corresponding network object in advance, and then fill the corresponding parameters in the network structure by reading parameters.

Details on how to use the GPU acceleration model, how to store and read trained networks, code and examples can be seen: PyTorch Learning Notes (6) : GPU and how to save load models.

7. Return RNN

Previously, we introduced the creation method of CNN and common network layer. Based on this, we also introduced some other related operations, such as GPU acceleration. Now let’s look at the final part of the series, which is how to use RNN.

Taking RNN as an example, we built a regressor to introduce the use of RNN in PyTorch and help you get started with the operation process of RNN.

7.1 RNN parameters

We won’t go over the definition and content of RNN here. This section is covered in detail in the links to articles later in this section. We will only mention the parameters that can be set by the RNN class in PyTorch. * * * *

Input_size: This parameter represents the dimension of the input data. For example, if you enter a sentence, this represents the dimension of the word vector for each word.

Hidden_size: can be understood as the output dimension of a convolution layer in CNN. This indicates what dimension the previous input_size is mapped to.

Num_layers: indicates the number of layers in the loop. For example, by setting num_layers to 2, the two RNNS are stacked together, with the output of the first layer as the input of the second layer. The default value is 1.

Nonlinearity: This parameter can be used to select the activation function, currently PyTorch supports TANH and RELU, the default activation function is TANh.

Bias: This parameter indicates whether a bias entry is required. The default is True.

Batch_first: This is the format description of our data. In PyTorch we often train data in batch groups. The batch_size parameter here indicates whether the batch is in the first dimension of the input data. If it is in the first dimension, it is True. The default value is False, which is the second dimension.

Dropout: This is whether to add a dropout layer to the output of each layer. If the parameter is non-zero, the dropout layer will be added. Note that the final output layer is not added, meaning that this parameter only makes sense if the num_layers parameter is greater than 1. The default is 0.

Bidirectional: True indicates that the RNN network is bi-directional. The default is False.

Given these parameters, we can easily set the desired RNN structure. Input_size and hidden_size are two parameters that must be passed in to let the network know to map input from what dimension to what dimension. The rest of the parameters are given more commonly used defaults.

7.2 Regressor: Use sine to predict cosine

Here’s a very easy to understand example. Instead of messing around with complicated data sets, we also use a simple custom data set: sin function as data, cosine function as label. Since the focus is on learning the use of RNN, we do not need to test the set, just look at the fitting degree of training, and judge whether the convergence is successful.

Firstly, we define the STRUCTURE of RNN, and then interpret the details:

class RNN(nn.Module) :
    def __init__(self):
        super(RNN, self).__init__()
        self.rnn = nn.RNN(
            input_size=1,
            hidden_size=32,
            batch_first=True,
        )
        self.out = nn.Linear(32.1)


    def forward(self, x, h_state):
        r_out, h_state = self.rnn(x, h_state)
        outs = []
        for time_step in range(r_out.size(1)):
            outs.append(self.out(r_out[:, time_step, :]))
        return torch.stack(outs, dim=1), h_state
Copy the code

Input_size hidden_size batch_first

Input_size we set it to 1 because there’s only one point at a time, and the data is one-dimensional;

Hidden_size is set to 32, indicating that we want to map this data to a 32-dimensional hidden space. This value is optional, but not too small or too large (too small will lead to poor fitting ability, too large will lead to too much computing resource consumption).

Batch_first is set to True, indicating that the first dimension in our data format is Batch.

Finally, according to the previous parameter introduction, we built a single-layer RNN network, and each input time_step data is one-dimensional. By mapping it to a 32-dimensional hidden space, the fitting relationship of label data can be explored.

Now let’s look at the contents of the forward function, which is a little different from CNN’s forward. In CNN, we can directly splice the corresponding network structure together, which has some strange parameters. Why is that?

We know that each time_step loop combines the hidden state of the previous loop with the current input. Then r_out and h_state are the output and hidden states of the current state.

The second line of outs is an empty list. What does it store? Let’s go down.

This is followed by a for loop, which depends on r_out.size(1). What does this parameter mean? R_out is known as output, and the output format should be the same as the input format (batch, time_step, hidden_size), so r_out.size(1) represents the time_step size of this batch of data, that is, how many points there are in this batch of data. Apply self.out() to the corresponding data, that is, remap the 32-dimensional data to 1-dimensional data, and append the result to outs.

Here we know what the outs list is used to hold, and stack the result as the return value of the forward.

Here is a look at the fitting situation in the training process:

The blue line shows how the model fits, and you can see that it gradually fits into the cos curve of the target (the red line). \

More details in this section, including a brief introduction to the principles of RNN and the corresponding network operation details, can be found in article 7. In addition, this article also provides a further example of using LSTM to classify mnIST datasets, which helps us learn the use of LSTM in PyTorch. It is highly recommended to read this article: pyTorch Learning Notes (7) RNN and LSTM implement classification and regression.

How to Progress

The introduction is not an unimportant step, but more to express my thoughts on why the learning process is set up this way. From the previous articles, you’ve certainly got a basic introduction to pyTorch, at least without feeling overwhelmed when trying to implement a network. But if we set out from the framework of learning well, this is certainly far from enough.

So how can we learn from this very popular framework? First of all, we should set up the thinking of the whole process according to the ideas of the previous article, and know how to start. Maybe god will tell you how to accelerate the bottom layer, how to optimize details, how to load data in parallel and so on. But if we first learn so detailed, may still be in the clouds, do not know that kind of detailed article is saying what.

So here we give an example of parallel loading of data to show that once we have a holistic understanding of PyTorch, we can easily add what we need to learn.

Here is an example of how to increase the speed of data loading. You can use pyTorch’s built-in DataLoader class to customize your own data loading type so that your data generation is not a bottleneck in your training. The details can be seen: an example shows how data should be generated in parallel in PyTorch.

It’s not about the article, it’s about teaching him how to fish. According to this kind of method, people can further optimize their knowledge system, supplement the improvement of details. After completing this series of tutorials, you can easily move on to other advanced articles that optimize pyTorch skills.

conclusion

This is an introduction to PyTorch, not only learning about the framework, but also learning about other frameworks, programming languages, etc.

Recommended reading:

Read common cache problems in high concurrency scenarios \

Using Django to develop DApp\ based on Ethereum smart contract

Let’s read Python tasks like celery\

5 minutes on chain calls in Python

Create a Bitcoin price alert application in Python

▼ clickBecome a community member and click on itIn the see