GitHub, heart of the Machine.

This tutorial shows you how to go from understanding tensors to training simple neural networks using PyTorch. It is a very basic resource for getting started with PyTorch. PyTorch is built on Python and the Torch library and provides an abstract numpy-like way to represent quantities (or multi-dimensional arrays), as well as the ability to leverage the GPU to improve performance. The code for this tutorial is incomplete; see the original Jupyter Notebook documentation for details.
PyTorch makes it easy to get started with deep learning, even if your background doesn’t have much knowledge. At the very least, it is helpful to know that a multilayer neural network model can be viewed as a node graph connected by weights, and that you can estimate weights from the data using optimization processes (such as gradient calculation) based on forward and back propagation.

  • What to Know: This tutorial assumes familiarity with Python and NumPy.
  • Prerequisite: You will need to install PyTorch before running the original Jupyter Notebook. The original notebook has code cells to verify that you are ready.
  • Required hardware: You’ll need an NVIDIA GPU and CUDA SDK installed. It is reported that this could achieve a 10-100 acceleration. Of course, if you don’t set this up, you can still run PyTorch using CPU only. But remember, life is short when it comes to training neural network models! So use the GPU whenever possible!
The address of the project: HTTPS://github.com/hpcgarage/accelerated_dl_pytorch



1. The necessary PyTorch background

  • PyTorch is a Python package built on top of the Torch library to speed up deep learning applications.
  • PyTorch provides a numpy-like abstract way to represent tensors (or multidimensional arrays) that utilizes the GPU to speed up training.

1.1 PyTorch tensor

The key data structure of PyTorch is tensors, or multidimensional arrays. It works like NumPy’s DARray object, so we can use torch.tensor () to create a Tensor.

# generate 2-dPytorch_tensor = torch.tensor (10,20) print (" type: ", type (pytorch_tensor), "and size:", pytorch_tensor. Shape)Copy the code
If you need a Numpy-compatible representation, or if you want to create a PyTorch tensor from an existing NumPy object, it’s easy.

Numpy_tensor = Pytorch_tensor.numpy ()print("type: ",type(Numpy_tensor), "and size:", numpy_tensor. Shape)print("type: ",type(Torch.Tensor (numpy_tensor)), "and size:", torch.Tensor (numpy_tensor).shape)Copy the code

1.2 PyTorch with NumPy

PyTorch is not a simple replacement for NumPy, but it does a lot of NumPy functionality. One of the inconveniences is the naming conventions, which are sometimes quite different from those used by NumPy. Let’s take a few examples to illustrate the differences:

1 tensor created

T = torch. Rand (2,4,3,5)Copy the code
2 tensor segmentation

T = torch. Rand (2,4,3,5) one = t.numpy () pytorch_slice = T [0, 1:3, :, 4] numpy_slice = A [0, 1:3, :, 4]'tensor [0, 1:3, : 1,4] : \ N', pytorch_slice) print ('NdArray [0, 1:3, : 1,4] : \ N', numpy_slice) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- tensor [0, 1, 3,,, 4]. [torch.FloatTensor size 2 ×3] NdArray [0, 1:3, : 4] : [[0.20322084 0.15935552 0.31143939] [0.90726137 0.64966112 0.28259504]]Copy the code
Three tensor masking

T = t-0.5a = T. numpy () pytorch_masked = t [t> 0] numpy_masked = a [a> 0]Copy the code
Four tensor remodeling

Pytorch_0 = t.view ([6,5,4]) 0Copy the code

1.3 PyTorch variable

  • Simple encapsulation of the PyTorch tensor
  • Help to build the computational graph
  • An essential part of Autograd (Automatic differentiation library)
  • Store the gradients with respect to these variables in.grad

Structure:

Computation graphs and variables: In PyTorch, the neural network is represented by interconnecting variables as computation graphs. PyTorch allows you to build network models by constructing computational diagrams in code; PyTorch then simplifies the process of estimating model weights, for example by automatically calculating gradients.

For example, suppose that we want to build a two-tier model, the inputs and outputs first create tensor variables. We can wrap PyTorch Tensor into a variable object:

N. Functional as ˚F X = variable (torch. Randn (4,1), requires_grad = false) Y = variable (torch. Randn (3,1), Requires_grad = false)Copy the code
We set requires_grad to True, indicating that we want to calculate the gradient automatically, which will be used in back propagation to optimize the weights.

Now let’s define the weights:

W2 = variable (torch. Randn (3,5), requires_grad = true)Copy the code
Training model:

Def model_forward (x) :returnF.sigmoid ([email protected] (w1@x))print(w1)print(w1. Data. Shape)print(w1. Grad) # first, there is no -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- variable contains: 1.6068-1.3304-0.6717-0.6097-0.3414-0.5062-0.2533 1.0260-0.0341-1.2144-1.5983-0.1392-0.5473 0.0084 0.4054 [torch.Float sensor Size 5 X4] torch.Size ([5, 4]) noneCopy the code

1.4 PyTorch backpropagation

So we have the inputs and the targets, the model weights, so it’s time to train the model. We need three components:

Loss function: describe how far our model’s prediction is from the target;

Import torch. Nn for nn condition = nn.mseloss ()Copy the code
Optimization algorithm: used to update the weight;

Torch. Optim as Optim Optimizer = optim.sgd ([w1, w2], lr = 0.001)Copy the code
Backpropagation steps:

For calendar elements in range (10) : Loss = standard (model_forward (x), Y) optimizer.zero_grad () # zero out previous gradients loss. Backward () # Calculate new gradients Optimizer.step () # apply these gradients to print (w1) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- variables include: 1.6067-1.3303-0.6717-0.6095-0.3414-0.5062-0.2533 1.0259-0.0340-1.2145-1.5983-0.1396-0.5476 0.0085 0.4055 0.0976 0.3597 0.5986-0.0324 0.6113 [torch. Size 5 x4 splash sensor]Copy the code

1.5 PyTorch CUDA interface

One of PyTorch’s advantages is that it provides CUDA interfaces for tensor and Autograd libraries. Using a CUDA GPU, you can speed up not only neural network training and inference, but also any workload mapped to the PyTorch tensor.

You can check to see if cudas are available in PyTorch by calling torch.cuda.is_Available ().

Cuda_gpu = torch.cuda.is_available ()if(cuda_gpu) :print(" Great, you have a GPU! )else:print(" Life is short-consider a GPU!" )Copy the code
Good, now you have a GPU.

Cuda ()

After that, using CUDA to accelerate code is as simple as calling it. If you call.cuda () on a tensor, it will perform data migration from the CPU to the CUDA GPU. If you call.cuda () on the model, it not only moves all internal storage to the GPU, it maps the entire computation graph to the GPU.

To copy a tensor or model back to the CPU, for example to interact with NumPy, you can call.cpu ().

Cuda_gpu: x = x.cuda ()print(type(x.da)) x = x.pu ()print(type(x.d ata)) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- < class'the torch. Cuda. FloatTensor '> 
< class 'the torch. FloatTensor '>

Copy the code
Let’s define two functions (training function and test function) to perform training and inference tasks using our model. This code is also from the official PyTorch tutorial, where we have picked out all the necessary steps for training/inference.

For training and testing the network, we need to perform a series of actions that map directly to the PyTorch code:

1. We transform the model into training/inference mode;

2. We train the model iteratively by acquiring images in batches on the dataset;

3. For each batch of images, we need to load data and annotations and run the forward step of the network to obtain model output;

4. We define loss function to calculate the loss between model output and target of each batch;

5. During the training, we initialized the gradient to zero, and used the optimizer and back propagation defined in the previous step to calculate all loss-related gradient levels;

6. During training, we perform the weight update step.

Def train (Model, epoch, Criterion, Optimizer, data_loader) : model.train ()forBatch_idx, (data, target)inEnumerate (data_loader) :ifCuda_gpu: Data, target = data.cuda (), target.cuda () model.cuda () data, target = variable (data), variable (target) output = model (data) Optimizer.zero_grad () loss = standard (output, Objective) loss. Backward () optimizer.step ()if(batCH_idx + 1) % 400 == 0:print('" Train "Epoch: {} {} / {} ({: 0 f} %)] \ tLoss: {: 6 f}'.format (epoch, (batch_IDx + 1) * len (data), len (data_loader. Dataset), 100 * (batch_IDx + 1)/len (data_loader), Loss. Data [0]) DEF test (model, data_loader, standard, data_loader) : model.eval () test_loss = 0ifCuda_gpu: Data, target = data.cuda (), target.cuda () model.cuda () data, target = variable (data), Output = model (data) test_loss + = criterion (output, Target).data [0] pred = output.data.max (1) [1] # get the maximum logarithmic probability index + = pred.eq (target data).cpu (). Acc = correct/len (data_loader.dataset)print('\ nTest set: Average Loss: {:. {} / {} ({:.0f} %) \ n'.format (test_loss, correct, len (data_loader. Dataset), 100. * acc))return(ACC, test_loss)Copy the code
Now that we’re done, let’s get started on our data science tour!



2. Use PyTorch for data analysis

  • Build the model using the torch. Nn library
  • Use the torch. Autograd library to train the model
  • Encapsulate the data into the torch. Utils.data. Dataset library
  • Use the NumPy interface to connect your models, data, and your favorite tools
Before we look at the complex model, let’s look at a simpler one: linear regression on a simple composite dataset, which we can generate using the Sklearn tool.

Import make_regression from sklearn.datasets import seaborn as SNS import panda as PD import matplotlib.pyplot as PLT sns.set () x_train, y_train, W_target = make_regression (N_SAMPLES times = 100, n_features = 1, noise = 10, COEF = true) DF = pd.dataframe (data = {'X': x_train. Ravel (),'Y': y_train.ravel ()}) sns.lmplot (x ='X', y ='Y', data = df, Fit_reg = True) plt.show () 1) x_torch = torch.FloatTensor (x_train) y_torch = torch Y_torch. View (y_torch. Size () [0], 1)Copy the code

PyTorch’s NN library has a number of useful modules, one of which is the linear module. As the name suggests, it performs a linear transformation, or linear regression, on the input.

Class LinearRegression (torch.nn.module) : def __init__ (self, input_size, output_size) : Super (LinearRegression, self).__ init __ () self.linear = torch. Nn.Linear (input_size, output_size) def forward (self, x) : Return self.linear (x) = linear regression (1, 1)Copy the code
To train linear regression, we need to add appropriate loss functions from the NN library. For linear regression, we will use the MSELoss () -mean square error loss function.

We also need to use the optimization function (SGD) and run back propagation similar to the previous example. Essentially, we repeat the steps in the train () function defined above. The reason this function cannot be used directly is that we implement it for classification rather than regression, and that we use cross entropy losses and indexes of maximum elements as model predictions. For linear regression, we use the output of the linear layer as a prediction.

Criteria = torch.nn.mseloss () optimization = torch.optim.sgd (model.parameters (), LR = 0.1) for calendar elements in range (50) : Data, target = variable (x_torch), variable (y_torch) output = model (data) optimizer.zero_grad () loss = criterion (output, Target) loss. Backward () optimizer.step () predict = model (Variable (x_torch)). Data. Numpy ()Copy the code
Now we can print out the raw data and a linear regression suitable for PyTorch.

Plt.plot (x_train, y_train,'o'The label ='Raw data'Plot (x_train, predicted, label ='Fitted line') plt.legend () plt.show ()Copy the code

To move on to a more complex model, we downloaded the MNIST dataset into the “Dataset” folder and tested some of the initial preprocessing available in PyTorch. PyTorch has a data loader and processor that can be used for different data sets. Once downloaded, you can use it at any time. You can also create your own data loader class by wrapping data into the PyTorch tensor.

Batch size is a machine learning term referring to the number of training samples used in an iteration. Batch sizes can be one of three:

  • Batch mode: The batch size is equal to the entire dataset, so the iteration and epoch values are consistent;
  • Mini-batch mode: The batch size is greater than 1 but smaller than the size of the entire data set. In general, quantities can be values that are divisible by the entire data set.
  • Random mode: Batch size equals 1. Therefore gradient and neural network parameters are updated after each sample.
From torchvision import data sets, transform batch_num_size = 64 train_loader = torch. The utils. Data. The DataLoader (datasets. MNIST ('data', train = True, download = True, transform = transforms.Compose ([transforms.ToTensor ())), Normalize ((0.1307,), (0.3081,))]), batch_size = batch_num_size, Shuffle test_loader = = True) torch. Utils. Data. The DataLoader (datasets. MNIST ('data'Transforms = transforms Write ([transforms.totensor (), transforms.normalize ((0.1307,), (0.3081,)))), batch_size = batch_num_size, shuffle = True)Copy the code

3. LeNet Convolutional Neural Network in PyTorch (CNN)

Now let’s create our first simple neural network from scratch. The network performs image classification and recognizes handwritten numbers in MNIST datasets. This is a four-layer convolutional neural network (CNN), a common architecture for analyzing MNIST datasets. This code from PyTorch official tutorial, you can find more examples here (http://pytorch.org/tutorials/).

We will use several modules from the torch. Nn library:

1. Linear layer: the weight of layer is used to perform linear transformation on the input tensor;

Conv1 and Conv2: convolution layer, each layer outputs the dot product between the convolution kernel (weight tensor of small size) and the input region of the same size;

3. Relu: Modified linear unit function, using element-by-element activation function Max (0, x);

Pooling layer: perform downsampling (typically 2×2 pixels) of a specific area using a Max operation;

5. Dropout2D: Randomly sets all channels of the input tensor to zero. When feature graphs are strongly correlated, dropout2D improves the independence of feature graphs.

6. Softmax: Apply the Log (Softmax (x)) function to the n-dimensional input tensor so that the output is between 0 and 1.

Class LeNet (nn.Module) : DEF __init__ (individual) : Conv2 = nn.Conv2d (1, 10, kernel_size = 5) self. Conv2 = nn. Kernel_size = 5) self.conv2_drop = nn.dropout2d () self.fc1 = nn.linear (320, 50) self.fc2 = nn.linear (50, 50) 10) DEF forward (individual, X) : X = f.relu (f.ax_pool2d (self.conv1 (X) 2)) X = f.ralu (f.ax_pool2d (self.conv2_drop (self.conv2 (X)), 2)) X = x.view (-1, 320) X = f.lelu (self.fc1 (X)) X = f.dropout (X, training = self.training) X = self.fc2 (X)returnF. Log_softmax (x, dim = 1)Copy the code
After creating the LeNet class, create the object and move it to the GPU:

Model = LeNet ()ifCuda_gpu: model.cuda ()print('MNIST_net model: \ n')print(the model) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- MNIST_net model: LeNet ((CONV1) : Conv2d (1,10, kernel_size = (5,5), span = (1,1)) (CONV2) : Conv2d (10,20, kernel_size = (5,5), stride = (1,1)) (conv2_drop) : Dropout2d (p =0.5) (fc1) : linear (in_features = 320, out_features = 50, bias = true) (fc2) : linear (in_features = 50, out_features = 10, bias = true)Copy the code
To train the model, we need to use SGD of drive with a learning rate of 0.01 and momentum of 0.5.

Criteria = nn.CrossentRopyLoss () optimizer = optim.sgd (model.parameters (), LR = 0.005, momentum = 0.9)Copy the code
With just 5 epochs (one epoch means you use the entire training data set to update the weights of the training model), we can train a fairly accurate LeNet model. This code check determines whether the file has been pre-trained. The model is loaded; If no, train one and save it to disk.

import os epochs

 = 5 
if(OS. Path. Isfile ('pretrained / MNIST_net.t7')) :print('Loading model'Model.load_state_dict (torch. Load ('pretrained / MNIST_net.t7', map_location = lambda storage, LOC: storage)) ACC, loss = test (model, 1, standard, test_loader) otherwise: print ('Training mode') for calendar in the range (1, calendar + 1 :) system (model, calendar, standard, optimization, train_loader) acc, loss =test(model, 1, criterion, test_loader) Torch. Save (model.state_dict (),'pretrained / MNIST_net.t7') -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- loading model group: the average loss: 0.0471, accuracy: Paper No. 9859/10000 (99%)Copy the code
Now let’s look at the model. First, print out the information for the model. The print function displays all layers (such as Dropout implemented as a single layer) and their names and parameters. There is also an iterator that runs between all named modules in the model. This helps when you have a complex with multiple “internal” models. Iteration across all named modules allows us to create a model parser that reads model parameters and creates modules similar to the network.

print(' 'Internal models.)forIndependence idx, minEnumerate (model.named_modules ()) :print(independence idx,'- >', m)print('-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- ") # output: internal model: 0 - > ('', LeNet ((CONV1) : Conv2d (1,10, kernel_size = (5,5), span = (1,1)) (CONV2) : Conv2d (10,20, kernel_size = (5,5), span = (1,1)) (conv2_drop) : Dropout2d (p = 0.5) Linear (IN_features = 320, OUT_features = 50, bias = true) (FC2) : Linear (IN_features = 50, out_features = 10, Bias = True))) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - > 1 ('CONV1', Conv2d (1, 10, kernel_size = (5,5), span = (1, 1))) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - > 2 ('CONV2', Conv2d (10,20, kernel_size = (5,5), span = (1, 1))) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - > 3 ('conv2_drop', Dropout2d (p = 0.5)) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - > 4 ('fc1', Linear (in_features = 320, out_features = 50, Bias = True)) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - > 5 ('fc2', Linear (in_features = 50, out_features = 10, Bias = True)) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --Copy the code
You can use the.cpu () method to move the tensor to the CPU (or make sure it’s there). Or, when the GPU is available (Torch. Cuda. Available), use the.cuda () method to move the tensor to the GPU. You can see if the tensor is on the GPU and is of type torch.cuda.FloatTensor. If the tensor is on the CPU, it is of type torch.FloatTensor.

print(type(t.pu (). The data)ifThe torch. Cuda. Is_available () :print(" Cuda is available ")print(type(t.cauda (). The data)else:print(" Cuda is unavailable ") -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- < class'the torch. FloatTensor '> Cuda is available in the < class'the torch. Cuda. FloatTensor '>

Copy the code
If the tensor is on the CPU, we can convert it to a NumPy array that shares the same memory location, changing one will change the other.

If torch.cuda.is_available () : Try: print (t.data.numpy ()), except RuntimeError is e: "You can't convert GPU tensors to NUMpy ND arrays, you have to copy your weight tendor to the CPU and get the NUMpy array"print(type(t.pu (). Data.numpy ()))print(t.pu (). Data. Numpy (). Shape)print(t.pu (). The data). Numpy's ())Copy the code
Now that we know how to convert tensors into NumPy arrays, we can use that knowledge to visualize using Matplotlib! Let’s print out the convolution filter for the first convolution layer.

Data = model.conv1.weight.cpu (). Data. Numpy ()print(data. Shape)print(Axes = axes, axes = axes, axes = axes, axes = axes, axes = axes, Figsize = (2 * kernel_num, 2)) for the mountain pass in the range (kernel_num) : axis [COL]. Imshow (data [COL, 0, : :], CMAP = plt.cm.gray) PLT. According to ()Copy the code

These are the brief tutorial resources, but more content and experiments can be found in the original project.