This is the seventh day of my participation in the First Challenge 2022

Today let’s talk about some of the tricks that PyTorch uses that you may or may not have used. If any of these tips are useful for using or learning Pytorch, please give me a thumbs up.

Creating the tensor directly specifies the device it runs on

It takes less time to specify devices when you create a tensor than it takes to create a tensor and then change the devices it runs on.

start_time = time.time()

for _ in range(100):
  cpu_tensor = torch.ones((1000.64.64))
  gpu_tensor = cpu_tensor.cuda()

print("total time:{:.3f}".format(time.time() - start_time))
Copy the code
Total time: 10.020Copy the code
start_time = time.time()
for _ in range(100):
  cpu_tensor = torch.ones((1000.64.64),device='cuda')
print("total time:{:.3f}".format(time.time() - start_time))
Copy the code
Total time: 0.015Copy the code
class ExampleModel(nn.Module) :
  def __init__(self) :
    super().__init__()

    input_size = 2
    output_size = 3
    hidden_size = 16

    self.input_layer = nn.Linear(input_size,hidden_size)
    self.input_activation = nn.ReLU()

    self.mid_layer = nn.Linear(hidden_size,hidden_size)
    self.mid_activation = nn.ReLU()

    self.output_size = nn.Linear(hidden_size,output_size)

  def forward(self,x) :
    z = self.input_layer(x)
    z = self.input_activation(z)

    z = self.mid_layer(z)
    z = self.mid_activation(z)

    out = self.output_size(z)

    return out
Copy the code

It is relatively simple to define a network. The network is a simple network structure composed of input layer, hidden layer and output layer. The activation function adopts ReLU

example_model = ExampleModel()
print(example_model)
print('Output: shape {}'.format(example_model(torch.ones([100.2])).shape))
Copy the code
ExampleModel( 
    (input_layer): Linear(in_features=2, out_features=16, bias=True) 
    (input_activation): ReLU() 
    (mid_layer): Linear(in_features=16, out_features=16, bias=True) (mid_activation): ReLU()
    (output_size): Linear(in_features=16, out_features=3, bias=True) 
) 
Output: shape torch.Size([100, 3])
Copy the code

While this works, it is not the better way. A better way is to use nn.Sequential to group these layers together to make the code look cleaner.

class ExampleSequential(nn.Module) :
  def __init__(self) :
    super().__init__()

    input_size = 2
    output_size = 3
    hidden_size = 16

    self.layers = nn.Sequential(
      nn.Linear(input_size,hidden_size),
      nn.ReLU(),
      nn.Linear(hidden_size,hidden_size),
      nn.ReLU(),
      nn.Linear(hidden_size,output_size)
    )

  def forward(self,x) :
    out = self.layers(x)
    return out
Copy the code
example_sequential = ExampleSequential()
print(example_sequential)
print('Output: shape {}'.format(example_sequential(torch.ones([100.2])).shape))
Copy the code

We only need to call the Nn. Sequential layer when we are forward, and there is no need to call multiple layers up. Moreover, when we enter the network structure, we can also see that the Sequential layer is contained inside the Sequential layer.

ExampleSequential( 
(layers): Sequential( 
    (0): Linear(in_features=2, out_features=16, bias=True) 
    (1): ReLU() 
    (2): Linear(in_features=16, out_features=16, bias=True) 
    (3): ReLU() 
    (4): Linear(in_features=16, out_features=3, bias=True)
    ) 
) 
Output: shape torch.Size([100.3])
Copy the code

Using a List as a Layer container brings potential pitfalls

class ExampleListModel(nn.Module) :
  def __init__(self) :
    super().__init__()
    input_size = 2
    output_size = 3
    hidden_size = 16

    self.input_layer = nn.Linear(input_size,hidden_size)
    self.input_activation = nn.ReLU()

    self.mid_layers = []
    for _ in range(5) : self.mid_layers.append(nn.Linear(hidden_size,hidden_size)) self.mid_layers.append(nn.ReLU()) self.output_layer = nn.Linear(hidden_size,output_size)def forward(self,x) :
    z = self.input_layer(x)
    z = self.input_activation(z)

    for layer in self.mid_layers:
      z = layer(z)
    
    out = self.output_layer(z)

    return out
Copy the code

This code is not very different from the above example, but in the hidden layer, there are five hidden layers stacked, because the hidden layers have the same structure, put them into a List and then call them by iterating through the List when forward. This looks fine, and it works.

example_list_model = ExampleListModel()
print(example_list_model)
print('Output: shape {}'.format(example_list_model(torch.ones([100.2])).shape))
Copy the code

But if we look at the output, there seems to be a missing 5 Linear just put in the List. The Model cannot manage and track the layers that are added to the List because they are not registered with the Model. This causes the following problem: When the Model changes the device on which it is running, the changes cannot be applied to these layers.


ExampleListModel( 
(input_layer): Linear(in_features=2, out_features=16, bias=True) 
(input_activation): ReLU() 
(output_layer): Linear(in_features=16, out_features=3, bias=True) ) Output: shape torch.Size([100.3])
Copy the code

But we want to run the model migration on the GPU and the problem is, even though we’re creating the input tensor specified device, and we’re going to migrate the model to the CUDA device by calling the CUDA function of the model instance, Because the hidden layers put into the List are not registered with the Model, they remain running on the CPU device and will report errors.

Gpu_input = torch. Ones ([100,2],device='cuda') gpu_example_list_model = example_list_model.cuda() print("Output: shape {}".format(gpu_example_list_model(gpu_input).shape))Copy the code
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
Copy the code

The problem should be solved simply by making the following changes to the code above, and introducing nn.sequential

class ExampleListModel(nn.Module) :
  def __init__(self) :
    super().__init__()
    input_size = 2
    output_size = 3
    hidden_size = 16

    self.input_layer = nn.Linear(input_size,hidden_size)
    self.input_activation = nn.ReLU()

    self.mid_layers = []
    for _ in range(5) : self.mid_layers.append(nn.Linear(hidden_size,hidden_size)) self.mid_layers.append(nn.ReLU()) self.mid_layers = nn.Sequential(*self.mid_layers) self.output_layer = nn.Linear(hidden_size,output_size)def forward(self,x) :
    z = self.input_layer(x)
    z = self.input_activation(z)
    z = self.mid_layers(z)
    # for layer in self.mid_layers:
    # z = layer(z)
    
    out = self.output_layer(z)

    return out
Copy the code
example_list_model = ExampleListModel()
print(example_list_model)
print('Output: shape {}'.format(example_list_model(torch.ones([100.2])).shape))
Copy the code
ExampleListModel(
  (input_layer): Linear(in_features=2, out_features=16, bias=True)
  (input_activation): ReLU()
  (mid_layers): Sequential(
    (0): Linear(in_features=16, out_features=16, bias=True)
    (1): ReLU()
    (2): Linear(in_features=16, out_features=16, bias=True)
    (3): ReLU()
    (4): Linear(in_features=16, out_features=16, bias=True)
    (5): ReLU()
    (6): Linear(in_features=16, out_features=16, bias=True)
    (7): ReLU()
    (8): Linear(in_features=16, out_features=16, bias=True)
    (9): ReLU()
  )
  (output_layer): Linear(in_features=16, out_features=3, bias=True)
)
Output: shape torch.Size([100.3])
Copy the code
gpu_input = torch.ones([100.2],device='cuda')
gpu_example_list_model = example_list_model.cuda()
print("Output: shape {}".format(gpu_example_list_model(gpu_input).shape))
Copy the code
Output: shape torch.Size([100, 3])
Copy the code

Useful angular module

The package of pressing provides some random distribution functions, which are rarely used in this package. Here we share them with you in case you are familiar with them when you encounter them, and they can be used as a backup tool, so you can know how to use the tools provided in this package when you need them.

example_model = ExampleModel()
input_tensor = torch.rand(3.2)
output = example_model(input_tensor)
Copy the code

A sampling of action that can be used for reinforcement learning

How to use Detach and Item

example_model = ExampleModel()
data_batches = [torch.rand((10.2)) for  _ in range(5)]
criterion = nn.MSELoss(reduce='mean')
Copy the code
losses = []
for batch in data_batches:
  output = example_model(batch)

  target = torch.rand((10.3))
  loss = criterion(output,target)
  losses.append(loss)

print(losses)
Copy the code

The output losses

[tensor(0.2293, grad_fn=<MseLossBackward0>), tensor(0.3611, grad_fn=<MseLossBackward0>), tensor(0.2629, grad_fn=<MseLossBackward0>), tensor(0.3523, grad_fn=<MseLossBackward0>), tensor(0.3340, grad_fn=<MseLossBackward0>)]
Copy the code
losses = []
for batch in data_batches:
  output = example_model(batch)

  target = torch.rand((10.3))
  loss = criterion(output,target)
  losses.append(loss.item())

print(losses)
Copy the code
[0.28973084688186646.0.28803274035453796.0.22044895589351654.0.18474963307380676.0.19146400690078735]
Copy the code

How do I remove the model from the GPU

import gc 
example_model = ExampleModel().cuda()
del example_model

gc.collect()
torch.cuda.empty_cache()
Copy the code

You need to call eval() before you validate

example_model = ExampleModel()

# Do training

example_model.eval(a)# Do testing
Copy the code