7 useful PyTorch tricks

Original: www.reddit.com/r/MachineLe…

A_few_helpful_Pytorch_tips_examples_included

Translation by KBSC13

Contact information:

Github:github.com/ccc013/AI_a…

Wechat official Account: AI algorithm notes

preface

This is the machine Learning section of Reddit, and someone has summarized about 7 useful PyTorch tips, along with colab’s code examples and videos. The code and video links are as follows:

Code: colab.research.google.com/drive/15vGz…

Video: youtu. Be/BoC8SGaT3GE

The video has also been uploaded to my B website with the following link:

www.bilibili.com/video/BV1YK…

In addition, the code and video can be obtained by replying “12” in the background of the public account.

1. Create Tensors directly on the target equipment

The first technique is to create a tensor directly on the target device using the device parameter.

The first is to create tensors on the CPU and then move them to the GPU using.cuda() as follows:

start_time = time.time()

for _ in range(100) :# Creating on the CPU, then transfering to the GPU
  cpu_tensor = torch.ones((1000.64.64))
  gpu_tensor = cpu_tensor.cuda()

print('Total time: {:.3f}s'.format(time.time() - start_time))
Copy the code

The second is to create the tensors directly on the target device as follows:

start_time = time.time()

for _ in range(100) :# Creating on GPU directly
  cpu_tensor = torch.ones((1000.64.64), device='cuda')

print('Total time: {:.3f}s'.format(time.time() - start_time))
Copy the code

The running times of the two methods are as follows:

You can see that the speed of creating Tensors directly on the target device is very fast;

2. Use it whenever possible`Sequential` 层

The second trick is to use Sequential layers to make the code look more concise.

The code for the first network model is as follows:

class ExampleModel(nn.Module) :
  def __init__(self) :
    super().__init__()

    input_size = 2
    output_size = 3
    hidden_size = 16

    self.input_layer = nn.Linear(input_size, hidden_size)
    self.input_activation = nn.ReLU()

    self.mid_layer = nn.Linear(hidden_size, hidden_size)
    self.mid_activation = nn.ReLU()

    self.output_layer = nn.Linear(hidden_size, output_size)

  def forward(self, x) :
    z = self.input_layer(x)
    z = self.input_activation(z)
    
    z = self.mid_layer(z)
    z = self.mid_activation(z)
    
    out = self.output_layer(z)

    return out
Copy the code

Its operating effect is as follows:

The writing method of using Sequential to build the network model is as follows:

class ExampleSequentialModel(nn.Module) :
  def __init__(self) :
    super().__init__()

    input_size = 2
    output_size = 3
    hidden_size = 16

    self.layers = nn.Sequential(
      nn.Linear(input_size, hidden_size),
      nn.ReLU(),
      nn.Linear(hidden_size, hidden_size),
      nn.ReLU(),
      nn.Linear(hidden_size, output_size))

  def forward(self, x) :
    out = self.layers(x)
    return out
Copy the code

Its operating effect is as follows:

You can see that the code that builds the network model in nn.sequential is much more concise.

3. Do not use lists to store the network layer

The third tip is that it is not recommended to use lists to hold created network layers because the nn.Module class cannot register them successfully. Instead, the list should be passed into nn.sequential.

The first is to show an example of a mistake:

class BadListModel(nn.Module) :
  def __init__(self) :
    super().__init__()

    input_size = 2
    output_size = 3
    hidden_size = 16

    self.input_layer = nn.Linear(input_size, hidden_size)
    self.input_activation = nn.ReLU()

    # Fairly common when using residual layers
    self.mid_layers = []
    for _ in range(5) : self.mid_layers.append(nn.Linear(hidden_size, hidden_size)) self.mid_layers.append(nn.ReLU()) self.output_layer = nn.Linear(hidden_size, output_size)def forward(self, x) :
    z = self.input_layer(x)
    z = self.input_activation(z)
    
    for layer in self.mid_layers:
      z = layer(z)
    
    out = self.output_layer(z)

    return out
  
bad_list_model = BadListModel()
print('Output shape:', bad_list_model(torch.ones([100.2])).shape)
gpu_input = torch.ones([100.2], device='cuda')
gpu_bad_list_model = bad_list_model.cuda()
print('Output shape:', bad_list_model(gpu_input).shape)
Copy the code

When you print the second sentence, you will find an error:

Correct way to write:

class CorrectListModel(nn.Module) :
  def __init__(self) :
    super().__init__()

    input_size = 2
    output_size = 3
    hidden_size = 16

    self.input_layer = nn.Linear(input_size, hidden_size)
    self.input_activation = nn.ReLU()

    # Fairly common when using residual layers
    self.mid_layers = []
    for _ in range(5) : self.mid_layers.append(nn.Linear(hidden_size, hidden_size)) self.mid_layers.append(nn.ReLU()) self.mid_layers = nn.Sequential(*self.mid_layers) self.output_layer = nn.Linear(hidden_size, output_size)def forward(self, x) :
    z = self.input_layer(x)
    z = self.input_activation(z)
    z = self.mid_layers(z)
    out = self.output_layer(z)

    return out

correct_list_model = CorrectListModel()
gpu_input = torch.ones([100.2], device='cuda')
gpu_correct_list_model = correct_list_model.cuda()
print('Output shape:', correct_list_model(gpu_input).shape)
Copy the code

The printed result:

4. Use it well`distributions`

The fourth technique is the torch for PyTorch. There are some nice objects and methods to implement distribution in the repository, but they are not very well used.

Pytorch.org/docs/stable…

Here’s an example of how to use it:

5. Use it for long-term indicators`detach`

The fifth trick is to use.detach() to prevent memory leaks if you need to store tensor metrics between each epoch.

Let’s use a code example to illustrate this, starting with the initial configuration:

# Setup
example_model = ExampleModel()
data_batches = [torch.rand((10.2)) for _ in range(5)]
criterion = nn.MSELoss(reduce='mean')
Copy the code

Examples of incorrect code:

losses = []

# Training loop
for batch in data_batches:
  output = example_model(batch)

  target = torch.rand((10.3))
  loss = criterion(output, target)
  losses.append(loss)

  # Optimization happens here

print(losses)
Copy the code

The print result is as follows:

The correct way to write it

losses = []

# Training loop
for batch in data_batches:
  output = example_model(batch)

  target = torch.rand((10.3))
  loss = criterion(output, target)
  losses.append(loss.item()) # Or `loss.item()`

  # Optimization happens here

print(losses)
Copy the code

The print result is as follows:

The loss.item() method should be called here to hold the loss value in each epoch.

6. Tips for deleting models on the GPU

The sixth tip is that you can clean up the GPU cache using the torch.cuda.empty_cache() method. This method is useful when using notebook, especially if you want to delete and recreate a large model.

The following is an example:

import gc

example_model = ExampleModel().cuda()

del example_model

gc.collect()
# The model will normally stay on the cache until something takes it's place
torch.cuda.empty_cache()
Copy the code

7. Call before test`eval()`

Finally, don’t forget to call model.eval() before you start testing. This is simple but easy to forget. This operation will necessitates some changes to the network layer that are set up differently during the training and validation phases. The modules that will be affected include:

Dropout
Batch Normalization
RNNs
Lazy Variants

This can be reference: stackoverflow.com/questions/6…

The following is an example:

example_model = ExampleModel()

# Do training

example_model.eval(a)# Do testing

example_model.train()

# Do training again
Copy the code

preface

1. Create Tensors directly on the target equipment

2. Use it whenever possibleSequential 层

3. Do not use lists to store the network layer

4. Use it welldistributions

5. Use it for long-term indicatorsdetach

6. Tips for deleting models on the GPU

7. Call before testeval()

Related Posts

TensorFlow Distributed environment (1) — overall architecture

Face recognition was breached? At most two years it will be everywhere!

Model parallel distributed training Megatron (2) — Overall architecture

2. Use it whenever possible`Sequential` 层

4. Use it well`distributions`

5. Use it for long-term indicators`detach`

7. Call before test`eval()`