Cut the crap! Code! PyTorch features a compilation of nearly 100 code snippets in 7 directions

Author: @Atomic A Qiang, Harbin Engineering University

The code for this article is based on PyTorch version 1.0 and requires the following packages

import collectionsimport osimport shutilimport tqdmimport numpy as npimport PIL.Imageimport torchimport torchvisionCopy the code

1. Basic configuration

Check out the PyTorch version

torch.version # PyTorch version 

torch.version.cuda # Corresponding CUDA version 

torch.backends.cudnn.version() # Corresponding cuDNN version 

torch.cuda.get_device_name(0) # GPU type

Fixed random seed

torch.manual_seed(0)torch.cuda.manual_seed_all(0)Copy the code

Specifies that the program runs on a specific GPU card

Specify environment variables on the command line

CUDA_VISIBLE_DEVICES = 0, 1 python "train". PyCopy the code

Or specified in code

os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1'Copy the code

Check whether CUDA support is available

torch.cuda.is_available()Copy the code

Set to cuDNN Benchmark mode

Benchmark mode will improve the calculation speed, but due to randomness in the calculation, the network feedforward results are slightly different each time.

torch.backends.cudnn.benchmark = True

Copy the code

If you want to avoid this result fluctuation, set

torch.backends.cudnn.deterministic = True
Copy the code

Clearing GPU Storage

Sometimes the GPU storage cannot be released in time after control-c stops running, and you need to manually clear it. Inside PyTorch you can

torch.cuda.empty_cache()
Copy the code

Or on the command line, you can use ps to find the PID of the program and then use kill to end the process

ps aux | grep python
kill -9 [pid]
Copy the code

Or simply reset the GPU that has not been emptied

nvidia-smi --gpu-reset -i [gpu_id]
Copy the code

2. Tensor processing

Basic tensors

tensor.type()   # Data typetensor.size() # Shape of the tensor. It is a subclass of Python tupletensor.dim() # Number of dimensions.Copy the code

Data type conversion

# Set default tensor type. Float in PyTorch is much faster than double.
torch.set_default_tensor_type(torch.FloatTensor)
Copy the code

Type convertions.

tensor = tensor.cuda()
tensor = tensor.cpu()
tensor = tensor.float()
tensor = tensor.long()
Copy the code

Torch.Tensor translates to Np. ndarray

# torch.Tensor -> np.ndarray.
ndarray = tensor.cpu().numpy()

# np.ndarray -> torch.Tensor.
tensor = torch.from_numpy(ndarray).float()
tensor = torch.from_numpy(ndarray.copy()).float()  # If ndarray has negative stride
Copy the code

Torch.Tensor translates to Pil. Image

Tensors in PyTorch are in the order of N×D×H×W by default, and the data range is [0, 1], requiring transpose and normalization.

# torch.Tensor -> PIL.Image.
image = PIL.Image.fromarray(torch.clamp(tensor * 255, min=0, max=255
    ).byte().permute(1, 2, 0).cpu().numpy())
image = torchvision.transforms.functional.to_pil_image(tensor)  # Equivalently way
Copy the code
# PIL.Image -> torch.Tensor.
tensor = torch.from_numpy(np.asarray(PIL.Image.open(path))
    ).permute(2, 0, 1).float() / 255
tensor = torchvision.transforms.functional.to_tensor(PIL.Image.open(path))  # Equivalently wayNp. ndarray and pil. Image conversionCopy the code
# np.ndarray -> PIL.Image.
image = PIL.Image.fromarray(ndarray.astypde(np.uint8))
Copy the code
# PIL.Image -> np.ndarray.
ndarray = np.asarray(PIL.Image.open(path))
Copy the code

Extract values from tensors containing only one element

This is especially useful in counting changes in loss during training. Otherwise this will accumulate graphs and make the GPU storage footprint bigger and bigger.

value = tensor.item()
Copy the code

Tensor deformation

Tensor deformation is often needed to input convolution layer features into the fully connected layer. Torch. View Torch. 0 Is 0 0 Compared to Torch. View torch.

tensor = torch.reshape(tensor, shape)
Copy the code

The order

tensor = tensor[torch.randperm(tensor.size(0))]  # Shuffle the first dimension
Copy the code

Flip horizontal

PyTorch does not support negative step operations like tensor[::-1]. Horizontal flipping can be done with tensor indexes.

# Assume tensor has shape N*D*H*W.
tensor = tensor[:, :, :, torch.arange(tensor.size(3) - 1, -1, -1).long()]
Copy the code

Copy the tensor

There are three ways to replicate, each for different needs.

# Operation | New/Shared memory | Still in computation graph |
tensor.clone()            # | New | Yes |
tensor.detach()           # | Shared | No |
tensor.detach.clone()()   # | New | No |
Copy the code

Stitching tensor

Note that the difference between torch. Cat and torch. Stack is that torch. Cat concatenates along the given dimension, while torch. Stack adds another dimension. For example, when the parameters are three 10×5 tensors, the result of torch. Cat is a 30×5 tensor, while the result of torch. Stack is a 3×10×5 tensor.

tensor = torch.cat(list_of_tensors, dim=0)
tensor = torch.stack(list_of_tensors, dim=0)
Copy the code

Converts integer tokens to one-hot codes

Tags in PyTorch start at 0 by default.

N = tensor.size(0)
one_hot = torch.zeros(N, num_classes).long()
one_hot.scatter_(dim=1, index=torch.unsqueeze(tensor, dim=1), src=torch.ones(N, num_classes).long())
Copy the code

I get a non-zero/zero element

torch.nonzero(tensor)               # Index of non-zero elements
torch.nonzero(tensor == 0)          # Index of zero elements
torch.nonzero(tensor).size(0)       # Number of non-zero elements
torch.nonzero(tensor == 0).size(0)  # Number of zero elements
Copy the code

Say that two tensors are equal

torch.allclose(tensor1, tensor2)  # float tensor
torch.equal(tensor1, tensor2)     # int tensor
Copy the code

Tensor extension

# Expand tensor of shape 64*512 to shape 64*512*7*7.
torch.reshape(tensor, (64, 512, 1, 1)).expand(64, 512, 7, 7)
Copy the code

Matrix multiplication

# Matrix multiplication: (m*n) * (n*p) -> (m*p).
result = torch.mm(tensor1, tensor2)

# Batch matrix multiplication: (b*m*n) * (b*n*p) -> (b*m*p).
result = torch.bmm(tensor1, tensor2)

# Element-wise multiplication.
result = tensor1 * tensor2
Copy the code

Calculate the pairwise Euclidean distance between two sets of data

# X1 is of shape m*d, X2 is of shape n*d.
dist = torch.sqrt(torch.sum((X1[:,None,:] - X2) ** 2, dim=2))
Copy the code

3. Model definition

Convolution layer

The most common convolution layer configuration is

conv = torch.nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=True)
conv = torch.nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=True)
Copy the code

If the convolutional layer configuration is complicated and it is not convenient to calculate the output size, the following visual tools can be used for assistance

Convolution Visualizer: https://ezyang.github.io/convolution-visualizer/index.html

GAP (Global Average pooling) layer

gap = torch.nn.AdaptiveAvgPool2d(output_size=1)
Copy the code

Bilinear pooling [1]

X = torch.reshape(N, D, H * W)                        # Assume X has shape N*D*H*W
X = torch.bmm(X, torch.transpose(X, 1, 2)) / (H * W)  # Bilinear pooling
assert X.size() == (N, D, D)
X = torch.reshape(X, (N, D * D))
X = torch.sign(X) * torch.sqrt(torch.abs(X) + 1e-5)   # Signed-sqrt normalization
X = torch.nn.functional.normalize(X)                  # L2 normalization
Copy the code

Batch Normalization (PCB)

Based on data from all GPU cards, PyTorch used data from all GPU cards to calculate the mean and standard deviation of the BN layer independently. Alleviating inaccurate estimation of mean and standard deviation when batch size comparison is small is an effective technique to improve performance in target detection and other tasks.

https://github.com/vacancy/Synchronized-BatchNorm-PyTorch

PyTorch now officially supports synchronous BN operations

Sync_bn = Torch. Nn.SyncBatchNorm(num_features, EPS = 1E-05, Momentum =0.1, affine=True, track_running_stats=True)Copy the code

Change all BN layers of existing networks to synchronous BN layers

def convertBNtoSyncBN(module, process_group=None):
    ' ''Recursively replace all BN layers to SyncBN layer. Args: module[torch.nn.Module]. Network '' '
    if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
        sync_bn = torch.nn.SyncBatchNorm(module.num_features, module.eps, module.momentum, 
                                         module.affine, module.track_running_stats, process_group)
        sync_bn.running_mean = module.running_mean
        sync_bn.running_var = module.running_var
        if module.affine:
            sync_bn.weight = module.weight.clone().detach()
            sync_bn.bias = module.bias.clone().detach()
        return sync_bn
    else:
        for name, child_module in module.named_children():
            setattr(module, name) = convert_syncbn_model(child_module, process_group=process_group))
        return module
Copy the code

It’s like BN moving average

If you want to do something like BN moving average, you assign the moving average in the forward function using the inplace operation.

class BN(torch.nn.Module)
    def __init__(self):
        ...
        self.register_buffer('running_mean', torch.zeros(num_features))

    def forward(self, X):
        ...
        self.running_mean += momentum * (current - self.running_mean)
Copy the code

Calculate the whole parameter number of the model

num_parameters = sum(torch.numel(parameter) for parameter in model.parameters())
Copy the code

Keras-like model.summary() outputs model information

https://github.com/sksq96/pytorch-summary

Model weights are initialized

Note the difference between model.modules() and model.children() : Model.modules () iterates through all the child layers of the model, while model.children() iterates through only one layer below the model.

# Common practise for initialization.
for layer in model.modules():
    if isinstance(layer, torch.nn.Conv2d):
        torch.nn.init.kaiming_normal_(layer.weight, mode='fan_out',
                                      nonlinearity='relu')
        ifLayer.bias is not None: Torch.nn.init.constant_(layer.bias, val=0.0)elifisinstance(layer, torch.nn.BatchNorm2d): Torch. Nn. The init. Constant_ (layer. Weight, val = 1.0) torch. The nn. The init. Constant_ (layer. Bias, val = 0.0)elif isinstance(layer, torch.nn.Linear):
        torch.nn.init.xavier_normal_(layer.weight)
        ifLayer.bias is not None: Torch.nn.init.constant_(layer.bias, val=0.0)# Initialization with given tensor.
layer.weight = torch.nn.Parameter(tensor)
Copy the code

Load the model saved on the GPU to the CPU

model.load_state_dict(torch.load('model,pth', map_location='cpu'))
Copy the code

4. Data preparation, feature extraction and fine tuning

Image Shuffle/Region Confusion Mechanism (RCM) [2]

# X is torch.Tensor of size N*D*H*W.
# Shuffle rows
Q = (torch.unsqueeze(torch.arange(num_blocks), dim=1) * torch.ones(1, num_blocks).long()
     + torch.randint(low=-neighbour, high=neighbour, size=(num_blocks, num_blocks)))
Q = torch.argsort(Q, dim=0)
assert Q.size() == (num_blocks, num_blocks)

X = [torch.chunk(row, chunks=num_blocks, dim=2)
     for row in torch.chunk(X, chunks=num_blocks, dim=1)]
X = [[X[Q[i, j].item()][j] for j in range(num_blocks)]
     for i in range(num_blocks)]

# Shulle columns.
Q = (torch.ones(num_blocks, 1).long() * torch.unsqueeze(torch.arange(num_blocks), dim=0)
     + torch.randint(low=-neighbour, high=neighbour, size=(num_blocks, num_blocks)))
Q = torch.argsort(Q, dim=1)
assert Q.size() == (num_blocks, num_blocks)
X = [[X[i][Q[i, j].item()] for j in range(num_blocks)]
     for i in range(num_blocks)]

Y = torch.cat([torch.cat(row, dim=2) for row in X], dim=1)
Copy the code

Get basic information of video data

import cv2 video = cv2.VideoCapture(mp4_path) height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT)) width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH)) num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT)) fps = Int (video.get(cv2.cap_prop_fps)) video.release() TSN segment 1 [3] K = self._num_segmentsif is_train:
    if num_frames > K:
        # Random index for each segment.
        frame_indices = torch.randint(
            high=num_frames // K, size=(K,), dtype=torch.long)
        frame_indices += num_frames // K * torch.arange(K)
    else:
        frame_indices = torch.randint(
            high=num_frames, size=(K - num_frames,), dtype=torch.long)
        frame_indices = torch.sort(torch.cat((
            torch.arange(num_frames), frame_indices)))[0]
else:
    if num_frames > K:
        # Middle index for each segment.
        frame_indices = num_frames / K // 2
        frame_indices += num_frames // K * torch.arange(K)
    else:
        frame_indices = torch.sort(torch.cat((                              
            torch.arange(num_frames), torch.arange(K - num_frames))))[0]
assert frame_indices.size() == (K,)
return [frame_indices[i] for i in range(K)]
Copy the code

Extract the convolution feature of a layer of ImageNet pretraining model

# VGG-16 relu5-3 feature.
model = torchvision.models.vgg16(pretrained=True).features[:-1]
# VGG-16 pool5 feature.
model = torchvision.models.vgg16(pretrained=True).features
# VGG-16 fc7 feature.
model = torchvision.models.vgg16(pretrained=True)
model.classifier = torch.nn.Sequential(*list(model.classifier.children())[:-3])
# ResNet GAP feature.
model = torchvision.models.resnet18(pretrained=True)
model = torch.nn.Sequential(collections.OrderedDict(
    list(model.named_children())[:-1]))

with torch.no_grad():
    model.eval()
    conv_representation = model(image)
Copy the code

The convolutional features of ImageNet pretraining model are extracted

class FeatureExtractor(torch.nn.Module):
    """Helper class to extract several convolution features from the given pre-trained model. Attributes: _model, torch.nn.Module. _layers_to_extract, list
       
         or set
        
          Example: >>> model = torchvision.models.resnet152(pretrained=True) >>> model = torch.nn.Sequential(collections.OrderedDict( list(model.named_children())[:-1])) >>> conv_representation = FeatureExtractor( pretrained_model=model, layers_to_extract={'layer1', 'layer2', 'layer3', 'layer4'})(image) "
        
       ""
    def __init__(self, pretrained_model, layers_to_extract):
        torch.nn.Module.__init__(self)
        self._model = pretrained_model
        self._model.eval()
        self._layers_to_extract = set(layers_to_extract)
    
    def forward(self, x):
        with torch.no_grad():
            conv_representation = []
            for name, layer in self._model.named_children():
                x = layer(x)
                if name in self._layers_to_extract:
                    conv_representation.append(x)
            return conv_representation
Copy the code

Fine tuning full connection layer

model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
model.fc = nn.Linear(512, 100)  # Replace the last fc layerOptimizer = torch.optim.sgd (model.fc.parameters(), lr= 1E-2, momentum=0.9, weight_decay=1e-4)Copy the code

The full connection layer is fine-tuned with a large learning rate and the convolution layer is fine-tuned with a small learning rate

model = torchvision.models.resnet18(pretrained=True)
finetuned_parameters = list(map(id, model.fc.parameters()))
conv_parameters = (p for p in model.parameters() if id(p) not in finetuned_parameters)
parameters = [{'params': conv_parameters, 'lr': 1e-3}, 
              {'params': model.fc.parameters()}] optimizer = torch. Optim.sgd (parameters, lr=1e-2, momentum=0.9, weight_decay=1e-4)Copy the code

5. Model training

Commonly used training and validation data preprocessing

The ToTensor will translate Pil. Image or Np. ndarray from H×W×D to Torch.Tensor from D×H×W [0, 255].

Train_transform = torchvision.transforms.Com pose ([torchvision. Transforms. RandomResizedCrop (size = 224, scale = (0.08, 1.0)), torchvision. Transforms. RandomHorizontalFlip (), torchvision. Transforms. ToTensor (), Torchvision. Transforms. The Normalize (mean = (0.485, 0.456, 0.406), STD = (0.229, 0.224, 0.225)), ]) val_transform = torchvision.transforms.Compose([ torchvision.transforms.Resize(256), torchvision.transforms.CenterCrop(224), torchvision.transforms.ToTensor(), Torchvision. Transforms. The Normalize (mean = (0.485, 0.456, 0.406), STD = (0.229, 0.224, 0.225)),])Copy the code

Training basic code framework

for t in epoch(80):
    for images, labels in tqdm.tqdm(train_loader, desc='Epoch %3d' % (t + 1)):
        images, labels = images.cuda(), labels.cuda()
        scores = model(images)
        loss = loss_function(scores, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
Copy the code

Parameter Smoothing [4]

for images, labels in train_loader:
    images, labels = images.cuda(), labels.cuda()
    N = labels.size(0)
    # C is the number of classes.Smoothed_labels = torch. Full (size=(N, C), fill_value=0.1 / (c-1)). Cuda () smoothed_labels. Scatter_ (dim=1, index=torch.unsqueeze(labels, dim=1), Value =0.9) score = model(images) log_prob = torch.nn.functional.log_softmax(score, dim=1) loss = -torch.sum(log_prob * smoothed_labels) / N optimizer.zero_grad() loss.backward() optimizer.step()Copy the code

Mixup[5]

beta_distribution = torch.distributions.beta.Beta(alpha, alpha)
for images, labels in train_loader:
    images, labels = images.cuda(), labels.cuda()

    # Mixup images.
    lambda_ = beta_distribution.sample([]).item()
    index = torch.randperm(images.size(0)).cuda()
    mixed_images = lambda_ * images + (1 - lambda_) * images[index, :]

    # Mixup loss.    
    scores = model(mixed_images)
    loss = (lambda_ * loss_function(scores, labels) 
            + (1 - lambda_) * loss_function(scores, labels[index]))

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
Copy the code

L1 regularization

l1_regularization = torch.nn.L1Loss(reduction='sum')
loss = ...  # Standard cross-entropy loss
for param in model.parameters():
    loss += lambda_ * torch.sum(torch.abs(param))
loss.backward()
Copy the code

L2 regularization/weight decay for bias entries

bias_list = (param for name, param in model.named_parameters() if name[-4:] == 'bias')
others_list = (param for name, param in model.named_parameters() ifname[-4:] ! ='bias')
parameters = [{'parameters': bias_list, 'weight_decay': 0},                
              {'parameters': others_list}] optimizer = torch. Optim. SGD(parameters, lr=1e-2, momentum=0.9, weight_decay=1e-4)Copy the code

Gradient clipping

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=20)
Copy the code

Calculate the accuracy of Softmax output

score = model(images)
prediction = torch.argmax(score, dim=1)
num_correct = torch.sum(prediction == labels).item()
accuruacy = num_correct / labels.size(0)
Copy the code

Visual model feedforward calculation diagram

szagoruyko/pytorchviz

Get the current learning rate

# If there is one global learning rate (which is the common case).
lr = next(iter(optimizer.param_groups))['lr']

# If there are multiple learning rates for different layers.
all_lr = []
for param_group in optimizer.param_groups:
    all_lr.append(param_group['lr'])
Copy the code

Attenuation of learning rate

# Reduce learning rate when validation accuarcy plateau.
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', patience=5, verbose=True)
for t inrange(0, 80): train(...) ; val(...) scheduler.step(val_acc)# Cosine annealing learning rate.
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=80)
# Reduce learning rate by 10 at given epochs.Scheduler = Torch. Optim. lr_scheduler.MultiStepLR(Optimizer, Renamed =[50, 70], gamma=0.1);for t inrange(0, 80): scheduler.step() train(...) ; val(...)# Learning rate warmup by 10 epochs.
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda t: t / 10)
for t inrange(0, 10): scheduler.step() train(...) ; val(...)Copy the code

Save and load breakpoints

Note that in order to be able to resume training, we need to save both the state of the model and the optimizer, along with the current number of training rounds.

# Save checkpoint.
is_best = current_acc > best_acc
best_acc = max(best_acc, current_acc)
checkpoint = {
    'best_acc': best_acc,    
    'epoch': t + 1,
    'model': model.state_dict(),
    'optimizer': optimizer.state_dict(),
}
model_path = os.path.join('model'.'checkpoint.pth.tar')
torch.save(checkpoint, model_path)
if is_best:
    shutil.copy('checkpoint.pth.tar', model_path)

# Load checkpoint.
if resume:
    model_path = os.path.join('model'.'checkpoint.pth.tar')
    assert os.path.isfile(model_path)
    checkpoint = torch.load(model_path)
    best_acc = checkpoint['best_acc']
    start_epoch = checkpoint['epoch']
    model.load_state_dict(checkpoint['model'])
    optimizer.load_state_dict(checkpoint['optimizer'])
    print('Load checkpoint at epoch %d.' % start_epoch)
Copy the code

Calculation accuracy, Precision and recall

# data['label'] and data['prediction'] are groundtruth label and prediction 
# for each image, respectively.
accuracy = np.mean(data['label'] == data['prediction'* 100])# Compute recision and recall for each class.
for c in range(len(num_classes)):
    tp = np.dot((data['label'] == c).astype(int),
                (data['prediction'] == c).astype(int))
    tp_fp = np.sum(data['prediction'] == c)
    tp_fn = np.sum(data['label'] == c)
    precision = tp / tp_fp * 100
    recall = tp / tp_fn * 100
Copy the code

6. Model testing

Precision, recall, F1 and overall indicators of each category were calculated

import sklearn.metrics

all_label = []
all_prediction = []
for images, labels in tqdm.tqdm(data_loader):
     # Data.
     images, labels = images.cuda(), labels.cuda()
     
     # Forward pass.
     score = model(images)
     
     # Save label and predictions.
     prediction = torch.argmax(score, dim=1)
     all_label.append(labels.cpu().numpy())
     all_prediction.append(prediction.cpu().numpy())

# Compute RP and confusion matrix.
all_label = np.concatenate(all_label)
assert len(all_label.shape) == 1
all_prediction = np.concatenate(all_prediction)
assert all_label.shape == all_prediction.shape
micro_p, micro_r, micro_f1, _ = sklearn.metrics.precision_recall_fscore_support(
     all_label, all_prediction, average='micro', labels=range(num_classes))
class_p, class_r, class_f1, class_occurence = sklearn.metrics.precision_recall_fscore_support(
     all_label, all_prediction, average=None, labels=range(num_classes))
# Ci,j = #{y=i and hat_y=j}
confusion_mat = sklearn.metrics.confusion_matrix(
     all_label, all_prediction, labels=range(num_classes))
assert confusion_mat.shape == (num_classes, num_classes)
Copy the code

Write various results to a spreadsheet

import csv

# Write results onto disk.
with open(os.path.join(path, filename), 'wt', encoding='utf-8') as f:
     f = csv.writer(f)
     f.writerow(['Class'.'Label'.'# occurence'.'Precision'.'Recall'.'F1'.'Confused class 1'.'Confused class 2'.'Confused class 3'.'Confused 4'.'Confused class 5'])
     for c in range(num_classes):
         index = np.argsort(confusion_mat[:, c])[::-1][:5]
         f.writerow([
             label2class[c], c, class_occurence[c], '% 4.3 f' % class_p[c],
                 '% 4.3 f' % class_r[c], '% 4.3 f' % class_f1[c],
                 '%s:%d' % (label2class[index[0]], confusion_mat[index[0], c]),
                 '%s:%d' % (label2class[index[1]], confusion_mat[index[1], c]),
                 '%s:%d' % (label2class[index[2]], confusion_mat[index[2], c]),
                 '%s:%d' % (label2class[index[3]], confusion_mat[index[3], c]),
                 '%s:%d' % (label2class[index[4]], confusion_mat[index[4], c])])
         f.writerow(['All'.' ', np.sum(class_occurence), micro_p, micro_r, micro_f1, 
                     ' '.' '.' '.' '.' '])
Copy the code

7. PyTorch Other precautions

  1. It is recommended that you use the torch. Nn module definition for the pooling layer and the torch. Nn. The difference between torch. Nn and torch. Nn. Functional is that torch. Nn calls torch. Pay attention to network state when using torch.nn. Functional, for example
def forward(self, x): ... X = torch. Nn. Functional. Dropout (x, p = 0.5, training = self. The training)Copy the code
  1. Switch network state with model.train() and model.eval() before model(x).

  2. Blocks of code that do not need to calculate gradients are contained with torch.no_grad().

  3. The difference between Model.eval () and Torch.no_grad() is that model.eval() switches the network into a test state, such as BN and Dropout using different calculations during the training and test phases. Torch. No_grad () is to turn off the automatic derivative mechanism of PyTorch tensor to reduce storage usage and speed up calculation, and the result obtained cannot be carried out losing.backward ().

  4. The torch. Nn. CrossEntropyLoss Softmax input is not needed. Torch. Nn. CrossEntropyLoss equivalent to the torch. The nn). The functional log_softmax + torch. Nn. NLLLoss.

  5. The cumulative gradient is cleared with optimizer.zero_grad() before loss. Backward (). Optimizer.zero_grad () has the same effect as model.zero_grad().

  6. Torch. Utils. Data. As far as possible in the DataLoader set pin_memory = True, set on small data sets such as MNIST pin_memory = False instead of faster. The num_workers setting needs to find the fastest value in the experiment.

  7. Use DEL to delete unnecessary intermediate variables in a timely manner, saving GPU storage.

  8. Use inplace operations to save GPU storage, e.g. X = torch.nn.functional.relu(x, inplace=True). You can also save GPU storage by reserving only a portion of intermediate results for forward propagation of torch. Utils.checkpoint. The contents required for backward propagation are calculated from the latest intermediate results.

  9. Reduce data transfer between CPU and GPU. For example, if you want to know the loss and accuracy of each Mini-batch in an epoch, it will be faster to accumulate them on GPU and transfer them back to CPU together after the end of the epoch than to transfer gPU-to-CPU from each Mini-batch.

  10. The use of the half-precision float half() gives some speed improvement, depending on the GPU model. You need to be careful about the stability of low numerical accuracy.

  11. Always use assert tensor.size() == (N, D, H, W) as a debugging tool to make sure the tensor dimensions are what you think they are.

  12. In addition to marking Y, one-dimensional tensors should be used as little as possible and two-dimensional tensors of N *1 should be used instead, which can avoid some unexpected calculation results of one-dimensional tensors.

  13. Statistics the time of each part of the code

with torch.autograd.profiler.profile(enabled=True, use_cuda=False) as profile:
    ...
print(profile)
Copy the code

Or run it from the command line

python -m torch.utils.bottleneck main.py
Copy the code

Reference:

[1] Lin, A. RoyChowdhury, and S. Maji. Bilinear CNN models for fine-grained visual recognition. In ICCV, 2015.

[2] Chen, Y. Bai, W. Zhang, and T. Mei. Destruction and construction learning for fine-grained image recognition. In CVPR, 2019.

[3] Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. V. Gool. Temporal segment networks: Towards good practices for deep action recognition. In ECCV, 2016.

[4] Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna: Rethinking the Inception architecture for computer vision. In CVPR, 2016.

[5] Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. Mixup: Empirical Risk minimization. In ICLR, 2018.

This article is formatted using MDNICE