YOLOv3 source code precision understanding (six) training

The code mainly refers to bubbliiing’s Github YOLOv3 code: github.com/bubbliiiing…

Interpretation of source code

Training section

“Train”. Py files

"' training target detection model must need to pay attention to the following: 1, carefully check before training its own format whether meet the requirements, the library requires data set format for VOC format, need to prepare good content input images and labels for the input image. JPG images, no fixed size, before the incoming will automatically resize. Grayscale images will be automatically converted into RGB images for training, without their own modification. Input pictures if the suffix is not JPG, you need to batch convert to JPG before starting training. The label is in. XML format, and the file contains the target information to be detected. The label file corresponds to the input image file. 2. Trained weight files are stored in logs folder, and each epoch will be saved once. If only a few steps have been trained, the epoch and step will not be saved. In the process of training, the code is not set to save only the lowest loss, so there will be 100 weights after training according to the default parameters, if the space is not enough can be deleted. This is not to save as little as possible or as much as possible, some people want to save all, some people want to save only a little, in order to meet most needs, or save all can be highly selective. 3. The size of the loss value is used to judge whether the model converges or not. What is more important is that the model has a tendency of convergence, that is, the loss of the verification set decreases continuously. The exact size of the loss is meaningless. It depends only on the way the loss is calculated, not close to zero. If you want to make the loss look nice, you can just go into the corresponding loss function and divide by 10,000. The lost values during the training will be saved in the logs folder loss_%Y_%m_%d_%H_% m_% S folder. 4. Tuning parameters is a pretty important knowledge, no parameters are always good, the existing parameters are the parameters THAT I have tested and can be trained normally, so I would suggest using the existing parameters. However, the parameter itself is not absolute. For example, with the increase of batch, the learning rate can also increase and the effect will be better. Too deep network do not use too large learning rate and so on. These are on experience, can rely on each classmate query data and oneself to try more. ' ' '  
if __name__ == "__main__":
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # Whether to use Cuda
    # No GPU can be set to False
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    Cuda            = False
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # classes_path points to TXT under model_data, which is related to the data set it trained
    Before training, be sure to modify classes_path to correspond to your own data set
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    We need to modify it to our own class tag file during training
    classes_path    = 'model_data/voc_classes.txt'
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # anchors_path represents the TXT file corresponding to the prior box, which is generally not modified.
    # anchors_mask is used to help code find the corresponding prior box and is generally not modified.
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    Basic information of Anchor, like prediction, is generally not modified
    anchors_path    = 'model_data/yolo_anchors.txt'
    anchors_mask    = [[6.7.8], [3.4.5], [0.1.2]]
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - -- -- -- -- -- #
    See README for weight files, which can be downloaded from a web disk. The pre-training weights of the model are universal across different data sets because the features are universal.
    # The important part of the pre-training weight of the model is the weight part of the trunk feature extraction network, which is used for feature extraction.
    # The weight of pre-training must be used in 99% of cases, otherwise the weight of the trunk part is too random, the effect of feature extraction is not obvious, and the result of network training will not be good
    #
    # If there is an operation that interrupts training, you can set model_path to the weight file under logs folder and load the weights that have been partially trained again.
    At the same time, modify the parameters of the freezing phase or thawing phase below to ensure the continuity of the model EPOCH.
    #   
    # do not load weights for the entire model when model_path = ".
    #
    # Here is the weight of the whole model, so it is loaded in train.py. The following pretrain does not affect the weight loading here.
    # If you want the model to train from the trunk's pretrain weight, set model_path = ", pretrain = True below, and load only the trunk.
    Set model_path = "", pretrain = Fasle, Freeze_Train = Fasle, and train from 0 without freezing the trunk.
    #   
    Generally speaking, the training effect of network starting from 0 will be very poor, because the weight is too random, the feature extraction effect is not obvious, so it is very, very, very not recommended to start training from 0!
    # If you must start from 0, you can understand the Imagenet data set. First, train the classification model to obtain the main cadre decentralization value of the network. The main part of the classification model is common to the model, and train based on this.
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - -- -- -- -- -- #
    
    # is our model parameter
    model_path      = 'model_data/yolo_weights.pth'
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # input_shape Shape size must be a multiple of 32
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    # Input size, the larger the input image, the more accurate the effect, but will be slower, but must be a multiple of 32
    input_shape     = [416.416]
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - -- -- -- -- -- #
    # Pretrained indicates whether the trunk network pretrained weights are used. The trunk weights are used here, so pretrained is loaded during model construction.
    # If model_path is set, the weight of the trunk does not need to be loaded, and the pretrained value is meaningless.
    # if model_path is not set, pretrain = True
    Model_path pretrained = False, Freeze_Train = Fasle
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - -- -- -- -- -- #
    
    Pretrained: / / load the load from model_path. Pretrained: / / load the load from model_path We've set up pretrained, which has an initial parameter position in our darknet53 trunk network, to initialize the parameters directly. Model_path ="",pretrained=False, start training from 0
    pretrained      = False
    
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - -- -- -- -- -- #
    # The training is divided into two stages, namely the freezing stage and the thawing stage. The freezing stage is set to meet the training needs of students with insufficient machine performance.
    When the video memory required for freezing training is small and the graphics card is very poor, Freeze_Epoch can be set to UnFreeze_Epoch, and only freezing training can be performed.
    #      
    Some suggestions for parameter setting are provided here, which can be flexibly adjusted according to your own needs:
    (I) Start training from the pre-training weight of the whole model:
    # Init_Epoch = 0, Freeze_Epoch = 50, UnFreeze_Epoch = 100, Freeze_Train = True (default)
    # Init_Epoch = 0, UnFreeze_Epoch = 100, Freeze_Train = False
    UnFreeze_Epoch can be adjusted between 100 and 300. Optimizer_type = 'SGD', Init_lr = 1e-2.
    (2) Start training from the pre-training weight of the backbone network:
    # Init_Epoch = 0, Freeze_Epoch = 50, UnFreeze_Epoch = 300, Freeze_Train = True
    # Init_Epoch = 0, UnFreeze_Epoch = 300, Freeze_Train = False
    Where: since the training starts from the pre-training weight of the trunk network, the weight of the trunk may not be suitable for target detection, and more training is needed to jump out of the local optimal solution.
    # UnFreeze_Epoch can be adjusted between 200 and 300, 300 is recommended for YOLOV5 and YOLOX. Optimizer_type = 'SGD', Init_lr = 1e-2.
    # batch_size = batch_size
    # In the graphics card can accept the range, much better. Insufficient video memory has nothing to do with the data set size. If you are prompted that video memory is insufficient (OOM or CUDA out of memory), adjust batch_size to a smaller value.
    # the BatchNorm layer affects the BatchNorm layer. The minimum batch_size is 2, not 1.
    The recommended value for Freeze_batch_size is 1-2 times that of Unfreeze_batch_size. It is not recommended to set the gap too large as it relates to the automatic adjustment of learning rate.
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - -- -- -- -- -- #
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # Freeze phase training parameters
    # At this time, the backbone of the model is frozen, and the feature extraction network does not change
    Small memory footprint, only fine tuning of the network
    # Init_Epoch The training generation currently started by the model, whose value can be greater than Freeze_Epoch, as set:
    # Init_Epoch = 60, Freeze_Epoch = 50, UnFreeze_Epoch = 100
    # Will skip the freezing stage and directly start from generation 60, and adjust the corresponding learning rate.
    # (used to continue a breakpoint)
    The # Freeze_Epoch model freezes the trained Freeze_Epoch
    # (void when Freeze_Train=False)
    # Freeze_batch_size Model freezes the batch_size of training
    # (void when Freeze_Train=False)
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    # is enabled to freeze training, we must first clear is frozen backbone network, is our darknet53 parameters do not change, but the other five layer feature extraction + 2 results predict a part or parameter changing of the training, because we use the freeze part without calculating parameters, takes up the cache is small, So we can make our batch_size a little bit bigger
    Init_Epoch          = 0
    Freeze_Epoch        = 50
    Freeze_batch_size   = 16
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # Unfreeze training parameters
    # At this point, the backbone of the model will not be frozen, and the feature extraction network will change
    All parameters of the network will be changed
    # UnFreeze_Epoch The total trained epoch of the model
    # Unfreeze_batch_size Specifies the batch_size of the thawed model
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    The parameters of darknet53 are also constantly being calculated and changed, so the video memory usage is also increasing, so we need to set our batch_size to be smaller
    UnFreeze_Epoch      = 100
    Unfreeze_batch_size = 8
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # Freeze_Train Whether to freeze training
    By default, trunk training is frozen first and then unfrozen.
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    Freeze_Train        = True

    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # Other training parameters: learning rate, optimizer, learning rate decline related
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # Init_lr Maximum learning rate for the model
    # Min_lr minimum learning rate for the model. The default is 0.01 of the maximum learning rate
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    # Learning rate Settings
    Init_lr             = 1e-2
    Min_lr              = Init_lr * 0.01
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # optimizer type used by optimizer_type (Adam, SGD
    # Init_lr= 1E-3 is recommended when using the Adam optimizer
    Init_lr= 1E-2 is recommended when using the SGD optimizer
    Momentum parameters used internally by the optimizer
    # weight_decay Weight decay prevents overfitting
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    # The optimization method we use, optimization method momentum parameter and weight attenuation parameter Settings
    optimizer_type      = "sgd"
    momentum            = 0.937
    weight_decay        = 5e-4
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # lr_decay_type Learning rate reduction method used. The options are step and cos
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
   
    lr_decay_type       = "cos"
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # save_period How many epochs are saved at a time. By default, each epoch is saved
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    After training how many epochs, save the whole model. The default is to train one epoch for saving. My strategy is generally in the early stage
    The fluctuation of loss is relatively large. We can separate epochs and then save them again. Generally, I will put them in the first 50 epochs
    # Save every 5 epochs. When we passed 50 epochs, loss dropped to a stable position and we needed to fine-tune it
    # We can use a smaller learning rate to save each epoch. In this way, a lot of space can be saved
    save_period         = 1
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # num_workers is used to set whether multithreading is used to read data
    This will speed up data reading, but will take up more memory
    Set this to 2 or 0 for computers with small memory
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    # Use multi-thread to construct dataset, generally set 0, 2, 4
    num_workers         = 4

    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    Get the image path and tag
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    X1,y1,x2,y2,class_num... The construction of
    See a # : C: \ \ Users \ XXX yolo3 - pytorch - master \ VOCdevkit/VOC2007 JPEGImages / 000007. 141,50,500,330,6 JPG
    train_annotation_path   = '2007_train.txt'
    val_annotation_path     = '2007_val.txt'

    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    Get classes and Anchors
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    # Get anchors info and anchors info, as I mentioned in my prediction, forget that you can check it out in the YOLO class
    class_names, num_classes = get_classes(classes_path)
    anchors, num_anchors     = get_anchors(anchors_path)

    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    # Create yOLO model
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    # Create our model
    model = YoloBody(anchors_mask, num_classes, pretrained=pretrained)
    
    # Assume that we need to initialize parameters by ourselves when there are no pre-training parameters in our model. See the following interpretation for details
    if not pretrained:
        weights_init(model)
    
    # we load the model parameters.
    # We first determine whether to load the trunk network pre-training parameters, do not load self-initialization, load using the pre-training parameters;
    # Then we load model parameters at the time of judgment. If model parameters are loaded, they will be overwritten even if pre-training parameters are set.
    If no model parameters are loaded, initialize the model as described above
    Key: model_path >> pretrained >> weight_init
    ifmodel_path ! =' ':
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        See README for weight files
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        print('Load weights {}.'.format(model_path))
        device          = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        model_dict      = model.state_dict()
        pretrained_dict = torch.load(model_path, map_location = device)
        pretrained_dict = {k: v for k, v in pretrained_dict.items() if np.shape(model_dict[k]) == np.shape(v)}
        model_dict.update(pretrained_dict)
        model.load_state_dict(model_dict)

    The loading of # yOLO loss class is explained in chapter (7) Loss
    yolo_loss    = YOLOLoss(anchors, num_classes, input_shape, Cuda, anchors_mask)
    
    Write the loss information of each epoch in the log. We can observe the data changes in the visualization part to see whether the data converges
    loss_history = LossHistory("logs/", model, input_shape=input_shape)

    # Model training
    model_train = model.train()
    if Cuda:
        model_train = torch.nn.DataParallel(model)
        cudnn.benchmark = True
        model_train = model_train.cuda()

    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    Read the data set corresponding to TXT
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    
    The main purpose of reading TXT files is to construct dataset and dataloader
    with open(train_annotation_path) as f:
        train_lines = f.readlines()
    with open(val_annotation_path) as f:
        val_lines   = f.readlines()
    num_train   = len(train_lines)
    num_val     = len(val_lines)

    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    The main feature extraction network features are universal. Freezing training can speed up training
    It can also prevent weights from being destroyed at the beginning of training.
    # Init_Epoch is the start generation
    # Freeze_Epoch Is the freeze training generation
    # UnFreeze_Epoch Total training generation
    Change the Batch_size in OOM or video memory is insufficient
    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
    if True:
        UnFreeze_flag = False
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        Freeze some training
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        
        # freeze training, see the following cycle, we are on the parameters of the backbone is darknet53, will they don't need the gradient is set to update the status
        if Freeze_Train:
            for param in model.backbone.parameters():
                param.requires_grad = False

        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        Set batch_size to Unfreeze_batch_size if training is not frozen
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        
        The batch_size of our frozen state is different from that of our unfrozen state
        batch_size = Freeze_batch_size if Freeze_Train else Unfreeze_batch_size

        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        # Judge the difference between the current BATCH_size and 64 and adjust the learning rate adaptively
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        
        # Automatically adjust learning rate
        nbs         = 64
        Init_lr_fit = max(batch_size / nbs * Init_lr, 1e-4)
        Min_lr_fit  = max(batch_size / nbs * Min_lr, 1e-6)

        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        Select the optimizer according to optimizer_type
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        
        In this case, we actually construct our optimizer. Our parameters are divided into three types: weight, which we store with P1, bias, which we store with B2, and BN, where there is no bias, weight, which we store with P0
        pg0, pg1, pg2 = [], [], []  
        for k, v in model.named_modules():
            if hasattr(v, "bias") and isinstance(v.bias, nn.Parameter):
                pg2.append(v.bias)    
            if isinstance(v, nn.BatchNorm2d) or "bn" in k:
                pg0.append(v.weight)    
            elif hasattr(v, "weight") and isinstance(v.weight, nn.Parameter):
                pg1.append(v.weight)
                
        # we use optimizer_type for optimizer selection, dictionary selection can be learned, and then will
        optimizer = {
            'adam'  : optim.Adam(pg0, Init_lr_fit, betas = (momentum, 0.999)),
            'sgd'   : optim.SGD(pg0, Init_lr_fit, momentum = momentum, nesterov=True)
        }[optimizer_type]
        
        Update parameters in optimizer
        optimizer.add_param_group({"params": pg1, "weight_decay": weight_decay})
        optimizer.add_param_group({"params": pg2})

        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        # Obtain the formula for declining learning rate
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
        
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        # Judge the length of each generation
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        
        # To obtain how many steps we need to perform in an epoch, the divisible part is retained, and the remainder part is not enough to discard a step
        # train_step
        epoch_step      = num_train // batch_size
        # val_step
        epoch_step_val  = num_val // batch_size
        
        If the data set fails to meet a step, throw an exception
        if epoch_step == 0 or epoch_step_val == 0:
            raise ValueError("Dataset too small to continue training, please expand dataset.")

        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        Build the dataset loader.
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        
        Build Dataset and load DataLoader
        train_dataset   = YoloDataset(train_lines, input_shape, num_classes, train = True)
        val_dataset     = YoloDataset(val_lines, input_shape, num_classes, train = False)
        gen             = DataLoader(train_dataset, shuffle = True, batch_size = batch_size, num_workers = num_workers, pin_memory=True,drop_last=True, collate_fn=yolo_dataset_collate)
        gen_val         = DataLoader(val_dataset  , shuffle = True, batch_size = batch_size, num_workers = num_workers, pin_memory=True, drop_last=True, collate_fn=yolo_dataset_collate)

        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        # Start model training
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
        
        Since our unfrozen Epoch is the turn to train, we cycle directly from the initial Epoch to the unfrozen Epoch
        for epoch in range(Init_Epoch, UnFreeze_Epoch):
            # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
            # If the model has a frozen learning part
            Unfreeze and set parameters
            # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
            
            The early stage is frozen training, and the late stage is non-frozen training. The batCH_size is different, so it needs to be switched
            if epoch >= Freeze_Epoch and not UnFreeze_flag and Freeze_Train:
                batch_size = Unfreeze_batch_size

                # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
                # Judge the difference between the current BATCH_size and 64 and adjust the learning rate adaptively
                # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
                
                # Because batCH_size changes, the adaptive learning rate should also change
                nbs         = 64
                Init_lr_fit = max(batch_size / nbs * Init_lr, 1e-4)
                Min_lr_fit  = max(batch_size / nbs * Min_lr, 1e-6)
                # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
                # Obtain the formula for declining learning rate
                # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
                
                # get function for learning rate drop
                lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
                Since we are entering the unfrozen phase of training, we need to set the gradient descent attribute of the parameter to True
                for param in model.backbone.parameters():
                    param.requires_grad = True
                
                When # batch_size changes, the number of steps changes as well
                epoch_step      = num_train // batch_size
                epoch_step_val  = num_val // batch_size

                If the data set fails to meet a step, throw an exception
                if epoch_step == 0 or epoch_step_val == 0:
                    raise ValueError("Dataset too small to continue training, please expand dataset.")

                In the same way, when batch_size changes, our dataloader will be reloaded
                gen     = DataLoader(train_dataset, shuffle = True, batch_size = batch_size, num_workers = num_workers, pin_memory=True,drop_last=True, collate_fn=yolo_dataset_collate)
                gen_val = DataLoader(val_dataset  , shuffle = True, batch_size = batch_size, num_workers = num_workers, pin_memory=True, drop_last=True, collate_fn=yolo_dataset_collate)
                
                # Unfrozen training flag set to True
                UnFreeze_flag = True
                
            Set the optimizer's learning rate, parameters (optimizer, learning rate optimization method, epoch)
            set_optimizer_lr(optimizer, lr_scheduler_func, epoch)
            
            # Real training methods (core)
            fit_one_epoch(model_train, model, yolo_loss, loss_history, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, UnFreeze_Epoch, Cuda, save_period)
        The log file manager is closed
        loss_history.writer.close()
Copy the code

Parse the function called above

Weights_init interpretation

def weights_init(net, init_type='normal', init_gain = 0.02) :
    # init method
    def init_func(m) :
        classname = m.__class__.__name__
        # Assuming we have a weight in our property and a convolution layer
        if hasattr(m, 'weight') and classname.find('Conv') != -1:
            Initialize with a positive distribution of the given variance mean
            if init_type == 'normal':
                torch.nn.init.normal_(m.weight.data, 0.0, init_gain)
            # Xavier initialized
            elif init_type == 'xavier':
                torch.nn.init.xavier_normal_(m.weight.data, gain=init_gain)
            # Keming initialization
            elif init_type == 'kaiming':
                torch.nn.init.kaiming_normal_(m.weight.data, a=0, mode='fan_in')
            Orthogonal initialization
            elif init_type == 'orthogonal':
                torch.nn.init.orthogonal_(m.weight.data, gain=init_gain)
            Select only one of the methods provided above, otherwise raise exception
            else:
                raise NotImplementedError('initialization method [%s] is not implemented' % init_type)
        # For BN layer, we use normal distribution initialization and constant initialization
        elif classname.find('BatchNorm2d') != -1:
            torch.nn.init.normal_(m.weight.data, 1.0.0.02)
            torch.nn.init.constant_(m.bias.data, 0.0)
The initialization is complete and applied to the network
print('initialize network with %s type' % init_type)
net.apply(init_func)
Copy the code

The interpretation of YoloDataset
- Interpretation of YoloDataset: juejin.cn/post/707962… The detailed analysis is all in this section
The interpretation of LossHistory
- For LossHistory: juejin.cn/post/707962… The detailed analysis is all in this section
The interpretation of get_lr_scheduler

# lr_decay_type = "cos",lr = max(batch_size / nbs * Init_lr, 1e-4)
# min_lr = Max (batch_size/NBS * min_lr, 1E-6) total_iters Total iteration epoch number
# Delay analysis
def get_lr_scheduler(lr_decay_type, lr, min_lr, total_iters, warmup_iters_ratio = 0.1, warmup_lr_ratio = 0.1, no_aug_iter_ratio = 0.3, step_num = 10) :
    # yolox
    def yolox_warm_cos_lr(lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter, iters) :
        if iters <= warmup_total_iters:
            # lr = (lr - warmup_lr_start) * iters / float(warmup_total_iters) + warmup_lr_start
            lr = (lr - warmup_lr_start) * pow(iters / float(warmup_total_iters), 2) + warmup_lr_start
        elif iters >= total_iters - no_aug_iter:
            lr = min_lr
        else:
            lr = min_lr + 0.5 * (lr - min_lr) * (
                1.0 + math.cos(math.pi* (iters - warmup_total_iters) / (total_iters - warmup_total_iters - no_aug_iter))
            )
        return lr

    def step_lr(lr, decay_rate, step_size, iters) :
        if step_size < 1:
            raise ValueError("step_size must above 1.")
        n       = iters // step_size
        out_lr  = lr * decay_rate ** n
        return out_lr

    if lr_decay_type == "cos":
        warmup_total_iters  = min(max(warmup_iters_ratio * total_iters, 1), 3)
        warmup_lr_start     = max(warmup_lr_ratio * lr, 1e-6)
        no_aug_iter         = min(max(no_aug_iter_ratio * total_iters, 1), 15)
        func = partial(yolox_warm_cos_lr ,lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter)
    else:
        decay_rate  = (min_lr / lr) ** (1 / (step_num - 1))
        step_size   = total_iters / step_num
        func = partial(step_lr, lr, decay_rate, step_size)

    return func
Copy the code

The interpretation of set_optimizer_lr

Set the learning rate for the specified epoch
def set_optimizer_lr(optimizer, lr_scheduler_func, epoch) :
    # Use the specified LR_scheduler learning rate drop method to complete the learning rate drop
    lr = lr_scheduler_func(epoch)
    Update the learning rate to the optimizer parameter
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr
Copy the code

Interpretation of FIT_one_epoch (Focus on)
- Writing method is relatively fixed, can learn the place is customized TQDM writing method
- Key methods to resolve: yolo_loss() For the calculation of loss, see:

def fit_one_epoch(model_train, model, yolo_loss, loss_history, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, Epoch, cuda, save_period) :

    # Define training loss and val_Loss
    loss        = 0
    val_loss    = 0

    # A sign to start training
    model_train.train()
    print('Start Train')
    
    # We can customize the output style for our TQDM usage
    with tqdm(total=epoch_step,desc=f'Epoch {epoch +1}/{Epoch}',postfix=dict,mininterval=0.3) as pbar:
        
        Let's start looping through our Dataloader
        for iteration, batch in enumerate(gen):
            # If the number of iterations exceeds the number of steps of an epoch we previously specified, we will skip it
            if iteration >= epoch_step:
                break
            
            Get image data and label data
            images, targets = batch[0], batch[1]
            with torch.no_grad():
                # Assuming we use the GPU, we translate the data into tensor and put it on CUDa
                if cuda:
                    images  = torch.from_numpy(images).type(torch.FloatTensor).cuda()
                    targets = [torch.from_numpy(ann).type(torch.FloatTensor).cuda() for ann in targets]
                You will need to translate your data into tensor
                else:
                    
                    images  = torch.from_numpy(images).type(torch.FloatTensor)
                    targets = [torch.from_numpy(ann).type(torch.FloatTensor) for ann in targets]
            # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
            Clear the zero gradient
            # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
            
            # Fixed operation, gradient zero, not zero will have gradient accumulation, calculation error
            optimizer.zero_grad()
            # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
            # forward propagation
            # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
            
            # Use the web to predict our data
            outputs         = model_train(images)
            
            Record the loss of each step
            loss_value_all  = 0
            # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
            # Calculate the loss
            # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
            
            Since we have three outputs [[bs,13,13,75],[bs,26,26,75],[bs,52,52,75]], we loop through the output of each feature graph and calculate the loss between the real label
            for l in range(len(outputs)):
                # The core of calculating loss is: yolo_loss
                loss_item = yolo_loss(l, outputs[l], targets)
                loss_value_all  += loss_item
            # Get the loss of this step
            loss_value = loss_value_all

            # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
            # Backpropagation
            # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
            
            # Fixed writing, back propagation and optimizer optimization
            loss_value.backward()
            optimizer.step()

            loss += loss_value.item()
            
            # Set parameters for our customized TQDM, mainly output loss, and learning rate
            pbar.set_postfix(**{'loss'  : loss / (iteration + 1), 'lr'    : get_lr(optimizer)})
            # TQDM update
            pbar.update(1)

    print('Finish Train')
    
    After an epoch training is complete, we perform eval
    model_train.eval(a)print('Start Validation')
    
    # Same as above, refer to above
    with tqdm(total=epoch_step_val, desc=f'Epoch {epoch +1}/{Epoch}',postfix=dict,mininterval=0.3) as pbar:
        for iteration, batch in enumerate(gen_val):
            if iteration >= epoch_step_val:
                break
            images, targets = batch[0], batch[1]
            with torch.no_grad():
                if cuda:
                    images  = torch.from_numpy(images).type(torch.FloatTensor).cuda()
                    targets = [torch.from_numpy(ann).type(torch.FloatTensor).cuda() for ann in targets]
                else:
                    images  = torch.from_numpy(images).type(torch.FloatTensor)
                    targets = [torch.from_numpy(ann).type(torch.FloatTensor) for ann in targets]
                # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
                Clear the zero gradient
                # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
                optimizer.zero_grad()
                # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
                # forward propagation
                # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
                outputs         = model_train(images)

                loss_value_all  = 0
                # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
                # Calculate the loss
                # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
                for l in range(len(outputs)):
                    loss_item = yolo_loss(l, outputs[l], targets)
                    loss_value_all  += loss_item
                loss_value  = loss_value_all

            val_loss += loss_value.item()
            pbar.set_postfix(**{'val_loss': val_loss / (iteration + 1)})
            pbar.update(1)

    print('Finish Validation')
    
    # Then record our loss information in our log
    loss_history.append_loss(epoch + 1, loss / epoch_step, val_loss / epoch_step_val)
    print('Epoch:'+ str(epoch + 1) + '/' + str(Epoch))
    print('Total Loss: %.3f || Val Loss: %.3f ' % (loss / epoch_step, val_loss / epoch_step_val))
    
    If the last epoch or the current epoch is divisible to our preservation period parameter, we will keep the model
    if (epoch + 1) % save_period == 0 or epoch + 1 == Epoch:
        torch.save(model.state_dict(), 'logs/ep%03d-loss%.3f-val_loss%.3f.pth' % (epoch + 1, loss / epoch_step, val_loss / epoch_step_val))
Copy the code

YOLOv3 source code precision understanding (six) training

Interpretation of source code

Training section

“Train”. Py files

Related Posts

Gossip about CUDA optimization

LSSVM Prediction Model Based on matlab Bat algorithm improved least square Support Vector Machine LSSVM prediction

[Form detection and structure identification] from master’s thesis to product landing