YOLO, which stands for “You Only Look Once,” is an object detection algorithm based on convolutional neural network (CNN). YOLO V3 is the third version of YOLO (YOLO, YOLO 9000, AND YOLO V3). The detection effect is more accurate and stronger.

For more details, see YOLO’s website.

“YOLO” means “You Only Live Once.”

This paper introduces the implementation details of YOLO V3 algorithm and Keras framework. This is chapter one, training. Of course, there are chapters 2 to n, after all, this is a full edition 🙂

GitHub source: github.com/SpikeKing/k…

Has been updated:

  • Article 1 training: mp.weixin.qq.com/s/T9LshbXoe…
  • Article 2 model: mp.weixin.qq.com/s/N79S9Qf1O…
  • Article 3 network: mp.weixin.qq.com/s/hC4P7iRGv…
  • Article 4 the true value: mp.weixin.qq.com/s/5Sj7QadfV…
  • Article 5 Loss:mp.weixin.qq.com/s/4L9E4WGSh…

Welcome to wechat public number DeepAlgorithm (ID: DeepAlgorithm), learn more depth technology!!


1. The parameters

Training parameters of the model, including 5 parameters:

(1) The picture data set with marked box is in the following format:

The position of the picture frame of four coordinates, and a category ID (xmin, ymin, xmax, ymax, label_id)... The dataset/image. JPG 788351832426, 0 805208855270, 0Copy the code

(2) Summary of annotation box categories, that is, list of all categories of objects labeled in the data set, as follows:

aeroplane
bicycle
bird
...
Copy the code

(3) Pre-training model, used for Fine Tune in Transfer Learning, optional weight of COCO model that has been trained by YOLO V3, namely:

pretrained_path = 'model_data/yolo_weights.h5'
Copy the code

(4) Collection of Anchor box of Prediction Feature Map:

  • Feature pictures of 3 scales, 3 anchor frames for each feature picture, a total of 9 anchor frames, arranged from small to large;
  • 1~3 are used for large scale (52×52) feature maps, 4~6 are mesoscale (26×26), and 7~9 are small scale (13×13).
  • Large scale feature map detection of small objects, small scale detection of large objects;
  • Nine anchors are derived from k-means clustering of Bounding boxes.

Among these, COCO’s anchors are as follows:

10.13.16.30.33.23.30.61.62.45.59.119.116.90.156.198.373.326
Copy the code

(5) Image input size, the default is 416×416.

  • Picture size meets32Multiples of, in the DarkNet network, containing 5 down-sampled convolution of step size 2 (32 = 2 ^ 5). The realization of down-sampling convolution is as follows:
x = DarknetConv2D_BN_Leaky(num_filters, (3.3), strides=(2.2))(x)
Copy the code
  • At the bottom level, the size of the feature map should be odd, such as 13, to ensure that the center point falls in a unique box. If it is an even number, the center point falls into the four boxes in the center, leading to ambiguity.

2. Create the model

To create a network model for YOLOv3, enter:

  • Input_shape: image size;
  • Anchors: 9 Anchor boxes;
  • Num_classes: number of classes;
  • Freeze_body: Freeze mode, 1 is to freeze DarkNet53 layers, 2 is to freeze all but the last 3 layers;
  • Weights_path: The weight of the pretrained model.

Implementation:

model = create_model(input_shape, anchors, num_classes,
                     freeze_body=2,
                     weights_path=pretrained_path)
Copy the code

Among them, the last three layers of the network:

Three 1×1 convolution layers (instead of the full connection layer) are used to convert feature maps of three scales into predicted values of three scales.

Implementation:

out_filters = num_anchors * (num_classes + 5)
// ...
DarknetConv2D(out_filters, (1.1))
Copy the code

That is:

conv2d_59 (Conv2D)      (None, 13, 13, 18)   18450       leaky_re_lu_58[0][0]    
conv2d_67 (Conv2D)      (None, 26, 26, 18)   9234        leaky_re_lu_65[0][0]    
conv2d_75 (Conv2D)      (None, 52, 52, 18)   4626        leaky_re_lu_72[0][0]    
Copy the code

3. Sample size

Sample shuffle, split the data set into 10 pieces, train 9 pieces, and verify 1 piece.

Implementation:

val_split = 0.1  # Ratio of training to validation
with open(annotation_path) as f:
    lines = f.readlines()
np.random.seed(47)
np.random.shuffle(lines)
np.random.seed(None)
num_val = int(len(lines) * val_split)  # Number of validation sets
num_train = len(lines) - num_val  # Number of training sets
Copy the code

4. Stage 1 training

In the first stage, part of the network is frozen and only the bottom weights are trained.

  • The optimizer uses the common Adam;
  • The loss function directly uses the output of the modely_pred, ignore the truth valuey_true;

Implementation:

model.compile(optimizer=Adam(lr=1e-3), loss={
    # Use the custom Yolo_Loss Lambda layer
    'yolo_loss': lambda y_true, y_pred: y_pred})  # Loss function
Copy the code

Where, the loss functions yolo_loss, y_true and y_pred:

Y_true is taken as an input to form the multi-input model, and Loss is written as a layer (Lambda layer) as the final output. In this way, when building the model, only the output of the model needs to be defined as Loss. When compile, loss is directly set to y_pred, because the output of the model is loss, that is, y_pred is loss, so y_true is ignored. When training, just add an array of y_true that matches the shape.

Lambda expressions for Python:

f = lambda y_true, y_pred: y_pred
print(f(1.2))  # 2 output
Copy the code

Model FIT data using a data generation wrapper (datA_Generator_wrapper) to generate training and validation data on a batch basis. Finally, the model Model stores weights. The implementation is as follows:

batch_size = 32  # batch
model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
                    steps_per_epoch=max(1, num_train // batch_size),
                    validation_data=data_generator_wrapper(
                        lines[num_train:], batch_size, input_shape, anchors, num_classes),
                    validation_steps=max(1, num_val // batch_size),
                    epochs=50,
                    initial_epoch=0,
                    callbacks=[logging, checkpoint])
# Store the final de-weight during retraining and also store it through callbacks
model.save_weights(log_dir + 'trained_weights_stage_1.h5')  
Copy the code

In the training process, the weight of the model completed by the epoch will also be stored, wherein, only the weight (save_weights_only) and the optimal result (save_best_only) will be stored, and every 3 epoches will be stored (period), namely:

checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
                             monitor='val_loss', save_weights_only=True,
                             save_best_only=True, period=3)  Only weights are stored
Copy the code

5. Stage 2 training

In stage 2, the network weights trained in stage 1 are used to continue the training:

  • Set all the weights to trainable, and in phase 1 freeze some of the weights;
  • The optimizer, again Adam, only has a reduced learning rate (LR), from 1E-3 to 1E-4, learning the optimal weights delicately;
  • The loss function, again, is only usedy_predTo ignorey_true.

Implementation:

for i in range(len(model.layers)):
    model.layers[i].trainable = True

model.compile(optimizer=Adam(lr=1e-4),
              loss={'yolo_loss': lambda y_true, y_pred: y_pred})
Copy the code

The fit data of the model in the second stage was similar to that in the first stage. It started from the 50th epoch and continued to train to the 100th epoch, triggering conditions and terminating in advance. Two additional callbacks, reduce_LR and early_stopping, were added to control training extraction termination:

  • reduce_lr: The learning rate was reduced by 10% each time (factor) when the evaluation index did not improve, and the training was terminated when the validation loss was three times (patience).
  • early_stopping: Validation set accuracy, continuous increase is less than 0 (min_delta), lasting for 10 epochs (patience), the training will be terminated.

Implementation:

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)  # Reduce learning rate when evaluation indicators are not improving
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)  # Validate set accuracy, terminate before decline

batch_size = 32
model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
                    steps_per_epoch=max(1, num_train // batch_size),
                    validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors,
                                                           num_classes),
                    validation_steps=max(1, num_val // batch_size),
                    epochs=100,
                    initial_epoch=50,
                    callbacks=[logging, checkpoint, reduce_lr, early_stopping])
model.save_weights(log_dir + 'trained_weights_final.h5')
Copy the code

At this point, after the second stage of training, the output network weight is the final model weight.


Fill 1 K – Means

K-means algorithm is a clustering algorithm, which divides a group of data into multiple groups, and each group contains a center.

YOLOv3, obtain all anchor boxes in the data set, cluster these boxes into 9 categories through k-means algorithm, obtain 9 cluster centers, and arrange the area from small to large, as 9 Anchor boxes.

Simulated K-means algorithm:

  1. Create a test point where X is the data and y is the label, such as X (300,2) and y (300,);
  2. The data were clustered into 9 categories.
  3. Input data X, training;
  4. Predict the category of Xy_kmeans;
  5. Scatter map was created using Scatter with viridis color range.
  6. Get the cluster centercluster_centers_, represented by black dots;

Source:

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()  # for plot styling
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs


def test_of_k_means(a):
    # create test point, X = data, y = tag, X (300,2), y (300,)
    X, y_true = make_blobs(n_samples=300, centers=9, cluster_std=0.60, random_state=0)
    kmeans = KMeans(n_clusters=9)  # Cluster the data
    kmeans.fit(X)  # data X
    y_kmeans = kmeans.predict(X)  # prediction

    # color range viridis: https://matplotlib.org/examples/color/colormaps_reference.html
    plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=20, cmap='viridis')  # c is color, s is size

    centers = kmeans.cluster_centers_  # Center of clustering
    plt.scatter(centers[:, 0], centers[:, 1], c='black', s=40, alpha=0.5)  # Center point is black

    plt.show()  # show


if __name__ == '__main__':
    test_of_k_means()
Copy the code

Output:


For 2. EarlyStopping

EarlyStopping is a subclass of Callback, which specifies the operation to be performed at the beginning and end of each phase. In Callback, there are simple subclasses implemented such as ACC, VAL_ACC, Loss, and val_Loss, as well as complex subclasses such as ModelCheckpoint (for storing model weights) and TensorBoard (for drawing diagrams).

The Callback interface is as follows:

def on_epoch_begin(self, epoch, logs=None):
def on_epoch_end(self, epoch, logs=None):
def on_batch_begin(self, batch, logs=None):
def on_batch_end(self, batch, logs=None):
def on_train_begin(self, logs=None):
def on_train_end(self, logs=None):
Copy the code

EarlyStopping is a Callback subclass for early stop training. Specifically, when the loss in the training or verification set is no longer reduced, that is, when the reduction is less than a certain threshold, the training will stop. In this way, parameter tuning efficiency is improved and resources are not wasted.

In the model FIT data, set the callbacks as a list. Multiple callbacks are supported, such as:

callbacks=[logging, checkpoint, reduce_lr, early_stopping]
Copy the code

Parameter to EarlyStopping:

  • Monitor: indicates the type of monitoring data. Acc,val_acc, loss,val_lossAnd so on;
  • min_delta: Stop threshold, which, in conjunction with the mode parameter, increases or decreases the least;
  • Mode: min is minimum, Max is maximum, auto is automatic, andmin_deltaCooperate;
  • Patience: The number of epoches that can be tolerated after reaching a threshold to avoid stopping in jitter;
  • Verbose: indicates the complexity of a log. The larger the value is, the more information is output.

Min_delta and patience need to work together to avoid the model stopping in the jitter process, and coordinate with each other when setting up. Min_delta decreased, patience decreased; When min_delta increases, patience increases.

Example:

early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)
Copy the code

OK, that’s all! Enjoy it!

Welcome to pay attention to the wechat public number DeepAlgorithm (ID: DeepAlgorithm), to know more depth technology!