YOLO, short for You Only Look Once, is an object detection algorithm based on deep convolutional neural network. YOLO V3 is the third version of YOLO, with faster and more accurate detection algorithm.

This article source: https://github.com/SpikeKing/keras-yolo3-detection

Welcome GitHub:https://github.com/SpikeKing Follow me

YOLO V3 has provided model parameters for the COCO (Common Objects in Context) dataset. We can take the COCO model parameters as pre-training parameters, and then combine the existing data set to create our own detection algorithm.

This example uses WIDER FACE data to train a high-precision FACE detection model.

WIDER

Data set: WIDER Face

Established time: 2015-11-19

The WIDER FACE Dataset is a benchmark Dataset for FACE detection. Images are selected from a WIDER (Web Image Dataset for Event Recognition) Dataset. There are 32,203 pictures and 393,703 human faces. In different forms such as scale, pose and occlusion, human faces are highly transformed. The WIDER FACE data set is based on 61 event categories, each randomly selected for 40% training, 10% verification and 50% test. Training and testing have bounding box ground truth values, while verification does not.

Data sets can be downloaded from the official website, where Wider_face_train_bbx_gt. TXT is the border truth value of Face Annotations, and the data format is as follows:

0--Parade/0_Parade_marchingband_1_849.jpg
1
449 330 122 149 0 0 0 0 0 0 
Copy the code

Data description:

  • Line 1: The location and name of the image;
  • Line 2: Number of borders;
  • Line 3~n: borders and attributes for each face:
    • Digits 1 to 4 arex1, y1, w, h
    • Blur: 0 clear, 1 fair, 2 serious;
    • Expression: expression, 0 normal, 1 exaggerated;
    • Illumination: Exposure, 0 normal, 1 extreme;
    • Occlusion: 0 none, 1 part, 2 lots;
    • Pose: posture, 0 normal, 1 atypical;

Wider_face_val_bbx_gt.txt is similar.

The image data is of average resolution and varies in size, with a size of 1024x and the same width.

Data conversion

To meet the training requirements, you need to convert the border format of the WIDER data set to the border format required for training.

The file path, border xmin, ymin, xmax, ymax, label:

Data/WIDER_val/images / 10 -- People_Marching / 10 _people_marching_people_marching_2_433. JPG 614346771568, 0 245382392570, 0 353222461390, 0 498237630399, 0Copy the code

Convert the source code. Walk through data folders, parse data in different formats line by line, and write to files. Note:

  1. The object frame, Wider, data format is X, Y, W, H, and the training data format is Xmin,ymin,xmax,ymax;
  2. Only detect a category of face, the category index is 0;

Refer to the project’s wider_annotation.py script for details.

def generate_train_file(bbx_file, data_folder, out_file):
    paths_list, names_list = traverse_dir_files(data_folder)
    name_dict = dict()
    for path, name in zip(paths_list, names_list):
        name_dict[name] = path

    data_lines = read_file(bbx_file)

    sub_count = 0
    item_count = 0
    out_list = []

    for data_line in data_lines:
        item_count += 1
        if item_count % 1000= =0:
            print('item_count: ' + str(item_count))

        data_line = data_line.strip()
        l_names = data_line.split('/')
        if len(l_names) == 2:
            if out_list:
                out_line = ' '.join(out_list)
                write_line(out_file, out_line)
                out_list = []

            name = l_names[- 1]
            img_path = name_dict[name]
            sub_count = 1
            out_list.append(img_path)
            continue

        if sub_count == 1:
            sub_count += 1
            continue

        if sub_count >= 2:
            n_list = data_line.split(' ')
            x_min = n_list[0]
            y_min = n_list[1]
            x_max = str(int(n_list[0]) + int(n_list[2]))
            y_max = str(int(n_list[1]) + int(n_list[3]))
            p_list = ', '.join([x_min, y_min, x_max, y_max, '0'])  The # tag is all 0, face
            out_list.append(p_list)
            continue
Copy the code

The class file wider_classes.txt has only one line, face.

training

YOLO V3’s training process, parameters: annotation data, categories, storage paths, pre-training model, anchors, input dimensions.

annotation_path = 'dataset/WIDER_train.txt'  # data
classes_path = 'configs/wider_classes.txt'  # category

log_dir = 'logs/002/'  # log folder

pretrained_path = 'model_data/yolo_weights.h5'  # Pre-training model
anchors_path = 'configs/yolo_anchors.txt'  # anchors

class_names = get_classes(classes_path)  # Category list
num_classes = len(class_names)  # category number
anchors = get_anchors(anchors_path)  # list of anchors

input_shape = (416.416)  Multiple of # 32, input image
Copy the code

Create a model:

  1. input_shapeIs the size of the input image;
  2. Anchors are the size of the test box;
  3. num_classesIs the number of categories;
  4. freeze_body, Mode 1 is all frozen, mode 2 is the last three layers of training;
  5. weights_path, the path of pre-training weight;
  6. Logging is a callback to TensorBoard, and checkpoint is a callback to store weights.
model = create_model(input_shape, anchors, num_classes,
                     freeze_body=2,
                     weights_path=pretrained_path)  # make sure you know what you freeze

logging = TensorBoard(log_dir=log_dir)
checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
                             monitor='val_loss', save_weights_only=True,
                             save_best_only=True, period=3)  # store weights only,
Copy the code

Training data and validation data:

val_split = 0.1  # Ratio of training to validation
with open(annotation_path) as f:
    lines = f.readlines()
np.random.seed(10101)
np.random.shuffle(lines)
np.random.seed(None)
num_val = int(len(lines) * val_split)  # Number of validation sets
num_train = len(lines) - num_val  # Number of training sets
Copy the code

Model compilation and FIT data:

  1. The loss function is used onlyy_predPrediction results;
  2. The number of batches is 32;
  3. Both training data and validation data were obtaineddata_generator_wrapper;
  4. During training, weight is stored by checkpoint, and then the final weight is stored.
model.compile(optimizer=Adam(lr=1e-3), loss={
    # use custom yolo_loss Lambda layer.
    'yolo_loss': lambda y_true, y_pred: y_pred})  # Loss function

batch_size = 32  # batch size
print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
                    steps_per_epoch=max(1, num_train // batch_size),
                    validation_data=data_generator_wrapper(
                        lines[num_train:], batch_size, input_shape, anchors, num_classes),
                    validation_steps=max(1, num_val // batch_size),
                    epochs=200,
                    initial_epoch=0,
                    callbacks=[logging, checkpoint])
model.save_weights(log_dir + 'trained_weights_stage_1.h5')  # Store final parameters during retraining, stored via callback
Copy the code

Model creation:

  1. Create a model of YOLO V3,yolo_body, parameter image input, anchor number of each scale, category number;
  2. Pre-training weights were loaded, parameters were frozen, and the last three layers were retained.
  3. Custom Lambda, loss function layer of the model;
  4. Input is YOLO model input and truth value, output is loss function;
def create_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2,
                 weights_path='model_data/yolo_weights.h5'):
    K.clear_session()  # remove session
    image_input = Input(shape=(None.None.3))  # Image input format
    h, w = input_shape  # size
    num_anchors = len(anchors)  # anchor number

    The three scales of # YOLO, the number of anchor for each scale, the number of categories + 4 borders + confidence 1
    y_true = [Input(shape=(h // {0: 32.1: 16.2: 8}[l], w // {0: 32.1: 16.2: 8}[l],
                           num_anchors // 3, num_classes + 5)) for l in range(3)]

    model_body = yolo_body(image_input, num_anchors // 3, num_classes)  # model
    print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))

    if load_pretrained:  Load the pretraining model
        model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)  # load parameters, skip error
        print('Load weights {}.'.format(weights_path))
        if freeze_body in [1.2] :# Freeze darknet53 body or freeze all but 3 output layers.
            num = (185, len(model_body.layers) - 3)[freeze_body - 1]
            for i in range(num):
                model_body.layers[i].trainable = False  # Turn off training for other layers
            print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))

    model_loss = Lambda(yolo_loss,
                        output_shape=(1,), name='yolo_loss',
                        arguments={'anchors': anchors,
                                   'num_classes': num_classes,
                                   'ignore_thresh': 0.5})(model_body.output + y_true)  The input is followed by the output
    model = Model([model_body.input] + y_true, model_loss)  Inputs and outputs

    return model
Copy the code

Data generator:

  1. data_generator_wrapperFor condition checking;
  2. Random the input annotation line;
  3. According to the number of batches, the image is placedimage_dataIn, place the border and parameters into the true valuey_true;
  4. Output image and border, as well as batch number padding, for storing confidence.
def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
    '''data generator for fit_generator'''
    n = len(annotation_lines)
    i = 0
    while True:
        image_data = []
        box_data = []
        for b in range(batch_size):
            if i == 0:
                np.random.shuffle(annotation_lines)
            image, box = get_random_data(annotation_lines[i], input_shape, random=True)  # Get pictures and boxes
            image_data.append(image)  # add image
            box_data.append(box)  # Add box
            i = (i + 1) % n
        image_data = np.array(image_data)
        box_data = np.array(box_data)
        y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)  # truth value
        yield [image_data] + y_true, np.zeros(batch_size)


def data_generator_wrapper(annotation_lines, batch_size, input_shape, anchors, num_classes):
    """ For condition checking """
    n = len(annotation_lines)  # Label the number of lines in the image
    if n == 0 or batch_size <= 0: return None
    return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes)
Copy the code

Specific source code reference yolo3_train.py, can generate face detection model.


validation

In yolo3_predictor. py, replace the trained model parameters:

Model: ep108 loss44.018 – val_loss43. 270. The h5

Detection picture:

Others: Adding more data sets and combining them with specific demand pictures can improve the detection effect.

OK, that’s all! Enjoy it!