Mainly divided into several procedures:

1, config.py: saves most of the parameters of the whole project;

Calculate_Iou.py: Calculate the IOU value of preset box and truth box for screening positive and negative samples; And defines functions that encode and decode coordinates;

NMS. Py: defines the non-maximum suppression function;

4. Random_crop.py: A Cropper class is defined to enhance data through random cropping and random flipping.

5, Read_Data.py: Defines a Reader class to read the VOC2012 dataset.

6, anchors.py: Generate default boxes of the appropriate size and number for different feature layers;

7, label_anchors.py: Matches different default boxes with true boxes;

8, Network. py: Defines a NET class and defines the SSD network structure for training and saving models;

Loss_function.py: Defines the loss function, which contains a 1:3 ratio of sampling for positive and negative samples;

10, ssd_api.py: defines the SSD_detector class, which is used to load the model and input images for target detection;

1, the config. Py

To save the parameters of the project, first go to the code:

# config.py

import numpy as np

import os

NMS_THRESHold = 0.3 # NMS (Non-Maximum Suppression) Threshold

DATA_PATH = ‘.. /VOC2012’ # Dataset path

ImageSets_PATH = os.path.join(DATA_PATH, ‘ImageSets’) # The path to hold the image coordinates and category information

BLOCKS = [‘block4’, ‘block7’, ‘block8’,

‘block9’, ‘block10’, ‘block11’, ‘block12’] # Name of feature layer to extract

MAX_SIZE = 1000 # Maximum side length of the image

MIN_SIZE = 600 # Minimum size of the image

EPOCHES = 2000 # number of iterations

Batches = 64 # in BATCHES per epoch

Threshold = 0.5 # distinguishes the THRESHOLD for positive and negative samples to match

Score_Threshold = 0.997 # Threshold of positive sample score on the test

MIN_CROP_RATIO = 0.6 # Minimum ratio of random clipping

MAX_CROP_RATIO = 1.0 # Maximum ratio of random clipping

MODEL_PATH = ‘./model/’ # model save path

LEARNING_RATE = 2E-4 # Learning Rate

CLASSES = [”, ‘aeroplane’, ‘bicycle’, ‘bird’, ‘boat’, ‘bottle’, ‘bus’,

           ‘car’, ‘cat’, ‘chair’, ‘cow’, ‘diningtable’, ‘dog’, ‘horse’,

           ‘motorbike’, ‘person’, ‘pottedplant’, ‘sheep’, ‘sofa’,

‘train’, ‘tvmonitor’] # Object category. The first one is the background category

# Average image three pixels

Pixel_means = np.array([[122.7717, 115.9465, 102.9801]])

# Aspect ratio of preset boxes for different layers

RATIOS = [[2, .5],

[2, 5, 3, 1/3),

[2, 5, 3, 1/3),

[2, 5, 3, 1/3),

[2, 5, 3, 1/3),

[2, 5], [2, 5]]

# Step size per level

STRIDES = [8, 16, 32, 64, 128, 256, 512]

# S in the paper is considered to be the size of the side length (ratio size) of the preset box in each layer.

S = [0.04, 0.1, 0.26, 0.42, 0.58, 0.74, 0.9, 1.06]

# Default box size for each layer. The second element is the default box size for the next layer

Sk = [(20.48, 51.2),

      (51.2, 133.12),

      (133.12, 215.04),

      (215.04, 296.96),

      (296.96, 378.88),

      (378.88, 460.8),

      (460.8, 542.72)]

# is used to adjust the ratio of border regression values in loss

PRIOT_SCALING = (0.1, 0.1, 0.2, 0.2)

Parameters have remarks, not much to say, pick a few more important bar:

Blocks — Blocks holds the name of the seven feature layers — the first feature layer ‘Block4’ is an intermediate layer for VGG, and the other six feature layers are additional layers added by SSD to the layer for VGG, as stated in the ‘Strides’ parameter.

2, RATIOS: RATIOS preserving several aspect RATIOS of seven default boxes. For example, the first layer has two aspect RATIOS [2, 0.5], which means that each feature point in the first characteristic layer has two additional default boxes with aspect RATIOS of 2 and 0.5 respectively.

3. Sk: Sk saves the side length of the default box of each feature layer. Note that the size of the side length here is different from that of the original paper.

Then the parameters in config.py are referenced by import config as CFG, and the parameters are referenced by CFG. The parameter name is fine.

2, calculate_IOU. Py

Here defines a function to calculate the IOU value of preset box and truth box, which is used to filter positive and negative samples; And defines functions that encode and decode coordinates;

First on the code:

# calculate_IOU.py

import numpy as np

import config as cfg

def encode_targets(true_box, anchors, prior_scaling=cfg.PRIOT_SCALING):

    anchor_y_min = anchors[:, 0]

    anchor_x_min = anchors[:, 1]

    anchor_y_max = anchors[:, 2]

    anchor_x_max = anchors[:, 3]

    anchor_ctr_y = (anchor_y_max + anchor_y_min) / 2

    anchor_ctr_x = (anchor_x_max + anchor_x_min) / 2

    anchor_h = anchor_y_max – anchor_y_min

    anchor_w = anchor_x_max – anchor_x_min

    true_box_y_min = true_box[:, 0]

    true_box_x_min = true_box[:, 1]

    true_box_y_max = true_box[:, 2]

    true_box_x_max = true_box[:, 3]

    true_box_ctr_y = (true_box_y_max + true_box_y_min) / 2

    true_box_ctr_x = (true_box_x_max + true_box_x_min) / 2

    true_box_h = true_box_y_max – true_box_y_min

    true_box_w = true_box_x_max – true_box_x_min

    target_dy = (true_box_ctr_y-anchor_ctr_y)/anchor_h

    target_dx = (true_box_ctr_x-anchor_ctr_x)/anchor_w

    target_dh = np.log(true_box_h/anchor_h)

    target_dw = np.log(true_box_w/anchor_w)

    targets = np.stack([target_dy, target_dx, target_dh, target_dw], axis=1)

    return np.reshape(targets, (-1, 4)) / prior_scaling

def decode_targets(anchors, targets, image_shape, prior_scaling=cfg.PRIOT_SCALING):

    y_min = anchors[:, 0]

    x_min = anchors[:, 1]

    y_max = anchors[:, 2]

    x_max = anchors[:, 3]

    height, width = image_shape[:2]

    ctr_y = (y_max + y_min) / 2

    ctr_x = (x_max + x_min) / 2

    h = y_max – y_min

    w = x_max – x_min

    targets = targets * prior_scaling

    dy = targets[:, 0]

    dx = targets[:, 1]

    dh = targets[:, 2]

    dw = targets[:, 3]

    pred_ctr_y = dy*h + ctr_y

    pred_ctr_x = dx*w + ctr_x

    pred_h = h*np.exp(dh)

    pred_w = w*np.exp(dw)

    y_min = pred_ctr_y – pred_h/2

    x_min = pred_ctr_x – pred_w/2

    y_max = pred_ctr_y + pred_h/2

    x_max = pred_ctr_x + pred_w/2

    y_min = np.clip(y_min, 0, height)

    y_max = np.clip(y_max, 0, height)

    x_min = np.clip(x_min, 0, width)

    x_max = np.clip(x_max, 0, width)

    boxes = np.stack([y_min, x_min, y_max, x_max], axis=1)

    return boxes

def fast_bbox_overlaps(holdon_anchor, true_boxes):

Num_true = true_boxes. SHAPE [0] # Number of boxes

Num_holdon = holdon_anchor.shape[0] # Number of candidate boxes (samples that have been deleted) n

    true_y_max = true_boxes[:, 2]

    true_y_min = true_boxes[:, 0]

    true_x_max = true_boxes[:, 3]

    true_x_min = true_boxes[:, 1]

    anchor_y_max = holdon_anchor[:, 2]

    anchor_y_min = holdon_anchor[:, 0]

    anchor_x_max = holdon_anchor[:, 3]

    anchor_x_min = holdon_anchor[:, 1]

    true_h = true_y_max – true_y_min

    true_w = true_x_max – true_x_min

    true_h = np.expand_dims(true_h, axis=1)

    true_w = np.expand_dims(true_w, axis=1)

    anchor_h = holdon_anchor[:, 2] – holdon_anchor[:, 0]

    anchor_w = holdon_anchor[:, 3] – holdon_anchor[:, 1]

    true_area = true_w * true_h

    anchor_area = anchor_w * anchor_h

    min_y_up = np.expand_dims(true_y_max, axis=1) < anchor_y_max

    min_y_up = np.where(min_y_up, np.expand_dims(

        true_y_max, axis=1), np.expand_dims(anchor_y_max, axis=0))

    max_y_down = np.expand_dims(true_y_min, axis=1) > anchor_y_min

    max_y_down = np.where(max_y_down, np.expand_dims(

        true_y_min, axis=1), np.expand_dims(anchor_y_min, axis=0))

    lh = min_y_up – max_y_down

    min_x_up = np.expand_dims(true_x_max, axis=1) < anchor_x_max

    min_x_up = np.where(min_x_up, np.expand_dims(

        true_x_max, axis=1), np.expand_dims(anchor_x_max, axis=0))

    max_x_down = np.expand_dims(true_x_min, axis=1) > anchor_x_min

    max_x_down = np.where(max_x_down, np.expand_dims(

        true_x_min, axis=1), np.expand_dims(anchor_x_min, axis=0))

    lw = min_x_up – max_x_down

    pos_index = np.where(

        np.logical_and(

            lh > 0, lw > 0

        )

    )

    overlap_area = lh * lw  # (n, m)

    overlap_weight = np.zeros(shape=lh.shape, dtype=np.int)

    overlap_weight[pos_index] = 1

    all_area = true_area + anchor_area

    dialta_S = all_area – overlap_area

    dialta_S = np.where(dialta_S > 0, dialta_S, all_area)

    IOU = np.divide(overlap_area, dialta_S)

    IOU = np.where(overlap_weight, IOU, 0)

    IOU_s = np.transpose(IOU)

Return iOU_S. astype(np.float32) # (n, m) transpose

if __name__ == “__main__”:

    pass

3, NMS. Py

Non-maximum Suppression (NMS) is used to remove redundant detection frames and retain the best one.

If NMS is not performed, the effect looks like this:

The code:

import tensorflow as tf

from network import Net

Import config as CFG function(){// Forex MT4 Tutor www.kaifx.cn/mt4.html

import cv2

import numpy as np

from label_anchors import decode_targets

import matplotlib.pyplot as plt

from nms import py_cpu_nms

class SSD_detector(object):

    def __init__(self):

        self.net = Net(is_training=False)

        self.model_path = cfg.MODEL_PATH

        self.pixel_means = cfg.PIXEL_MEANS

        self.min_size = cfg.MIN_SIZE

        self.pred_loc, self.pred_cls = self.net.output

        self.score_threshold = cfg.SCORE_THRESHOLD

    def pre_process(self, image_path):

        image = cv2.imread(image_path)

        image = image.astype(np.float)

        image, scale = self.resize_image(image)

        value = {‘image’: image, ‘scale’: scale, ‘image_path’: image_path}

        return value

    def resize_image(self, image):

        image_shape = image.shape

        size_min = np.min(image_shape[:2])

        size_max = np.max(image_shape[:2])

        scale = float(self.min_size) / float(size_min)

        image = cv2.resize(image, dsize=(0, 0), fx=scale, fy=scale)

        return image, scale

    def test_ssd(self, image_paths):

        if isinstance(image_paths, str):

            image_paths = [image_paths]

        with tf.Session() as sess:

            sess.run(tf.compat.v1.global_variables_initializer())

            ckpt = tf.train.get_checkpoint_state(cfg.MODEL_PATH)

            if ckpt and ckpt.model_checkpoint_path:

# If you have saved the model, continue training on the saved model

                self.net.saver.restore(sess, ckpt.model_checkpoint_path)

print(‘Model Reload Successfully! ‘)

            for path in image_paths:

                value = self.pre_process(path)

                image = value[‘image’] – self.pixel_means

                feed_dict = {self.net.x: image}

                pred_loc, pred_cls, layer_anchors = sess.run(

                    [self.pred_loc, self.pred_cls, self.net.anchors], feed_dict

                )

                pos_loc, pos_cls, pos_anchors, pos_scores = self.decode_output(

                    pred_loc, pred_cls, layer_anchors)

                pos_boxes = decode_targets(pos_anchors, pos_loc, image.shape)

                pos_scores = np.expand_dims(pos_scores, axis=-1)

                self.draw_result(

                    value[‘image’], pos_boxes, pos_cls, value[‘scale’]

                )

                keep_index = py_cpu_nms(np.hstack([pos_boxes, pos_scores]))

                self.draw_result(

                    value[‘image’], pos_boxes[keep_index], pos_cls[keep_index], value[‘scale’]

                )

    def draw_result(self, image, pos_boxes, pos_cls, scale, font=cv2.FONT_HERSHEY_SIMPLEX):

        image = cv2.resize(image, dsize=(0, 0), fx=1/scale, fy=1/scale)        

        image = image.astype(np.int)

        pos_boxes = pos_boxes * (1/scale)

        for i in range(pos_boxes.shape[0]):

            bbox = pos_boxes[i]

            label = cfg.CLASSES[pos_cls[i]]

            y_min, x_min, y_max, x_max = bbox.astype(np.int)

            cv2.rectangle(image, (x_min, y_min),

                          (x_max, y_max), (0, 0, 255), thickness=2)

            cv2.putText(image, label, (x_min+20, y_min+20),

                        font, 1, (255, 0, 0), thickness=2)

        plt.imshow(image[:, :, [2, 1, 0]])

        plt.show()

    def decode_output(self, pred_loc, pred_cls, layer_anchors):

        pos_loc, pos_cls, pos_anchors, pos_scores = [], [], [], []

        for i in range(len(pred_cls)):

            loc_ = pred_loc[i]

CLS_ = PRED_CLS [I] # CLS_ is the score for each category

            anchors = layer_anchors[i].reshape((-1, 4))

MAX_SCORES = NP.MAX (CLS_ [:, 1:], AXIS =-1) # Non-background maximum score

Cls_ = np.argmax(cls_, axis=-1) # Maximum index

Pos_index = np. WHERE (max_scores > self.score_threshold)[0

            pos_loc.append(loc_[pos_index])

            pos_cls.append(cls_[pos_index])

            pos_anchors.append(anchors[pos_index])

            pos_scores.append(max_scores[pos_index])

        pos_loc = np.vstack(pos_loc)

        pos_cls = np.hstack(pos_cls)

        pos_anchors = np.vstack(pos_anchors)

        pos_scores = np.hstack(pos_scores)

        return pos_loc, pos_cls, pos_anchors, pos_scores

if __name__ == “__main__”:

    detector = SSD_detector()

    detector.test_ssd(‘./1.jpg’)