Welcome to follow my public account [Jizhi Vision], reply 001 to get Google programming specification

O_o >_< O_o O_o ~_~ O_o

Hello everyone, I am Jizhi Vision, this paper introduces the design and implementation of YOLOv1 algorithm in detail, including training.

This article is the first object detection algorithm to share, from the use of regression as the originator of object detection algorithm YOLOv1. YOLO (You Only Look Once) is a series of algorithms full of artistry and practicality, and YOLOv1 is the beginning, proposed in the paper “You Only Look Once: Unified, Real-Time Object Detection”. YOLO target detection algorithm is famous for real-time, and can gradually take into account the accuracy through continuous development. Take a look at some of the publicity effects on YOLO’s official website:

So let’s start. Again, we’re going to talk not only about the principles but also about implementation and training.

1. YOLOv1 principle

In fact, the baseline YOLOv1 model is not only proposed in the paper, but also proposed Fast YOLOv1, which aims to let people see the efficiency limit of YOLO. First, let’s look at the experimental data in the paper. The main reference objects of YOLOv1 are Fast R-CNN and DPM, and the accuracy (mAP) and performance (FPS) data are as follows:

     

The training test data set is PASCAL VOC 0712, and the hardware is Nvidia Titan X. Here we are mainly interested in the real-time Detectors. Compared with DPM and Fast R-CNN, YOLOv1 gives full consideration to accuracy and efficiency. You can also note that Fast YOLOv1 achieved a frame rate of 155 FPS, equivalent to 6.5 ms for a floating point inference sheet, which was pretty Fast at the time.

Compared with fast-RCNN, YOLOv1 can better distinguish the background and the target to be detected. However, YOLOv1 is not an extremely perfect network and has some defects (related to network design) as follows:

     

The figure above is Fast R-RCNN vs YOLOv1 Error Analysis, which mainly includes the following points:

(1) The prediction accuracy of YOLOv1 is lower than that of Fast R-CNN;

(2) YOLOv1 has a higher Localization error rate, which is due to the fact that YOLOv1 adopts the direct regression method for location prediction, which is inferior to the sliding window method;

(3) YOLOv1 has a lower error rate for background prediction;

Guided by the experimental conclusion, the following network design art of YOLOv1 is introduced.

1.1 Network structure design

The entire network consists of 24 convolutional layers (inspired by GoogLeNet, with many 1 x 1 and 3 x 3 alternating structures) and 2 fully connected layers. The input image size is 448 x 448, and the output is 7 x 7 x 30. YOLOv1 divides the input image into 7 x 7 grids, and each grid is responsible for predicting 2 targets, with these attributes for each target: X, y, H, W, confidence, and then 20 categories, and then you have an output tensor of 7 x 7 x (2 x 5 + 20) = 7 x 7 x 30. As follows:

     

Because YOLOv1 divides the image into 7 x 7 grids, although each grid is responsible for predicting two goals, only one bounding box with the highest matching degree is preserved through the suppression, that is to say, I can predict 77 49 goals for you at most in one map. If there are two adjacent targets in the same grid, one of them is bound to be missed, so YOLOv1 has a poor detection effect on adjacent targets.

1.2 Loss function design

The whole loss function of YOLOv1 is as follows, which is classic:

     

One by one:

(1) The first and second are both positional losses. The first is the loss of center position, which is calculated using a simple sum of squares. Where, liJ ^obj is a control parameter, which is used to be 1 when the grid in the label is to be detected, otherwise it is zero;

(2) The second is width and height loss. Here we take the square root of w and h, because it slows down the prediction box that tends to adjust to a larger size, so lij^obj is the same as in the first one;

(3) The third and fourth need to be combined, both are the confidence loss of the prediction box. The third is when there is a target, and the fourth is when there is no target. As we know, in the actual detection of 49 grid, each grid with or without a target is also a long tail problem, and there is often a small number of cases with targets, so the weight balance of λobj = 5 and λnoobj =.5 is designed here.

(4) The fifth is category loss.

So much for principles, then practice.

2, YOLOv1 implementation

Github address: github.com/AlexeyAB/da…

Darknet is already compiled by default. If you can’t do this, please refer to my previous article for an introduction.

The following introduction to the production of VOC0712 fusion data set.

2.1 Make VOC0712 fusion data set

Download data set

# download datasets
wget https://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar
wget https://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar
wget https://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar

#Unpack the
tar xf VOCtrainval_11-May-2012.tar
tar xf VOCtrainval_06-Nov-2007.tar
tar xf VOCtest_06-Nov-2007.tar
Copy the code

Making imagelist:

#Generate label and train. TXT, test. TXT, val.txt
wget https://pjreddie.com/media/files/voc_label.py
python voc_label.py

#Configure 2012train, 2007_train, 2007_test, 2007_val as the training set, and 2012_val as the authentication set
cat 2007_* 2012_train.txt > train.txt
Copy the code

The directory structure at this point looks like this:

     

2.2 training

Then you can train. It should be noted that the training instructions of YOLOv1 are different from those of YOLOv2 and YOLOv3/ V4. The training instructions of YOLOv1 are as follows:

./darknet yolo train cfg/yolov1/yolo.train.cfg
Copy the code

In case you are curious, the training image path is not specified in the command above, it is written inside the code (including the weight path backup/ is also written dead) :

This can be changed to the path you want to specify for train.txt, and then recompile the framework:

make clean
make -j32
Copy the code

Then execute the training instruction and start the training:

Here we use the default training configuration:

     

Some of these parameters can be modified to suit your needs.

2.3 validation

The training process is quite long, and when the training is over, you can test the training results:

./darknet yolo test cfg/yolov1/yolo.cfg backup/yolo.weights
Copy the code

     

Enter the Image to be detected in Enter Image Path, such as data/dog.jpg, to see the detection effect:

   

This completes the whole process of YOLOv1 data set preparation, training and validation.

The principle and practice of YOLOv1 have been shared in detail above. I hope my sharing can be of some help to your study.


【 model training 】 Target detection implementation share a: detailed explanation of the implementation of YOLOv1 algorithm