Abstract:This article will introduce the network structure of Yolov3 in detail.

Yolov3 network structure

In the blog “Yolo Development history and Network Structure”, we have explained the network structure of Yolov1 in detail, and briefly mentioned the improvement of the network structure of Yolov2 and Yolov3. This blog will introduce the network structure of Yolov3 in detail, the content is relatively simple.

Yolov3 network structure diagram

As can be seen from the figure, Yolov3 is mainly composed of the following parts:

  • The input
  • Basic network: The basic network can be selected according to specific needs. The author used darknet-53 designed by himself in the original text
  • YOLOv3 network has three branches: Y1, Y2, and Y3

Network Components

DBL: Darknetconv2d_BN_Leaky, shown in the bottom left of Figure 1 in the code, is the basic component of Yolo_v3. It’s convolution +BN+Leaky relu. For V3, BN and Leaky Relu are already inseparable from the convolution layer (except for the last convolution) and together constitute the smallest component.

Resn: n stands for numbers. There are res1, res2,… ,res8, etc., indicate how many res_units are in this res_block. This is the big component of Yolo_v3, which starts with the residual structure of ResNet, which allows for a deeper network structure (up from v2’s Darknet-19 to V3’s Darknet-53, which has no residual structure). An intuitive interpretation of res_block can be seen in the lower right corner of Figure 1, whose basic component is also DBL.

Concat: tensor concatenation. Stitching together the upper samples of darknet’s middle layer and one of the layers behind. The operation of concatenation is different from the operation of add at the residual layer. Concatenation expands the dimension of the tensor, while ADD simply adds directly without changing the dimension of the tensor.

Three branches of the YOLOv3 network

Multi-scale detection -Y1

Applicable goals: Big goals

Path: marked with the green line

Output dimensions: 13 x 13 x 255

Output dimensions: 13×13: image size; 255= (80+5) ×3; 80: identify the number of objects; 5=x,y,w,h and c (confidence); 3:3 bounding boxes predicted for each point.

Multi-scale detection -Y2

Applicable goals: Medium goals

Path: marked in yellow

Output dimension: 26×26×255

Specific explanation of output dimensions: 26×26: image size; 255= (80+5) ×3; 80: identify the number of objects; 5=x,y,w,h and c (confidence); 3:3 bounding boxes predicted for each point.

Multi-scale detection -Y3

Applicable goals: Small goals

Path: purple line

Output dimension: 52×52×255

Output dimensions: 52×52: image size; 255= (80+5) ×3; 80: identify the number of objects; 5=x,y,w,h and c (confidence); 3:3 bounding boxes predicted for each point.


Click follow to learn about the fresh technologies of Huawei Cloud