Simple implementation of YOLOV5 based on Pytorch

【GiantPandaCV Guide 】 This article is mainly aimed at the implementation of the network structure code of yOLOV5-PyTorch version, simplify the understanding of the code and simplify the configuration file, and further sort out some of the four network structures of YOLOV5. In this process, I have a more in-depth understanding of the V5 network. Finally, I hope that the readers of this article can be harvested, for some of the code on the optimization of writing hope to communicate with you progress.

One, the network complete code

The common code structure in V5 is retained, because this part of the code is easier to understand, and the overall code looks relatively simple, mainly the construction of the overall network structure, which is not very friendly to some developers by parsing yamL files.

Some of the variables in the network

C1: input channel C2: output channel K: convolution kernel size S: step size p: PADDING g: group act; Activation function e: expansion factor GW: network width factor Gd: network depth factor n: number of module repeats NC: number of categoriesCopy the code

Backbone network code CSPDarknet53

import torch import torch.nn as nn def autopad(k, p=None): # kernel, padding # Pad to 'same' if p is None: p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad return p class CBL(nn.Module): Def __init__ (self, c1 and c2, k = 1, s = 1, p = None, g = 1, the act = True, e = 1.0) : super(CBL, self).__init__() c1 = round(c1 * e) c2 = round(c2 * e) self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False) self.bn = nn.BatchNorm2d(c2) self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) def forward(self, x): return self.act(self.bn(self.conv(x))) class Focus(nn.Module): Def __init__ (self, c1 and c2, k = 3, s = 1, p = 1, g = 1, the act = True, e = 1.0) : super(Focus, self).__init__() c2 = round(c2 * e) self.conv = CBL(c1 * 4, c2, k, s, p, g, act) def forward(self, x): # x(b,c,w,h) -> y(b,4c,w/2,h/2) flatten_channel = torch.cat([x[..., 0::2, 0::2], x[..., 1::2, 0::2], x[..., 0::2, 1::2], x[..., 1::2, 1::2]], dim=1) return self.conv(flatten_channel) class SPP(nn.Module): Def __init__(self, c1, c2, k=(5, 9, 13), e=1.0): super(SPP, self).__init__() c1 = round(c1 * e) c2 = round(c2 * e) c_ = c1 // 2 self.cbl_before = CBL(c1, c_, 1, 1) self.max_pool = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k]) self.cbl_after = CBL(c_ * 4, c2, 1, 1) def forward(self, x): x = self.cbl_before(x) x_cat = torch.cat([x] + [m(x) for m in self.max_pool], 1) return self.cbl_after(x_cat) class ResUnit_n(nn.Module): def __init__(self, c1, c2, n): super(ResUnit_n, self).__init__() self.shortcut = c1 == c2 res_unit = nn.Sequential( CBL(c1, c1, k=1, s=1, p=0), CBL(c1, c2, k=3, s=1, p=1) ) self.res_unit_n = nn.Sequential(*[res_unit for _ in range(n)]) def forward(self, x): return x + self.res_unit_n(x) if self.shortcut else self.res_unit_n(x) class CSP1_n(nn.Module): def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True, n=1, e=None): super(CSP1_n, self).__init__() c1 = round(c1 * e[1]) c2 = round(c2 * e[1]) n = round(n * e[0]) c_ = c2 // 2 self.up = nn.Sequential( CBL(c1, c_, k, s, autopad(k, p), g, act), ResUnit_n(c_, c_, n), # nn.Conv2d(c_, c_, 1, 1, 0, Bias =False) Self. bottom = Conv2d(c1, c_, 1, 1, 0) self.tie = nn.Sequential(nn.BatchNorm2d(c_ * 2)) self.bottom = nn.Conv2d(c1, c_, 1, 1, 0) self.tie = nn.Sequential(nn.BatchNorm2d(c_ * 2)) nn.LeakyReLU(), nn.Conv2d(c_ * 2, c2, 1, 1, 0, bias=False) ) def forward(self, x): total = torch.cat([self.up(x), self.bottom(x)], dim=1) out = self.tie(total) return out class CSPDarkNet(nn.Module): Def __init__ (self, gd = 0.33, gw = 0.5) : super(CSPDarkNet, self).__init__() self.truck_big = nn.Sequential( Focus(3, 64, e=gw), CBL(64, 128, k=3, s=2, p=1, e=gw), CSP1_n(128, 128, n=3, e=[gd, gw]), CBL(128, 256, k=3, s=2, p=1, e=gw), CSP1_n(256, 256, n=9, e=[gd, gw]), ) self.truck_middle = nn.Sequential( CBL(256, 512, k=3, s=2, p=1, e=gw), CSP1_n(512, 512, n=9, e=[gd, gw]), ) self.truck_small = nn.Sequential( CBL(512, 1024, k=3, s=2, p=1, e=gw), SPP(1024, 1024, e=gw) ) def forward(self, x): h_big = self.truck_big(x) # torch.Size([2, 128, 76, 76]) h_middle = self.truck_middle(h_big) h_small = self.truck_small(h_middle) return h_big, h_middle, h_small def darknet53(gd, gw, pretrained, **kwargs): model = CSPDarkNet(gd, gw) if pretrained: if isinstance(pretrained, str): model.load_state_dict(torch.load(pretrained)) else: raise Exception(f"darknet request a pretrained path. got[{pretrained}]") return modelCopy the code

Overall network construction

import torch
import torch.nn as nn
from cspdarknet53v5 import darknet53


def autopad(k, p=None):  # kernel, padding
    # Pad to 'same'
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p


class UpSample(nn.Module):

    def __init__(self):
        super(UpSample, self).__init__()
        self.up_sample = nn.Upsample(scale_factor=2, mode='nearest')

    def forward(self, x):
        return self.up_sample(x)


class CBL(nn.Module):

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True, e=1.0):
        super(CBL, self).__init__()
        c1 = round(c1 * e)
        c2 = round(c2 * e)
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))


class ResUnit_n(nn.Module):

    def __init__(self, c1, c2, n):
        super(ResUnit_n, self).__init__()
        self.shortcut = c1 == c2
        res_unit = nn.Sequential(
            CBL(c1, c1, k=1, s=1, p=0),
            CBL(c1, c2, k=3, s=1, p=1)
        )
        self.res_unit_n = nn.Sequential(*[res_unit for _ in range(n)])

    def forward(self, x):
        return x + self.res_unit_n(x) if self.shortcut else self.res_unit_n(x)


class CSP1_n(nn.Module):

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True, n=1, e=None):
        super(CSP1_n, self).__init__()

        c1 = round(c1 * e[1])
        c2 = round(c2 * e[1])
        n = round(n * e[0])
        c_ = c2 // 2
        self.up = nn.Sequential(
            CBL(c1, c_, k, s, autopad(k, p), g, act),
            ResUnit_n(c_, c_, n),
            # nn.Conv2d(c_, c_, 1, 1, 0, bias=False) 这里最新yolov5结构中去掉了，与网上的结构图稍微有些区别
        )
        self.bottom = nn.Conv2d(c1, c_, 1, 1, 0)
        self.tie = nn.Sequential(
            nn.BatchNorm2d(c_ * 2),
            nn.LeakyReLU(),
            nn.Conv2d(c_ * 2, c2, 1, 1, 0, bias=False)
        )

    def forward(self, x):
        total = torch.cat([self.up(x), self.bottom(x)], dim=1)
        out = self.tie(total)
        return out


class CSP2_n(nn.Module):

    def __init__(self, c1, c2, e=0.5, n=1):
        super(CSP2_n, self).__init__()
        c_ = int(c1 * e)
        cbl_2 = nn.Sequential(
            CBL(c1, c_, 1, 1, 0),
            CBL(c_, c_, 1, 1, 0),
        )
        self.cbl_2n = nn.Sequential(*[cbl_2 for _ in range(n)])
        self.conv_up = nn.Conv2d(c_, c_, 1, 1, 0)
        self.conv_bottom = nn.Conv2d(c1, c_, 1, 1, 0)
        self.tie = nn.Sequential(
            nn.BatchNorm2d(c_ * 2),
            nn.LeakyReLU(),
            nn.Conv2d(c_ * 2, c2, 1, 1, 0)
        )

    def forward(self, x):
        up = self.conv_up(self.cbl_2n(x))
        total = torch.cat([up, self.conv_bottom(x)], dim=1)
        out = self.tie(total)
        return out


class yolov5(nn.Module):

    def __init__(self, nc=80, gd=0.33, gw=0.5):
        super(yolov5, self).__init__()
        # ------------------------------Backbone--------------------------------
        self.backbone = darknet53(gd, gw, None)

        # ------------------------------Neck------------------------------------
        self.neck_small = nn.Sequential(
            CSP1_n(1024, 1024, n=3, e=[gd, gw]),
            CBL(1024, 512, 1, 1, 0, e=gw)
        )
        self.up_middle = nn.Sequential(
            UpSample()
        )
        self.out_set_middle = nn.Sequential(
            CSP1_n(1024, 512, n=3, e=[gd, gw]),
            CBL(512, 256, 1, 1, 0, e=gw),
        )
        self.up_big = nn.Sequential(
            UpSample()
        )
        self.out_set_tie_big = nn.Sequential(
            CSP1_n(512, 256, n=3, e=[gd, gw])
        )

        self.pan_middle = nn.Sequential(
            CBL(256, 256, 3, 2, 1, e=gw)
        )
        self.out_set_tie_middle = nn.Sequential(
            CSP1_n(512, 512, n=3, e=[gd, gw])
        )
        self.pan_small = nn.Sequential(
            CBL(512, 512, 3, 2, 1, e=gw)
        )
        self.out_set_tie_small = nn.Sequential(
            CSP1_n(1024, 1024, n=3, e=[gd, gw])
        )
        # ------------------------------Prediction--------------------------------
        # prediction
        big_ = round(256 * gw)
        middle = round(512 * gw)
        small_ = round(1024 * gw)
        self.out_big = nn.Sequential(
            nn.Conv2d(big_, 3 * (5 + nc), 1, 1, 0)
        )
        self.out_middle = nn.Sequential(
            nn.Conv2d(middle, 3 * (5 + nc), 1, 1, 0)
        )
        self.out_small = nn.Sequential(
            nn.Conv2d(small_, 3 * (5 + nc), 1, 1, 0)
        )

    def forward(self, x):
        h_big, h_middle, h_small = self.backbone(x)
        neck_small = self.neck_small(h_small)  
        # ----------------------------up sample 38*38-------------------------------
        up_middle = self.up_middle(neck_small)
        middle_cat = torch.cat([up_middle, h_middle], dim=1)
        out_set_middle = self.out_set_middle(middle_cat)

        # ----------------------------up sample 76*76-------------------------------
        up_big = self.up_big(out_set_middle)  # torch.Size([2, 128, 76, 76])
        big_cat = torch.cat([up_big, h_big], dim=1)
        out_set_tie_big = self.out_set_tie_big(big_cat)

        # ----------------------------PAN 36*36-------------------------------------
        neck_tie_middle = torch.cat([self.pan_middle(out_set_tie_big), out_set_middle], dim=1)
        up_middle = self.out_set_tie_middle(neck_tie_middle)

        # ----------------------------PAN 18*18-------------------------------------
        neck_tie_small = torch.cat([self.pan_small(up_middle), neck_small], dim=1)
        out_set_small = self.out_set_tie_small(neck_tie_small)

        # ----------------------------prediction-------------------------------------
        out_small = self.out_small(out_set_small)
        out_middle = self.out_middle(up_middle)
        out_big = self.out_big(out_set_tie_big)

        return out_small, out_middle, out_big


if __name__ == '__main__':
    # 配置文件的写法
    config = {
        #            gd    gw
        'yolov5s': [0.33, 0.50],
        'yolov5m': [0.67, 0.75],
        'yolov5l': [1.00, 1.00],
        'yolov5x': [1.33, 1.25]
    }
    # 修改一次文件名字
    net_size = config['yolov5x']
    net = yolov5(nc=80, gd=net_size[0], gw=net_size[1])
    print(net)
    a = torch.randn(2, 3, 416, 416)
    y = net(a)
    print(y[0].shape, y[1].shape, y[2].shape)


Copy the code

Second, the analysis of network structure

ResUnit_n residual block

class ResUnit_n(nn.Module):

    def __init__(self, c1, c2, n):
        super(ResUnit_n, self).__init__()
        self.shortcut = c1 == c2
        res_unit = nn.Sequential(
            CBL(c1, c1, k=1, s=1, p=0),
            CBL(c1, c2, k=3, s=1, p=1)
        )
        self.res_unit_n = nn.Sequential(*[res_unit for _ in range(n)])

    def forward(self, x):
        return x + self.res_unit_n(x) if self.shortcut else self.res_unit_n(x)

Copy the code

CSP1_x structure

The CSP1_n code is optimized to treat the CSP as a lying animal with its head on the left and its tail on the right. Up is near the sky, bottom is near the ground, and the tail of an animal is tied

class CSP1_n(nn.Module): def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True, n=1, e=None): super(CSP1_n, self).__init__() c1 = round(c1 * e[1]) c2 = round(c2 * e[1]) n = round(n * e[0]) c_ = c2 // 2 self.up = nn.Sequential( CBL(c1, c_, k, s, autopad(k, p), g, act), ResUnit_n(c_, c_, n), # nn.Conv2d(c_, c_, 1, 1, 0, Bias =False) Self. bottom = Conv2d(c1, c_, 1, 1, 0) self.tie = nn.Sequential(nn.BatchNorm2d(c_ * 2)) self.bottom = nn.Conv2d(c1, c_, 1, 1, 0) self.tie = nn.Sequential(nn.BatchNorm2d(c_ * 2)) nn.LeakyReLU(), nn.Conv2d(c_ * 2, c2, 1, 1, 0, bias=False) ) def forward(self, x): total = torch.cat([self.up(x), self.bottom(x)], dim=1) out = self.tie(total) return outCopy the code

CSPDarknet backbone network construction

Class CSPDarkNet(nn.Module): def __init__(self, gd=0.33, gw=0.5): super(CSPDarkNet, self).__init__() self.truck_big = nn.Sequential( Focus(3, 64, e=gw), CBL(64, 128, k=3, s=2, p=1, e=gw), CSP1_n(128, 128, n=3, e=[gd, gw]), CBL(128, 256, k=3, s=2, p=1, e=gw), CSP1_n(256, 256, n=9, e=[gd, gw]), ) self.truck_middle = nn.Sequential( CBL(256, 512, k=3, s=2, p=1, e=gw), CSP1_n(512, 512, n=9, e=[gd, gw]), ) self.truck_small = nn.Sequential( CBL(512, 1024, k=3, s=2, p=1, e=gw), SPP(1024, 1024, e=gw) ) def forward(self, x): h_big = self.truck_big(x) h_middle = self.truck_middle(h_big) h_small = self.truck_small(h_middle) return h_big, h_middle, h_smallCopy the code

Overall network construction

Class yolov5(nn.Module): def __init__(self, nc=80, gd=0.33, gw=0.5): super(yolov5, self).__init__() # ------------------------------Backbone------------------------------------ self.backbone = darknet53(gd, gw, None) # ------------------------------Neck------------------------------------ self.neck_small = nn.Sequential( CSP1_n(1024, 1024, n=3, e=[gd, gw]), CBL(1024, 512, 1, 1, 0, e=gw)) Self.up_middle = nn.sequential (UpSample()) self.out_set_middle = nn.sequential (CSP1_n(1024, 512, n=3, e=[gd, gw]), CBL(512, 256, 1, 1, 0, e=gw), ) self.up_big = nn.Sequential( UpSample() ) self.out_set_tie_big = nn.Sequential( CSP1_n(512, 256, n=3, e=[gd, W5 w # w # w # v Self. pan_middle = nn.Sequential(CBL(256, 256, 3, 2, 1) Sequential(CBL) e=gw) ) self.out_set_tie_middle = nn.Sequential( CSP1_n(512, 512, n=3, e=[gd, gw]) ) self.pan_small = nn.Sequential( CBL(512, 512, 3, 2, 1, e=gw) ) self.out_set_tie_small = nn.Sequential( # CSP2_n(512, 512) CSP1_n(1024, 1024, n=3, e=[gd, gw]) ) # ------------------------------Prediction------------------------------------ # prediction big_ = round(256 * gw) middle = round(512 * gw) small_ = round(1024 * gw) self.out_big = nn.Sequential( nn.Conv2d(big_, 3 * (5 + nc), 1, 1, 0) ) self.out_middle = nn.Sequential( nn.Conv2d(middle, 3 * (5 + nc), 1, 1, 0) ) self.out_small = nn.Sequential( nn.Conv2d(small_, 3 * (5 + nc), 1, 1, 0) ) def forward(self, x): h_big, h_middle, h_small = self.backbone(x) neck_small = self.neck_small(h_small) # ----------------------------up sample 38*38-------------------------------- up_middle = self.up_middle(neck_small) middle_cat = torch.cat([up_middle, h_middle], dim=1) out_set_middle = self.out_set_middle(middle_cat) # ----------------------------up sample 76*76-------------------------------- up_big = self.up_big(out_set_middle) # torch.Size([2, 128, 76, 76]) big_cat = torch.cat([up_big, h_big], dim=1) out_set_tie_big = self.out_set_tie_big(big_cat) # ----------------------------PAN 36*36------------------------------------- neck_tie_middle = torch.cat([self.pan_middle(out_set_tie_big), out_set_middle], dim=1) up_middle = self.out_set_tie_middle(neck_tie_middle) # ----------------------------PAN 18*18------------------------------------- neck_tie_small = torch.cat([self.pan_small(up_middle), neck_small], dim=1) out_set_small = self.out_set_tie_small(neck_tie_small) # ----------------------------prediction------------------------------------- out_small = self.out_small(out_set_small) out_middle = self.out_middle(up_middle) out_big = self.out_big(out_set_tie_big) return out_small, out_middle, out_bigCopy the code

The four sizes of configuration files are written in the config dictionary, which is the configuration parameter of the network model. No other parameters are put in the configuration file. You can also put categories in the configuration file. In the network code above the width parameter is the variable e which is passed to each network.
```
The config = {# gd gw 'yolov5s: [0.33, 0.50],' yolov5m: [0.67, 0.75], 'yolov5l: [1.00, 1.00],' yolov5x ': Config ['yolov5x'] net = yolov5(nc=80, gd=net_size[0], gw=net_size[1])Copy the code
```

In the original V5 code, the Head part in V3 is separately written into a Detect class. The main reason is that SOME training techniques are used in V5, and there are two parts, training and detection. The original V5 code is difficult for beginners. However, the yamL configuration file to configure the network has been used by many companies, and this may be a way to write future engineering code, but still need to master this writing method.

Third, summary

My personal feeling is that the design of the network and the writing method of the code should have an unconstrained imagination, and the writing of the code is also like the elegant feeling in the martial arts novels. (Network structure diagram, there are many on the Internet, I copied the structure diagram of jiang Baymax, based on its structure diagram and with the latest V5 code based on the adjustment).
In the latest V5 network structure, Transformer structure has appeared, there is a rhythm of engineering changes in the CV field, you can learn about it.

Welcome to GiantPandaCV, where you will see exclusive deep learning sharing, adhere to the original, share the new knowledge we learn every day. ✧ (, ̀ omega, ́)

If you have questions about this article, or want to join the communication group, please add BBuf wechat:

Qr code

In order to make it easier for readers to get information and for the author of our official account to release some updates of the Github project, we have established a QQ group. The QR code is below, you can join if you are interested.

Public QQ communication group

This article uses the Article Synchronization Assistant to synchronize

Simple implementation of YOLOV5 based on Pytorch

One, the network complete code

Second, the analysis of network structure

Third, summary

Related Posts

Text classification: Keras+RNN vs traditional Machine Learning

Titanic passenger survival prediction

Audio and video applications on smartwatches