directory

1. Model introduction

2. Model structure

3. Model characteristics

4. Pytorch reappears

1. Model introduction

VGGNet is a deep convolutional network structure proposed by the Visual Geometry Group (VGG) at Oxford University. They won the second place in ILSVRC classification task with 7.32% error rate (the champion was GoogLeNet with 6.65% error rate) and the first place in Localization task with 25.32% error rate (GoogLeNet with 26.44% error rate) in 2014. The network name VGGNet is taken from the acronym of the group name. VGGNet is one of the first models to reduce the error rate of image classification to less than 10%, and the idea of convolution kernel adopted by this network is the basis of many later models. This model was unveiled at the International Conference On Learning Representations (ICLR) in 2015. So far, it has been cited more than 14,000 times. . The backbone network of many object detection models (such as SSD and M2Det) adopts VGGNet, as well as image style transfer and image segmentation, so VGGNet is one of the models we must learn in deep learning.

2. Model structure

The figure above is the VGG16-3 model. VGGNet in the original paper contains the evolution of 6 versions, corresponding to VGG11, VGG11-LRN, VGG13, VGG16-1, VGG16-3 and VGG19, as shown below:

Different suffix values represent different network layers (vGG11-LRN means that VGG11 of LRN is used in the first layer, vGG16-1 means that the size of convolution kernel is 1×1 in the last three convolution blocks, and corresponding VGG16-3 means that the size of convolution kernel is 3×3). VGG16 in this section is VGG16-3. The following table describes the parameter Settings.

The network layer	The input size	Nuclear size	Output size	Number of parameters
Convolution layer C_11	224 x 224 x 3	3 x 3 x 64/1	224 * 224 * 64	(3 * 3 * 3 + 1) by 64
Convolution layer C_12	224 * 224 * 64	3 x 3 x 64/1	224 * 224 * 64	(3 * 3 * 64 + 1) by 64
Pool Max Pool 1	224 * 224 * 64	2 x 2/2	112 * 112 * 64	0
Convolution layer C_21	112 * 112 * 64	3 x 3 x 128/1	112 * 112 * 128	(3 * 3 * 64 + 1) by 128
Convolution layer C_22	112 * 112 * 128	3 x 3 x 128/1	112 * 112 * 128	(3 * 3 * 128 + 1) by 128
Pool Max Pool 2	112 * 112 * 128	2 x 2/2	56 x x 128 to 56	0
Convolution layer C_31	56 x x 128 to 56	3 x 3 x 256/1	56 x x 256 to 56	(3 * 3 * 128 + 1) by 256
Convolution layer C_32	56 x x 256 to 56	3 x 3 x 256/1	56 x x 256 to 56	(3 * 3 * 256 + 1) by 256
Convolution layer C_33	56 x x 256 to 56	3 x 3 x 256/1	56 x x 256 to 56	(3 * 3 * 256 + 1) by 256
Pool Max Pool 3	56 x x 256 to 56	2 x 2/2	28 * 28 * 256	0
Convolution layer C_41	28 * 28 * 256	3 x 3 x 512/1	28 * 28 * 512	(3 * 3 * 256 + 1) by 512
Convolution layer C_42	28 * 28 * 512	3 x 3 x 512/1	28 * 28 * 512	(3 * 3 * 512 + 1) by 512
Convolution layer C_43	28 * 28 * 512	3 x 3 x 512/1	28 * 28 * 512	(3 * 3 * 512 + 1) by 512
Pool Max Pool 4	28 * 28 * 512	2 x 2/2	14 * 14 * 512	0
Convolution layer C_51	14 * 14 * 512	3 x 3 x 512/1	14 * 14 * 512	(3 * 3 * 512 + 1) by 512
Convolution layer C_52	14 * 14 * 512	3 x 3 x 512/1	14 * 14 * 512	(3 * 3 * 512 + 1) by 512
Convolution layer C_53	14 * 14 * 512	3 x 3 x 512/1	14 * 14 * 512	(3 * 3 * 512 + 1) by 512
Pool Max Pool 5	14 * 14 * 512	2 x 2/2	7 * 7 * 512	0
Full connection layer FC_1	7 * 7 * 512	(7 * 7 * 512) by 4096	1 x 4096	(7 * 7 * 512 + 1) by 4096
Fully connected layer FC_2	1 x 4096	4096 x 4096	1 x 4096	By 4096 (4096 + 1)
Full connection layer FC_3	1 x 4096	4096 x 1000	1 x 1000	By 1000 (4096 + 1)

3. Model characteristics

Convolution kernel size 3×3 and maximum pooling size 2×2 are used for the whole network.
The significance of 1×1 convolution mainly lies in linear transformation, while the number of input channels and output channels remain unchanged without dimensionality reduction.
Two 3×3 convolution layers in series are equivalent to a 5×5 convolution layer with a receptive field size of 5×5. Similarly, the effect of three 3×3 convolution layers in series is equivalent to one 7×7 convolution layer. In this connection mode, the number of network parameters is smaller, and the multi-layer activation function makes the network more capable of learning features.

Note: VGGNet has a small skill in training, first train the shallow simple network VGG11, then reuse the weight of VGG11 to initialize VGG13, such repeated training and initialization of VGG19, can make the training convergence speed faster. In the training process, multi-scale transformation is used to enhance the original data, which makes the model difficult to overfit.

4. Pytorch reappears

"""
vgg16
"""
class VGG16(nn.Module) :

    def __init__(self, num_classes) :
        super(VGG16, self).__init__()

        # calculate same padding:
        # (w - k + 2*p)/s + 1 = o
        # => p = (s(o-1) - w + k)/2

        self.block_1 = nn.Sequential(
            nn.Conv2d(in_channels=3,
                      out_channels=64,
                      kernel_size=(3.3),
                      stride=(1.1),
                      # (1(32-1)- 32 + 3)/2 = 1
                      padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(in_channels=64,
                      out_channels=64,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2.2),
                         stride=(2.2))
        )

        self.block_2 = nn.Sequential(
            nn.Conv2d(in_channels=64,
                      out_channels=128,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(in_channels=128,
                      out_channels=128,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2.2),
                         stride=(2.2))
        )
        
        self.block_3 = nn.Sequential(
            nn.Conv2d(in_channels=128,
                      out_channels=256,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(in_channels=256,
                      out_channels=256,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(in_channels=256,
                      out_channels=256,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2.2),
                         stride=(2.2))
        )

        self.block_4 = nn.Sequential(
            nn.Conv2d(in_channels=256,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2.2),
                         stride=(2.2))
        )

        self.block_5 = nn.Sequential(
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2.2),
                         stride=(2.2))
        )

        self.classifier = nn.Sequential(
            nn.Linear(512.4096),
            nn.ReLU(True),
            nn.Dropout(p=0.65),
            nn.Linear(4096.4096),
            nn.ReLU(True),
            nn.Dropout(p=0.65),
            nn.Linear(4096, num_classes),
        )

        for m in self.modules():
            if isinstance(m, torch.nn.Conv2d) or isinstance(m, torch.nn.Linear):
                nn.init.kaiming_uniform_(m.weight, mode='fan_in', nonlinearity='leaky_relu')
# nn.init.xavier_normal_(m.weight)
                if m.bias is not None:
                    m.bias.detach().zero_()

        # self.avgpool = nn.AdaptiveAvgPool2d((7, 7))

    def forward(self, x) :

        x = self.block_1(x)
        x = self.block_2(x)
        x = self.block_3(x)
        x = self.block_4(x)
        x = self.block_5(x)
        # x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        logits = self.classifier(x)
        probas = F.softmax(logits, dim=1)
        # probas = nn.Softmax(logits)
        return probas
        # return logits
Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

VGGNet (PyTorch)

1. Model introduction

2. Model structure

3. Model characteristics

4. Pytorch reappears

VGGNet (PyTorch)

1. Model introduction

2. Model structure

3. Model characteristics

4. Pytorch reappears

Related Posts

Learn National OKR! A technical perspective on the 14th Five-Year Plan and the outline of the 2035 vision goals

Mathematical Notes for Programmers 3– Iterative methods

Dp Notes: Thoughts on DP algorithms and scrollarray optimization