directory

1. Model introduction

2. Model structure

3. Model characteristics

4. Pytorch reappears


1. Model introduction

VGGNet is a deep convolutional network structure proposed by the Visual Geometry Group (VGG) at Oxford University. They won the second place in ILSVRC classification task with 7.32% error rate (the champion was GoogLeNet with 6.65% error rate) and the first place in Localization task with 25.32% error rate (GoogLeNet with 26.44% error rate) in 2014. The network name VGGNet is taken from the acronym of the group name. VGGNet is one of the first models to reduce the error rate of image classification to less than 10%, and the idea of convolution kernel adopted by this network is the basis of many later models. This model was unveiled at the International Conference On Learning Representations (ICLR) in 2015. So far, it has been cited more than 14,000 times. . The backbone network of many object detection models (such as SSD and M2Det) adopts VGGNet, as well as image style transfer and image segmentation, so VGGNet is one of the models we must learn in deep learning.

2. Model structure



The figure above is the VGG16-3 model. VGGNet in the original paper contains the evolution of 6 versions, corresponding to VGG11, VGG11-LRN, VGG13, VGG16-1, VGG16-3 and VGG19, as shown below:

Different suffix values represent different network layers (vGG11-LRN means that VGG11 of LRN is used in the first layer, vGG16-1 means that the size of convolution kernel is 1×1 in the last three convolution blocks, and corresponding VGG16-3 means that the size of convolution kernel is 3×3). VGG16 in this section is VGG16-3. The following table describes the parameter Settings.

The network layer The input size Nuclear size Output size Number of parameters
Convolution layer C_11 224 x 224 x 3 3 x 3 x 64/1 224 * 224 * 64 (3 * 3 * 3 + 1) by 64
Convolution layer C_12 224 * 224 * 64 3 x 3 x 64/1 224 * 224 * 64 (3 * 3 * 64 + 1) by 64
Pool Max Pool 1 224 * 224 * 64 2 x 2/2 112 * 112 * 64  0
Convolution layer C_21 112 * 112 * 64 3 x 3 x 128/1 112 * 112 * 128 (3 * 3 * 64 + 1) by 128
Convolution layer C_22 112 * 112 * 128 3 x 3 x 128/1 112 * 112 * 128 (3 * 3 * 128 + 1) by 128
Pool Max Pool 2 112 * 112 * 128 2 x 2/2 56 x x 128 to 56  0
Convolution layer C_31 56 x x 128 to 56 3 x 3 x 256/1 56 x x 256 to 56 (3 * 3 * 128 + 1) by 256
Convolution layer C_32 56 x x 256 to 56 3 x 3 x 256/1 56 x x 256 to 56 (3 * 3 * 256 + 1) by 256
Convolution layer C_33 56 x x 256 to 56 3 x 3 x 256/1 56 x x 256 to 56 (3 * 3 * 256 + 1) by 256
Pool Max Pool 3 56 x x 256 to 56 2 x 2/2 28 * 28 * 256  0
Convolution layer C_41 28 * 28 * 256 3 x 3 x 512/1 28 * 28 * 512 (3 * 3 * 256 + 1) by 512
Convolution layer C_42 28 * 28 * 512 3 x 3 x 512/1 28 * 28 * 512 (3 * 3 * 512 + 1) by 512
Convolution layer C_43 28 * 28 * 512 3 x 3 x 512/1 28 * 28 * 512 (3 * 3 * 512 + 1) by 512
Pool Max Pool 4 28 * 28 * 512 2 x 2/2 14 * 14 * 512  0
Convolution layer C_51 14 * 14 * 512 3 x 3 x 512/1 14 * 14 * 512 (3 * 3 * 512 + 1) by 512
Convolution layer C_52 14 * 14 * 512 3 x 3 x 512/1 14 * 14 * 512 (3 * 3 * 512 + 1) by 512
Convolution layer C_53 14 * 14 * 512 3 x 3 x 512/1 14 * 14 * 512 (3 * 3 * 512 + 1) by 512
Pool Max Pool 5 14 * 14 * 512 2 x 2/2 7 * 7 * 512  0
Full connection layer FC_1 7 * 7 * 512 (7 * 7 * 512) by 4096 1 x 4096 (7 * 7 * 512 + 1) by 4096
Fully connected layer FC_2 1 x 4096 4096 x 4096 1 x 4096 By 4096 (4096 + 1)
Full connection layer FC_3 1 x 4096 4096 x 1000 1 x 1000 By 1000 (4096 + 1)

3. Model characteristics

  • Convolution kernel size 3×3 and maximum pooling size 2×2 are used for the whole network.
  • The significance of 1×1 convolution mainly lies in linear transformation, while the number of input channels and output channels remain unchanged without dimensionality reduction.
  • Two 3×3 convolution layers in series are equivalent to a 5×5 convolution layer with a receptive field size of 5×5. Similarly, the effect of three 3×3 convolution layers in series is equivalent to one 7×7 convolution layer. In this connection mode, the number of network parameters is smaller, and the multi-layer activation function makes the network more capable of learning features.

Note: VGGNet has a small skill in training, first train the shallow simple network VGG11, then reuse the weight of VGG11 to initialize VGG13, such repeated training and initialization of VGG19, can make the training convergence speed faster. In the training process, multi-scale transformation is used to enhance the original data, which makes the model difficult to overfit.

4. Pytorch reappears

"""
vgg16
"""
class VGG16(nn.Module) :

    def __init__(self, num_classes) :
        super(VGG16, self).__init__()

        # calculate same padding:
        # (w - k + 2*p)/s + 1 = o
        # => p = (s(o-1) - w + k)/2

        self.block_1 = nn.Sequential(
            nn.Conv2d(in_channels=3,
                      out_channels=64,
                      kernel_size=(3.3),
                      stride=(1.1),
                      # (1(32-1)- 32 + 3)/2 = 1
                      padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(in_channels=64,
                      out_channels=64,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2.2),
                         stride=(2.2))
        )

        self.block_2 = nn.Sequential(
            nn.Conv2d(in_channels=64,
                      out_channels=128,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(in_channels=128,
                      out_channels=128,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2.2),
                         stride=(2.2))
        )
        
        self.block_3 = nn.Sequential(
            nn.Conv2d(in_channels=128,
                      out_channels=256,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(in_channels=256,
                      out_channels=256,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(in_channels=256,
                      out_channels=256,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2.2),
                         stride=(2.2))
        )

        self.block_4 = nn.Sequential(
            nn.Conv2d(in_channels=256,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2.2),
                         stride=(2.2))
        )

        self.block_5 = nn.Sequential(
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3.3),
                      stride=(1.1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2.2),
                         stride=(2.2))
        )

        self.classifier = nn.Sequential(
            nn.Linear(512.4096),
            nn.ReLU(True),
            nn.Dropout(p=0.65),
            nn.Linear(4096.4096),
            nn.ReLU(True),
            nn.Dropout(p=0.65),
            nn.Linear(4096, num_classes),
        )

        for m in self.modules():
            if isinstance(m, torch.nn.Conv2d) or isinstance(m, torch.nn.Linear):
                nn.init.kaiming_uniform_(m.weight, mode='fan_in', nonlinearity='leaky_relu')
# nn.init.xavier_normal_(m.weight)
                if m.bias is not None:
                    m.bias.detach().zero_()

        # self.avgpool = nn.AdaptiveAvgPool2d((7, 7))

    def forward(self, x) :

        x = self.block_1(x)
        x = self.block_2(x)
        x = self.block_3(x)
        x = self.block_4(x)
        x = self.block_5(x)
        # x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        logits = self.classifier(x)
        probas = F.softmax(logits, dim=1)
        # probas = nn.Softmax(logits)
        return probas
        # return logits
Copy the code