Wechat official account: Ilulaoshi/personal website: lulaoshi.info

AlexNet added convolution layer on the basis of LeNet, but AlexNet made a lot of adjustments to the convolution window, the number of output channels and the construction order. Although AlexNet pointed out the excellent performance of deep convolutional neural network, it did not provide simple rules to guide later researchers on how to design new networks.

The VGG Network, which stands for the laboratory’s name, is the brainchild of Oxford’s Visual Geometry Group. The VGG paper elaborates some of its findings from the ImageNet 2014 Challenge, suggesting that depth models can be constructed by reusing base blocks.

VGG block

Generally speaking, the law of constructing CNN foundation block is as follows:

  1. A convolution layer
  2. A nonlinear activation function, e.g. ReLU
  3. A pooling layer, such as the maximum pooling layer

The method proposed in the VGG paper is to continuously use several identical convolutional layers with 1 filling and 3 × 3 window shape, followed by a maximum pooling layer with 2 steps and 2 × 2 window shape. The convolution layer keeps the height and width of the input and output constant, while the pooling layer halves the size of the input.

For a given receptive field (the local size of the input image related to the output), the accumulation of small convolution kernels is better than the large convolution kernels, because increasing the network depth can ensure that the network can learn more complex patterns, and the whole model has fewer parameters, and the cost is relatively small.

We implement this basic VGG block using the vgg_block() method, which takes several parameters: the number of convolutional layers in the block, and the number of input and output channels.

def vgg_block(num_convs, in_channels, out_channels) :
    ''' Returns a block of convolutional neural network Parameters: num_convs (int): number of convolutional layers this block has in_channels (int): input channel number of this block out_channels (int): output channel number of this block Returns: a nn.Sequential network: '''
    layers=[]
    for _ in range(num_convs):
        # (input_shape - 3 + 2 + 1) / 1 = input_shape
        layers.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        layers.append(nn.ReLU())
        in_channels = out_channels
    # (input_shape - 2 + 2) / 2 = input_shape / 2
    layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
    return nn.Sequential(*layers)
Copy the code

VGG network

VGG is similar to AlexNet and LeNet. VGG network is divided into two parts, the first part is the convolution layer and pooling layer module, and the second part is the full connection layer module.

Convolution layer and pooling layer module are composed of multiple base blocks connected, namely the base block returned by vgg_block() method. We only need to provide vgg_block() method with: the number of convolution layers contained in this block, and the number of input and output channels of this block. We define the variable conv_arch, which is a list of tuples for each element in the list. Conv_arch determines the number of convolutional layers and the number of input-output channels in each VGG block. The full connection layer module is the same as in AlexNet. The following VGG () method returns the entire VGG network with the input parameter conv_arch.

def vgg(conv_arch) :
    ''' Returns the VGG network Parameters: conv_arch ([(int, int)]): a list which contains vgg block info. the first element is the number of convolution layers this block have. the latter element is the output channel number of this block. Returns: a nn.Sequential network: '''
    # The convolutional part
    conv_blks=[]
    in_channels=1
    for (num_convs, out_channels) in conv_arch:
        conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
        in_channels = out_channels

    return nn.Sequential(
        *conv_blks, nn.Flatten(),
        # The fully-connected part
        nn.Linear(out_channels * 7 * 7.4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096.4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096.10))
Copy the code

Now let’s construct a VGG network. It has 5 convolution blocks, the first 2 blocks use single convolution layer, and the last 3 blocks use double convolution layer. The input and output channels of the first block are 1 and 64 respectively. The number of input channels of the first block depends on the data set used. The number of channels of the Fashion MNIST data is 1, and the number of input channels of ImageNet data is generally 3. Then double the number of output channels each time until it becomes 512. Because this network uses eight convolutional layers and three full connection layers, it is often referred to as VGG-11.

At the code level, the variable conv_arch is defined, which contains multiple tuples in the middle. The first element of each tuple is the number of convolutional layers of the block, and the second element is the number of output channels.

conv_arch = ((1.64), (1.128), (2.256), (2.512), (2.512))

net = vgg(conv_arch)
Copy the code

The training process of the model is basically similar to LeNet and AlexNet shared previously. The source code has been posted to GitHub.

summary

  1. VGG constructs the network through multiple basic convolution blocks. The basic block needs to define the number of convolutional layers and the number of input and output channels.
  2. VGG paper believes that a smaller (e.g., 3 × 3) convolutional layer is more efficient than a larger one, because more layers can be added without too many parameters.

The resources

  1. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition.
  2. Ai/chapter_con d2l….