Make writing a habit together! This is the 15th day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

GoogleNet

GoogleNet is the 2014 ILSVRC classification contest winner. Before introducing the complete network structure of GoogleNet, we will introduce its sub-structure — Inception. GoogleNet is composed of such Inception. Inception structure is as follows:

As can be seen, Inception structure carries out three convolution operations of different sizes and a pooling operation on the original image, and then combines the feature images obtained from four operations into a large feature image. Such a splicing operation can fuse features of different scales, and the final feature map contains information of different receptive fields. [note: The size of a feature graph obtained by the convolutional pooling operation here must be the same, so that they can be splicing together, which requires setting different step sizes s, padding and Y to achieve the effect of the same size of the feature graph before and after the input and output. (Here again, the convolution kernel of 3*3 is often used for convolution. S =1,padding=0 is used to keep the size of the input and output feature graphs unchanged; s=1,padding=2 is required for the corresponding 5*5 convolution kernel.

In GoogleNet’s paper, the Inception structure in the figure above is improved, that is, 1×1 convolution kernel is added after maxpooling before 3×3 and 5×5 filters, because the number of parameters can be greatly reduced. The structure is shown in the figure below:

We give the following explanation of why the number of parameters decreases with the convolution of 1×1 [take the previous 5*5 convolution and the subsequent plus 1*1 convolution]

From the figure above, it is obvious that the parameters are greatly reduced when the convolution of 1×1 is added.

After the Inception module is explained, the structure of GoogleNet is clear, consisting mainly of 9 Inception modules, as shown below: [Note that there are 2 additional Softmax layers added to the 4A and 4D modules in the network, but this part is not required during network testing]

This part will not explain the changes in the feature map of each step (there are too many 😭😭😭). I believe that if you know it in front, it is very easy to deduce!! For your convenience, attached is the diagram of each input and output change in the paper, as shown below:

Note: in the table above, param represents the amount of parameters needed, but my calculation results (also found some data verification) show that the table above is not correct, you can also calculate it!!