Why is the effect of two-layer 3*3 convolution kernels better than that of one-layer 5*5 convolution kernels?

1. Receptive field

Receptive field: the size of each output feature pixel of the convolutional neural network in the original image mapping area. For example, if the original image is 3×3, we convolve it with a 3×3 convolution kernel, and the output image is 1×1, so the receptive field of the output image is 3. The receptive field represents the feature extraction capability of convolution kernel.

2. Convolution of layer 2 3 * 3 with layer 1 5 * 5

Let the original graph be x(x>=5), we use the convolution kernel of 5×5 to slide in the X direction, step size is 1, need to slide (x-5+1) times, in the y direction, also slide (x-5+1) times, so the final convolution is (x-4) * (x-4) times, In other words, the output graph size is (x-4) * (x-4). By the same token, the 3 x3 convolution kernels convolution times after the output figure size of (x – 3 + 1) * (x – 3 + 1), on the basis of the output figure again with 3 x3 convolution kernels convolution, get the output of figure size: (x – 3 + 1-3 + 1) * (x – 3 + 1-3 + 1) = (x – 4) * (x – 4). Can be found on the same image convolution, use 1 2 3 * 3 convolution with the size of the output in figure 5 * 5 convolution is the same, that is to say, the feeling of the wild is as large, so such a conclusion can be summed up: 2 times 3 * 3 convolution and 1 through 5 * 5 convolution, feature extraction ability is the same. Then why is the convolution of small convolution kernels often used in industry to replace the convolution of large convolution kernels once? This reason can be explained from three angles

3. Comparison of computational amount of 2 layer 3 * 3 convolution and 1 layer 5 * 5 convolution

The computation amount of the convolution of the 3×3 convolution kernel once is 9, 5×5, and 25. Therefore, the computation amount of the convolution of the 3×3 convolution twice and the convolution of the 5×5 is as follows:

3 * 3 convolution kernel: 9 * (x-2) ^2 +9 * (x-4) ^2(first convolution computation + second convolution computation)

5 * 5 convolution kernel: 25 * x-4 ^2

List the inequalities:

9 * (x-2) ^2 +9 * (x-4) ^2 <=25 * (x-4) ^2

The calculation gives us this result

That is to say, when the original graph side length x>10, 2 times of 3 * 3 convolution calculation is less than 1 time of 5 * 5 convolution calculation!!

And as x increases, the gap between the two calculations will gradually widen.

However, the general image size is greater than 10, such as mnist handwritten number set edge length 28

4. Nonlinear comparison between 2 layer 3 * 3 convolution and 1 layer 5 * 5 convolution

Small convolution kernel convolution integrates multiple nonlinear activation layers to replace a single nonlinear activation layer and increase the discriminant ability.

5. Comparison of the number of parameters between the 2 layer 3 * 3 convolution and the 1 layer 5 * 5 convolution

2 times 3 * 3 convolution parameters =2 * 3 * 3 =18 1 times 5 * 5 convolution parameters =5 * 5=25 The number of parameters is significantly reduced

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Why is the effect of two-layer 33 convolution kernels better than that of one-layer 55 convolution kernels?

directory

1. Receptive field

2. Convolution of layer 2 3 * 3 with layer 1 5 * 5

3. Comparison of computational amount of 2 layer 3 * 3 convolution and 1 layer 5 * 5 convolution

4. Nonlinear comparison between 2 layer 3 * 3 convolution and 1 layer 5 * 5 convolution

5. Comparison of the number of parameters between the 2 layer 3 * 3 convolution and the 1 layer 5 * 5 convolution

Why is the effect of two-layer 3*3 convolution kernels better than that of one-layer 5*5 convolution kernels?

directory

1. Receptive field

2. Convolution of layer 2 3 * 3 with layer 1 5 * 5

3. Comparison of computational amount of 2 layer 3 * 3 convolution and 1 layer 5 * 5 convolution

4. Nonlinear comparison between 2 layer 3 * 3 convolution and 1 layer 5 * 5 convolution

5. Comparison of the number of parameters between the 2 layer 3 * 3 convolution and the 1 layer 5 * 5 convolution

Related Posts

21-22 (1.2.3 Bayesian probability)

PyTorch Distributed Autograd (5) —-

· Understand the convolutional neural network of NLP

Why is the effect of two-layer 33 convolution kernels better than that of one-layer 55 convolution kernels?