On the 23rd day of the November Gwen Challenge, check out the details of the event: the last Gwen Challenge 2021


Need then hands-on learning deep learning 6.1 why convolution layer | convolution formula is derived.

After the extrapolation and demonstration in section 1, we can already get that the output shape of the convolution depends on the input shape and the shape of the convolution kernel.

If the input is M ∗nm*nm∗n and the convolution kernel is A ∗ba*ba∗b, the output is m− A +1∗n−b+ 1m-A +1* N-b +1m− A +1∗n−b+1.

In fact, in addition to this, the size of the convolution kernel is also affected by filling and stride.

Fill the padding

Padding means adding edges to an image.

Now I’m going to add a pixel edge to the image.

This changes from 3∗63*63∗6 to 5∗85*85∗8.

Each edge plus 1 pixel is two more rows, two more columns. So I have two more rows and two more columns in my matrix.

In general, we need to set
p h = k h 1 p_h=k_h-1

p w = k w 1 p_w=k_w-1
, so that the input and output have the same height and width.

  • P input matrix
  • K convolution kernels

Assuming kHK_HKh is odd, we will fill ph/2p_h/ 2pH /2 rows on both sides of the height.

If khk_HKh is even, one possibility is to fill the input at the top with ⌈ pH /2⌉\ LCeil p_h/2\rceil⌈ pH /2⌉ line and at the bottom with the chain of ⌉ pH /2\ lFloor p_H /2\rfloor rag pH /2⌉ line.

Do the same for both sides of the fill width.

The advantage of choosing an odd number is that we can fill the same number of rows at the top and bottom, and the same number of columns at the left and right, while preserving the spatial dimension.

For any two-dimensional tensor X, if:

  1. The kernel size is odd;
  2. All edges are filled with the same number of rows and columns;
  3. The output is the same height and width as the input

Then it can be concluded that the output Y[I, j] is obtained by cross-correlation calculation with the convolution kernel centering on the input X[I, j].

import torch
from torch import nn
Copy the code
conv2d = nn.Conv2d(1.1, kernel_size=3, padding=1)

# For convenience, we define a function to compute the convolution layer.
# This function initializes the convolution layer weights and increases and decreases the corresponding dimensions for inputs and outputs
def comp_conv2d(conv2d, X) :
    # (1, 1) indicates that both the batch size and the number of channels are 1
    X = X.reshape((1.1) + X.shape)
    Y = conv2d(X)
    # omit the first two dimensions: batch size and channel
    return Y.reshape(Y.shape[2:)Note that each side is filled with 1 row or 1 column, so a total of 2 rows or 2 columns have been added
X = torch.rand(size=(5.8))
print(comp_conv2d(conv2d, X).shape)
Copy the code
>>
torch.Size([5, 8])
Copy the code
  • After the convolution, it’s still going to be a 5 by 8 tensor.
  • Note that thereX.reshape((1, 1) + X.shape)The addition of two tuples is a concatenation, not the addition of numbers.

Stride length strides

Stride length is how much you move at a time, so before we used to move one by default, now we move two by default?

This changes from 3∗63*63∗6 to 2∗32*32∗3.

In general, when the vertical stride is SHs_HSH and the horizontal stride is SWs_WSW, Output is ⌊ (nh – kh + ph + sh)/sh ⌋ x ⌊ (nw – kw + pw + sw)/sw ⌋ \ lfloor (n_h – k_h + + s_h p_h)/s_h \ rfloor \ times \ lfloor (n_w – p_w k_w + + s_w)/s_w \ rfloor ⌊ (nh – kh + ph + sh)/sh ⌋ x ⌊ (nw – kw + pw + sw)/sw ⌋

If we set ph=kh−1p_h= k_h-1pH = KH −1 and PW = kW −1p_w= k_W-1PW =kw−1, Shape of the output will be simplified as ⌊ (nh + sh – 1)/sh ⌋ x ⌊ (nw and sw – 1)/sw ⌋ \ lfloor (n_h + s_h – 1)/s_h \ rfloor \ times \ lfloor (s_w n_w + 1)/s_w \ rfloor ⌊ (nh + sh – 1)/sh ⌋ x ⌊ (nw and sw – 1)/sw ⌋. Further, if the height and width of the input are divisible by vertical and horizontal steps, the output shape will be (NH /sh)×(NW/SW)(N_H/s_H) \times (N_W /s_w)(NH /sh)×(NW/SW).

conv2d = nn.Conv2d(1.1, kernel_size=3, padding=1, stride=2)
comp_conv2d(conv2d, X).shape
Copy the code
>>py
torch.Size([3, 4])
Copy the code

Use X in the code above, set the step of 2d crossover operation as 2 in both horizontal and vertical, so that the result is also reduced, and conforms to the formula deduced above.


  1. More on hands-on Deep Learning series can be seen here: Juejin. Cn

  2. Github address: DeepLearningNotes/d2l(github.com)

Still in update …………