This is the second day of my participation in the First Challenge 2022

Once I reproduced a classic network and wanted to save an if-else in order to keep the code beautiful, so I wanted to find a special module that could output the input exactly as it was. But I also don’t know what kind of module has this function, so I asked a group of friends in a computer communication group.

Conv2d (input_CH, input_ch, 1) conv2D (input_ch, 1) conv2D (input_ch, 1) conv2D (input_ch, 1) But 1 times 1 is kind of like full connection.Copy the code

I’m already laughing at this — it looks like you don’t understand the basics as well as I do. In addition, when I talked with other students about convolution, I also felt that convolution, especially 1*1 convolution, was a disaster area of misunderstanding. Therefore, today I intend to sort out my previous thoughts for review.

A convolution

Convolutional neural network is mainly used for image data processing, so we focus on image convolution. As a student of signals and systems, I have a more primitive, elementary, and accurate understanding of convolution. The convolution operation of two functions can be understood as flipping and shifting one of them, and then measuring the overlap between them. However, in convolutional neural network, convolution layer is strictly a wrong term. The so-called convolutional operation is actually cross-correlation operation, rather than convolution. Teacher Li Mu pointed out this point in Hands-on Deep Learning, which will not be listed in this paper for the moment.

Let’s take a look at the definition of two-dimensional convolution in Hands-on Deep Learning:

In a two-dimensional cross-correlation operation, the convolution window starts at the upper left corner of the input tensor and slides from left to right and top to top. When the convolution window slides to a new position, some tensors contained in the window are multiplied by elements with the convolution kernel tensors, and the resulting tensors are summed to obtain a single scalar value, from which we get the output tensor value of this position.

Let’s start with the conversation above.

At first glance, the convolution kernel of group A’s initial response of 1*1 seems to be the correct answer, because for A single-channel input, the convolution kernel of 1*1 can indeed ensure that the input does not change. However, I quickly asked about the multi-channel input scenario, and the response from the group was 1*1*N. Another group member gave the code and a more accurate explanation, which was actually incorrect.

For a multi-channel convolution, convolution kernel layer to the input layer, can be cross-correlation operations, but if only a convolution kernel, then the output channel number can only be 1, if use 1 * 1 * N of convolution kernels (filter), is equivalent to the input of N channel all corresponding addition, after finally got a sum of single channel matrix, And this damage is irreversible, and we can not recover the original appearance of the N channels before the input.

The relationship between convolutional layer and channel number

Furthermore, according to the group, the number of input and output channels remains the same, which requires multiple convolution kernels. About the input channel and output channel, this is actually the first place that many students blur. In a convolution operation, the number of input channels is equal to the number of convolution kernels in each filter, and the number of output channels is equal to the number of filters.

First, let’s take a look at the composition of the convolution layer, as shown in the figure:

We call the part of the middle kernel a convolution layer. A convolution layer contains a single or multiple filters, and each filter has several layers. The so-called number of convolution kernels is the number of layers of each filter. In the figure above, the number of convolution kernels in each filter is3, the number of filters is2.3and2It also corresponds to the number of input channels and the number of output channels.

Back to the group message, if the number of input and output channels is the same, then a filter of 1*1*N*N is needed, but the question is, how should the values of the N convolution kernels be set? If they are all set to the same, they will eventually turn N channel inputs into N identical outputs, which is obviously not correct. But I admit, if N convolution kernels are all set to different values, as long as the values in the filter form an N by N invertible matrix, then you can invert the input from the output, but that’s the opposite. In short, it is not feasible to output a multichannel input intact using convolution.

What does 1*1 convolution have to do with the full connection layer

Again, this is a bit of a misunderstanding, and I’ve seen more than one person think that 1 by 1 convolution is essentially a fully connected layer. Although it is a very basic question, it is not meaningful, but since it is written here, I would like to record my thoughts.

Although 1*1 convolution is not related to the full connection layer, there are also some interconnections. The full connection layer can be replaced by a special convolution layer, assuming that the input is 7*7*512 and the full connection layer is a vector with length 4096. If the full connection layer is replaced by the convolution layer, then in the convolution layer:

  • There are4096A filter
  • Each filter has512Layer (512Convolution kernel)
  • The size of each layer is7 * 7

In this way, the dimension of output is also 1*1*4096, so the fully connected operation can be realized through the convolution layer.