Concept learning: convolutional kernel/filter, Featue Map, convolutional layer

As a foundation, it is recommended to read the electronic version first:

Neural Networks and Deep Learning by Michael Nielsen and Xiaohu Zhu/Freeman Zhang

In this book, the basic principles of neural networks are explained clearly.

 

Feature map is translated into feature map and channel is translated into channel. Sometimes they say the same thing; But sometimes when emphasizing the input and output, it is called the channel, and when emphasizing the features found after the neural network operation of the picture, it is called the feature graph. In the following description, we do not make a significant distinction. Students can understand that the feature maps output of the previous layer are the channels input of the next layer.

 

(1) Convolution kernel/filter

The convolution kernel is also called a filter.

Each convolution kernel has three dimensions: length, width and depth.

The length and width of the convolution kernel are artificially specified. Length X width is also known as the size of the convolution kernel, and the commonly used size is 3X3, 5X5, etc.

When specifying the convolutional kernel, only two parameters, length and width, are specified, because generally the depth of the convolutional kernel (also known as the number of channels) is the same as the depth of the current image (the number of Feather Map, for example: RGB three channels are three feature maps).

In the process of convolution, the number of channels in the input layer (the number of input feature maps), the number of channels in the filter (the depth of the convolution kernel); However, the number of filters is arbitrary. The number of filters determines the number of channels output after convolution (namely, the number of output feature maps).

In many common architectures, the number of filters used grows larger as the network used for computing grows deeper (for example, 64 for the second, 128 for the third, and so on).

 

(2) Feature map

Input layer: In the input layer, if it is a grayscale image, there is only one feature map (one channel); In the case of color images, there are generally three feature maps (red, green and blue channels).

Other layers: there are several convolution kernels (also called filters) between layers. When the feature map (channel) of the upper layer is convolved with each convolution kernel, a feature map of the next layer will be generated. With N convolution kernels, N Feather Maps (i.e., N output channels) are generated in the lower layer.

 

(3) Convolution layer

Many convolution architectures start with an external convolution unit that maps the channel RGB input image to a series of internal filters. In a deep learning framework, this code might look like this:

 

Out_1 = Conv2d (input = image, filter = 32, kernel_size = (3, 3), strides = (1, 1))

relu_out=relu(out_1)

Pool_out = MaxPool (relu_out kernel_size = (2, 2), strides = 2)

 

For one input image, 32 filters are used, each with a size of 3X3 and a step size of 1.

The following diagram can be used to show all the operations in the code snippet above:

 

In the figure above, each of the 32 filters (i.e., filter-1, filter-2…) It actually contains a set of three two-dimensional cores (WT-R, WT-G, and WT-B, i.e., depth of 3). Each of these two-dimensional cores is saved as the red (R), green (G), and blue (B) channels in the input image, respectively.

During forward propagation, the R, G, and B pixel values in the input image are multiplied by wT-R, WT-G, and WT-B cores, respectively, to produce a Intermittent activation map (not shown). The output of the three cores is then added, resulting in one Activation for each filter, a total of 32.

Each of these activation maps is then governed by the ReLu function, which eventually runs to the maximum pooling layer (or not), which is responsible for reducing the dimension of the output activation map (which can be understood as reducing the length X width, note the step size used here is 2). Finally, we get a group of 32 active maps, whose dimension is half of the input image (i.e., 32 feature maps obtained, each feature map size is only half of the input image).

The output from the convolution layer is often used as the input to the subsequent convolution layer. Therefore, if our second convolution unit is as follows:

 

Conv_out_2 = Conv2d (input = relu_out, filters = 64)

 

There are 64 filters in this convolution unit, and each filter uses a group of 32 unique kernels (each kernel corresponds to the channel of a feature map output by the previous convolution layer, 32 feature maps require 32 kernels, i.e., the depth is 32).

The calculation process of Conv2d() convolution function with relatively simple parameters can be referred to:

Blog.csdn.net/weixin_4194…

 

(4) Batch Normalization

But there is a lot of Normalization in data. There is a lot of Normalization in data. There is a lot of Normalization in data.