The calculation process of 2D convolutional neural network

2D convolution: the convolution kernel slides in the 2D space of the input image.

Single channel convolution

Multichannel convolution

Enter the shape of the image (input_height, input_width, input_Channels). They are 7x7x3, and three channels correspond to RGB three channels. The shape of convolution kernel is (kernel_height, kernel_width, kernel_channels). They’re 3x3x3. There are two filters: filter_num=2. Filter_num =output_channels

Each value of FEATure_map is accumulated by the convolutional kernels of three channels after convolution. The information of multiple channels is compressed, and the output of each convolutional kernels is a two-dimensional Feature_map. In multi-channel convolution, the parameters of convolution kernels in different channels are different.

2D convolution only considers the characteristics of space, not the characteristics of time.


The calculation process of 3D convolutional neural network

3D convolution: the convolution kernel slides in the 3D space of the input image.

3D convolution has a depth channel, which is usually the continuous frames on the video or different slices of the stereo image.

The input shape becomes (input_depth, input_height, input_width, input_Channels). Input_depth is 3, for example, 3 consecutive frames of images on a video (note, not 3 channels of an image).

The shape of the convolution kernel is (kernel_depth, kernel_height, Kernel_width,kernel_channels). As shown above, kernel_depth is 2. Corresponding to the same color above can be regarded as the convolution kernel under the same channel, and different colors can be regarded as the convolution kernel under different channels of the same convolution kernel (note that it is easier to understand by analogy with 2D convolution).

The convolution kernel in 3D convolution is 3D, and the same parameters are used in the same channel, just as the weight of each channel in the convolution kernel in 2D convolution is the same, and the weight of different channels may be different.