Convolutional Neural Network (CNN) is a feedforward Neural Network, and its artificial neurons can respond to part of the surrounding elements within the coverage area, and have excellent performance in large-scale image processing. It includes a convolutional layer and a pooling layer.

Comparison: convolutional neural network, fully connected neural network more convolutional neural network related knowledge

Left: Fully connected neural network (plane), composed of: input layer, activation function, fully connected layer

Right figure: Convolutional neural network (stereo), consisting of input layer, convolutional layer, activation function, pooling layer and full connection layer

There is an important concept in convolutional neural networks: depth

Convolution layer

Convolution: Feature extraction on raw input. In short, feature extraction is to extract features in one small area and one small area on the original input. Later, the calculation process of convolution will be explained in detail.

In the figure above, the left block is the input layer, 3-channel image of size 3232. The little square on the right is filter, size 55, depth 3. The input layer is divided into multiple regions, and the fixed-size assistant filter is used to perform operations in the input layer, and finally a feature map with a depth of 1 is obtained.

In the figure above, it is shown that convolution of multiple filters is generally used to obtain multiple feature graphs.

In the figure above, six filters are convolved for feature extraction, and finally six feature graphs are obtained. Stacking these six together gives you the output of the convolution layer.

Convolution is not limited to convolution of the original input. The blue square is the convolution operation on the original input, and 6 extracted feature graphs are obtained by using 6 filters. Green squares can also carry out convolution operation on blue squares, and 10 feature graphs are obtained by using 10 filters. The depth of each filter must be equal to the depth of the input from the previous layer.

Intuitive understanding of convolution More convolutional neural network related knowledge

The above picture is an example: More knowledge about convolutional neural networks

The first convolution can extract low-level features.

The second convolution can extract the features of the middle level.

The third convolution can extract higher-level features.

Features are constantly extracted and compressed, and finally high-level features can be obtained. In short, the original features are condensed step by step, and the final features are more reliable. The last layer of features can be used for various tasks: classification, regression, etc.

Flow of convolution calculation

The three large matrices in the left area are the input of the original image, and the RGB three channels are represented by three matrices with a size of 773.

Filter W0 represents a Filter helper with a size of 3*3 and a depth of 3 (three matrices); Filter W1 also represents a Filter helper. Since two filters are used in the convolution, the output depth of the result of the convolution layer is 2 (the green matrix has 2).

Bias B0 is the Bias of Filter W0, and Bias B1 is the Bias of Filter W1.

OutPut is the convolution OutPut, size is 3*3, depth is 2.

Calculation process:

The input is fixed and the filter is specified, so the calculation is how to get the green matrix. In the first step, there is a slide window of the same size as filter on the input matrix, and then the part of the input matrix in the slide window is multiplied by the corresponding position of filter matrix:

1. That is, the sum multiplied by the corresponding positions is 0

2. The sum multiplied by the corresponding positions is 2

3. That is, the sum multiplied by the corresponding positions is 0

In the second step, we sum the results of the three matrices and add the offset term, i.e. 0+2+0+1=3, so we get 3 in the upper left corner of the output matrix:

The third step is to make each filter perform such operations to obtain the first element: more knowledge about convolutional neural networks

Step 4: Slide the window for 2 steps and repeat the previous steps for calculation

Step 5: Finally, the output result with a depth of 2 generated after convolution under two filters can be obtained:

Think about:

① Why is each slide 2 grids?

The stride is called S. The smaller S is, the more features can be extracted. However, S is generally not set as 1, and time efficiency is mainly considered. The S should not be too large, otherwise it will miss the information on the image.

(2) Because the side length of filter is larger than S, there will be an intersection part after each moving window. The intersection part means that features are extracted for many times, especially in the middle region of the image, while the edge part is extracted for less times. How to do?

The usual way to do this is to put a 0 around the edge of the image, and those of you who are careful might have noticed that in the demo case, you already put a 0 around the edge of the image, which is +pad 1. Plus pad n means plus n circle 0.

③ What is the size of the output feature graph after one convolution? More on convolutional neural networks

Please calculate Output=?

Note: there can be more than one filter in a convolution operation, and they must be of the same size.

Convolution parameter sharing principle

In convolutional neural networks, there is a very important feature: weight sharing.

The so-called weight sharing means that a filter is used to scan an input image, and the numbers in the filter are called weights. Each position of the image is scanned by the same filter, so the weights are the same, namely, sharing.

Pooling layer

As shown in the figure above, pooling is the feature compression of the feature graph. Pooling is also called down-sampling. Select the Max or mean of the original region to replace that region, and the whole will be condensed. The following is a demonstration of pooling operation, and the size, stride and pooling method (Max or mean) of a filter need to be developed: more knowledge related to convolutional neural network

Composition of convolutional neural networks

Convolution — activation — convolution — activation — pooling –…… Pooling – full join – classification or regression

Forward propagation and back propagation

The forward propagation process of convolution layer has been explained before, and here is a review with a picture:

The backward propagation process of the convolution layer is explained below:

Purpose of backpropagation: update parameter w. So we have to figure out dJ/ Dw. So let’s say that the upper layer is going to pass a gradient dJ/dout, by the chain rule, so dJ/dw =dJ/dout * dout/dw =dJ/dout * x. DJ /dout is denoting dout and dJ/dw is denoting dw for the convenience of naming variables in the computer, as shown in the figure. I’ll do it in this notation.

First, be clear: DW and W are the same size. A point times a region gives you another region. Then the backpropagation process is equivalent to multiplying one element of DOUT by the matrix in the input partition window to obtain a DW matrix; Then slide the sliding window, continue to calculate the next DW, in turn, finally add the multiple DW obtained, execute W = W-dw to complete the calculation of back propagation.

The above back-propagation can update parameters in one filter and require other filters.

The following figure shows two different pooling processes — forward propagation of pooling layer:

Different modes of max-pooling and mean-pooling are adopted for back propagation at the pooling layer.

For max-pooling, in the forward calculation, is the maximum value in each 22 areas selected, and the position of the maximum value in each small area shall be recorded. When propagating back, only that maximum value contributes to the next layer, so the residual is passed to the position of that maximum value, and the other 22-1=3 positions in the region are set to zero. The specific process is shown in the figure below, where the non-zero position of the 4*4 matrix is the position of the maximum value of each small area calculated in the previous part

For mean-pooling, we need to divide the residuals into 2*2=4 pieces and transmit them to the 4 units in the front small area. The specific process is shown as follows:

Convolutional Network architecture examples More convolutional neural network related knowledge

VGGNet is much deeper, with many convolutional layers and pooling layers. One version has 16 layers, the other has 19 (more common).

VGGNet features:

Filter is only 3*3, which means more features and finer granularity of calculation. Meanwhile, the parameters of pooling are also fixed.

Note: the more layers the traditional convolutional neural network has, the better the effect will be. In 2016, the deep residual network was launched, reaching 152 layers. I’ll introduce you later.

So how much memory overhead is there to train a VGGNet?

It can be seen from the figure that in the training process, an image of 2242243 will have 138M parameters and occupy 93MB memory. Therefore, the number of images in each batch should be constrained by memory, that is, 93* number of images < total memory capacity. More on convolutional neural networks

Does more articles and materials | click behind the text to the left left left 100 gpython self-study data package Ali cloud K8s practical manual guide] [ali cloud CDN row pit CDN ECS Hadoop large data of actual combat operations guide the conversation practice manual manual Knative cloud native application development guide OSS Operation and maintenance actual combat manual cloud native architecture white paper Zabbix enterprise level distributed monitoring system source document Cloud native basic introduction manual 10G large factory interview questions