Transpose convolution has always been difficult to understand. Today, we will understand the back propagation of simple two-layer CNN transpose convolution through detailed derivation examples and codes.

Compile | special knowledge

Participate in | Yingying, Xiaowen

Today, we are going to train a simple CNN with two convolution layers, as shown below.

 

inspiration



The corn on the plate reminded me of the principle of deconvolution in the process of CNN back propagation.

The red box is a 2 by 2 output image

The green box is the 3 by 3 convolution kernel

The blue box is a 4 by 4 input image

“Since we get a 2 * 2 output image after convolving the 4 * 4 image, we need to perform some operations on the 2 * 2 output image to get an image with 4 * 4 when performing back propagation.”

But the corn made me realize that the goal wasn’t to recover the original image. Instead, get the error rate for each weight in the network. In the case of multi-layer CNN, we need to backtransmit the error rate. Let me try to explain what I mean with a concrete example and code.

The network structure



As shown above, the network structure is very simple, with only two convolution layers and one fully connected layer. Note that to perform the convolution, we need to transpose (rotate) the convolution kernel by 180 degrees, notice the green box in the figure above.

Also, note that I did not draw the activation layer for simplicity. But in the code, I use tanh () or archtan () as the activation function.

The forward propagation



Note: The author made an error on the column and had to swap the two columns pointed to by the green arrow.

So as you can see above, the convolution operation can be written as a row. For reasons I’ll explain later, take careful note of the red box variables, which are the input for the next level. This information is important when performing back propagation.

Back propagation (green weight in figure above)

 

The yellow box represents the learning rate, and the whole back propagation is the standard process. I wrote down the gradient update equation as well. Finally, notice the symbol ‘k’ in the red box, which I’ll use again and again for (out-y).

Backpropagation (red weight in figure above)

Red box → (out-of-y)

Yellow box → learning rate

Black box → Rotate the kernel 180 degrees (or transpose) before the convolution operation (remember in the convolution operation, we rotate the convolution kernel).

Except for the purple box, everything is very simple and straightforward, so what’s the purple box for?

Purple box → rotation matrix suitable for computing the derivative of each weight quantity.

Now the question arises, why? Why are we doing this?

Remember I told you to pay attention to the input at each level? So let’s go back one more time.

Please look carefully at the color frame.

Orange box → multiplying the input W of red W (2,2)

Light green box → multiplying the red input W (2,1)

Blue box → multiplying the input W of red W (1,2)

Pink box → multiplying the red input W () 1,1)

That’s easy, but what does that have to do with the transpose convolution kernel? Because (see the equation in the black box) Out can be written as a line, the gradient of the weights in the red box is as follows:

The number in the dark green box -> green weights.

As you can see, when we compute the derivative for each red weight, we can see that the XX coordinate varies with the input. We need to match these coordinates with each weight, which is why we rotated the matrix 180 degrees.

  • Blue weight back propagation part 1

Basket → Compute the convolution between (K * green weight) and (fill red weight)

Orange box → Rotate the matrix again to get the gradient of each weight

Black box → Rotate the convolution kernel before the convolution operation

Now, the question is, why the Padding? Why do we need to fill in the red weight?

We’ll explain that later.

 

  • Blue weight back propagation part 2

Blue box → the matrix computed in Part 1

Black box → Transpose the convolution kernel before the convolution operation

Orange, light green, blue, pink boxes → Calculate the gradient of each blue weight

So this is a closer look at the rotating convolution kernel, while performing the convolution operation. But now let’s look at the input.

Again, since Out can be written as a line, the gradient of the blue weight looks like this:

Green box → green weight

Orange box → gradient of blue weight W(2,2)

Pink box → gradient of blue weight W(1,1)

So, we rotate (or transpose) the matrix again to match the gradient of each weight.

In addition, now we fill in the red weights for the obvious reason of calculating the gradient for each weight, and I’ll show you again that I mean fill in the red weights (see the purple asterisk section).

 

The activation function



Green box → Activate the derivative of the function, since they have the same dimension, we can multiply the elements

Red box → Rotate convolution kernel to match gradient

Basket → Fill the red weight with zero (name it W2)

code



import numpy as np,sys# Func: Only for 2D convolution from scipy.signal import convolve2dfrom sklearn.utils import shuffle# Func: For Back propagation on Max Poolingfrom scipy.ndimage.filters import maximum_filterimport Skimage. Measurenp. The random seed (12314) def ReLU (x) : mask = (x > 0) * 1.0 return mask * xdef d_ReLU (x) : Mask = (x >0) * 1.0 return mask def tanh(x): return np.tanh(x)def d_tanh(x): return 1-np.tanh (x) ** 2def arctan(x): return np.arctan(x)def d_arctan(x): return 1 / ( 1 + x ** 2)def log(x): return 1 / (1 + np.exp(-1 * x))def d_log(x): Return log(x) * (1-log (x))# 1. Prepare Datanum_epoch = 1000Learning_rate = 0.1total_error = 0x1 = np.array([ [0, 0, 1], [1, 0, 1, 0], [- 1, 0, 1, 1], [1, 0, 1, 1]] ") x2 = np) array ([[0,0,0,0], [0, 0, 1, 0], [0,0,0,0]. [0, 1, 1]]) x3 = np, array ([[0, 0, 1], [0, 0, 1, 0], [1,0,1,1], [1, 0, 1, 1]]) x4 = np, array ([[0,0,0,1],,0,1,0 [1], ,0,1,1,0,1,1 [1], [1]]) image_label = np, array ([[1.42889927219], [0.785398163397], [0.0]. [1.46013910562]])image_matrix = np.array([x1,x2,x3,x4])w1 = (np.random. Randn (2,2) * 4.2)-0.1w2 = Randn (np.random.randn(2,2)* 4.2)-0.1w3 = (np.random.randn(4,1)* 4.2)-0.1print('Prediction Before Training')predictions = np.array([])for image_index in range(len(image_matrix)): current_image = image_matrix[image_index] l1 = convolve2d(current_image,w1,mode='valid') l1A = tanh(l1) l2 = convolve2d(l1A,w2,mode='valid') l2A = arctan(l2) l3IN = np.expand_dims(l2A.ravel(),0) l3 = l3IN.dot(w3) l3A = arctan(l3)  predictions = np.append(predictions,l3A)print('---Groud Truth----')print(image_label.T)print('--Prediction-----')print(predictions.T)print('--Prediction Rounded-----')print(np.round(predictions).T)print("\n")for iter in range(num_epoch): for current_image_index in range(len(image_matrix)): current_image = image_matrix[current_image_index] current_image_label = image_label[current_image_index] l1 = convolve2d(current_image,w1,mode='valid') l1A = tanh(l1) l2 = convolve2d(l1A,w2,mode='valid') l2A = arctan(l2) l3IN = Np.expand_dims (L2A.ravel (),0) l3 = l3in.dot (w3) l3A = arctan(l3) cost = Np.square (l3A-current_image_label).sum() * 0.5  total_error += cost grad_3_part_1 = l3A - current_image_label grad_3_part_2 = d_arctan(l3) grad_3_part_3 =l3IN grad_3 =  grad_3_part_3.T.dot( grad_3_part_1 * grad_3_part_2) grad_2_part_IN = np.reshape((grad_3_part_1 * grad_3_part_2). Dot (w3.t),(2,2)) grad_2_part_1 = grad_2_part_IN grad_2_part_2 = d_arctan(l2) grad_2_part_3 = l1A grad_2= np.rot90( Convolve2d (grad_2_PART_3, Np. rot90 (grad_2_PART_1 * grad_2_PART_2,2),mode='valid'),2) grad_1_part_IN_pad_weight = Pad (w2,1,mode='constant') grad_1_part_IN = Np. rot90(grad_2_PART_1 * grad_2_PART_2,2) grad_1_part_1 = convolve2d(grad_1_part_IN_pad_weight,grad_1_part_IN,mode='valid') grad_1_part_2 = d_tanh(l1) grad_1_part_3 = Current_image grad_1 = Np.rot90 (convolve2d(grad_1_PART_3, Np.rot90 (grad_1_PART_1 * grad_1_PART_2,2),mode='valid'),2) w1 = w1 - learning_rate * grad_1 w2 = w2 - learning_rate * grad_2 w3 = w3 - learning_rate * grad_3 #print('Current iter:  ', iter, ' current cost: ', cost, end='\r') total_error = 0 print('\n\n')print('Prediction After Training')predictions = np.array([])for image_index in range(len(image_matrix)): current_image = image_matrix[image_index] l1 = convolve2d(current_image,w1,mode='valid') l1A = tanh(l1) l2 = convolve2d(l1A,w2,mode='valid') l2A = arctan(l2) l3IN = np.expand_dims(l2A.ravel(),0) l3 = l3IN.dot(w3) l3A = arctan(l3)  predictions = np.append(predictions,l3A)print('---Groud Truth----')print(image_label.T)print('--Prediction-----')print(predictions.T)print('--Prediction Rounded-----')print(np.round(predictions).T)print("\n")Copy the code

Full code link: https://repl.it/ @jae_dukduk /transpose-conv

Original link:

https://towardsdatascience.com/only-numpy-understanding-back-propagation-for-transpose-convolution-in-multi-layer-cnn-wi th-c0a07d191981