At present, there are many compiled machine learning and deep learning toolkits on the Internet. In some cases, it may be very convenient and effective to directly call the established model, such as Caffe and TensorFlow toolkits. However, these toolkits require a lot of hardware resources, which is not conducive for beginners to practice and understand. Therefore, in order to better understand and master the relevant knowledge, it is best to be able to practice their own programming. This article will show how to use NumPy to build Convolutional Neural Network (CNN). CNN is a kind of neural network proposed earlier, which has become popular in recent years. It can be said that IT is the most widely used network in the field of computer vision. The CNN model has been well implemented in some toolkits, and the related library functions have been fully compiled. Developers only need to call the existing modules to complete the model construction, avoiding the complexity of implementation. In practice, however, this leaves developers unaware of the implementation details. Sometimes the data scientist has to improve the performance of the model with details that the toolkit doesn’t have. In this case, the only solution is to program a similar model yourself so that you have the highest level of control over the model you implement and a better understanding of how it works at each step. In this paper, ONLY NumPy will be used to realize CNN network, and three layer modules will be created, namely Conv, ReLu activation function and Max pooling.

1. Read the input image

The following code reads an existing image from the SkImage Python library and converts it to a grayscale image:

1.  import skimage.data  
2.  # Reading the image  
3.  img = skimage.data.chelsea()  
4.  # Converting the image into gray.  
5.  img = skimage.color.rgb2gray(img)jsCopy the code

Reading the image is the first step, and the next step depends on the size of the input image. The image is converted to grayscale as shown below:

2. Prepare the filter

The following code prepares the filter bank for the first convolution Layer Conv (Layer 1, abbreviated as L1, the same below) :

1. L1_filter = numpy. Zeros (filling (2))Copy the code

Create zeros based on the number of filters and the size of each filter. The above code creates two filters of size 3×3. The element numbers in (2,3,3) represent 2: the number of filters (num_filters),3: the number of columns of the filter, and 3: the number of rows of the filter. Since the input image is a grayscale image, it becomes a two-dimensional image matrix after reading, so the size of the filter is selected as a two-dimensional array, and the depth is omitted. If the image is a color graph (with 3 channels, RGB), the filter size must be (3,3,3), with the last 3 indicating depth, and the code above is also changed to (2,3,3,3). The size of the filter bank is specified by itself, but there is no specific value given in the filter, and random initialization is generally adopted. The following set of values can be used to check for vertical and horizontal edges:

1.  l1_filter[0, :, :] = numpy.array([[[-1, 0, 1],   
2.                                     [-1, 0, 1],   
3.                                     [-1, 0, 1]]])  
4.  l1_filter[1, :, :] = numpy.array([[[1,   1,  1],   
5.                                     [0,   0,  0],   
6.                                     [-1, -1, -1]]]) Copy the code

3. Conv Layer

Once the filter is constructed, the next step is to convolve with the input image. The following code uses the conv function to convolve the input image with a filter bank:

1.  l1_feature_map = conv(img, l1_filter) Copy the code

The conv function only accepts two parameters, namely, the input image and the filter bank:

1. def conv(img, conv_filter): 2. if len(img.shape) > 2 or len(conv_filter.shape) > 3: # Check if number of image channels matches the filter depth. 3. if img.shape[-1] ! = conv_filter.shape[-1]: 4. print("Error: Number of channels in both image and filter must match.") 5. sys.exit() 6. if conv_filter.shape[1] ! = conv_filter.shape[2]: # Check if filter dimensions are equal. 7. print('Error: Filter must be a square matrix. I.e. number of rows and columns must match.') 8. sys.exit() 9. if conv_filter.shape[1]%2==0: # Check if filter diemnsions are odd. 10. print('Error: Filter must have an odd size. I.e. number of rows and columns must be odd.') 11. sys.exit() 12. 13. # An empty feature map to hold the output of convolving the filter(s) with the image. 14. feature_maps = numpy.zeros((img.shape[0]-conv_filter.shape[1]+1, 15. img.shape[1]-conv_filter.shape[1]+1, 16. conv_filter.shape[0])) 17. 18. # Convolving the image by the filter(s). 19. for filter_num in range(conv_filter.shape[0]): 20. print("Filter ", filter_num + 1) 21. curr_filter = conv_filter[filter_num, :] # getting a filter from the bank. 22. """ 23. Checking if there are mutliple channels for the single filter. 24. If so, then each channel will convolve the image. 25. The result of all convolutions are summed to return a single feature map.  26. """ 27. if len(curr_filter.shape) > 2: 28. conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) # Array holding the sum of all feature maps. 29. for ch_num in range(1, curr_filter.shape[-1]): # Convolving each channel with the image and summing the results. 30. conv_map = conv_map + conv_(img[:, :, ch_num], 31. curr_filter[:, :, ch_num]) 32. else: # There is just a single channel in the filter. 33. conv_map = conv_(img, curr_filter) 34. feature_maps[:, :, filter_num] = conv_map # Holding feature map with the current filter. 35. return feature_maps # Returning all feature maps.Copy the code

This function first ensures that the depth of each filter is equal to the number of image channels, coded as follows. The if statement first checks whether there is a depth channel between the image and the filter. If there is, it checks whether the number of channels is equal. If the match is unsuccessful, an error is reported.

1. if len(img.shape) > 2 or len(conv_filter.shape) > 3: # Check if number of image channels matches the filter depth. 2. if img.shape[-1] ! = conv_filter.shape[-1]: 3. print("Error: Number of channels in both image and filter must match.")Copy the code

In addition, the filter sizes should be odd, and each filter size should be equal. This is checked against the following two if conditional blocks. If the conditions are not met, the program displays an error and exits.

1. if conv_filter.shape[1] ! = conv_filter.shape[2]: # Check if filter dimensions are equal. 2. print('Error: Filter must be a square matrix. I.e. number of rows and columns must match.') 3. sys.exit() 4. if conv_filter.shape[1]%2==0: # Check if filter diemnsions are odd. 5. print('Error: Filter must have an odd size. I.e. number of rows and columns must be odd.') 6. sys.exit()Copy the code

After all the above conditions are met, we initialize an array as the value of the filter, and specify the value of the filter with the following code:

1.  # An empty feature map to hold the output of convolving the filter(s) with the image.  
2.  feature_maps = numpy.zeros((img.shape[0]-conv_filter.shape[1]+1,   
3.                              img.shape[1]-conv_filter.shape[1]+1,   
4.                              conv_filter.shape[0])) Copy the code

Since the stride or padding is not set, the default step is set to 1 with no padding. Then the size of the feature graph obtained after the convolution operation is (img_rows-filter_rows+1, image_columns-filter_columns+1, num_filters), that is, the size of the input image minus the size of the filter and then add 1. Notice that each filter outputs a feature graph.

1.   # Convolving the image by the filter(s).  
2.      for filter_num in range(conv_filter.shape[0]):  
3.          print("Filter ", filter_num + 1)  
4.          curr_filter = conv_filter[filter_num, :] # getting a filter from the bank.  
5.          """  
6.          Checking if there are mutliple channels for the single filter. 
7.          If so, then each channel will convolve the image. 
8.          The result of all convolutions are summed to return a single feature map. 
9.          """  
10.         if len(curr_filter.shape) > 2:  
11.             conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) # Array holding the sum of all feature maps.  
12.             for ch_num in range(1, curr_filter.shape[-1]): # Convolving each channel with the image and summing the results.  
13.                 conv_map = conv_map + conv_(img[:, :, ch_num],   
14.                                   curr_filter[:, :, ch_num])  
15.         else: # There is just a single channel in the filter.  
16.             conv_map = conv_(img, curr_filter)  
17.         feature_maps[:, :, filter_num] = conv_map # Holding feature map with the current filter.  Copy the code

After iterating through each filter in the filter bank, update the state of the filter with the following code:

1.  curr_filter = conv_filter[filter_num, :] # getting a filter from the bank.  Copy the code

If the input image has more than one channel, the filter must have the same number of channels. Only then can the convolution process proceed properly. Finally, the output of each filter is summed as the output characteristic graph. The following code detects the number of channels in the input image. If the image has only one channel, then a single convolution completes the whole process:

1.  if len(curr_filter.shape) > 2:  
2.       conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) # Array holding the sum of all feature map 
3.       for ch_num in range(1, curr_filter.shape[-1]): # Convolving each channel with the image and summing the results.  
4.          conv_map = conv_map + conv_(img[:, :, ch_num],   
5.                                    curr_filter[:, :, ch_num])  
6.  else: # There is just a single channel in the filter.  
7.      conv_map = conv_(img, curr_filter) Copy the code

The conv_ function in this code is different from the previous conV function. Conv takes only the input image and the filter bank, and does not convolve itself. Instead, it sets up each set of input filters that the conv_ function uses to convolve. Here is the code for the conv_ function:

1.  def conv_(img, conv_filter):  
2.      filter_size = conv_filter.shape[0]  
3.      result = numpy.zeros((img.shape))  
4.      #Looping through the image to apply the convolution operation.  
5.      for r in numpy.uint16(numpy.arange(filter_size/2,   
6.                            img.shape[0]-filter_size/2-2)):  
7.          for c in numpy.uint16(numpy.arange(filter_size/2, img.shape[1]-filter_size/2-2)):  
8.              #Getting the current region to get multiplied with the filter.  
9.              curr_region = img[r:r+filter_size, c:c+filter_size]  
10.             #Element-wise multipliplication between the current region and the filter.  
11.             curr_result = curr_region * conv_filter  
12.             conv_sum = numpy.sum(curr_result) #Summing the result of multiplication.  
13.             result[r, c] = conv_sum #Saving the summation in the convolution layer feature map.  
14.               
15.     #Clipping the outliers of the result matrix.  
16.     final_result = result[numpy.uint16(filter_size/2):result.shape[0]-numpy.uint16(filter_size/2),   
17.                           numpy.uint16(filter_size/2):result.shape[1]-numpy.uint16(filter_size/2)]  
18.     return final_result  Copy the code

Each filter iterates the convolution on the image with the same size, achieved by the following code:

1.  curr_region = img[r:r+filter_size, c:c+filter_size]  Copy the code

After that, the image region matrix is multiplied by bitwise between the filter and the result is summed to get a single-valued output:

1. #Element-wise multipliplication between the current region and the filter. 2. curr_result = curr_region * conv_filter  3. conv_sum = numpy.sum(curr_result) #Summing the result of multiplication. 4. result[r, c] = conv_sum #Saving the summation in the convolution layer feature map.Copy the code

After the input image is convolved with each filter, the feature graph is returned by the CONV function. The following figure shows the feature images returned by the CONV layer (since the filter parameters of l1 convolution layer are (2,3,3), that is, two convolution kernels with a size of 3×3, the final output of two feature images) :



Convolution image


4.ReLU activation function layer

The ReLU layer applies the ReLU activation function to each feature graph output by the CONV layer and invokes the ReLU activation function according to the following lines:

l1_feature_map_relu = relu(l1_feature_map)Copy the code

ReLU activation function (ReLU) implementation code is as follows:

1. def relu(feature_map): 2. #Preparing the output of the ReLU activation function. 3. relu_out = numpy.zeros(feature_map.shape) 4. for map_num in  range(feature_map.shape[-1]): 5. for r in numpy.arange(0,feature_map.shape[0]): 6. for c in numpy.arange(0, feature_map.shape[1]): 7. relu_out[r, c, map_num] = numpy.max(feature_map[r, c, map_num], 0)Copy the code

The idea of ReLU is very simple, just compare each element in the feature graph with 0, and retain the original value if it is greater than 0. Otherwise, set it to 0. The output of ReLU layer is shown below:



The ReLU layer outputs images


5. Maximum pooling layer

The output of the ReLU layer serves as the input to the maximum pooling layer, and the maximum pooling operation is invoked with the following line:

1.  l1_feature_map_relu_pool = pooling(l1_feature_map_relu, 2, 2)  Copy the code

The implementation code of Max pooling is as follows:

1.  def pooling(feature_map, size=2, stride=2):  
2.      #Preparing the output of the pooling operation.  
3.      pool_out = numpy.zeros((numpy.uint16((feature_map.shape[0]-size+1)/stride),  
4.                              numpy.uint16((feature_map.shape[1]-size+1)/stride),  
5.                              feature_map.shape[-1]))  
6.      for map_num in range(feature_map.shape[-1]):  
7.          r2 = 0  
8.          for r in numpy.arange(0,feature_map.shape[0]-size-1, stride):  
9.              c2 = 0  
10.             for c in numpy.arange(0, feature_map.shape[1]-size-1, stride):  
11.                 pool_out[r2, c2, map_num] = numpy.max(feature_map[r:r+size,  c:c+size])  
12.                 c2 = c2 + 1  
13.             r2 = r2 +1  Copy the code

This function takes three parameters, ReLU layer output, pooled mask size and step length. The first step is to create an empty array to hold the output of the function. The size of the array is determined according to the size of the input feature map, the size of the mask, and the stride length.

1.  pool_out = numpy.zeros((numpy.uint16((feature_map.shape[0]-size+1)/stride),  
2.                          numpy.uint16((feature_map.shape[1]-size+1)/stride),  
3.                          feature_map.shape[-1]))  Copy the code

Maximize pooling for each input feature map channel and return the maximum value in the region, with the following code:

pool_out[r2, c2, map_num] = numpy.max(feature_map[r:r+size,  c:c+size])Copy the code

The output of the pooling layer is shown in the figure below. In order to make the image size look the same, the image size after the pooling operation is much smaller than the input image.



The pooled layer outputs images

6. The layer stack

The above content has implemented the basic layer of CNN structure — CONV, ReLU and Max pooling. Now stack them, and the code is as follows:

1.  # Second conv layer  
2.  l2_filter = numpy.random.rand(3, 5, 5, l1_feature_map_relu_pool.shape[-1])  
3.  print("\n**Working with conv layer 2**")  
4.  l2_feature_map = conv(l1_feature_map_relu_pool, l2_filter)  
5.  print("\n**ReLU**")  
6.  l2_feature_map_relu = relu(l2_feature_map)  
7.  print("\n**Pooling**")  
8.  l2_feature_map_relu_pool = pooling(l2_feature_map_relu, 2, 2)  
9.  print("**End of conv layer 2**\n") Copy the code

It can be seen from the code that L2 represents the second convolution layer, which uses convolution kernel (3,5,5), that is, three 5×5 convolution kernels (filters) are convolved with the output of the first layer to obtain three feature graphs. This is followed by the ReLU activation function and maximum pooling. Visualize the results of each operation as shown below:



Visualization of l2 layer processing

1.  # Third conv layer  
2.  l3_filter = numpy.random.rand(1, 7, 7, l2_feature_map_relu_pool.shape[-1])  
3.  print("\n**Working with conv layer 3**")  
4.  l3_feature_map = conv(l2_feature_map_relu_pool, l3_filter)  
5.  print("\n**ReLU**")  
6.  l3_feature_map_relu = relu(l3_feature_map)  
7.  print("\n**Pooling**")  
8.  l3_feature_map_relu_pool = pooling(l3_feature_map_relu, 2, 2)  
9.  print("**End of conv layer 3**\n"Copy the code

It can be seen from the code that L3 represents the third convolution layer, and the convolution kernel used by this convolution layer is (1,7,7), that is, a 7×7 convolution kernel (filter) is convolved with the output of the second layer to obtain a feature graph. This is followed by the ReLU activation function and maximum pooling. Visualize the results of each operation as shown below:



Visualization of l3 layer processing


1.  l2_feature_map = conv(l1_feature_map_relu_pool, l2_filter)  
2.  l3_feature_map = conv(l2_feature_map_relu_pool, l3_filter)Copy the code

7. Complete code

All the code has been uploaded to Github and the visualization of each layer is implemented using the Matplotlib library.

Ahmed, Gad, and the research interest is deep learning, artificial intelligence and computer vision personal homepage: www.linkedin.com/in/ahmedfga… This article is translated by Ali Yunqi Community Organization. Building Convolutional Neural Network using NumPy from Scratch The article is a brief translation. For more details, please refer to the original text.