Convolutional Neural Networks (CNN) Introduction and practice

1 CNN’s past and present lives

1.1 the brain

As humans, we constantly observe and analyze the world around us through our eyes, making predictions about what we see and acting on it without any deliberate “effort” to think about it. When we see something, we mark each object based on what we’ve learned in the past. To illustrate, take a look at the picture below:

This is what we do subconsciously all day long. We see things, we label them, we predict and identify behaviors. But how do we do this? How can we explain what we saw?

It took nature more than 500 million years to create a system to achieve this. The cooperation between the eyes and the brain, called the primary visual pathway, is how we make sense of the world around us.

In the primary visual cortex

When you see an object, the photoreceptors in your eyes send signals through the optic nerve to the main visual cortex, which is processing the input. In the primary visual cortex, the eye gets a sense of what it sees.

All of this comes naturally to us. Little did we expect that we would be able to identify the particularity of all the objects and people we see in our lives. Deep and complex hierarchies of connections between neurons and the brain play an important role in remembering and labeling objects.

Think about how we learn like what an umbrella is. Or ducks, lamps, candles or books. In the beginning, our parents or family members tell us the names of objects in our immediate environment. We know that by the example given to us. Slowly, but more and more, we begin to recognize certain things in our environment. They become so common that the next time we see them, we immediately know what the name of the object is. They become part of the model of our world.

1.2 History of convolutional neural networks

Similar to the way children learn to recognize objects, we need algorithms that show millions of images before we can generalize inputs and make predictions about images that have never been seen before.

Computers “see” things in a different way than we do. Their world consists only of numbers. Each image can be represented as a two-dimensional array of numbers called a pixel.

But the fact that they perceive images in different ways doesn’t mean we can’t train their recognition patterns, just like we do how to recognize images. We just have to think differently about what an image is.

Artificial neural network
convolution

Convolutional neural networks are inspired by the brain. Studies of the mammalian brain by DH Hubel and TN Wiesel in the 1950s and 1960s produced new models of how mammals perceive the world visually. They showed that the visual cortex of cats and monkeys includes neurons that specialize in responding to neurons in their immediate environment.

In their paper, they describe two basic types of visual neuron cells in the brain, each of which functions in different ways: simple cells (S cells) and complex cells (C cells).

For example, they are activated when simple cells recognize basic shapes as lines with fixed areas and specific angles. Complex cells have large receptive fields and their output is insensitive to specific locations in the field.

The complex cell continues to respond to certain stimuli, even if its absolute location on the retina changes. In this case, complex means more flexible.

In vision, the receptive area of a single sensory neuron is a specific area of the retina where something will affect the firing of that neuron (i.e., will activate the neuron). Each sensory neuron has a similar receptive field, and its fields are covered.

In addition, the concept of hierarchy plays an important role in the brain. The information is stored sequentially in a sequence of patterns. The neocortex, the outermost layer of the brain, stores information in a hierarchical manner. It is stored in cortical columns, or in groups of neurons evenly organized in the neocortex.

In 1980 a researcher called Fukushima proposed a layered neural network model. He called it new cognition. The model was inspired by the concept of simple and complex cells. Neocognitron can recognize patterns by knowing the shape of objects.

Later, in 1998, coiled neural networks were introduced by Bengio, Le Cun, Bottou and Haffner. Their first convolutional neural network, called Lenet-5, was able to classify numbers in handwritten numbers.

Convolutional neural network

Convolutional Neural Network is referred to as CNN, which is the model required to be taught in all deep learning courses and books. CNN is particularly powerful in image recognition, and many image recognition models are extended based on the framework of CNN. In addition, it is worth mentioning that CNN model is one of the few deep learning models established by referring to human brain visual organization. After learning CNN, it is also very helpful for learning other deep learning models. This paper mainly describes the principle of CNN and the use of CNN to achieve 99% correct handwritten font recognition. CNN’s concept art is as follows:

2.1 Convolution Layer Convolution Layer

The convolution operation is to convolve the original image with a specific Feature Detector(filter) (symbol ⊗). The convolution operation is to multiply and add the two 3×3 matrices shown below. The following graphs, 0 * * 0 0 0 + + 0 x + 1 / + 0 * 1 * 0 + 0 0 0 + * * * 1 + 0 0 0 + * 1 = 0

The Feature Detector(Filter) in the middle will randomly generate several kinds (EX :16 kinds). The purpose of the Feature Detector is to help us extract some features (EX: shape) in the picture, just like the human brain determines what the picture is based on the shape

We convolved the input many times, each operation using a different filter. This results in different feature mappings. Finally, we put all these feature graphs together as the final output of the convolution layer.

Just like any other neural network, we use an activation function to make the output non-linear. In the case of convolutional neural networks, the output of the convolution will be through the activation function. This may be the ReLU activation function

And the other concept here is the step, and the Stride is the step that the convolution filter moves each time. The step size is usually 1, which means the filter slides pixel by pixel. By increasing the step size, your filter slides more apart on the input, so there is less overlap between the cells.

The animation below shows a step size of 1.

Add a layer of zero-valued pixels to surround the input with zero so that our element graph does not shrink. In addition to keeping the space size constant after performing convolution, padding also improves performance and ensures that the kernel and step sizes fit the input.

A good way to visualize the convolution layer is as follows, and finally let’s use a GIF to explain what the convolution layer does

2.2 Pooling Layer Pooling Layer

After the convolutional layer, pooling layer is usually added between CNN layers. The function of pooling is to continuously reduce the dimension to reduce the number of parameters and calculations in the network. This shortens training time and controls overfitting.

The most common type of pool is Max pooling, which has maximum usage per window. These window sizes need to be specified in advance. This reduces the size of the feature map while preserving important information.

The main advantages of Max Pooling are that there is no influence on judgment when the image is shifted by several pixels, and it has a good anti-noise function.

2.3 Fully Connected Layer

Basically, the full connection layer is all about flattening out the previous results and connecting them to the basic neural network

3. MNIST handwriting is recognized by CNN

The following part is mainly about how to use Tensorflow to realize CNN and the application of handwriting font recognition

Code # CNN
def convolutional(x,keep_prob):

    def conv2d(x,W):
        returnTf. Nn. Conv2d (x, W,,1,1,1 [1], the padding ='SAME')

    def max_pool_2x2(x):
        returnTf. Nn. Max_pool (x, ksize =,2,2,1 [1], strides =,2,2,1 [1], the padding ='SAME') def weight_variable (shape) : initial = tf. Truncated_normal (shape, stddev = 0.1)returnTf. Variable (initial) def bias_variable (shape) : initial = tf. Constant (0.1, shape = shape)returnTf. Variable (initial) x_image = tf. Reshape (x,,28,28,1] [- 1) W_conv1 = weight_variable (,5,1,32 [5]) b_conv1 = bias_variable ([32]) h_conv1=tf.nn.relu(conv2d(x_image,W_conv1)+b_conv1) h_pool1=max_pool_2x2(h_conv1) W_conv2 = weight_variable([5, 5, 32, 64]) b_conv2 = bias_variable([64]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) h_pool2 = max_pool_2x2(h_conv2)# full_connetionW_fc1 = weight_variable ([4] 7 * 7 * 64102) b_fc1 = bias_variable ([1024]) h_pool2_flat = tf. Reshape (h_pool2, [1, 7 * 7 * 64]) h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1)+b_fc1)# Dropout Drop some values at random to prevent overfittingH_fc1_drop = tf. Nn. Dropout (h_fc1, keep_prob) W_fc2 = weight_variable ([1024, 10]) b_fc2 = bias_variable ([10]) y=tf.nn.softmax(tf.matmul(h_fc1_drop,W_fc2)+b_fc2)return y,[W_conv1,b_conv1,W_conv2,b_conv2,W_fc1,b_fc1,W_fc2,b_fc2]
Copy the code

For those of you who have some basic knowledge of the tensorFlow code, this section is basically easy to understand and follows the logical order in our previous concept diagram.

Finally, according to the learning materials of MOOCs, TensorFlow and Flask are combined to create handwritten number recognition and realize CNN. The twists and turns are the front end and how to integrate the training model with Flask. The final project results are as follows:

TensorFlow-MNIST-WEBAPP

4 summarizes

Finally, let me talk about my two feelings:

CNN has been well applied in various scenarios, and there are a lot of online materials. I know a little about CNN before, but I have never summarized and sorted it out, and I feel more comfortable after sorting it out.
Remember the theory and practice, to achieve a more practical.

5 Reference Materials

Lesson 5.1 introduction to Convolutional Neural Network
An Intuitive Explanation of Convolutional Neural Networks — The Data Science Blog
Convolutional Neural Network (CNN) | Skymind
Convolutional Neural Networks (LeNet) — DeepLearning 0.1 documentation
CS231n Convolutional Neural Networks for Visual Recognition
Convolution neural network (CNN) learning notes 1: introductory | Jey Zhang
Deep Learning Learning Notes Collection series (7) – CSDN blog