Article/Rookie team – Lei MAO

Today, artificial intelligence is no longer a strange word for ordinary people. In daily life, intelligent voice, intelligent driving and so on are common products incubated under artificial intelligence.

It is worth mentioning is the artificial intelligence has become a national strategy, in a number of countries on behalf of the state is the China and Russia, and so on power, China has introduced a “national artificial intelligence research and development of strategic planning” “artificial intelligence and national security” “a new generation of artificial intelligence development plan” and so on series of policy planning, its importance can be seen.

The field of AI today can be roughly divided into problem solving, machine learning, natural language (NLP), speech recognition, vision, robotics. The fields are not strictly divided, but cross each other.


Three waves of ARTIFICIAL intelligence

One thing to remember is that AI has evolved over the decades in three broad waves:

First Wave (Non-intelligent conversational robots, 50-70)

In October 1950, Turing proposed the concept of artificial intelligence (AI). 1966 ELIZA, a psychotherapy robot, was born. (Limited thesaurus)

Second wave (Speech recognition, 1980-2000)

The biggest breakthrough of the second wave was a change of thinking, abandoning the symbolic school of thought and using statistical thinking to solve problems.

Third wave (deep learning + Big data, 21st century and beyond)

The reason for the arrival of the third wave mainly comes from the maturity of two conditions:

  • The rapid development of the Internet industry has formed a massive amount of data. And the cost of data storage is falling fast. It makes it possible to store and analyze massive amounts of data.
  • The continuous maturation of Gpus provides necessary support for computing power, improves the availability of algorithms and reduces the cost of computing power.

The market value of Nvidia, founded by Chinese Huang Renxun, has increased 20 times in the past five years, and Nvidia itself surpassed Intel in market value for the first time after the U.S. stock market closed on July 8, 2020, becoming the most valuable CHIP manufacturer in the United States. Thanks to the rapid development of Gpus, artificial intelligence, which has been plagued by computing power for decades, finally has a big explosion.

Okay, so today we’re going to focus on deep learning, and we’re going to stop there, and we’re going to need to know about depth, which is a subdivision of machine learning, supervised learning under the three main categories of machine learning. Then there’s unsupervised learning and reinforcement learning.

Learn about deep learning

Let’s discuss the basic idea of machine learning:

  1. Abstract real life problems into mathematical models, and understand the role of different parameters in the model.
  2. Using mathematical methods to solve the mathematical model, so as to solve the problems in real life.
  3. Evaluate whether this mathematical model really solves real life problems, and how well?


To sum up the ideas, it is modeling, tuning, solving, evaluation.

At the heart of machine learning models are algorithms, but there are six steps to building any model:

  • To collect data
  • Data preparation
  • Choose a model (what?)
  • Training (what?)
  • assessment
  • Predict (come into use)

As for the algorithm itself, since this article focuses on deep learning, the algorithm is simply pasted with a list:

Linear regression Supervised learning Random forests Supervised learning
Logistic regression Supervised learning AdaBoost Supervised learning
Linear discriminant analysis Supervised learning Gaussian mixture model Unsupervised learning
The decision tree Supervised learning Limit boltzmann machine Unsupervised learning
Naive Bayes Supervised learning K – means clustering Unsupervised learning
K neighboring Supervised learning Maximum expectation algorithm Unsupervised learning
Learning vector quantization Supervised learning Random forests Supervised learning
Support vector machine Supervised learning

Supervised and unsupervised learning

The type of algorithm determines the type of machine learning. The key point of supervised learning is that the data are marked, which means that the training data provided to the machine should be marked with what the data is!

Unsupervised learning does not label the data, but lets the computer extract the features itself, train them to classify them, and then let the human classify them. Obviously the application scenarios are different, each has its own application scenarios, unsupervised learning is very good at exception handling, imagine if the bank monitors the user’s behavior is abnormal, then those operations that are different from normal user behavior will be detected.

Summarize machine learning in one sentence: summarize rules from “specific” masses of data, generalize some “specific knowledge”, and then apply this “knowledge” to real situations to solve real problems.

It says that deep learning is a supervised type of machine learning, so it has all the characteristics of supervised learning.

What are the unique characteristics of deep learning?

So what are the unique characteristics of deep learning? How is it different from other supervised learning? The answer is a neural network!

** Deep learning mainly involves three methods: **

  • Neural network system based on convolution operation
  • Self-coding neural network based on multilayer neurons.
  • A multi-layer self-coding neural network is used for pre-training, and then the depth confidence network of neural network weights is optimized by combining the identification information.

Convolutional neural network is the most representative deep learning. It has multi-layer nodes and copes with the word depth of deep learning. Next, we bring the structure and knowledge of the whole neural network. First, let’s take a picture:


What? Look not to understand? Don’t panic, we can break it down roughly, so let’s look at a picture of a general neural network,


For neural networks, there are three layers: input layer, hidden layer and output layer. The points of each layer are called neurons, which contain activation values. Input and output are easy to understand. The main work of neural network is in the middle hidden layer. They get each value of the hidden layer through the calculation of weight W and bias value and activation function.

Tell me about the principle of

Let’s learn the principle of neural network according to a case of 2 hidden layers. First let’s look at a set of 28 by 28 images:




We humans will soon be able to figure out what these numbers are. The input to the human eye is a number of pixels, and we will mimic the steps of human vision:

  • Pixel intake from pupil.
  • This is followed by preliminary identification of edges (each number has a different edge),
  • Then abstract the components.
  • The overall shape, and finally the result.

Then the neural network is similar. A picture will become binary pixels for the machine. Here, 0 ~ 1 values are used to represent it for convenience (real color pixels include RGB).


So for the machine, what we’re putting in is 28 * 28 = 784 inputs.


Various calculations are performed to produce an output, which in the figure above can be identified as 9. This is the original Multilayer Perceptron. You see the lines up there, they’re called weights W, and each node is called a neuron, and the ones that are inside are called activation values.

And to give you a little bit of a visual understanding of this structure, let’s think about it from the point of view of human vision. Human vision goes from edge to part to whole. Let’s imagine breaking up a number:


Wait, so together they can get our results.



So the question is, right? Is this how real neural networks work? Answer: You can do that, provided you set the weights correctly. In practice, however, the model setup is much more complex than this, but this is just for ease of understanding.

Let’s talk about the math

So the other question is, right? How to achieve this effect? Answer: Weights, which are the lines of the neural network, our activation values are calculated together with the values of the previous layer:


The original formula was: A (L) = W * a(L-1). That is:


So if we want to identify a horizontal line, we can set the weight of the corresponding horizontal line pixel to 1, and the rest to 0. Let’s call the top value S the weighted value. I’m going to keep doing that.

As we said above, this is the most primitive calculation formula. In reality, we usually calculate the value of S and then use an activation function to ensure that the activation value is within a certain range of control.

For example, we need to control the value between 0 and 1. At this point, we can use an activation function called sigmoid to show what we learned in primary school (😊), as shown below:



This is a formula that you’ve all learned, and it stays between zero and one. Ok, let’s find the formula for the activation value. Currently, the parameters involved have weights:




The formula we get then becomes:


Sometimes we want to light up when S>10 for some reason. Then we need to add a bias value. The formula becomes:


B is the offset value basis. We can call the inside Z:


So finally we write it as:


Ok, so this is where our understanding of neural networks comes in. Here, we need to think about the question, that is, how many bias values and weight parameters do we need?

The wires are connected in pairs, so there are 78416 + 1616 + 1610 and the offset values are required for each of the next calculations so 16+16+10 adds up to: 78416 + 1616 + 1610 +16+ 16+10 = 13002. That’s a lot of parameters for a simple neural network. Isn’t it surprising:


A total of well-trained neural networks can make the image recognize the correct situation, and complex and multiple parameters are the prerequisite for strong adaptability.

Ok, so now we actually understand the forward propagation of neural networks.

But it’s still not enough. We said that a good network can have a good result, so the question is, how do we get a good neural network?

How do you get a good neural network?

The answer is: self-regulation by continuous back-propagation using gradient descent algorithms.

So you must want to hit me, don’t you?


Anyway, this is a lot of work, and it’s usually just a matter of setting the rules and letting the machine adjust itself. First we need to be clear about what we need to focus on:

  • The accuracy of the results (how to quantify).
  • If not, how and what we adjust.

First of all, is it accurate or not? This one is relatively simple, which is to calculate the loss of the output layer:


So how do we adjust when the cost is high? Obviously, we have to adjust a(n) so that it’s close to y(n). Remember the formula for a(n)? We find that the only things that can really be adjusted are the activation value A and the weight W and the bias value B, but the activation value is also a, W and B of the previous layer, right? So from this point of view, the only things we can adjust are w and B and a at the first level, but the first level is the input value, so ignore it. The important thing is that we need to adjust all the parameters, the weight W and the bias B

All right, we got a target, right? The question now is how do you adjust these parameters? The answer is gradient descent

Learning gradient descends directly, let’s recall two elementary mathematics foundations: Taylor Publicity:


Here f prime of x is the gradient of the function f at x. The gradient of a one-dimensional function is a scalar, also known as the derivative, as shown in the figure below:


Then we find a constant η>0 that is small enough to replace ϵ with −ηf ‘(x) given


If the derivative f prime of x is not equal to 0, then



This means that we need to take the derivative of f(x) and set a small enough positive η to make f(x) smaller, and x-ηf(x) is the target value. This is the one-dimensional evaluation:

Well, I’m sure a lot of people have been enlightened by this point. Our goal is to reduce the loss, and the loss C is all the weights and biases and inputs in the hierarchy. So we must compute f ‘(x) for all weights and biases to achieve the same goal. You get all the vectors that you want to adjust, you could call that


Since our calculation is going back layer by layer, it is called back propagation.

First calculate the adjustment amount of the output layer, set the last layer as L layer, the output layer is N.


So the error adjustment for the outermost layer is done. Then calculate the penultimate layer L-1. According to the chain rule of elementary school math,


Similarly, each adjustment difference of each layer needs to be computed by the chain rule until the vector is obtained


Machines do all the complicated tasks, and we humans just fine-tune them. This is particularly evident in model training.

Here we have made a demonstration of the principle of neural network. Now let’s see what convolutional neural network is ~

Convolutional neural network is actually a neural network with special layers, including convolutional layer, pooling layer and full connection layer.


They have different functions respectively. Let’s first look at the convolution layer and express the meaning of convolution with a picture:


The essence of the convolution layer is feature extraction from the original graph of multiple convolution checks, and the graph in the middle is the convolution kernel. Comparing the processing weight S in our neural network, it is not difficult to find that the convolution kernel can be regarded as a simple weight setting.

Convolution layer parameters include convolution kernel size, stride and padding. The following are introduced respectively.

Feel the convolution kernel with a moving image:


The convolution kernel here in particular is obviously 3 times 3, all 1, in other words all the numbers add up.

1 1 1
1 1 1
1 1 1

As can be seen above, part of values are lost in the result of the convolution between the input image and the convolution kernel, and the edge of the input image is “pruned” away (only part of pixels are detected at the edge, and a lot of information at the image boundary is lost). This is because the pixels on the edge will never be in the center of the convolution kernel, and the convolution kernel cannot be extended beyond the edge region. This result is unacceptable to us, and sometimes we want the size of the input and output to be the same. To solve this problem, the Padding can be performed on the original matrix before the convolution operation. That is, the boundary of the matrix is filled with some values to increase the size of the matrix. It is usually filled with 0.


Padding is used to avoid information loss, so sometimes we need to set the step size to compress information. See the image below




This image is the edge features extracted by the commonly used convolution kernel – Laplace operator, and then the extracted edges can be strengthened to get our usual sharpening image.


Convolution kernels of different sizes have different effects, as shown below:



Obviously, this picture is a little bit more nuanced, so now that we know what the convolution kernel does,

Let’s move on to the pooling layer. Well, ACTUALLY I prefer to call it the sampling layer, because that’s what it does. Let’s take a look at the Max pooling process for the fastest understanding:


From a human understanding point of view, it’s taking out the most obvious features. So why do we need to do this? Then you need to go back in history. This sampling layer can greatly reduce the amount of computation required by early machines. Another reason is to prevent overfitting.

So the last layer is the full link layer. In fact, the full link layer is not necessary to discuss the number of layers, it plays a role in classification, the principle of the basic and said before the neural network hidden layer principle. The main function is abstraction and classification. It is worth mentioning that the last layer is often softMax’s activation function, which is very good at categorizing.

Ok, summary: 1. Input layer is pixel value. 2. Convolution layer is feature extraction. 3. The pooling layer is sampled to obtain obvious characteristics. 4. The full-link layer performs nonlinear combination (classification) of features to obtain output. 5. The output is the result.

So that’s the structure of the convolutional neural network, and what you need to remember is that this structure is fairly primitive, and a lot of things have been optimized by more people, like light convolution, There are tiled convolution, deconvolution, dilated convolution, etc. If you’re interested, do your own research.

But that’s not all. There are some important concepts that need to be emphasized here: 1. Fine tune; 2.

What is a fine tune? . Suppose we need to train a linear model y = x; If you start with a weight of 0.1, it may take many training steps to get a similar model.

But if someone just gives you this weight, you train a lot faster, right? Then you go back and spread the word, and it’s a fine tune. Of course, for real model training, people will not tell you the weight directly, but you can load other people’s excellent models directly. Their parameters and weights are well validated and you will train much faster. You can call this a pre-training model, the process is called transfer learning, and tuning parameters is called ** fine Tune **

Fine Tune has two main advantages: 1. It saves resources by not having to start from scratch. 2. The verified model has high accuracy and proper parameters. (e.g. Imagenet, Resnet)

Talking about imagenet, this is a very cow force of the computer vision recognition program, is currently the world’s largest library images recognition, it establish the original intention is to simulate human recognition system established, the whole data set about 15 million pictures, 22000 categories, and manual annotation and control the quality, it is shocking, Unfortunately, I personally don’t think it includes D2C, or we could really just use it.

Interested can click the link below to see ~ www.tensorflow.org/js/models?h… image-net.org/

Image standardization: reference others’ developers.google.com/machine-lea images…







Tao department front – F-X-team opened a weibo! (Visible after microblog recording)
In addition to the article there is more team content to unlock 🔓