There is always a feeling that machine learning is difficult to get into. This is because there are too many obscure concepts “neural network”, “evaluation index”, “optimization algorithm” and so on, so that beginners always have a feeling of blind men and elephants. Even understanding an official Tensorflow Demo was difficult, so many developers have experienced the “machine learning from start to quit” experience. In this article, we analyze an official Demo of TensorFlow to get a bird ‘s-eye view of a “machine learning” system, so that readers can see the whole elephant and help beginners to get started with “machine learning”.

How to understand machine learning systems

The goal of “machine learning” is to use existing answers to find rules and make predictions. The difference between this and “traditional systems” is that:

  • The goal of the “traditional system” is to get answers
  • The goal of “machine learning” is to use existing answers to get rules

Because the goal of machine learning is to get rules, people can use it to make all kinds of predictions: stock movements, lottery numbers, clothing recommendations, even when employees will leave their jobs. Image recognition is essentially finding rules. For example, to identify whether a picture object has a cat, whiskers, ears, fluff, etc., can be used as the characteristic value of the cat, and defining the eigenvalue is to define the composition rule of a cat.

Explain a machine learning Demo

The best way to learn a skill is to use it. Let’s look at a TensorFlow Demo in this section. TensorFlow is a deep learning framework from Google, but I won’t go over the basics. What I want to show you is how to read this Demo. You might ask, is a Demo that hard to understand? For beginners of “machine learning”, it is very difficult to read a Demo without understanding the concepts of “neural network”, “loss function”, “evaluation index” and so on.

Look at this Demo, there’s not much code, I posted it all.

What does it feel like to see all of this code? The first time I read it, it was like, “I can understand all the grammar, but I just don’t know what you’re doing!” If you feel that way, then I suggest you read this article carefully. This Demo is actually going to train a Model that can recognize handwritten numbers that look like this:

You may have a lot of question marks all at once. Handwritten numbers? Where is the picture? How do you know? Don’t worry, let me explain this Demo in detail.

Data preparation

What is the data in artificial intelligence? And we know that from the name of TensorFlow, the TensorFlow. In the realm of artificial intelligence, most of your data is in the form of a Tensor, and a Tensor can be interpreted as a multidimensional array.

Take, for example, a picture to feed into an ARTIFICIAL intelligence model. Our first instinct is to digitize the image, use Base64 to represent the image, use binary, etc. But for ai systems, the best thing to do is translate pictures into Tensor. We’ll try using Tensor for a picture with pixels 3 by 3, a white background and a black diagonal:

After running the code, we get the 33 image with black diagonals. So that’s a Tensor for a picture with a fourth order Tensor, the Tensor is 1, 3, 3. Similarly, if you have 6,000 pictures of 2828, then the Tensor has the shape of 6,000, 28, 28.

Now let’s read the code for the first part:

“MNIST” (Mixed National Institute of Standards and Technology Database) is a large handwritten digital database collected and organized by the National Institute of Standards and Technology of the United States. A 60,000-sample training set and a 10,000-sample test set with images that look like this.

These images are stored in the form of a spatial matrix:

So we see what this code means: get the set of data from MNIST for training (x_trian, y_train) and the set of data for testing (x_test, y_test).

  • X_trian has the shape of (6000, 28, 28), representing 6000 28*28 pictures.
  • The _trian shape is (6000,) and represents the numeric answer to X_train.

What is a model

Now that we have the data set, can we start training the model? Don’t worry, let’s be clear about what the model is. The Tensorflow documentation defines the model as follows:

In machine learning, a Model is one with learnable parameters
function, which maps input to output.
The optimal parametersBy training the model on the data. A well-trained model will provide an accurate mapping from the inputs to the desired outputs.

A model is a function with built-in parameters whose values directly affect the output of the model. What is interesting is that these parameters are learnable and can be adjusted according to the training data to achieve a set of optimal values so as to achieve the optimal output effect of the model.

  • So what are the parameters in the model?
  • What about the four layers passed in by the model in the Demo?

  • How is the model trained? To answer these questions, then: “Mr Or miss, swim for fitness, uh no. Neural networks, learn about it.”

Neural Network

A Neural Network is just a Network made of connected neurons. So what is a neuron?

The word Neuron in machine learning derives from biological neural networks — biological neurons that express “excitement” through potential changes. In machine learning, a neuron is really a unit of computation. It needs to be fed N signals to start counting (excitement), and those signals pass through
Connections with weightsTo the neuron, and the neuron computes a value by adding weights. This value is then processed by the activation function to produce an output, usually a number compressed between 0 and 1.

In the Demo, the first Layer expands the 28*28 image into a one-dimensional array containing 784 neurons.

.# first Layer
# Neurons expand into a one-dimensional array
tf.keras.layers
.Flatten(input_shape=(28.28)),...Copy the code

Second Layer:

. tf.keras.layers .Dense(128, activation='relu'),...Copy the code

Layer2 passes in the parameter activation=’relu’, meaning relu is used as the activation function. Let’s first understand what an activation function is,

When our brain receives a lot of information at the same time, it tries to make sense of it and divide it into “useful” and “less useful” information. In the case of neural networks, we need a similar mechanism to classify incoming information as “useful” or “not so useful”. This is important for machine learning because not all information is equally useful, and some information is just noise. This is the role of activation functions, which help the network use important information and suppress irrelevant data points. In the Demo, for example, Layer1 outputs 784 neurons, not all of which are active. However, only activated neurons can stimulate Layer2, and Layer4 outputs 10 neurons, among which the second neuron is activated, indicating that the probability of recognition result is 1 is 99%.

So RELU is a kind of activation function, which is used for neuron activation — calculate the last output (display) number of neuron according to the stimulus given by the previous Layer. Layer2 has 128 neurons, which will connect with 728 neurons in Layer1, generating different connections of 728 * 128 =93184 weights (weights). The output of the neuron in Layer1 is weighted with the weight values connected to Layer2, and the result is carried into the RELu function, which eventually outputs a new value as the output of the neuron in Layer2.

The third Layer

. tf.keras.layers.Dropout(0.2),
Copy the code

The Dropout Layer’s main role is to prevent overfitting. The transitional fitting phenomena are as follows: the final model has good effect on the training set; Poor performance on test sets. Weak model generalization ability. Dropout One way to solve overfitting is to drop a neuron at random. In the Demo, we use Dropout to randomly drop 20% of our neurons.

The fourth Layer

. t tf.keras.layers .Dense(10, activation='softmax')...Copy the code

There are 10 neurons on Layer4, and softmax is used as the activation function, and the output of these 10 neurons is the final result. The figure below shows the whole process of recognizing a handwritten number 1. Each layer of neurons activates layer by layer and finally outputs the prediction result.

Here, we simply understand the operation mode of a neural network by understanding the relationship between the four layers.

Model training supplement

To understand this code, we need to use an analogy to understand what it is: If a man is going to start exercising, the goal is to have a chest size of 120cm, And look trim (not too strong) :

  • After a lot of training, his chest size is 110cm, so we can putLoss = | target (120 cm) - the current (110 cm) |As a simplest ** Loss Function. ** The Loss Function in Demo uses -sparse_categorical_crossentropy, which is characterized by being good at classification.
  • The loss function alone cannot be used to judge whether the goal is achieved or not. Being fit and beautiful are also important, and Evaluation Metrics provide one.
  • Next, we need to find the rule of Loss. Loss is not only the Loss of chest circumference less than 130cm, but also the Loss of aesthetic feeling caused by chest circumference larger than 130cm. So for best results, don’t go too light or too hard. We give training elements different Weights (Weights). The weight of protein supplement is W0, the training intensity of upper thoracic muscle W1, the training intensity of middle thoracic muscle W2, the training intensity of lower thoracic muscle W3, the aerobic training intensity W4, etc. We add different Weights to the influencing factors. Finally, a set of [W1, W2…wn] is obtained. Optimization Algorithms is the way to find the best chest workout by constantly tweaking [W1, W2…wn]. After the neural network model, layer, weight, optimization algorithm, loss function and evaluation index, we can read the code in the Demo. Now try to draw a neural network flow chart, a neural network flow chart.

Training and testing

This part is easy to understand, just bring in data training and testing. Talk about Epochs. In the field of neural networks, an epoch refers to a training cycle of the entire training data set. 1 epoch = 1 Forward pass + 1 Backward pass (we can simply understand that forward pass aims to obtain predicted results, and backward pass aims to adjust to the optimal weights to minimize Loss. )

Epochs = 5 in the Demo is because one epoch may not have the optimal weight (weights). Since one time is not enough, then 5 times, 5 times is not enough, 10 times, until the effect is minimized the effect of Loss no longer changes.


conclusion

If you’ve read this carefully, I’m sure you’ve already got a bit of a general understanding of AI. This article gives you a bird’s eye view of AI without feeling like a blind man touching an elephant. It’s not magic that will instantly turn you into an AI guru, but a deeper understanding of the basic architecture will enhance your ability to teach yourself ai. Whether you are a front-end, back-end, full-stack developer, or just interested in AI, I hope this article gives you a new perspective on AI that will give you something to think about, something to think about, something to think about, something to think about, something to think about, something to gain, or something to benefit from.


The article source

If you find this article helpful after reading it, please follow my public account “Dongzawa chat technology”, thank you very much.