Chapter 1 Basic concepts of TensorFlow

TensorFlow Tutorial: Understanding Deep Learning

TensorFlow tutorial: Tensor Basics

TensorFlow tutorial: Session basics

TensorFlow TensorFlow Graph tutorial

Let’s start with linear regression

Today we can finally start making a nerve, which is a little exciting to think about. In this paper, we develop a linear regression predictive network model from our understanding of neural networks to our own step by step.

We’re going to use Numpy’s Arange () method to generate numbers 0 through 25 and then shape that number into a matrix of 25 rows and 1 column using the 0 0 () method

X = np.arange(0, 25,1).0 0Copy the code

And then, by linear variation, you compute the value. To make TensorFlow a little more difficult, we add some noise to the resulting value so that the points it generates are distributed around the linear.

# Generate random number as noiseWhose = np. Random. Normal (0, 10, x.s hape) astype (np) float32)Make a linear change of x*4+10 and add noise to the result
y = x*4 + 10 + noise
Copy the code

Now that our data is ready, we will train the model through a fully connected neural network to see if TensorFlow can perform regression on our data well.

Before we really get started, let’s take a look at our final results.

Understand how a neuron (node) works

This is the simplest three-layer fully connected neural network structure, which contains: an input layer, a hidden layer and an output layer. A fully connected neural network is one in which every node is connected to each other. We can see that each white circle (node) in the figure is connected to the nodes in the next layer, which is the most typical fully connected neural network structure.

To help you understand what happens at each node in a fully connected network structure, let’s take a magnifying glass and look at what happens between the input layer and the hidden layer.

First, our input node is for example a matrix of two rows and one column: [x1,x2]. When designing a fully connected neural network with TensorFlow, we often see the following code:

weight = tf.get_variable('weight', [input_size, output_size], dType =tf.float32, Initializer = tf.truncated_NORMal_Initializer (stddev=0.1))Copy the code

This is the code that initializes our w, just to make it easier for you to understand. I have weighted the nodes between the input layer and the hidden layer. Where, w11 represents the weight value between x1 and A1 nodes, w21 represents the weight value between X2 and A1, and so on.

We assume that x1 has a value of 1 and x2 has a value of 2, where the values of W between each node are shown in the figure. At this point, our input layer is a matrix of 1 row and 2 columns: [[1,2]]. And we weight is a 2 line 3 columns of the matrix: [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]] of the matrix.

At this point we can calculate the values of the corresponding A1, A2 and A3 matrices through matrix multiplication.

A1 = x1*w11+x2*w21 = 1*0.1 + 2*0.4 = 0.9

A2 = x1*w12+x2*w22 = 1*0.2 + 2*0.5 = 1.2

A3 = x1* W13 +x2* W23 = 1*0.3 + 2*0.6 = 1.5

This gives us the value of each node in the hidden layer. In the end, by the same process, we can also calculate the value of the output layer, and this is the value of the output. This continuous derivation from the input layer to the output layer is our forward propagation process.

So let’s define a way to generate our neural layer.

def get_layer(x,input_size,output_size):
    Initialize our weight weight
    weight = tf.get_variable('weight', [input_size, output_size], dType =tf.float32, Initializer = tf.truncated_NORMal_Initializer (stddev=0.1))# Through tf.matmul matrix multiplication, we can calculate the node matrix of the next layer
    output = tf.matmul(x, weight)
    # output a neural layer of our germ layer
    return output

# Generate our hidden layer through the method we definedGet_layer (x, 2, 3)Copy the code

We define a method to compute and generate our neural layer. We take 3 parameters. The first parameter is the input matrix. The second parameter is the number of nodes in the upper layer; The third parameter is the number of nodes contained in the neural layer we generate.

For example, in the output and hidden layer example above, the first parameter is the matrix of the input layer [[1,2]]; The second parameter is the number of nodes in the input layer 2; The third parameter is the number of nodes we generated the hidden layer 3.

First, we use tf.get_variable() method to generate variables, initialize our weight value through normal random distribution, and then calculate matrix multiplication by tf.matmul() method to get the node matrix of our hidden layer, which is our newly generated neural layer, and return it.

Don’t forget the bias term

We know that the linear equation y is equal to wx, so it must be a line through the point 0,0. And a lot of the data that we get in the real world, of course, is not distributed around y equals wx. So in this case we give y =wx + b, so that our regression line can move more flexibly to return to our data, where b is what we call the bias term.

To sum up, we must add a paranoid term to the fully connected neural network, so that our neural layer can be more flexible.

Let’s modify the way we generate neural layers:

def get_layer(x,input_size,output_size):
    Initialize our weight weight
    weight = tf.get_variable('weight', [input_size, output_size], dType =tf.float32, Initializer = tf.truncated_NORMal_Initializer (stddev=0.1))Initialize our bias item
    bias = tf.get_variable('bias', [1, output_size], dType =tf.float32, Initializer = tf.constant_Initializer (0.1))# Through tf.matmul matrix multiplication, we can calculate the node matrix of the next layer
    output = tf.matmul(x, weight) + bias
    # output a neural layer of our germ layer
    return output

# Generate our hidden layer through the method we definedGet_layer (x, 2, 3)Copy the code

In the method of generating neural layer, we added the code to initialize the variable of bias bias term. We usually only need to initialize the bias term into a matrix of 1 row and N columns, where N is the number of nodes we generate neural layer. Usually, we directly initialize the value of 0.1, and then add our bias items to the generated neural layer matrix during the calculation of output.

What is the loss function

Loss function is an important link in neural network. Loss function refers to calculating the deviation between models and samples by defining functions. In the process of training the model, we will try our best to make the model learn the data in the sample so as to reduce the loss value.

For example, in the loss function of this example, we use the mean square error loss function. We calculate the square error between the predicted value of the model and the actual value, and the goal of our model optimization is to reduce the loss value.

For example, the black line is the straight line fitted by our model, and the blue point is the actual y value of our sample. We subtract the predicted value from the y value, and the difference between the predicted value and the sample is the distance between the predicted value and the sample. I’m squaring it to get rid of the difference between the positive and the negative. And then we add up all the distances and divide by the average, and we get the average distance. So that’s the error between our model and the sample.

Going back to the data at the beginning of the article, our x is a number between 0 and 24, and we calculate our prediction by defining a hidden layer and an output layer.

By calculating the mean square error between the predicted result and the actual Y value, the distance between the predicted value and the loss value is obtained.

# hidden layerL1 = get_layer (x 1, 25,'l1')
# output layer_y = get_layer (l1, 25, 1,'output')
# Define the loss function
loss = tf.reduce_mean(tf.square(y-_y))
Copy the code

And then our goal is to constantly reduce the loss function, to optimize our weight value, to achieve the optimal model.

However, it is worth noting that the mean square error is suitable for linear regression models, but different practical problems require different loss functions. Later in the tutorial, we will combine different problems to learn different loss functions.

What is the learning rate

Before we get to know what learning rate is, let’s familiarize ourselves with the following figure. So this is a loss function and the weight, and this is a plot of the loss and the weight, assuming that the curve is a w squared. The ultimate goal of our neural network is to find a weight that minimizes the loss.

In calculus, we can look for extreme values by having the derivative of a function equal to 0. In this example, the gradient of parameter x is: f'(w) = 2W.

The learning rate η is used to define the amplitude of each parameter update. Intuitively, it can be considered that the learning rate defines the amplitude of each parameter movement. The formula is:

Assuming that the initial value of the parameter is 5 and the learning rate is 0.3, the optimization process of weight optimization is shown in the following table:

Number of rounds Current round count weight value Gradient x learning rate Updated weight parameter value
1 5 2 x 5 x 0.3 = 3 5-3 = 2
2 2 2 x 2 x0. 3 = 1.2 2-1.2 = 0.8
3 0.8 2 * 0.8 * 0.3 = 0.48 0.8-0.48 = 0.32

After three rounds, the weight value is 0.32, which is constantly moving toward the optimal weight weight of 0.

In the neural network process, the optimization process can be divided into two stages. In the first stage, the predicted value is calculated by the forward propagation algorithm, and the difference between the predicted value and the actual value is compared. In the second stage, the gradient of the loss function for each parameter is calculated by the back propagation algorithm. Then the gradient descent algorithm is used to update each parameter according to the gradient and learning rate.

In TF, methods are already packaged for us.

# Through the gradient descent algorithm, the learning rate is defined as 0.001 and the loss value is minimized"Train" = tf. Train. GradientDescentOptimizer (0.001). Minimize (loss)Copy the code

One small step for linearity, one giant leap for nerves

Complete code:

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf;

# Generate dataX = np.arange(0, 25,1).0 0# Generate noiseNormal (0,10,x.shape). Astype (np.float32) y = x*4 + 10+noise# Draw via matplotlibFIG = plt.figure() ax = add_subplot(1,1,1) ax.scatter(x,y) plt.ion()# please comment this run, do not comment global run
plt.show()

def get_layer(x,input_size,output_size,name):
    with tf.variable_scope(name):
        with tf.name_scope('Weight') :Generate the weight weight value
            weight = tf.get_variable('weight', [input_size, output_size], dtype=tf.float32, , initializer = tf. Truncated_normal_initializer (stddev = 0.1)) with tf. Name_scope ("Bias") :Generate bias items
            bias = tf.get_variable('bias', [1, output_size], dType =tf.float32, Initializer = tf.constant_Initializer (0.1) Output = tf.matmul(x, weight) + biasreturn output

# a placeholder to store input data. Specific values are entered using feed_dictInput_x = tf.placeholder(dtype=tf.float32, shape=[25,1], name= tf.placeholder(dtype=tf.float32, shape=[25,1], name= tf.placeholder'x_input') input_y = tf.placeholder(dtype=tf.float32, shape=[25,1], name= tf.placeholder(dtype=tf.float32, shape=[25,1], name= tf.placeholder'y_input')

Generate a hidden layerL1 = get_layer (input_x, 1, 25,'l1')

# Generate the output layer to get the prediction results_y = get_layer (l1, 25, 1,'output')

# Define the loss function
loss = tf.reduce_mean(tf.square(input_y-_y))

# Train the model by gradient descent algorithm"Train" = tf. Train. GradientDescentOptimizer (0.001). Minimize (loss) with tf. The Session () as sess:Initialize variables
    sess.run(tf.global_variables_initializer())
    # Train 10,000 steps
    for i in range(10000):
        Enter the x and y values of the training samples into input_x and input_y using feed_dict
        sess.run(train,feed_dict={input_x: x, input_y: y})
        # Redraw the line every 100 steps
        if i % 100 == 0:
            try:
                ax.lines.remove(lines[0])
            except Exception:
                pass
            prediction = sess.run(_y,feed_dict={input_x:x,input_y:y})
            lines = ax.plot(x, prediction, 'r-'PLT, lw = 5). Pause (0.1)Copy the code

Above is the complete code of this section. We generated a hidden layer of 25 nodes and finally output 25 predicted values, calculated the gap with the actual value through the mean square error loss function, and then updated our weight weight value continuously through the gradient descent algorithm to narrow the gap with the actual value.

We trained the model 10,000 steps through the for loop, updating the line every 100 steps. By running the code, we can see that our line will slowly adjust as the number of steps in the model increases, better fitting our actual value.

analyse

This section uses a simple linear regression example to understand how neural network forward propagation works, what are loss functions and paranoid terms, and finally understand the concepts of gradient descent algorithms and learning rates. Hopefully, through this simple case, you can have a most intuitive feeling of neural networks.