This article is written by ** Luo Zhouyang [email protected]** original, reprint please note the author and source. It shall not be used for commercial purposes without authorization.

Without using any deep learning framework, implement a simple neural network for classification. Hand – to – hand takes you to build the neural network, including loss function selection, has handwritten back propagation code.

Generate some data

Generate data that is not easy to categorize linearly.

import numpy as np
import matplotlib.pyplot as plt

N = 100 # Number of points generated for each category
D = 2 # Dimension of each point. Here we use plane, so it is 2-dimensional data
K = 3 # Number of categories, we generate a total of 3 category points

# All sample data, a total of 300 points, each point is represented by 2 dimensions
# All training data is a 300*2 two-dimensional matrix
X = np.zeros((N*K, D))
Hashtag data, a total of 300 points, each point for a category,
# So the tag is a 300 by 1 matrix
y = np.zeros(N*K, dtype='uint8')

# Generate training data
for j in range(K):
    ix = range(N*j, N*(j+1))
    r = np.linspace(0.0.1, N)
    t = np.linspace(j*4, (j+1) *4, N) + np.random.randn(N)*0.2
    X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
    y[ix] = j
    
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
plt.show()

Copy the code

Train a Softmax linear classifier

Using SoftMax and cross-entropy loss, train a linear classifier.

In fact, softMax is directly used to do multiple classification, and cross entropy loss is used as the loss function to train a linear classification model.

import numpy as np
import matplotlib.pyplot as plt

N = 100
D = 2
K = 3
X = np.zeros((N*K, D))
y = np.zeros(N*K, dtype='uint8')

for j in range(K):
    ix = range(N*j, N*(j+1))
    r = np.linspace(0.0.1, N)
    t = np.linspace(j*4, (j+1) *4, N) + np.random.randn(N)*0.2
    X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
    y[ix] = j
    
# plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
# plt.show()

Initialize weights and offsets
W = 0.01 * np.random.randn(D, K)
b = np.zeros((1, K))

step_size = 1e-0
reg = 1e-3 # regularization strength


# Obtain the number of training samples
num_examples = X.shape[0]

for i in range(200) :# Calculate the classification score
    scores = np.dot(X, W) + b

    # Calculate softMax score
    exp_scores = np.exp(scores)
    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

    # Use cross entropy loss
    correct_log_probs = -np.log(probs[range(num_examples), y])

    # Calculate the data loss of the training set by dividing the total loss by the number of samples
    data_loss = np.sum(correct_log_probs) / num_examples
    # To calculate reg loss, use L2 reg
    # reg is lambda
    reg_loss = 0.5 * reg * np.sum(W * W)
    Calculate the total loss function
    loss = data_loss + reg_loss
    
    if i%10= =0:
        print("iteration %4d loss: %f" % (i, loss))

    # Calculate the gradient, back propagation
    Dscores = probs??
    dscores = probs
    dscores[range(num_examples), y] -= 1
    dscores /= num_examples

    dW = np.dot(X.T, dscores)
    db = np.sum(dscores, axis=0, keepdims=True)
    dW += reg * W # The gradient of the regular terms, dW is not the first time, must be accumulated

    # Update parameters
    W += -step_size * dW
    b += -step_size * db

    
# Estimate accuracy after training
scores = np.dot(X, W) + b
# Pick the highest probability category in the second dimension (category dimension)
predicted_class = np.argmax(scores, axis=1)
print("Training accuracy: %.2f" % (np.mean(predicted_class == y)))

Copy the code
Iteration 0 loss: 1.097993 Iteration 10 Loss: 0.908688 Iteration 20 Loss: 0.838372 Iteration 30 loss: 0.806482 Iteration 40 loss: 0.789911 Iteration 50 Loss: 0.780488 Iteration 60 loss: 0.774783 Iteration 70 loss: 0.771169 Iteration 80 loss: 0.768801 Iteration 90 Loss: 0.767208 Iteration 100 Loss: 0.766114 Iteration 110 loss: 0.765351 Iteration 120 Loss: 0.764811 Iteration 130 Loss: 0.764425 Iteration 140 Loss: 0.764147 Iteration 150 loss: 0.763945 Iteration 160 Loss: 0.763797 Iteration 170 Loss: 0.763688 Iteration 180 loss: 0.763608 Iteration 190 Loss: 0.763548 Training accuracy: 0.51Copy the code

In the above code, there is a question: whydscores = probs?

Softmax function is a normalized probability vector, we useThat’s the probability of class K. So:

Then our cross entropy loss is:

So, there is:

This formula shows that by increasing the number of points correctly classified, the loss can be reduced!

Hypothetical probability vector

And the second 0.3 is the probability of correct classification. So what does our gradient look like?

According to the above formula, only in the correct classification, the gradient becomes the original classification probability -1, and in other positions, the gradient is equal to the original classification probability, namely:

And isn’t f here just our scores function?

That is to say,

So, forThe place where weIt’ll be ok.

Thus, we have the following formula:

* dscores * dscores * dscores
dscores = probs
# The predicted classification is exactly the case where the correct classification is required to subtract 1 from the gradient value at that position
dscores[range(num_examples), y] -= 1
The average #
dscores /= num_examples

Copy the code

Anyway, this is because of the nature of softMax itself!!

As you can see, the correct rate is only 0.51.

Which is to be expected, because the data is not linear. Of course, the linear classifier is not very effective.

Stanford CS231N has a diagram showing the decision boundaries of this model:

Train a neural network

The above SoftMax linear classifier does not work well. Let’s try training a neural network.

The code is as follows:

import numpy as np

N = 100
D = 2
K = 3

X = np.zeros((N*K, D))
y = np.zeros(N*K, dtype='uint8')

for j in range(K):
    ix = range(N*j, N*(j+1))
    r = np.linspace(0.0.1, N)
    t = np.linspace(j*4, (j+1) *4, N) + np.random.randn(N)*0.2
    X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
    y[ix] = j

h = 100 # Number of hidden layer neurons

# First layer weights and bias initialization
W1 = 0.01 * np.random.randn(D, h)
b1 = np.zeros((1, h))

# Weights and bias initialization for the second layer
W2 = 0.01 * np.random.randn(h, K)
b2 = np.zeros((1, K))

step_size = 1e-0
reg = 1e-3 # regularization strength

# Obtain the number of training samples
num_examples = X.shape[0]

for i in range(10000) :Calculate the output of the first hidden layer, using the ReLU activation function
    hidden_layer = np.maximum(0, np.dot(X, W1) + b1)
    Calculate the result of the output layer, which is the final classification score
    scores = np.dot(hidden_layer, W2) + b2
    
    # softmax
    exp_scores = np.exp(scores)
    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # [N x K]
  
    # Calculate the loss, same as before
    correct_logprobs = -np.log(probs[range(num_examples),y])
    data_loss = np.sum(correct_logprobs)/num_examples
    reg_loss = 0.5*reg*np.sum(W1*W1) + 0.5*reg*np.sum(W2*W2)
    loss = data_loss + reg_loss
    
    if i % 1000= =0:
        print ("iteration %4d loss %f" % (i, loss))
  
    Calculate the gradient of scores
    dscores = probs
    dscores[range(num_examples),y] -= 1
    dscores /= num_examples
  
    # Calculate the gradient, back propagation
    dW2 = np.dot(hidden_layer.T, dscores)
    db2 = np.sum(dscores, axis=0, keepdims=True)
    
    # Backpropagating hidden layers
    dhidden = np.dot(dscores, W2.T)
    # Back propagates the ReLu function
    dhidden[hidden_layer <= 0] = 0
    
    dW1 = np.dot(X.T, dhidden)
    db1 = np.sum(dhidden, axis=0, keepdims=True)
    
    # plus the regular term
    dW2 += reg * W2
    dW1 += reg * W1
    
    # Update parameters
    W1 += -step_size * dW1
    b1 += -step_size * db1
    W2 += -step_size * dW2
    b2 += -step_size * db2

At the end of training, estimate the correct rate
hidden_layer = np.maximum(0, np.dot(X, W1) + b1)
scores = np.dot(hidden_layer, W2) + b2
predicted_class = np.argmax(scores, axis=1)
print("Training accuracy: %.2f" % (np.mean(predicted_class == y)))

Copy the code
Iteration 0 loss 1.109818 Iteration 1000 loss 0.277248 Iteration 2000 loss 0.202578 iteration 3000 loss 0.192406 Iteration 4000 loss 0.189857 Iteration 5000 loss 0.189404 Iteration 6000 loss 0.189292 Iteration 7000 loss 0.189199 Iteration 8000 Loss 0.189143 Iteration 9000 loss 0.189097 Training accuracy: 0.99Copy the code

As you can see, the accuracy rate has increased to 0.99.

Stanford CS231N also has a graph showing the decision boundary of this neural network:

To contact me

  • Email: [email protected]
  • WeChat: luozhouyang0528
  • Personal public account, you may be interested in