Machine Learning from zero - Logistic regression to recognize handwritten characters!

Previous logistic regression article: Machine learning from 0 – Logistic regression principles and Combat! I shared the basics of logistic regression and how to classify a simple data set.

Today, Deng Long will share with you how to use logistic regression to classify handwriting [0-9] these 10 characters, the data set is as follows:

Here I will take you step by step to write the key code, the complete code in my Github repository: logistic_reg

First, load handwritten character data

1.1 Reading the data set

raw_X, raw_y = load_data('ex3data1.mat')

# 5000 x 400
print(raw_X.shape)

# 5000 print(raw_y.shape) Copy the code

The dataset has 5000 samples, each of which is a 20 x 20 = 400 pixel handwritten character image:

The handwritten character recognition problem belongs to supervised learning, so we have the real label Y of the training set, with dimension 5000, representing the real number of the 5000 samples in the training set:

1.2 Add all 1 vectors

As usual, add a column of vectors with all 1s before the first column of the training sampleVectorized by multiplication) :

Add the first column full 1 vector
X = np.insert(raw_X, 0, values = np.ones(raw_X.shape[0]), axis = 1)
# 5000 rows, 401 columns
X.shape
Copy the code

After adding one column, the sample becomes 5,000 rows and 401 columns.

1.3 Vectorization labels

Changing the original label (5000 rows and 1 column) to (5000 rows and 10 columns) is equivalent to replacing each real label with a vector of 10 positions:

Represent each category in the original tag with a row vector
y_matrix = []

# k = 1 ... 10
Set raw_y to 1 when raw_y == k, 0 otherwise
for k in range(1.11) : y_matrix.append((raw_y == k).astype(int)) Copy the code

Each row of the changed vector label represents a label, but is represented by 10 positions. For example, the number 1 corresponds to the first position 1, the number 2 corresponds to the second position 1, and so on, but note that the number 0 corresponds to the 10th position 1:

And each column represents all the same characters in the original tag value. For example, the first column represents the real tag value of all the numbers 1, the second column represents the real tag value of all the numbers 2, and so on, and the 10th column represents the real value of the number 0:

Because we are loading Matlab data files of type. Mat, and Matlab index starts from 1, so the original data set uses the number 0 in column 10, but for Python processing, we move the number 0 in column 10 to the first column. To arrange the columns in numerical order [0-9] :

# Since the Matlab subscript starts at 1, raw_y uses 10 for tag 0
# here we move the row vector of tag 0 to the first row
y_matrix = [y_matrix[- 1]] + y_matrix[:- 1]
Copy the code

This is the picture of the original experiment, the principle is the same, can be compared to understand (there is no move of the 10th column here oh) :

Why do you do that? The main purpose is to complete the task of predicting multiple numbers at once.

Second, training model

The principles of logistic regression and regularization have all been covered before, for those of you who haven’t seen them:

Machine learning from zero – Logistic regression principles and Combat!
Machine learning from zero – Regularization technology principles and programming!

I’m just going to put in the key functions and explain a little bit.

2.1 Logistic regression hypothesis function

Suppose the function uses the usual sigmoid function:

def sigmoid(z):
    return 1 / (1 + np.exp(-z))
Copy the code

2.2 Logistic regression cost function

def cost(theta, X, y):
    return np.mean(-y * np.log(sigmoid(X @ theta)) - (1 - y) * np.log(1 - sigmoid(X @ theta)))
Copy the code

2.3 Logistic regression regularizes the cost function

def regularized_cost(theta, X, y, l=1):
    theta_j1_to_n = theta[1:]
    
    # regularization cost
    regularized_term = (l / (2 * len(X))) * np.power(theta_j1_to_n, 2).sum()
  return cost(theta, X, y) + regularized_term Copy the code

2.4 Gradient Calculation

def gradient(theta, X, y):
    return (1 / len(X)) * X.T @ (sigmoid(X @ theta) - y)
Copy the code

2.5 Regularized gradient

Add regularized gradient to the end of the original gradient:

def regularized_gradient(theta, X, y, l=1):
    theta_j1_to_n = theta[1:]
    
    # regularize gradient
    regularized_theta = (l / len(X)) * theta_j1_to_n
  Theta_0 is not regularized  regularized_term = np.concatenate([np.array([0]), regularized_theta])   return gradient(theta, X, y) + regularized_term Copy the code

2.6 Logistic regression training function

Optimize with scipy. Optimize:

Logistic regression function    args:
X: eigenmatrix, (m, n + 1), first column all 1 vectorY: label matrix, (m,)L: Regularization coefficient Return: Parameter vector for training"" " def logistic_regression(X, y, l = 1):  Save the training parameter vector, the dimension is the column number of the feature matrix, that is, the feature number + 1  theta = np.zeros(X.shape[1])   # Use regularization cost and gradient training  res = opt.minimize(fun = regularized_cost,  x0 = theta,  args = (X, y, l),  method = 'TNC'. jac = regularized_gradient,  options = {'disp': True})   Get the final training parameters  final_theta = res.x   return final_theta Copy the code

3. Training model

We first train the model so that it can recognize a single number 0, y[0] (5000 rows and 1 column) representing all samples with real label value 0. Refer to the vectorization label mentioned above:

theta_0 = logistic_regression(X, y[0])
Copy the code

The predicted result theTA_0 (401 row, 1 column) is the parameter vector corresponding to the handwritten character 0.

4. Predict the number of training set 0

We use theTA_0 parameter of training to predict the accuracy of all character images being zero in the following training set:

def predict(x, theta):
    prob = sigmoid(x @ theta)
    return (prob >= 0.5).astype(int)
Copy the code

The predicted value of # character 0, also 5000 rows and 1 column
y_pred = predict(X, theta_0)
Copy the code

Y_pred is a vector of 5000 rows and 1 column, the elements are only 0 and 1, 1 means the sample predicted value is 0, 0 means the predicted value is not 0.

Then we compare the predicted value with the real value and calculate the average value of the error as the output precision:

Print the accuracy of the predicted number 1
print('Accuracy = {}'.format(np.mean(y[0] == y_pred)))

Accuracy = 0.9974
Copy the code

It is shown that the image accuracy rate of handwritten digit 0 in the training set is about 99.74%. This is just sorting one number, but let’s do it again and sort all 10 numbers.

Five, categorize 10 numbers

With only one character 0 trained and predicted above, we can use the for loop to train all 10 characters in the same way as the single digits above:

# Train theta_[0 -> 9] parameter vectors for the 10 categories 0-9
theta_k = np.array([logistic_regression(X, y[k]) for k in range(10)])
Copy the code

Theta_k is the argument vector corresponding to 10 numbers (each row represents one argument vector) :

# 10 rows 401 columns
print(theta_k.shape)
Copy the code

To predict the eigenmatrix, note that theta_k is transposed here to perform matrix multiplication:

# X(5000, 401), theta_k.T(401, 10)
prob_matrix = sigmoid(X @ theta_k.T)

# prob_matrix(5000, 10)
prob_matrix
Copy the code

Print the forecast matrix (5000 rows and 10 columns) :

Put the index of the largest column in each row into y_pred to represent the predicted number:

y_pred = np.argmax(prob_matrix, axis = 1)

# (5000, 1)
print(y_pred.shape)

y_pred Copy the code

At this point, y_pred becomes 5000 rows and 1 column, and each row is the number recognition result predicted by the model. Then replace 10 in the real label with 0:

# replace 10 with 0
y_answer[y_answer == 10] = 0
Copy the code

Print out the prediction accuracy of each handwritten number in the training set:

print(classification_report(y_answer, y_pred))
Copy the code

It can be seen that the prediction effect of each character on the training set can reach more than 90%, indicating that the prediction effect of this model on the training set is relatively good.

OK, today share these, I hope you can practice more! Link to the full runnable code: logistic_reg

Remember to come back to me a Star oh (^▽^)!

Machine learning, algorithmic programming, Python, robotics and other original articles, scan code attention to reply to “1024” you know!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Machine Learning from zero – Logistic regression to recognize handwritten characters!

First, load handwritten character data

1.1 Reading the data set

1.2 Add all 1 vectors

1.3 Vectorization labels

Second, training model

2.1 Logistic regression hypothesis function

2.2 Logistic regression cost function

2.3 Logistic regression regularizes the cost function

2.4 Gradient Calculation

2.5 Regularized gradient

2.6 Logistic regression training function

3. Training model

4. Predict the number of training set 0

Five, categorize 10 numbers

Machine Learning from zero – Logistic regression to recognize handwritten characters!

First, load handwritten character data

1.1 Reading the data set

1.2 Add all 1 vectors

1.3 Vectorization labels

Second, training model

2.1 Logistic regression hypothesis function

2.2 Logistic regression cost function

2.3 Logistic regression regularizes the cost function

2.4 Gradient Calculation

2.5 Regularized gradient

2.6 Logistic regression training function

3. Training model

4. Predict the number of training set 0

Five, categorize 10 numbers

Related Posts

Horovod (2) — A distributed training framework for deep learning — from the user’s perspective

Flink ® 2021 Flink ® series

OpenCV + Python to achieve video vehicle detection