MNIST handwritten digital introduction

1. Get samples

The MNIST database of handwritten numbers is available from this page, with a training set of 60,000 examples and a test set of 10,000 examples. It is a subset of the larger collection provided by NIST. These numbers have been dimensioned and centered on fixed-size images.

Download link: http://yann.lecun.com/exdb/mnist/

The sample contains four parts:

Train-images-idx3-ubyte. gz (9.9MB, 47 MB, 60,000 samples)

Train-labelages-idx1-ubyte. gz (29 KB, 60 KB after decompression, contains 60,000 labels)

Test set images: t10K-images-idx3-ubyte. gz (1.6mb, 7.8MB, contains 10,000 samples)

Test set labels: t10K-labelages-idx1-ubyte. gz (5KB, uncompressed 10 KB, containing 10,000 labels)

2. MNIST analysis

MNIST is a classic introductory demo of deep learning. It is composed of 60,000 training pictures and 10,000 test pictures, each of which is 28*28 in size (as shown below), and all of which are black and white (black here is a floating point number 0-1, the darker the black is, the closer the value is to 1). The pictures were taken of different people writing numbers from 0 to 9 by hand. TensorFlow encapsulates this data set and related operations into a library. Let’s take a step-by-step look at the process of deep learning MNIST.

Here are four MNIST images. These images are not in the traditional PNG or JPG format, because PNG or JPG format has a lot of interference information (such as: data block, image header, image tail, length, etc.), these images will be processed into a very simple two-dimensional array, as shown in the figure:

And you can see that the pattern of the values in the matrix is very similar to the pattern on the left. The reason for doing this is to make the model simpler and clearer. The characteristics are more obvious.

Two, kNN principle

1. Overview of kNN algorithm

The core idea of kNN algorithm is that if most of the k closest samples in the feature space belong to a certain category, then the sample also belongs to this category and has the characteristics of the samples in this category. This method only determines the classification of the samples according to the category of the nearest one or several samples.

2. KNN algorithm introduction

The simplest and most elementary classifier is to record the corresponding categories of all training data. When the attributes of the test object match those of a certain training object, it can be classified. But how is it possible that all test objects can find training objects that match exactly with them? Secondly, there is the problem that a test object matches multiple training objects at the same time, leading to the problem that a training object is divided into multiple classes. Based on these problems, kNN is generated.

KNN is classified by measuring the distance between different characteristic values. The idea is that if most of the k most similar samples in the feature space belong to a certain category, then the sample also belongs to this category. K is usually an integer not greater than 20. In kNN algorithm, the selected neighbors are correctly classified objects. This method only determines the classification of the samples according to the category of the nearest one or several samples.

Here is a simple example:

As shown below, which class should the green circle be assigned, the red triangle or the blue square? If K is equal to 3, the green circle will be assigned to the red triangle class because the proportion of red triangles is 2/3, and if K is equal to 5, the green circle will be assigned to the blue square class because the proportion of blue squares is 3/5.

It also shows that the result of kNN algorithm depends largely on the choice of K.

In kNN, the matching problem between objects is avoided by calculating the distance between objects as the dissimilarity index between objects. Here, the distance is generally used as Euclidean distance or Manhattan distance:

At the same time, kNN makes decisions based on the dominant category of K objects rather than a single object category. These two points are the advantages of kNN algorithm.

Next, the idea of kNN algorithm is summarized as follows:

Is concentrated in training data and the condition of known labels, enter test data, the characteristics of the test data and training focused on to compare the characteristics of the corresponding, and find the most similar of training focus and former K data, then the test data, the corresponding category is K data appear most frequently in the classification, the algorithm is described as:

1) Calculate the distance between test data and each training data;

2) Sort according to the increasing relationship of distance;

3) Select K points with the smallest distance;

4) Determine the occurrence frequency of the category of the first K points;

5) Return the category with the highest frequency in the first K points as the prediction classification of test data.

The original link: https://www.cnblogs.com/sxron/p/5451923.html

Iii. Illustration

1. Load MNIST data

TensorFlow has prepared a script to automatically download and import MNIST datasets. It automatically creates a’ MNIST_data’ directory to store data.

import tensorflow as tf
import numpy as np
import random 
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data',one_hot=True)
# one_hot introduced: https://blog.csdn.net/lanhaier0591/article/details/78702558
Copy the code

Here, mnist is a lightweight class. It stores training, checksum, and test data sets as Numpy arrays. It also provides a function to get the MiniBatch in an iteration, which we’ll use later.

2. Set properties

trainNum = 60000
# Total training pictures
testNum = 10000
# Test the total number of images
trainSize = 500
# Number of images used during training
testSize = 5
# Number of images used when testing
k = 4
# K images with the smallest distance
Copy the code

3. Data decomposition

We said that each image is 28 * 28 = 784 pixels, so in testData.shape= (5, 784), 5 represents the number of images and 784 represents the number of pixels per image.

Testlabel. shape= (5, 10)

First take a look at:

testLabel= [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]

[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]

[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]

The first label [0. 0. 0. 0. 0. 0.] in testLabel is the number 6, which corresponds to the first data in testData. Similarly, the second data is labeled 0, the third data is labeled 4, the fourth data is labeled 0, and the fifth data is labeled 6.

trainIndex = np.random.choice(trainNum,trainSize,replace=False)
# Generate trainSize random numbers with no repetition in trainNum data
testIndex = np.random.choice(testNum,testSize,replace=False)

trainData = mnist.train.images[trainIndex]
The image of the training dataset is mnist.train.images
trainLabel = mnist.train.labels[trainIndex]
Mnist.train. Labels for the training dataset
testData = mnist.test.images[testIndex]
The image of the test dataset is mnist.test.images
testLabel = mnist.test.labels[testIndex]
Mnist.test.labels = mnist.test.labels = mnist.test.labels
Copy the code

print('trainData.shape=',trainData.shape)
print('trainLabel.shape=',trainLabel.shape)
print('testData.shape='.testData.shape)
print('testLabel.shape='.testLabel.shape)
print('testLabel='.testLabel)
Copy the code

Results:

trainData.shape= (500, 784)
trainLabel.shape= (500, 10)
testData.shape= (5, 784)
testLabel.shape= (5, 10)
testLabel= [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] [1. 0. 0. 0. 0. [0. 0. 0. 0. 0. 0. 0. 0.] [0.Copy the code

4. Data training

(1) Set variables

Tf.placeholder (dtype, shape=None, name=None) Placeholder (dtype, shape=None, name=None) placeholder(dtype, shape=None, name=None)

Dtype: indicates the data type. Tf. float32, tF. float64 and other numerical types are commonly used

Shape: Data shape. The default is None, which is a one-dimensional value. It can also be multidimensional, such as [2,3], [None, 3], which indicates that the column is 3 and the row is variable

Name: indicates the name.

trainDataInput = tf.placeholder(shape=[None,784],dtype=tf.float32)
trainLabelInput = tf.placeholder(shape=[None,10],dtype=tf.float32)
testDataInput = tf.placeholder(shape=[None,784],dtype=tf.float32)
testLabelInput = tf.placeholder(shape=[None,10],dtype=tf.float32)
Copy the code

(2) Calculate the distance of kNN

Using Manhattan distance:

f1 = tf.expand_dims(testDataInput,1) 
# expand_dim() to add dimensions
f2 = tf.subtract(trainDataInput,f1)
# Subtract () to obtain a 3D data
f3 = tf.reduce_sum(tf.abs(f2),reduction_indices=2)
# tf.abs(
# tf.reduce_sum() complete the data accumulation and put the data into F3
Copy the code

with tf.Session() as sess:
    p1 = sess.run(f1,feed_dict={testDataInput:testData[0:5]})
    print('p1=',p1.shape)
    # p1= (5, 1, 784)
    p2 = sess.run(f2,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5]})
    print('p2=',p2.shape)
    # p2= (5, 500, 784) 
    p3 = sess.run(f3,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5]})
    print('p3=',p3.shape)
    # p3= (5, 500)
    print('p3 [0, 0] =', p3 [0, 0])# p3[0,0]= 132.96472, indicating that the distance between the first test image and the first training image is 132.96472
Copy the code

Results:

P2 = (5, 500, 784) p3= (5, 500) p3[0,0]= 132.96472Copy the code

(3) Select K pictures with the smallest distance

Tf.nn. top_k(input, k, name=None) Returns the maximum number of k rows in the input and the index of their positions.

Input parameters:

Input: a tensor whose data type must be one of the following: Float32, float64, Int32, Int64, Uint8, Int16, int8. The data dimension is batch_size times x categories.

K: an integer that must be greater than or equal to 1. In each row, find the largest k values.

Name: Give this operation a name.

Output parameters:

A tuple Tensor, values indices

Values: a tensor of the same data type as input. The data dimension is batch_size times k Max.

Indices: a tensor with data type INT32. Index position of each maximum value in the input.

f4 = tf.negative(f3)
# tf. Negative (x,name=None), take the negative operation (f4 = -f3 = -132.96472)
f5,f6 = tf.nn.top_k(f4,k=4) 
# f5, select the four largest values of F4, namely the four smallest values of F3
# f6, the index for these four values
Copy the code

with tf.Session() as sess:
    p4 = sess.run(f4,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5]})
    print('p4=',p4.shape)
    print('p4 [0, 0] =', p4 [0, 0]) p5, p6 = sess. Run ((f5, f6), feed_dict = {trainDataInput: trainData.testDataInput:testData[0:5]})
    # p5= (5, 4), each test picture (a total of 5 pictures) corresponds to 4 recent training pictures, a total of 20 pictures
    print('p5=',p5.shape)
    print('p6=',p6.shape)
    print('p5',p5)
    print('p6',p6)
Copy the code

Results:

P4 = (5, 500) p4 [0, 0] = 132.96472 p5 = (5, 4) p6 = (5, 4) P5 = [[-54.49804-54.87059-55.690197-59.97647] [-49.09412-64.74118-68.22353-68.76863] [-65.36079-69.278435 -72.60785-74.84314] [-75.46667-78.19216-78.36864-80.44706] [-42.478436-61.517654-62.36863-63.42353]] P6 = [[150] [402 268 279 164] [300 97 78 237] [387 164 268 311] [258 107 226 207]]Copy the code

(3) Determine the frequency of K images in the type

Tf.gather () collects slices from the parameter axis based on the index. The index must be an integer tensor of any dimension (usually 0-d or 1-d). Shape [:axis] + indices. Shape + params.shape[axis + 1:]

Input parameters:

Params: a tensor. This tensor is used to collect values. The rank of this tensor must be at least Axis + 1.

Indices: a tensor The value must be one of the following types: INT32, int64. The index tensor must be in the range [0, params.shape[axis]).

Axis: A tensor. The value must be one of the following types: INT32, int64. Collect indexes from the parameter axis. Default is the first dimension. Negative indexes are supported.

Name: Indicates the operation name (optional).

Output parameters:

This function returns a tensor. Has the same type as a parameter. Parameter values are collected from the given index and have the shape of params.shape[:axis] + indices.shape + params.shape[axis + 1:].

f7 = tf.gather(trainLabelInput,f6)
Select tag from index
f8 = tf.reduce_sum(f7,reduction_indices=1)
Add the value of dimension 1
f9 = tf.argmax(f8,dimension=1)
# returns the index number of the maximum value in f8
Copy the code

with tf.Session() as sess:
    p7 = sess.run(f7,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5],trainLabelInput:trainLabel})
    print('p7=',p7.shape)
    print('p7[]',p7)
    p8 = sess.run(f8,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5],trainLabelInput:trainLabel})
    print('p8=',p8.shape)
    print('p8[]=',p8)
    p9 = sess.run(f9,feed_dict={trainDataInput:trainData,testDataInput:testData[0:5],trainLabelInput:trainLabel})
    print('p9=',p9.shape)
    print('p9[]=',p9)
Copy the code

Results:

p7= (5, 4, 10) p7[]= [[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]] [[1. 0. 0. 0. 0, 0, 0, 0, 0, 0.] [1. 0. 0. 0. 0, 0, 0, 0, 0, 0.] [1. 0. 0. 0. 0, 0, 0, 0, 0, 0.] [1. 0. 0 0. 0, 0, 0, 0, 0, 0.]] [[0. 0. 0. 1. 0, 0, 0, 0, 0, 0.] [0. 0. 0. 1. 0, 0, 0, 0, 0, 0.] [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 1. 0. 0, 0, 0, 0, 0.]] [[1. 0. 0. 0. 0, 0, 0, 0, 0, 0.] [1. 0. 0. 0. 0, 0, 0, 0, 0, 0.] [1. 0. 0 0. 0. 0. 0. 0. 0. 0.] [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]] p8= (5, 10) p8[]= [[0. 0. 0. 0. 0. 0. 4. 0. 0. 0.] [4. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 4. 0. 0. 0. 0. 0. 0.] [4. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 3. 0. 0. 1.]] p9= (5,) p9[]= [6 0 3 0 6]Copy the code

(4) Test results

with tf.Session() as sess:
 p10 = np.argmax(testLabel[0:5],axis=1)
 # p9=p10, for correct
    print('p10[]=',p10)
j = 0
for i inRange (0, 5) :if p10[i] == p9[i]:
        j = j+1
print('ac=',j*100/5)
# accuracy
Copy the code

Results:

P10 []= [6 0 3 0 6] AC = 100.0Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Computer Vision — kNN Recognition of Handwritten Numbers (10)

MNIST handwritten digital introduction

1. Get samples

2. MNIST analysis

Two, kNN principle

1. Overview of kNN algorithm

2. KNN algorithm introduction

Iii. Illustration

1. Load MNIST data

2. Set properties

3. Data decomposition

4. Data training

(1) Set variables

(2) Calculate the distance of kNN

(3) Determine the frequency of K images in the type

(4) Test results

Computer Vision — kNN Recognition of Handwritten Numbers (10)

MNIST handwritten digital introduction

1. Get samples

2. MNIST analysis

Two, kNN principle

1. Overview of kNN algorithm

2. KNN algorithm introduction

Iii. Illustration

1. Load MNIST data

2. Set properties

3. Data decomposition

4. Data training

(1) Set variables

(2) Calculate the distance of kNN

(3) Determine the frequency of K images in the type

(4) Test results

Related Posts

Gpt-2 pretraining model and text generation

Attention is all you need. Start with the head

LeetCode 84 | drab stack biggest rectangle solution