“This is the first day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021”

KNN algorithm

(Actual machine learning) K-nearest neighbor algorithm uses the method of measuring the distance between different eigenvalues to classify. How it works: There is a sample data set, namely the training set, and each data in the training set has its corresponding classification label. After inputting the new data without labels, each feature of the new data is compared with the corresponding feature of the data in the sample set, and then the algorithm extracts the classification labels of the data with the most similar features in the sample set. That is, look at the k tags closest to each other, by majority vote. Here, k of k-NN is the most similar data of the first K in the selected sample data set, and k is usually set as an integer not greater than 20

In actual combat

Helen has been using online dating sites to find a match for her. Although dating sites suggest different candidates, she doesn’t like everyone. After summarizing, she found that she had been with three types of people:

The doses are small, and those who do not like them didntLike smallDoses Extremely attractive largeDoses

Helen has been collecting dating data for some time, storing it in a text file called datingtest.txt, with each sample taking up one line, 1,000 lines in total. Helen’s sample mainly contains the following three characteristics:

Frequent flyer miles earned per year percentage of time spent playing video games Litres of ice cream consumed per week

Data format as shown in figure:! [FQ) W) EEDN2IGI38X] _ $98 R5. PNG] (p3-juejin.byteimg.com/tos-cn-i-k3… ?). Use KNN algorithm to help Helen classify her date: use KNN algorithm in sklearn library to solve. Step 1: Extract data

import numpy as np
from numpy import zeros
from sklearn.neighbors import KNeighborsClassifier, RadiusNeighborsClassifier
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.metrics import accuracy_score
data = pd.read_csv('datingTest.txt', sep='\t', header=None)
X = data.iloc[:, 0:2]
Y = data.iloc[:, -1]
Copy the code

Step 2: Use train_test_split to extract data with a ratio of 0.2 for testing

X_test, X_test, Y_test = train_test_split(X, Y, test_size=0.2)Copy the code

Step 3: Store labels

label=[]
for line in Y:
    label.append(line)
Copy the code

The last step: train the training set, predict the test set and get the prediction accuracy

datalist = np.array(data)
mmat(datalist,label)
model=(KNeighborsClassifier(n_neighbors=7))
model.fit(X_train,Y_train)
predictions=model.predict(X_train)
print(predictions)
print(accuracy_score(Y_train, predictions))
Copy the code