[Machine learning] Iris analysis and training of Kaggle Iris dataset

The Iris data set is also the Hello world of machine learning. Here is how to analyze this data set in all aspects and mine hidden features. This article focuses on the analysis of features. The model directly calls SkLearn, which is relatively simple. Because the dataset is characterized, it is easy to get the accuracy close to 100%

I. Preparation: Introduce machine learning libraries

# Introduce machine learning libraries
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelBinarizer
from sklearn import svm
from sklearn import model_selection
from sklearn import metrics 
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
Ignore unnecessary errors
import warnings
warnings.filterwarnings("ignore")
Copy the code

2. Data visualization analysis

  • The amount of data in this question is small (150 groups), not many eigenvalues (4 categories), and the label is simple (3 categories)classification, so when doing data visualization analysis, there is no need to considerTitanic/Crime_predictionIn this way, the subtle connection between different features can be deeply explored by directly drawing the scatter diagram in the same coordinate system.
Load the IRIS dataset in the Sklearn library
from sklearn.datasets import load_iris
iris = load_iris()
print (iris.data)
Copy the code
[[5.1 3.5 1.4 0.2] [4.9, 1.4, 0.2] [4.7 3.2 1.3 0.2] [4.6 3.1 1.5 0.2] [5. 3.6 1.4 0.2] [5.4 3.9 1.7 0.4] [4.6 3.4 1.4 3.4 1.5 0.2 0.3] [5] [] 4.4 2.9 1.4 0.2 4.9 3.1 1.5 0.1 [] [] 5.4 3.7 1.5 0.2 1.6 0.2 [4.8 3.4] [4.8 3. 1.4 0.1] [4.3 3. 1.1 0.1] [5.8 4. 1.2 0.2] [5.7 4.4 1.5 0.4] [5.4 3.9 1.3 0.4] [5.1 3.5 1.4 0.3] [5.7 3.8 1.7 0.3] [5.1 3.8 1.5 0.3] [5.4 5.1 3.7 1.5 0.4 3.4 1.7 0.2] [] [4.6 3.6 1. 0.2] [5.1 3.3 1.7 0.5] [4.8 3.4 1.9 0.2] [5. 3. 1.6 to 0.2] [5. 3.4 1.6 0.4] 5.2 3.5 1.5 0.2 [] [] 5.2 3.4 1.4 0.2 4.7 3.2 1.6 0.2 [] [] 4.8 3.1 1.6 0.2 5.4 3.4 1.5 0.4 [] [] 5.2 4.1 1.5 0.1 [5.5 4.2 1.4 4.9 3.1 1.5 0.2 0.2] [] [5. 3.2 1.2 0.2] [5.5 3.5 1.3 0.2] [4.9 3.6 1.4 0.1] [4.4 3. 1.3 0.2] [5.1 3.4 1.5 0.2] [5. 3.5 4.5 2.3 1.3 0.3 1.3 0.3] [] [] 4.4 3.2 1.3 0.2 [5. 3.5 1.6 0.6] [5.1 3.8 1.9 0.4] [4.8 3. 1.4 0.3] [5.1 3.8 1.6 0.2] [4.6 5.3 3.7 1.5 0.2 3.2 1.4 0.2] [] [5. 3.3 1.4 0.2] [7. 3.2 4.7 1.4] [6.4 3.2 4.5 1.5] [6.9 3.1 4.9 1.5] [1.3] 5.5 2.3 4. 5.7 2.8 4.5 6.5 2.8 4.6 1.5 [] [] 1.3 [6.3 3.3 4.7 1.6] [4.9 2.4 3.3 1.] [6.6 2.9 4.6 1.3] [5.2 2.7 3.9 1.4] [5. 2. 3.5 5.9 3. 1.] [4.2 1.5] [6. 2.2 4. 1.] [6.1 2.9 4.7 1.4] [5.6 2.9 3.6 1.3] [6.7 3.1 4.4 1.4] [5.6 3. 4.5 1.5] [5.8 2.7 6.2 2.2 4.5 1.5 4.1 1.] [] [] 5.6 2.5 3.9 1.1 4.8 1.8 [5.9 3.2] [6.1 2.8 4. 1.3] [6.3 2.5 4.9 1.5] [6.1 2.8 4.7 1.2] [6.4 3. 6.6 2.9 4.3 1.3] [4.4 1.4] [6.8 2.8 4.8 1.4] [6.7 3. 5. 1.7] [6. 2.9 4.5 1.5] [5.7 2.6 3.5 1.] [5.5 2.4 3.8 1.1] [5.5 2.4 3.7 1.] [5.8 2.7 3.9 1.2] [6. 2.7 5.1 1.6] [5.4 3. 4.5 1.5] [6. 3.4 4.5 1.6] [6.7 3.1 4.7 1.5] [6.3 2.3 4.4 1.3] [5.6 3. 4.1 1.3] [5.5 2.5 4. 1.3] [5.5 2.6 4.4 1.2] [6.1 3. 4.6 1.4] [1.2] 5.8 2.6 4. [5. 2.3 3.3 1.] [5.6 2.7 4.2 1.3] [5.7 3. 4.2 1.2] [5.7 2.9 4.2 1.3] [6.2 2.9 4.3 1.3] [5.1 2.5 (3) 1.1] [5.7 2.8 4.1 1.3] [6.3, 3.3 6. 2.5] [5.8 2.7 5.1 1.9] [7.1 3. 5.9 2.1] [6.3 2.9 5.6 1.8] [6.5 3. 5.8 2.2] [7.6 3. 6.6 2.1] [4.9 2.5 4.5 1.7] [7.3 2.9 6.3 1.8] [6.7 7.2 3.6 6.1 2.5 2.5 5.8 1.8] [] [6.5 3.2 5.1 (2)] [6.4 2.7 5.3 1.9] [6.8, 5.5, 2.1] [5.7 2.5 5. 2.] [5.8 2.8 5.1 2.4] [6.4 3.2 5.3 2.3] [6.5, 5.5, 1.8] [] 7.7 3.8 6.7 2.2 7.7 2.6 6.9 2.3 [] [] 6. 5. 2.2 1.5 [6.9 3.2 5.7 2.3] [5.6 2.8 4.9 2. ] [7.7 2.8 6.7 (2)] [6.3 2.7 4.9 1.8] [6.7 3.3 5.7 2.1] [7.2, 3.2 6. 1.8] [6.2 2.8 4.8 1.8] [6.1 3. 4.9 1.8] [6.4 2.8 5.6 2.1] [7.2 3. 5.8 1.6] [7.4 2.8 6.1 1.9] [7.9 3.8 6.4 (2)] [6.4 2.8 5.6 2.2] [6.3 2.8 5.1 1.5] [6.1 2.6 5.6 1.4] [7.7 3. 6.1 2.3] [6.3 3.4 5.6 2.4] [6.4 3.1 5.5 1.8] [6. 3. 4.8 to 1.8] [6.9 3.1 5.4 2.1] [6.7 3.1 5.6 2.4] [6.9 3.1 5.1 2.3] 6.8 3.2 5.9 5.8 2.7 5.1 1.9 [] [] 2.3 [6.7 3.3 5.7 2.5] [6.7 3. 5.2 2.3] [6.3, 2.5 5. 1.9] [6.5 3. 5.2 2.] [6.2 3.4 5.4 3] [5.9 3.5.1 1.8]]Copy the code
There are 50 in each of the three types of iris, 150 in total
print (iris.target)
print (len(iris.target))
Each IRIS has 4 characteristic attributes
print (iris.data.shape)
Copy the code
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]
150
(150, 4)
Copy the code
  • Iris contains iris.data and iris.target, where iris.data is a 150×4 matrix storing four features (length and width values of petals and calyx), and iris.target is a 150×1 matrix storing labels of 150 flowers (categories: 0,1,2). Fifty flowers of each of the three species were distributed in the first 50, middle 50, and last 50 of the dataset.
# Obtain one or two series of flower feature data sets (calyx features)
DD = iris.data  
X = [x[0] for x in DD]  
print (X)  
Y = [x[1] for x in DD]  
print (Y)  
  
#plt.scatter(X, Y, c=iris.target, marker='x')
# Top 50 samples in Category 1
plt.scatter(X[:50], Y[:50], color='red', marker='o', label='setosa')
# 50 samples in the middle of the second category
plt.scatter(X[50:100], Y[50:100], color='blue', marker='x', label='versicolor') 
# The last 50 samples of the third category
plt.scatter(X[100:], Y[100:],color='green', marker='+', label='Virginica')
# legend
plt.legend(loc=2) # the upper left corner
plt.show()
Copy the code
[5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0, 7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 5.5, 5.5, 5.8, 6.0, 5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 5.5, 6.1, 5.8, 5.0, 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2, 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 6.7, 6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9]
[3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3.0, 3.4, 3.5, 3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.6, 3.0, 3.4, 3.5, 2.3, 3.2, 3.5, 3.8, 3.0, 3.8, 3.2, 3.7, 3.3, 3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0]
Copy the code

  • The problem of this group is that the selection of calyx width and length characteristics can not effectively separate the blue and green discrete points, so the correlation is low.
Get three or four columns of flower feature data set (petal feature)
DD = iris.data  
X = [x[2] for x in DD]  
print (X)  
Y = [x[3] for x in DD]  
print (Y)  
  
#plt.scatter(X, Y, c=iris.target, marker='x')
# Top 50 samples in Category 1
plt.scatter(X[:50], Y[:50], color='red', marker='o', label='setosa')
# 50 samples in the middle of the second category
plt.scatter(X[50:100], Y[50:100], color='blue', marker='x', label='versicolor') 
# The last 50 samples of the third category
plt.scatter(X[100:], Y[100:],color='green', marker='+', label='Virginica')
# legend
plt.legend(loc=2) # the upper left corner
plt.show()
Copy the code
[1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.9, 1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4, 4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1, 6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1]
[0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 0.4, 0.2, 0.5, 0.2, 0.2, 0.4, 0.2, 0.2, 0.2, 0.2, 0.4, 0.1, 0.2, 0.2, 0.2, 0.2, 0.1, 0.2, 0.2, 0.3, 0.3, 0.2, 0.6, 0.4, 0.3, 0.2, 0.2, 0.2, 0.2, 1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3, 2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8]

Copy the code

Obtain the flower feature data set of 14 columns
DD = iris.data  
X = [x[0] for x in DD]  
print (X)  
Y = [x[3] for x in DD]  
print (Y)  
  
#plt.scatter(X, Y, c=iris.target, marker='x')
# Top 50 samples in Category 1
plt.scatter(X[:50], Y[:50], color='red', marker='o', label='setosa')
# 50 samples in the middle of the second category
plt.scatter(X[50:100], Y[50:100], color='blue', marker='x', label='versicolor') 
# The last 50 samples of the third category
plt.scatter(X[100:], Y[100:],color='green', marker='+', label='Virginica')
# legend
plt.legend(loc=2) # the upper left corner
plt.show()

Copy the code
[5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0, 7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 5.5, 5.5, 5.8, 6.0, 5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 5.5, 6.1, 5.8, 5.0, 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2, 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 6.7, 6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9]
[0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 0.4, 0.2, 0.5, 0.2, 0.2, 0.4, 0.2, 0.2, 0.2, 0.2, 0.4, 0.1, 0.2, 0.2, 0.2, 0.2, 0.1, 0.2, 0.2, 0.3, 0.3, 0.2, 0.6, 0.4, 0.3, 0.2, 0.2, 0.2, 0.2, 1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3, 2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8]

Copy the code

Obtain the flower feature data set of 1, 3 columns
DD = iris.data  
X = [x[0] for x in DD]  
print (X)  
Y = [x[2] for x in DD]  
print (Y)  
  
#plt.scatter(X, Y, c=iris.target, marker='x')
# Top 50 samples in Category 1
plt.scatter(X[:50], Y[:50], color='red', marker='o', label='setosa')
# 50 samples in the middle of the second category
plt.scatter(X[50:100], Y[50:100], color='blue', marker='x', label='versicolor') 
# The last 50 samples of the third category
plt.scatter(X[100:], Y[100:],color='green', marker='+', label='Virginica')
# legend
plt.legend(loc=2) # the upper left corner
plt.show()

Copy the code
[5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0, 7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 5.5, 5.5, 5.8, 6.0, 5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 5.5, 6.1, 5.8, 5.0, 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2, 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 6.7, 6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9]
[1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.9, 1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4, 4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1, 6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1]

Copy the code

# Obtain two or three column characteristic data sets of flowers
DD = iris.data  
X = [x[1] for x in DD]  
print (X)  
Y = [x[2] for x in DD]  
print (Y)  
  
#plt.scatter(X, Y, c=iris.target, marker='x')
# Top 50 samples in Category 1
plt.scatter(X[:50], Y[:50], color='red', marker='o', label='setosa')
# 50 samples in the middle of the second category
plt.scatter(X[50:100], Y[50:100], color='blue', marker='x', label='versicolor') 
# The last 50 samples of the third category
plt.scatter(X[100:], Y[100:],color='green', marker='+', label='Virginica')
# legend
plt.legend(loc=2) # the upper left corner
plt.show()

Copy the code
[3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3.0, 3.4, 3.5, 3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.6, 3.0, 3.4, 3.5, 2.3, 3.2, 3.5, 3.8, 3.0, 3.8, 3.2, 3.7, 3.3, 3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0]
[1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.9, 1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4, 4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1, 6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1]

Copy the code

# Obtain flower twenty-four column feature data set
DD = iris.data  
X = [x[1] for x in DD]  
print (X)  
Y = [x[3] for x in DD]  
print (Y)  
  
#plt.scatter(X, Y, c=iris.target, marker='x')
# Top 50 samples in Category 1
plt.scatter(X[:50], Y[:50], color='red', marker='o', label='setosa')
# 50 samples in the middle of the second category
plt.scatter(X[50:100], Y[50:100], color='blue', marker='x', label='versicolor') 
# The last 50 samples of the third category
plt.scatter(X[100:], Y[100:],color='green', marker='+', label='Virginica')
# legend
plt.legend(loc=2) # the upper left corner
plt.show()

Copy the code
[3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3.0, 3.4, 3.5, 3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.6, 3.0, 3.4, 3.5, 2.3, 3.2, 3.5, 3.8, 3.0, 3.8, 3.2, 3.7, 3.3, 3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0]
[0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 0.4, 0.2, 0.5, 0.2, 0.2, 0.4, 0.2, 0.2, 0.2, 0.2, 0.4, 0.1, 0.2, 0.2, 0.2, 0.2, 0.1, 0.2, 0.2, 0.3, 0.3, 0.2, 0.6, 0.4, 0.3, 0.2, 0.2, 0.2, 0.2, 1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3, 2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8]

Copy the code

3. Logistic regression classification results

3.1 Selection of calyx characteristics for classification

Get the first two columns of flower data set
X = X = iris.data[:, 0:2]   
Y = iris.target           

# Logistic regression model
lr = LogisticRegression(C=1e5)  
lr.fit(X,Y)

The meshgrid function generates two grid matrices
h = . 02
x_min, x_max = X[:, 0].min() - . 5, X[:, 0].max() + . 5
y_min, y_max = X[:, 1].min() - . 5, X[:, 1].max() + . 5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# pcolorMesh function draws the two grid matrices xx and YY and the corresponding prediction result Z on the picture
Z = lr.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(8.6))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

# Draw a scatter plot
plt.scatter(X[:50.0], X[:50.1], color='red',marker='o', label='setosa')
plt.scatter(X[50:100.0], X[50:100.1], color='blue', marker='x', label='versicolor')
plt.scatter(X[100:,0], X[100:,1], color='green', marker='s', label='Virginica') 

plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.legend(loc=2) 
plt.show()

Copy the code

3.2 Selecting petal feature classification

Get the last two columns of the flower dataset
X = X = iris.data[:, 2:4]   
Y = iris.target           

# Logistic regression model
lr = LogisticRegression(C=1e5)  
lr.fit(X,Y)

The meshgrid function generates two grid matrices
h = . 02
x_min, x_max = X[:, 0].min() - . 5, X[:, 0].max() + . 5
y_min, y_max = X[:, 1].min() - . 5, X[:, 1].max() + . 5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# pcolorMesh function draws the two grid matrices xx and YY and the corresponding prediction result Z on the picture
Z = lr.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(8.6))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

# Draw a scatter plot
plt.scatter(X[:50.0], X[:50.1], color='red',marker='o', label='setosa')
plt.scatter(X[50:100.0], X[50:100.1], color='blue', marker='x', label='versicolor')
plt.scatter(X[100:,0], X[100:,1], color='green', marker='s', label='Virginica') 

plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.legend(loc=2) 
plt.show()

Copy the code

Iv. Data set cutting (Training set + Test Set)

x=iris.data
y=iris.target
x_train,x_test,y_train,y_test=model_selection.train_test_split(x,y,random_state=101,test_size=0.3)
print("split_train_data 70%:", x_train.shape, "split_train_target 70%:",y_train.shape, "split_test_data 30%", x_test.shape, "split_test_target 30%",y_test.shape)

Copy the code
split_train_data 70%: (105, 4) split_train_target 70%: (105,) split_test_data 30% (45, 4) split_test_target 30% (45,)

Copy the code

5. Model training and validation set verification

5.1 Logistic regression

# Logistic Regression
model = LogisticRegression()
model.fit(x_train, y_train)
prediction=model.predict(x_test)
print('The accuracy of the Logistic Regression is: {0}'.format(metrics.accuracy_score(prediction,y_test)))

Copy the code
The accuracy of The Logistic Regression is: 0.955555555556Copy the code

5.2 the decision tree

# DecisionTreeClassifier
model=DecisionTreeClassifier()
model.fit(x_train, y_train)
prediction=model.predict(x_test)
print('The accuracy of the DecisionTreeClassifier is: {0}'.format(metrics.accuracy_score(prediction,y_test)))

Copy the code
The accuracy of The DecisionTreeClassifier is: 0.955555555556Copy the code

5.3 K – nearby

# K-Nearest Neighbours
model=KNeighborsClassifier(n_neighbors=3)
model.fit(x_train, y_train)
prediction=model.predict(x_test)
print('The accuracy of the K-Nearest Neighbours is: {0}'.format(metrics.accuracy_score(prediction,y_test)))

Copy the code
The accuracy of the K-Nearest Neighbours is: 1.0

Copy the code

5.4 Support vector machines

# Support Vector Machine
model = svm.SVC()
model.fit(x_train, y_train)
prediction=model.predict(x_test)
print('The accuracy of the SVM is: {0}'.format(metrics.accuracy_score(prediction,y_test)))

Copy the code
The accuracy of the SVM is: 1.0

Copy the code

Neural networks

x=iris.data
y=iris.target
np.random.seed(seed=7)
y_Label=LabelBinarizer().fit_transform(y)
x_train,y_train,x_test,y_test=model_selection.train_test_split(x,y_Label,test_size=0.3,random_state=42)

Copy the code
from keras.models import Sequential
from keras.layers.core import Dense
model = Sequential() # Build a model

model.add(Dense(4,activation='relu',input_shape=(4,)))
model.add(Dense(6,activation='relu'))
model.add(Dense(3,activation='softmax'))

model.summary()

Copy the code
Model: "sequential_23"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_68 (Dense)             (None, 4)                 20        
_________________________________________________________________
dense_69 (Dense)             (None, 6)                 30        
_________________________________________________________________
dense_70 (Dense)             (None, 3)                 21        
=================================================================
Total params: 71
Trainable params: 71
Non-trainable params: 0
_________________________________________________________________

Copy the code
model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])

step=25
history=model.fit(x_train,x_test,validation_data=(y_train,y_test),batch_size=10,epochs=step)
train_result=history.history

Copy the code
Train on 105 samples, validate on 45 samples Epoch 1/25 105/105 [==============================] - 0s 2ms/step - loss: 0.1160-accuracy: 0.9429-val_loss: 0.0453-val_accuracy: 1.0000 Epoch 2/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1121 accuracy: 0.9524-val_loss: 0.0486-val_accuracy: 1.0000 Epoch 3/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 114 us/step - loss: 0.1116 accuracy: 0.9429-val_loss: 0.0496-val_accuracy: 1.0000 Epoch 4/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 114 us/step - loss: 0.1129 accuracy: 0.9524-val_loss: 0.0479-val_accuracy: 1.0000 Epoch 5/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1136 accuracy: 0.9524 - val_loss: 0.0483-val_accuracy: 1.0000 Epoch 6/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 114 us/step - loss: 0.1112 accuracy: 0.9524-val_loss: 0.0518-val_accuracy: 1.0000 Epoch 7/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 114 us/step - loss: 0.1110 accuracy: 0.9429-VAL_loss: 0.0505-val_accuracy: 1.0000 Epoch 8/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1099 accuracy: 0.9524-val_loss: 0.0467-val_accuracy: 1.0000 Epoch 9/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 114 us/step - loss: 0.1109 accuracy: 0.9524 - val_loss: 0.0554-val_accuracy: 0.9778 Epoch 10/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1097 accuracy: 0.9524 - val_loss: 0.0535-val_accuracy: 1.0000 Epoch 11/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 114 us/step - loss: 0.1091 accuracy: 0.9524-val_loss: 0.0452-val_accuracy: 1.0000 Epoch 12/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1040 accuracy: 0.9619-VAL_loss: 0.0742-val_accuracy: 0.9778 Epoch 13/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1124 accuracy: 0.9524 - val_loss: 0.0670 - val_accuracy: 0.9778 Epoch 14/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - ETA: 0 s - loss: 0.0918-accuracy: 0.90-0S 133us/ step-loss: 0.1110-accuracy: 0.9429-val_loss: 0.0746-val_accuracy: 0.900-0S 133US/step-loss: 0.1110-accuracy: 0.9429-val_loss: 0.0746-val_accuracy: 0.9778 Epoch 15/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 114 us/step - loss: 0.1105 accuracy: 0.9429-val_loss: 0.0545-val_accuracy: 0.9778 Epoch 16/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1077 accuracy: 0.9524-val_loss: 0.0519-val_accuracy: 1.0000 Epoch 17/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 114 us/step - loss: 0.1113 accuracy: 0.9524 - val_loss: 0.0562-val_accuracy: 0.9778 Epoch 18/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1085 accuracy: 0.9524 - val_loss: 0.0491-val_accuracy: 1.0000 Epoch 19/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1101 accuracy: 0.9524 - val_loss: 0.0548-val_accuracy: 0.9778 Epoch 20/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 114 us/step - loss: 0.1060 accuracy: 0.9429-val_loss: 0.0451-val_accuracy: 1.0000 Epoch 21/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1128 accuracy: 0.9524 - val_loss: 0.0450-val_accuracy: 1.0000 Epoch 22/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1086 accuracy: 0.9524 - val_loss: 0.0534-val_accuracy: 0.9778 Epoch 23/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1065 accuracy: 0.9524-val_loss: 0.0455-val_accuracy: 1.0000 Epoch 24/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 114 us/step - loss: 0.1062 accuracy: 0.9524 - val_loss: 0.0434-val_accuracy: 1.0000 Epoch 25/25 105/105 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 123 us/step - loss: 0.1081 accuracy: 0.9524-VAL_loss: 0.0423-val_accuracy: 1.0000Copy the code
  • The fully connected neural network model based on KerAS can achieve 100% accuracy with only a few training times, making it a good choice for training IRIS data sets.
Read accuracy from the final training model
acc=train_result['accuracy']
val_acc=train_result['val_accuracy']
epochs=range(1,step+1)
plt.plot(epochs,acc,'b-')
plt.plot(epochs,val_acc,'r')
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.show()

t=model.predict(y_train)
resultsss=model.evaluate(y_train,y_test)
resultsss

Copy the code

45/45 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 44 us/step [0.5691331187884013, 0.9111111164093018]Copy the code

7. Separate petal and calyx features + label training

7.1 Selection of calyx characteristic training

# Obtain one or two column characteristic data sets of flowers
DD = iris.data  
x=DD[ :,0:2]
print(x)
y=iris.target

Copy the code
[[5.1 3.5] [4.9 3.] [4.7 3.2] [4.6 3.1] [5. 3.6] [5.4 3.9] [4.6 3.4] [5. 3.4] [4.4 2.9] [4.9 3.1] [5.4 3.7] [4.8 3.4] 4.8 3. [] [4.3 3.] [5.8 4.] [5.7 4.4] [5.4 3.9] [5.1 3.5] [5.7 3.8] [5.1 3.8] [5.4 3.4] [5.1 3.7] [4.6 3.6] [5.1 3.3] [4.8 3.4] [5. 3.] [5. 3.4] [5.2 3.5] [5.2 3.4] [4.7 3.2] [4.8 3.1] [5.4 3.4] [5.2 4.1] [5.5 4.2] [4.9 3.1] [3.2] 5. 4.4 [5.5 3.5] [4.9 3.6] [3] [5.1 3.4] [5. 3.5] [4.5 2.3] [4.4 3.2] [5. 3.5] [5.1 3.8] [4.8 3.] [5.1 3.8] [4.6 3.2] 3.3 [5.3 3.7] [5] [7] 3.2 [6.4 3.2] [6.9 3.1] [5.5 2.3] [6.5 2.8] [5.7 2.8] [6.3 3.3] [4.9 2.4] [6.6 2.9] [5.2 2.7] 5.9 3. [5. 2.] [] [6] 2.2 [6.1 2.9] [5.6 2.9] [6.7 3.1] [5.6 3.] [5.8 2.7] [6.2 2.2] [5.6 2.5] [5.9 3.2] [6.1 2.8] [6.3 2.5] [6.1 2.8] [6.4 2.9] [6.6 3.] [6.8 2.8] [6.7 3.] [6] 2.9 [5.7 2.6] [5.5 2.4] [5.5 2.4] [5.8 2.7] [2.7] 6. 5.4 3. [] [6] 3.4 [6.7 3.1] [6.3 2.3] [5.6 3.] [5.5 2.5] [5.5 2.6] [6.1 3.] [5.8 2.6] [5. 2.3] [5.6 2.7] [5.7 3.] [5.7 2.9] [6.2 2.9] [5.1 2.5] [5.7 2.8] [6.3 3.3] [5.8 2.7] [7.1 3.] [6.3 2.9] [6.5 3.] [7.6 3.] [4.9 2.5] [7.3 2.9] [6.7 2.5] [7.2 3.6] [6.5 3.2] [6.4 2.7] [6.8 3.] [5.7 2.5] [5.8 2.8] [6.4 3.2] [6.5 3.] [7.7 3.8] [7.7 2.6] [2.2] 6. [6.9 3.2] [5.6 2.8] [7.7 2.8] [6.3 2.7] [6.7 3.3] [7.2 3.2] [6.2 2.8] [6.1 3.] [6.4 2.8] [7.2 3.] [7.4 2.8] [7.9 3.8] [6.4 2.8] [6.3 2.8] [6.1 2.6] [7.7 3.] [6.3 3.4] [6.4 3.1] [6. 3.] [6.9 3.1] [6.7 3.1] [6.9 3.1] [5.8 2.7] [6.8 3.2] 6.7 [6.7 3.3] [3] [6.3 2.5] [6.5 3.] [6.2 3.4] [5.9 3.]]Copy the code
x_train,x_test,y_train,y_test=model_selection.train_test_split(x,y,random_state=101,test_size=0.3)
print("split_train_data 70%:", x_train.shape, "split_train_target 70%:",y_train.shape, "split_test_data 30%", x_test.shape, "split_test_target 30%",y_test.shape)

Copy the code
split_train_data 70%: (105, 2) split_train_target 70%: (105,) split_test_data 30% (45, 2) split_test_target 30% (45,)

Copy the code
# K-Nearest Neighbours
model=KNeighborsClassifier(n_neighbors=3)
model.fit(x_train, y_train)
prediction=model.predict(x_test)
print('The accuracy of the K-Nearest Neighbours is: {0}'.format(metrics.accuracy_score(prediction,y_test)))

Copy the code
The accuracy of The K-nearest Neighbours is: 0.644444444444444445Copy the code

7.1 Selecting petal features for training

DD = iris.data  
x=DD[ :,2:4]
print(x)

Copy the code
[[1.4 0.2] [1.4 0.2] [1.3 0.2] [1.5 0.2] [1.4 0.2] [1.7 0.4] [1.4 0.3] [1.5 0.2] [1.4 0.2] [1.5 0.1] [1.5 0.2] [1.6 0.2] [1.4 0.1] [1.1 0.1] [1.2 0.2] [1.5 0.4] [1.3 0.4] [1.4 0.3] [1.7 0.3] [1.5 0.3] [1.7 0.2] [1.5 0.4] [1] 0.2 [1.7 0.5] [1.9 0.2] [1.6 0.2] [1.6 0.4] [1.5 0.2] [1.4 0.2] [1.6 0.2] [1.6 0.2] [1.5 0.4] [1.5 0.1] [1.4 0.2] [1.5 0.2] [1.2 0.2] [1.3 0.2] [1.4 0.1] [1.3 0.2] [1.5 0.2] [1.3 0.3] [1.3 0.3] [1.3 0.2] [1.6 0.6] [1.9 0.4] [1.4 0.3] [1.6 0.2] [1.4 0.2] [1.5 0.2] [1.4 0.2] [4.7 1.4] [4.5 1.5] [4.9 1.5] [4] 1.3 [4.6 1.5] [4.5 1.3] [4.7 1.6] [3.3 1.] [4.6 1.3] [3.9 1.4] [3.5 1.] [4.2 1.5] [4. 1.] [4.7 1.4] [3.6 1.3] [4.4 1.4] [4.5 1.5] [4.1 1.] [4.5 1.5] [3.9 1.1] [4.8 1.8] [1.3] 4. [4.9 1.5] [4.7 1.2] [4.3 1.3] [4.4 1.4] [4.8 1.4] [5. 1.7] [4.5 1.5] [3.5 1.] [3.8 1.1] [3.7 1.] [3.9 1.2] [5.1 1.6] [4.5 1.5] [4.5 1.6] [4.7 1.5] [4.4 1.3] [4.1 1.3] [4] 1.3 [4.4 1.2] [4.6 1.4] [4] 1.2 [3.3 1.] [4.2 1.3] [4.2 1.2] [4.2 1.3] [4.3 1.3] [3] 1.1 [4.1 1.3] [6] 2.5 [5.1 1.9] [5.9 2.1] [5.6 1.8] [5.8 2.2] [6.6 2.1] [4.5 1.7] [6.3 1.8] 5.1 [5.8 1.8] [6.1 2.5] [2] [5.3 1.9] [5.5 2.1] [5. 2.] [5.1 2.4] [5.3 2.3] [5.5 1.8] [6.7 2.2] [6.9 2.3] [1.5] 5. 4.9 [5.7 2.3] [2] [6.7 2.] [4.9 1.8] [5.7 2.1] [6] 1.8 [4.8 1.8] [4.9 1.8] [5.6 2.1] [5.8 1.6] [6.1 1.9] [6.4 2.] [5.6 2.2] [5.1 1.5] [5.6 1.4] [6.1 2.3] [5.6 2.4] [5.5 1.8] [4.8 1.8] [5.4 2.1] [5.6 2.4] [5.1 2.3] [5.1 1.9] [5.9 2.3] [5.7 2.5] [5.2 2.3] [5. 1.9] [5.2 2.] [5.4 2.3] [5.1 1.8]]Copy the code
x_train,x_test,y_train,y_test=model_selection.train_test_split(x,y,random_state=101,test_size=0.3)
print("split_train_data 70%:", x_train.shape, "split_train_target 70%:",y_train.shape, "split_test_data 30%", x_test.shape, "split_test_target 30%",y_test.shape)

Copy the code
split_train_data 70%: (105, 2) split_train_target 70%: (105,) split_test_data 30% (45, 2) split_test_target 30% (45,)

Copy the code
# K-Nearest Neighbours
model=KNeighborsClassifier(n_neighbors=3)
model.fit(x_train, y_train)
prediction=model.predict(x_test)
print('The accuracy of the K-Nearest Neighbours is: {0}'.format(metrics.accuracy_score(prediction,y_test)))

Copy the code
The accuracy of The K-nearest Neighbours is: 0.97777777777777Copy the code
  • It can be seen that KNN model with high accuracy is also selected, and calyx features and petal features are selected for training respectively, resulting in different prediction accuracy. The correlation degree of calyx features is low, while the correlation degree of petal features is high, and the petal training result is optimistic, while the accuracy of training with four features together can reach 100%.