In class, the teacher asked me to do homework, kmeans classification, iris clustering classification, so I did this homework. Very simple, I set it into three categories. The key point is that I tried to choose the initial center by random selection and by big moves. The effect of random selection of initial points is not as good as that of magnification moves. The trick is to select a random point as Center1, a point farthest from center1 as Center2, and a point farthest from center1 and center2 as Center3.

When updating the center of mass in the training process, I did not choose the online mode to update the center of mass one point at a time. Instead, I chose the mini_batch mode. I first sent in mini_batch samples, which were assigned to the corresponding center respectively, and then updated. So much bullshit, why not put the code up here, as follows:

Import numpy as np import CSV import random features = np.loadtxt('iris.csv',delimiter=',',usecols=(1,2,3,4)) #read features z_min, z_max = features.min(axis=0), features.max(axis=0) #features normalized features = (features - z_min)/(z_max - z_min) csv_file = open('iris.csv') #transform string to num label csv_reader_lines = csv.reader(csv_file) classes_list = [] for i in csv_reader_lines: classes_list.append(i[-1]) labels = [] for i in classes_list: if i=='setosa': labels.append(0) elif i=='versicolor': labels.append(1) else: 0 0 array(150,1) # transformm list to numpy type data_index = 0  np.arange(features.shape[0]) np.random.shuffle(data_index) train_input = features[ data_index[0:120] ] train_label = labels[ data_index[0:120] ] test_input = features[ data_index[120:150] ] test_label = labels[ data_index[120:150] ] train_length = 120 K = 3 center_1_pos = random.randint(0,train_length) center1 = train_input[ center_1_pos ] # center1 =  train_input[0] # center2 = train_input[1] # center3 = train_input[2] # print(center1) # print(center2) # print(center3) Biggest_distance = 0.0 center_2_pos = 0 for I in range(train_length) train_input[i]),2 )) if dist > biggest_distance: Biggest_distance = dist Center_2_pos = I center2 = train_input[center_2_pos] Biggest_distance = 0.0 center_3_pos = 0 for I in range(train_length):# select center3 dist = np.sum(pow((train_input[I]), 2 )) + np.sum(pow( (center2 - train_input[i]) , 2)) if dist > biggest_distance: biggest_distance = dist center_3_pos = i center3 = train_input[center_3_pos] mini_batch = 20 for epoch in Range (10):# Train 10 times on the whole data set for I in range(6): belong1 = [] belong2 = [] belong3 = [] for j in range(mini_batch):#mini_batch temp_index = mini_batch * i + j belong = 1  dist_1 = np.sum(pow( ( center1 - train_input[mini_batch*i+j] ),2 )) temp_dist = dist_1 dist_2 = np.sum(pow((center2 - train_input[mini_batch * i + j]), 2)) dist_3 = np.sum(pow((center3 - train_input[mini_batch * i + j]), 2)) if(dist_2 < temp_dist): temp_dist = dist_2 belong = 2 if(dist_3 < temp_dist): belong = 3 if belong==1: belong1.append( temp_index ) elif belong == 2: belong2.append(temp_index) else: belong3.append(temp_index) for k in belong1: center1 = center1 + train_input[k] center1 = center1 / (1 + len(belong1)) for k in belong2: center2 = center2 + train_input[k] center2 = center2 / (1 + len(belong2)) for k in belong3: center3 = center3 + train_input[k] center3 = center3 / (1 + len(belong3)) b_1=[] b_2=[] b_3=[] for l in Distance = 1 dist_1 = np.sum(pow((test_input[l])), 2)) temp_dist = dist_1 dist_2 = np.sum(pow((center2 - test_input[ l ]), 2)) dist_3 = np.sum(pow((center3 - test_input[ l ]), 2)) if (dist_2 < temp_dist): temp_dist = dist_2 belong = 2 if (dist_3 < temp_dist): belong = 3 if belong == 1: b_1.append(test_label[l][0]) elif belong == 2: b_2.append(test_label[l][0]) else: b_3.append(test_label[l][0]) print() print('epoch : {} / 10' .format(epoch+1)) print('center1: ',b_1) print('center2',b_2) print('center3: ',b_3)Copy the code

Here’s what happens when I run the program:

epoch : 1 / 10
center1:  [2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1]
center2 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
center3:  [2, 2, 2, 2, 2]

epoch : 2 / 10
center1:  [2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1]
center2 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
center3:  [2, 2, 2, 2, 2]

epoch : 3 / 10
center1:  [2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1]
center2 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
center3:  [2, 2, 2, 2, 2]

epoch : 4 / 10
center1:  [2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1]
center2 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
center3:  [2, 2, 2, 2, 2]

epoch : 5 / 10
center1:  [2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1]
center2 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
center3:  [2, 2, 2, 2, 2]

epoch : 6 / 10
center1:  [2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1]
center2 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
center3:  [2, 2, 2, 2, 2]

epoch : 7 / 10
center1:  [2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1]
center2 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
center3:  [2, 2, 2, 2, 2]
Copy the code

It turns out pretty good on a small data set.

Note: WHEN I read iris data, I deleted the attribute name in the first line, otherwise it would be troublesome to process.

We learned how to normalize Nummpy data, how to read NUMpy from CSV files, and how to convert character types in CSV files to numeric tags.