A list,

K-nearest Neighbor (KNN) classification algorithm is a relatively mature method in theory, and also one of the simplest machine learning algorithms. The idea of this method is: if a sample belongs to a certain category among the k most similar (that is, the closest neighbor) samples in the feature space, then the sample also belongs to this category.

1 definition

A sample belongs to a category if most of the k most similar (i.e. closest neighbors in the feature space) samples in the feature space belong to that category, that is, your category is inferred from your “neighbors”.

2 Distance formula

The distance between two samples can be calculated by the following formula, also called Euclidean distance

3 steps of KNN algorithm (1) Calculate the distance between each point and the current point in the known category data set; (2) Select K points with the smallest distance from the current point; (3) The frequency of each category of samples in the first K points was counted; (4) Return the category with the highest occurrence frequency of the first K points as the prediction classification of the current point.

4 KNN principle







5 Advantages and disadvantages of KNN



6 KNN performance problems

NN performance issues are also one of KNN’s drawbacks. Using KNN, the model can be easily constructed, but when classifying the classified samples, in order to obtain the K-nearest neighbor, the violent search method must be adopted to scan all the training samples and calculate the distance between them and the samples to be classified, which costs a lot of system overhead.

Ii. Source code

clc clear all; close all; %% load the feature vector matrix of each emotion; load F_happiness.mat; load N_neutral.mat; load T_sadness.mat; load W_anger.mat; NumberOfTrain=size(fearVec,2)/2; % for half of the tests, Half training trainVector=[fearVec(:,1:NumberOfTrain),hapVec(:,1:NumberOfTrain),neutralVec(:,1:NumberOfTrain),sadnessVec(:,1:NumberOfT rain),angerVec(:,1:NumberOfTrain)]; % Build the training sample set testVector=[fearVec(:,(NumberOfTrain+1):size(fearVec,2)),hapVec(:,(NumberOfTrain+1):size(hapVec,2)),neutralVec(:,(Number OfTrain+1):size(neutralVec,2)),sadnessVec(:,(NumberOfTrain+1):size(sadnessVec,2)),angerVec(:,(NumberOfTrain+1):size(ange rVec,2))]; % Build test sample set k=9; %k nearest neighbor distanceMatrix=zeros(size(trainVector,2),size(testVector,2)); %% Calculate the distance between each test sample and each sample in the training sample set for I =1:size(testVector,2) for j=1:size(trainVector,2) distanceMatrix(j,i)=norm(testVector(:,i)-trainVector(:,j)); TotalTestNumber =size(fearVec,2) -Numberoftrain;  EmtionCounter = zeros (1, 5); n1=NumberOfTrain; n2=n1+NumberOfTrain; n3=n2+NumberOfTrain; n4=n3+NumberOfTrain; n5=n4+NumberOfTrain; p1=size(fearVec,2)-NumberOfTrain; p2=p1+size(hapVec,2)-NumberOfTrain; p3=p2+size(neutralVec,2)-NumberOfTrain; p4=p3+size(sadnessVec,2)-NumberOfTrain; p5=p4+size(angerVec,2)-NumberOfTrain; if(n5~=size(trainVector,2)||p5~=size(testVector,2)) disp('data error') return; End for I = 1: size (distanceMatrix, 2) flag = zeros (1, 5); [sortVec,index]=sort(distanceMatrix(:,i)); For j=1: K if(n1>=index(j)&&index(j)>=1) flag(1)=flag(1)+1; elseif(n2>=index(j)&&index(j)>n1) flag(2)=flag(2)+1; elseif(n3>=index(j)&&index(j)>n2) flag(3)=flag(3)+1; elseif(n4>=index(j)&&index(j)>n3) flag(4)=flag(4)+1; else flag(5)=flag(5)+1; end end [~,index1]=sort(flag); % If the category with the largest number of K nearest neighbors is consistent with the actual category of the sample, the algorithm is considered to recognize correctly, and counter is added by one accordingly. if((p1>=i&&i>=1)&&index1(5)==1) emtionCounter(index1(5))=emtionCounter(index1(5))+1; elseif((p2>=i&&i>p1)&&index1(5)==2) emtionCounter(index1(5))=emtionCounter(index1(5))+1; elseif((p3>=i&&i>p2)&&index1(5)==3) emtionCounter(index1(5))=emtionCounter(index1(5))+1; elseif((p4>=i&&i>p3)&&index1(5)==4) emtionCounter(index1(5))=emtionCounter(index1(5))+1; elseif((p5>=i&&i>p4)&&index1(5)==5) emtionCounter(index1(5))=emtionCounter(index1(5))+1; end end function feature=featvector(filename) [y,fs]=wavread(filename); L=length(y); ys=y; For I = 1: length (y) - 1) if (abs (y (I)) < 1-3) e % out smaller values, to calculate the short-time energy use % ys (I) = ys (I + 1); L=L-1; end end y1=ys(1:L); s=enframe(y,hamming(256),128); % s1=enframe(y1,hamming(256),128); [nframe,framesize]=size(s); [nframe1,framesize1]=size(s1); E=zeros(1,nframe1); Z=zeros(1,nframe); F=zeros(1,nframe); for i=1:nframe Z(i)=sum(abs(sign(s(i,framesize:2)-s(i,framesize-1:1))))/2; % % the zero rate end for I = 1: nframe1 E (I) = sum (s1 (I, :). * s1 (I, :)); % end s1=x2-x1; s2=x3-x4; E_Reg_coff=s1/s2; x=0; for i=1:nframe1 t=E(i)-(mean(E)-s1/s2*x4/nframe1)-s1/s2*i; x=x+t^2/nframe1; end E_Sqr_Err=x; Feature (1:7, 1) = [Max (E), min (E), mean (E) and var (E); E_shimmer; E_Reg_coff; E_Sqr_Err]; Feature (8,1)=Eratio; endCopy the code

Third, the operation result