“This is the 12th day of my participation in the Gwen Challenge in November. Check out the details: [Last Gwen Challenge 2021]

Overview of support vector machines

Support Vector Machine (SVM) is a kind of generalized linear classifier that classifies data by supervised learning method. The decision boundary is the maximum-margin hyperplane solved for the learning sample. Compared with logistic regression and neural networks, support vector machines provide a clearer and more powerful way to learn complex nonlinear equations.

Soft spacer, hard spacer, and nonlinear SVM

If the data is completely linearly separable, the learned model can be called a hard interval support vector machine. In other words, hard interval refers to complete classification accuracy, there can be no classification error. Soft interval, which allows a certain number of samples to be misclassified.

Support vector machine algorithm idea

Find some data on the edge of the set (called the Support Vector), and use these points to find a plane (called the decision plane), which maximizes the distance between the Support Vector and the plane.

Hyperplane background knowledge:

Linearly separable vector machines

Background knowledge

Linear separable vector machine

Visualize the data using a linearly separable algorithm. Using SKLearn library SVM algorithm to achieve. The data set is as follows:

#coding=gbk import numpy as np import pylab as pl from sklearn import svm def loadDataSet(fileName): dataMat = []; labelMat = [] fr = open(fileName) for line in fr.readlines(): # read line by line, LineArr = linear.strip ().split('\t') datamat.append ([float(lineArr[0])), Float (lineArr[1])]) # Add the tag LabelMat. append(float(lineArr[2])) # Add tag return dataMat,labelMat X,Y = loadDataSet('datasets_testSet.txt') #fit the model clf = svm.SVC(kernel='linear') clf.fit(X, Y) # get the separating hyperplane w = clf.coef_[0] a = -w[0]/w[1] xx = np.linspace(-5, 5) yy = a*xx - (clf.intercept_[0])/w[1] # plot the parallels to the separating hyperplane that pass through the support vectors b = clf.support_vectors_[0] yy_down = a*xx + (b[1] - a*b[0]) b = clf.support_vectors_[-1] yy_up = a*xx + (b[1] -  a*b[0]) print ("w: ", w) print ("a: ", a) # print "xx: ", xx # print "yy: ", yy print ("support_vectors_: ", clf.support_vectors_) print ("clf.coef_: ", clf.coef_) # switching to the generic n-dimensional parameterization of the hyperplan to the 2D-specific equation # of a  line y=a.x +b: the generic w_0x + w_1y +w_3=0 can be rewritten y = -(w_0/w_1) x + (w_3/w_1) # plot the line, the points, and the nearest vectors to the plane pl.plot(xx, yy, 'k-') pl.plot(xx, yy_down, 'k--') pl.plot(xx, yy_up, 'k--') pl.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=80, facecolors='none') pl.scatter([x[0] for x in X], [x[1] for x in X], c=Y, cmap=pl.cm.Paired) pl.axis('tight') pl.show()Copy the code

The results are as follows: