• Implementing SVM and Kernel SVM with Python’s Scikit-learn
  • Usman Malik
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: rockyzhengwu
  • Proofreader: Zhusimaji, TrWestdoor

Support vector machine (SVM) is a supervised learning classification algorithm. Support vector machines were proposed in the 1960s and further developed in the 1990s. However, it has only recently become particularly popular because of its ability to achieve good results. Compared with other machine learning algorithms, SVM has its unique features.

This article begins with a brief introduction to the theory behind support vector machines and how to implement them using the SciKit-learn library in Python. Then we will Learn advanced SVM theories such as Kernel SVM, and also use Scikit-learn to practice.

Simple SVM

Considering two-dimensional linear factorable data, as shown in Figure 1, a typical machine learning algorithm hopes to find a classification boundary that minimizes classification errors. If you look closely at Figure 1, you will see that there is no unique boundary that can categorize data points correctly. Two dotted lines and one solid line can correctly classify all points.

Figure 1: Multiple decision boundaries

SVM determines the boundary by maximizing the minimum distance between the data points in all classes and the decision boundary, which is the main difference between SVM and other algorithms. SVM doesn’t just find a decision boundary; It can find the optimal decision boundary.

The boundary that maximizes the minimum distance between all classes and the decision boundary is the optimal decision boundary. As shown in Figure 2, those points closest to the decision boundary are called support vectors. Decision boundaries in support vector machines are called maximum interval classifiers, or maximum interval hyperplanes.

Figure 2: Support vectors for decision boundaries

Finding the support vector, calculating the distance between the decision boundary and the support vector, and maximizing the distance involve complex mathematical knowledge. This tutorial is not intended to go into the details of mathematics, we will only see how to implement SVM and kernel-SVM using Python’s Scikit-learn library.

SVM is realized by Scikit-learn

We will use the same data as in the decision tree tutorial.

Our task is to determine whether the paper money is real or not by four attributes: the skewness of the wavelet transform image, the variance of the image, the entropy of the image and the curvature of the image. We will use SVM to solve this dichotomy problem. The rest is a standard machine learning process.

Import libraries

The following code imports all the required libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Copy the code

Import data

The data can be downloaded from the following link:

Drive.google.com/file/d/13nw…

Details of the data can be found in the following links:

Archive.ics.uci.edu/ml/datasets…

Download the data from the Google Drive link and save it locally. The data set in this example is saved in a CSV file in the “Datasets” folder on drive D of my Windows computer. The following code reads the data from the file path. You can modify the file according to its path on your own computer.

The easiest way to read a CSV file is to use the read_CSV method in pandas. The following code reads the bank note data into pandas’ dataframe:

bankdata = pd.read_csv("D:/Datasets/bill_authentication.csv")
Copy the code

Exploratory data analysis

Almost all data analysis can be done using Python’s various libraries. For simplicity, let’s just check the dimension of the data and look at the first few records. To check the number of rows and columns of the data, execute the following statement:

bankdata.shape
Copy the code

You’ll see that the output is (1372, 5). This means that the dataset has 1372 rows and 5 columns.

To get an idea of what the data looks like, run the following command:

bankdata.head()
Copy the code

The output is as follows:

You can see that all the properties are numeric. Category tags are also numeric, that is, 0 and 1.

Data preprocessing

Data preprocessing includes (1) separating attribute and class table tags and (2) dividing training data sets and test data sets.

To separate the attribute from the category tag, execute the following code:

X = bankdata.drop('Class', axis=1)
y = bankdata['Class']
Copy the code

The first line of the code above removes the Class label column “Class” from the BankData Dataframe and assigns the result to variable X. The drop() function drops the specified column.

The second line stores only the category column in the variable Y. Now the variable X contains all the attributes and the variable Y contains the corresponding category tag.

Now that the data set has separated the attribute and category tags, the final preprocessing step is to separate the training and test sets. Fortunately, the model_selection module in Scikit-Learn provides a function, train_test_split, that allows us to gracefully split data into training and testing parts.

Execute the following code to complete the partition:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
Copy the code

Algorithm training

We have divided the data into training sets and test sets. Now we use the training set to train. The SVM module in the SciKit-Learn library implements various SVM algorithms. Since we have a classification task to complete, we will use the support vector classifier implemented by the SVC class under the SVM module in Scikit-Learn. This class takes an argument specifying the type of the kernel function. This parameter is very important. Here consider the simplest SVM and set the type parameter as Linear linear support vector machines only apply to linearly fractionable data. We introduce nonlinear kernel in the next section.

The training data is passed to the SVC class FIT method to train the algorithm. Execute the following code to complete the algorithm training:

from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)
Copy the code

Do the forecast

The SVC class predict method can be used to predict new categories of data. The code is as follows:

y_pred = svclassifier.predict(X_test)
Copy the code

Algorithm to evaluate

Confusion matrix, accuracy, recall rate and F1 are some of the most commonly used evaluation indexes for classification tasks. The Metrics module of SciKit-Learn provides methods such as Classification_report and Confusion_matrix to quickly calculate these metrics.

Here is the code to calculate the evaluation indicators:

from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
Copy the code

The results of

Here are the results:

[[152 0] [1 122]] Precision recall F1-Score Support 0 0.99 1.00 1.00 152 1 1.00 0.99 1.00 123 AVG/total 1.00 1.00 1.00 275Copy the code

From the above evaluation results, we can find that SVM is slightly better than decision tree. SVM has only 1% misclassification while decision tree has 4%.

Kernel SVM

In the previous section we saw how to use a simple SVM algorithm to find decision boundaries on linearly fractionable data. However, when the data is not linearly divisible as shown in Figure 3, straight lines can no longer be used as decision boundaries.

Fig 3: Nonlinear fractionable data

For nonlinear separable data sets, simple SVM algorithm is no longer applicable. An improved SVM called Kernel SVM can be used to solve the problem of nonlinear fractional data classification.

Fundamentally speaking, kernel SVM maps linear non-fractional data in low-dimensional space to linearly separable data in high-dimensional space, so that data points of different categories are distributed in different dimensions. Again, there is complex math involved, but don’t worry if you just use SVM. Kernel SVM can be easily implemented using Python’s Scikit-learn library.

Kernel SVM is implemented using Scikit-learn

Same as implementing simple SVM. In this part, we used the famous iris data set to predict which classification the plant belonged to according to the following four attributes: sepal width, sepal length, petal width and petal length.

The data can be accessed from the following link:

Archive.ics.uci.edu/ml/datasets…

The remaining steps are typical machine learning steps and we need some brief instructions before training Kernel SVM.

Import libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Copy the code

Import data

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Assign colum names to the dataset
colnames = ['sepal-length'.'sepal-width'.'petal-length'.'petal-width'.'Class']

# Read dataset to pandas dataframe
irisdata = pd.read_csv(url, names=colnames)
Copy the code

pretreatment

X = irisdata.drop('Class', axis=1)
y = irisdata['Class']
Copy the code

Training and test set partitioning

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
Copy the code

Algorithm training

SVC classes in the SVM module of Scikit-learn are also used. The difference is that the kernel type parameters of class SVC have different values. The kernel function type we use in simple SVM is “Linear”. However, you can use Gaussian, polynomial, SigmoID or other computable kernel for kernel SVM. We will implement polynomial, Gaussian, and Sigmoid cores and test which one performs better.

1. Polynomial kernel

In the case of polynomial cores, you need to pass degree to the SVC class. This parameter is the degree of the polynomial. See the following code to implement the polynomial kernel to implement the kernel SVM:

from sklearn.svm import SVC
svclassifier = SVC(kernel='poly', degree=8)
svclassifier.fit(X_train, y_train)
Copy the code

Do the forecast

Now that we have trained the algorithm, the next step is to make predictions on the test set.

Run the following code to do this:

y_pred = svclassifier.predict(X_test)
Copy the code

Algorithm to evaluate

Usually the final step in a machine learning algorithm is to evaluate the polynomial kernel. Run the following code.

from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
Copy the code

The output of kernel SVM using polynomial kernel is as follows:

[[11 00] [0 12 1] [00 6]] Precision Recall F1-Score support Iris-Setosa 1.00 1.00 1.00 11 Iris-versicolor 1.00 0.92 0.96 13 Iris-virginica 0.86 1.00 0.92 6 AVg/total 0.97 0.97 30Copy the code

Now let’s repeat the above steps using gauss and sigmoid cores.

2. The gaussian kernel

Take a look at how we implemented kernel SVM using gaussian kernel:

from sklearn.svm import SVC
svclassifier = SVC(kernel='rbf')
svclassifier.fit(X_train, y_train)
Copy the code

With gaussian kernel, you must specify the value of the SVC as “RBF”.

Prediction and evaluation

y_pred = svclassifier.predict(X_test)
Copy the code
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
Copy the code

Output using gaussian kernel:

[[11 00] [0 13 0] [00 6]] Precision Recall F1-Score support Iris-Setosa 1.00 1.00 1.00 11 Iris-versicolor 1.00 1.00 1.00 1.00 13 Iris-Virgin 1.00 1.00 6 AVG/Total 1.00 1.00 30Copy the code

3. The Sigmoid kernel

Finally, let’s use the Sigmoid Kernel to implement Kernel SVM. Look at the following code:

from sklearn.svm import SVC
svclassifier = SVC(kernel='sigmoid')
svclassifier.fit(X_train, y_train)
Copy the code

To use the sigmoID kernel, set the kernel value of the SVC class to sigmoID.

Prediction and evaluation

y_pred = svclassifier.predict(X_test)
Copy the code
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
Copy the code

Using the Sigmoid core, the output is as follows:

[[00 11] [00 13] [00 6]] Precision Recall F1-Score Support Iris-SetoSA 0.00 0.00 11 Iris-Versicolor 0.00 0.00 13 Iris-virginica 0.20 1.00 0.33 6 AVG/total 0.04 0.20 0.07 30Copy the code

Compare the performance of the nuclei

The sigmoid nucleus was found to be the worst. Because sigmoid returns both 0 and 1, the sigmoid core is better suited for dichotomy problems. There are three categories in our example.

Gaussian kernel and polynomial kernel have similar behavior. The prediction accuracy of Gaussian kernel is 100% and the error of polynomial kernel is only 1%. Gauss nucleus performs slightly better. However, there are no hard and fast rules about which kernel is better in any situation. You can only choose which core performs better on your data set by testing results on your test set.

resources

Want to Learn more about sciKit-learn and machine learning algorithms? I recommend you check out more materials such as online courses:

  • Python for Data Science and Machine Learning Bootcamp
  • Machine Learning A-Z: Hands-On Python & R In Data Science
  • Data Science in Python, Pandas, Scikit-learn, Numpy, Matplotlib

conclusion

In this article, we learned the basic SVM and kernel SVMs. You also learned about the intuition behind the SVM algorithm and how to implement it using the Python library SciKit-learn. We also learned how to implement SVM using different types of cores. I’m guessing you already want to apply these algorithms to real data like Kaggle.com.

Finally, I suggest you to understand the mathematics behind SVM in detail. Although you don’t need to know that math if you just use SVM algorithms, it is helpful to understand the math of how algorithms find decision boundaries.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.