Binary problems are common problems in our life, such as mail can be divided into junk mail and non-junk mail, a person sick or not sick, but in addition to some multivariate classification problems, such as weather can be divided into sunny, cloudy, rain, snow and so on.

Classifiers constructed by algorithms can be divided into binary classifiers and multivariate classifiers. The former can distinguish two category tags, while the latter can distinguish more than two category tags. For algorithms, SVM and logistic regression are strict binary classification algorithms, while algorithms such as Naive Bayes and random forest can directly deal with multivariate classification. However, it is feasible to use binary classifier to deal with multi-classification problems. The following will take logistic regression combined with iris data set as an example.

OvA and OvO strategies

Using binary classifier to solve multiple classification problems can be divided into two strategies:

  • One-versus-all (OvA) strategy, also known as one-versus-Rest (OvR), is short for one-to-many.
  • The one-versus-one strategy. I’m sure some of you have used OvO as a text emoji.

Those who have used iris data set should know that there are three categories of category labels in this data set, namely setosa, Versicolor and Virginica. Since there are three categories, three binary classifiers are constructed. Assume mountain – classifier, variable-classifier, and dimension – classifier. During training, the samples of a certain category are classified into one category and the samples of other categories are classified into another category. In this way, for the samples of an unknown category, the three classifiers all have a decision score (probability), and then take the category with the highest decision score as the category of this sample. This method belongs to one-to-many method.

The one-to-one method is to construct multiple binary classifiers between any two types of samples, similar to the principle of forming a team. For example, the above three label variables can form mountain and variation, mountain and dimension, and variation and dimension. If the category is N, the number of classifiers required is. Finally, the category with the highest decision score is taken as the sample of an unknown category for final classification.

From the above introduction, it is easy to conclude the advantages and disadvantages of both:

  • Disadvantages: Because it is one category versus multiple categories (1:N relationship), training may favor one of multiple categories.
  • OvA advantage: Assuming there are n categories, only N classifiers need to be built.
  • Disadvantages of OvO: If there are many label categories, you need to build many binary classifiers, which can be cumbersome to build and train
  • OvO advantage: Each classifier needs to be trained on only part of the data containing two categories, not the entire data set.

The basic idea of these two strategies is to solve the multi-classification problem by constructing multiple binary classifiers. Most binary classification algorithms are suitable for OvA strategy, but not all of them, depending on the characteristics of the data set. Next, logistic regression is used to model the iris data set. Since the data set is relatively simple and we only talk about this method, operations like analysis are omitted.

Implement OvA strategy by hand

I personally get used to converting datasets to easily observable DataFrame format:

import pandas as pd
from sklearn.datasets import load_iris
feature_names = load_iris().feature_names
dataset_data = pd.DataFrame(load_iris().data,columns=feature_names)
dataset_target = pd.DataFrame(load_iris().target,columns=['target'])
data = pd.concat([dataset_data,dataset_target],axis = 1)
Copy the code

The dataset consists of 150 samples, four features and a category label:

We use one to many (OvA strategy) combined with logistic regression to solve this multi-classification problem, first introduce the way of hand push, the modeling process is the same as binary classification, but we need to pay attention to the idea of OvA strategy.

First of all, we need to divide the data set. Seven data sets are taken as training sets, and the remaining three are test sets. The basic part will not be attached with the code, and the complete code acquisition method will be given at the end of this article. The OvA strategy is to build as many classifiers as there are categories, so you need to know all categories of the tag variable, use the unique index, and then store all classifiers in dictionary format.

Get the category of the tag variable
unique_targets = data['target'].unique()
'''
array([0, 1, 2])
'''
Using OvA strategy, three categories correspond to three models, stored in dictionary format
models = {}
Copy the code

Each classifier classifies one category into one category, and the remaining categories into another, so we tentatively set each loop and target to the same category, with the label set to 1, and the remaining two categories to 0. For code brevity, pipe flows are used to encapsulate each classifier and standardization process.

y_train_copy = y_train.copy()
for target in unique_targets:
    # Pipeline flow encapsulation
    models[target] = make_pipeline(StandardScaler(),LogisticRegression())
    y_train_list = y_train_copy.tolist()
    Each time you modify the training set's label, set the label of the current category to 1, and set the other categories to 0
    for i in range(len(y_train_list)):
        if y_train_list[i] == target:
            y_train_list[i] = 1
        else:
            y_train_list[i] = 0
    y_train = np.array(y_train_list)
    
    models[target].fit(X_train,y_train)
Copy the code

After creating the corresponding classifiers, all you need to do is apply them to the test set, and the three classifiers will end up with the predicted probabilities for the three labels.

test_probs = pd.DataFrame(columns=unique_targets)
for target in unique_targets:
    #[:,1] returns the probability of being 1, [:,0] is the probability of being 0
    test_probs[target] = models[target].predict_proba(X_test)[:,1]
print(test_probs)
Copy the code

The available dataframes for probability are as follows:

predicted_target = test_probs.idxmax(axis=1)
0 0 1 0 2 1 3 0 4 2 5 1....... The model error rate is 6.67% ".
Copy the code

Finally, the accuracy of the model can be calculated by comparing with the original label, so that is how to use binary classifier to achieve multivariate classification problem by hand push method.

Sklearn call

There are also classes in SkLearn that can implement OvA and OVO policies, which are simpler and more convenient than manual push methods: OneVsOneClassifier or OneVsRestClassifier, respectively.

from sklearn.multiclass import OneVsOneClassifier,OneVsRestClassifier
OvO = OneVsOneClassifier(make_pipeline(StandardScaler(),LogisticRegression()))
OvO.fit(X_train,y_train)
ovo_predict = OvO.predict(X_test)
Copy the code

The number of binary classifiers required by OvA and OvO strategies is 3. The number of binary classifiers required by OvA and OvO strategies is 3.

print('Category labels are :%s' % OvO.classes_)
print('Classifiers :%d' % len(OvO.estimators_))
The category tags are :[0 1 2]
Copy the code

The ovR (OvA) policy is available in the multi_class parameter of logistic regression, but there is no OvO policy. After all, it is easier to understand the idea of a policy by hand, and then we will know what we are doing when we call a class or call a parameter. In summary, this is an overview of how to use binary classification algorithms to solve multivariate classification problems.

Reference links: [1]. Blog.csdn.net/zm714981790…

Public number [milk candy cat] reply keyword “multivariate classification” can be obtained in this article code reference