Note: multi-label classification problem

Machine learning multi-label classification


Recently, I encountered a label problem, which is to label an object with multiple labels. I checked a lot of information on the Internet. Found baidu did not search out what, is later to know online to find some reliable information, and then in Google. Now to summarize the multi-label problem.

Multi-label methods can be roughly divided into two categories, namely problem transformation and algorithm transformation.

Describe the problem first:

The problem conversion method is introduced first.

Problem conversion method

The first major category is the token-based transformation approach.


The first is Binary Relevance (BR).

According to labels, the data were reconstituted into positive and negative samples. For each category label, we trained the base classifier respectively, and the overall complexity was Q × O(C), where O(C) was the complexity of the basic classification algorithm. Therefore, BR algorithm was suitable for the case where the number of labels q was relatively small. However, in many scenarios, tags are associated at a tree-like level. In this case, BR does not take into account the correlation between these tags.

The second is Classifier Chain(CC). In order to solve the problem of label affinity in BR, in CC it sets these base classifiers Cj, j = 1… Q is connected in series to form a chain with the output of the previous base classifier as the input of the next base classifier.

The second category is based on sample instance conversion methods


The first is innovative new label-powersets.



This comes at the cost of increasing the number of tags, and some tags have only a few instances, but LP has the advantage of allowing for correlation between tags.

The second is to decompose multiple tags



The idea in the diagram above is that we can use the training data multiple times, calledcross-trainingIn other words, E1 in the figure above was regarded as both the positive sample of y2 category training and the positive sample of Y3 sample training, which felt the same meaning as the Binary Relevance (BR) algorithm.

Algorithm modification method

Algorithm modification An algorithm is modified for a particular algorithm. Scikit.ml/API/Classif… .

The neural network

Here we introduce multi-label Neural Networks with Applications to Functional Genomics and Text Categorization, a Neural network algorithm.

It’s a simple deep web:



However, it should be noted that we choose the Loss function, assuming that we choose



That is equivalent to only considering a single label value, 0 or 1, without considering the correlation between different labels, so we change loss to the following:

Above k is the labeled subscript, and L is the unlabeled subscript, and we consider that the labeled value is more significant than the unlabeled value.

Finally, we introduce a new neural network model, Learning Deep Latent Spaces for multi-label Classification

Its model is as follows:



Wherein Fx,Fe and Fd are three DNNS respectively, representing feature extraction, label encode and hidden vector decode respectively, while loss function consists of two parts:

Embedding loss is:



The output loss as follows:

It can be seen that this is the same as the loss function in multi-label Neural Networks with Applications to Functional Genomics and Text Categorization.

If you still don’t understand this paper, fortunately, there is an implementation of this paper on the web, see C2AE-multilabel-Classification.

conclusion

This paper gives a brief introduction to the multi-label problem. It reminds me that many scenes such as image classification and video content recognition are multi-label problems, and I will continue to have a deeper understanding of them when I have time.

Your encouragement is my motivation to continue writing, and I look forward to our common progress.

reference

Xu Zhaogui Learning Deep Latent Spaces for Multi-Label Classification Multi-Label machine Learning and its Application The to semantic scene classi fi cation