### 1 the introduction

Dear friends, welcome to Moon Lai Inn. In the previous paper [1], the author introduced a method to measure the model loss in the single-label classification problem, namely the cross-entropy loss function. At the same time, common evaluation indexes and their realization methods in multi-classification tasks are also introduced [2]. In the following article, the author will detail two common loss assessment methods in the multi-tag classification task, as well as model evaluation metrics in the multi-tag classification scenario.

### 2 methods

The Softmax operation of the original output layer was replaced by the Simoid operation, and then the Sigmoid cross entropy between the output layer and the label was calculated as the measurement standard of the error. The specific calculation formula is as follows:

$$loss(y,\hat{y})=-\frac{1}{C} \sum_{i=1}^m\left[y^{(i)}\cdot\log\left(\frac{1}{1+\exp(-\hat{y}^{(i)})}\right)+\left(1-y^{(i)}\right)\cdot\log\left(\fr ac{\exp(-\hat{y}^{(i)})}{1+\exp(-\hat{y}^{(i)})}\right)\right]\; \; \; \; \; (1)$$

Where $C$represents the number of categories, $y^{(I)}$and $\hat{y}^{(I)}$are vectors that represent the real tag and the network output without any activation function processing, respectively.

It can be found from Equation $(1)$that this measurement of error loss is actually a method used to measure the error between the predicted probability and the true label in logistic regression.

#### 2.1 numpyImplementation:

According to the calculation formula of $(1)$, the loss value can be calculated through the following Python code:

def sigmoid(z): return 1 / (1 + np.exp(-z)) def compute_loss_v1(y_true, y_pred): t_loss = y_true * np.log(sigmoid(y_pred)) + \ (1 - y_true) * np.log(1 - sigmoid(y_pred)) # [batch_size,num_class] loss = T_loss. mean(axis=-1) # Returns the loss value of each sample (or other) if __name__ == '__main__': Y_true = np. Array ([[1, 1, 0, 0] to [0, 1, 0, 1]]) y_pred = np, array ([[0.2, 0.5, 0, 0], [0.1, 0.5, 0, 0.8]]) print(compute_loss_v1(y_true, y_pred)) # 0.5926

Of course, both methods are implemented in TensorFlow 1.x and PyTorch, respectively.

#### 2.2 TensorFlowimplementation

In TensorFlow 1.x, it can be called using the sigmoid_cross_entropy_with_logits method under the tf.nn module:

def sigmoid_cross_entropy_with_logits(labels, logits): loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits) loss = tf.reduce_mean(loss, axis=-1) return tf.reduce_mean(loss) if __name__ == '__main__': Y_true = tf. Constant ([[1, 1, 0, 0] to [0, 1, 0, 1]], dtype = tf. Float16) y_pred = tf. The constant ([[0.2, 0.5, 0, 0], [0.1, 0.5, 0, 0.8]],dtype=tf.float16) with tf.session () as sess: Print (loss) # 0.5926 print(loss) # 0.5926 print(loss) # 0.5926

Of course, after the completion of model training, the predicted label results and corresponding probability values can be obtained through the following code:

def prediction(logits, K): Y_pred = np.argSort (-logits, axis=-1)[:,:K] print(" ArgSort (-logits, axis=-1) ",y_pred) p = np. Vstack ([logits[r,c] for r,c in enumerate(y_pred)]) print(",p) prediction(y_pred,2) ##### [[1 0] [3 1]] Prediction Probability: [[0.5 0.2] [0.8 0.5]]

#### 2.3 Pytorchimplementation

In PyTorch, losses can be calculated by using the MultilabelSoftMarginLoss class in the Torch. nn module:

if __name__ == '__main__': Y_true = torch. Tensor ([[1, 1, 0, 0] to [0, 1, 0, 1]], dtype = torch. Int16) y_pred = torch. The tensor ([[0.2, 0.5, 0, 0], [0.1, 0.5, 0, 0.8]], dtype = torch. Float32) loss = nn. MultiLabelSoftMarginLoss (reduction = 'mean') print (loss (y_pred, Y_true)) # 0.5926

Similarly, after the completion of model training, the above prediction function can also be used to complete the inference and prediction. Note that in TensorFlow 1.x, the sigmoid_cross_entropy_with_logits method returns the mean loss of all samples; In PyTorch, MultilabelsoftMarginLoss returns the mean loss of all samples by default, but you can specify the type of return by specifying the reduction parameter to mean or sum.

### 3 method 2

In the method of measuring the loss of the result of multi-label classification, there is another commonly used loss function besides the method 1 introduced above. This loss function is actually an extended version of the cross-entropy loss function used in single-tag classification, of which single-tag can be regarded as a special case. The specific calculation formula is as follows:

$$loss(y,\hat{y})=-\frac{1}{m}\sum_{i=1}^m\sum_{j=1}^qy^{(i)}_j\log{\hat{y}^{(i)}_j}\; \; \; \; \; \; \; \; \; \; (2)$$

Where $y ^ {(I)} _j$says the first $I$a sample first $j$category the real value of $\ hat {} y ^ {(I)} _j$says the first $I$a sample first $j$categories of output after dealing with the softmax results.

For example, for the following sample:

Y_true = np. Array ([[1, 1, 0, 0] to [0, 1, 0, 1]]) y_pred = np, array ([[0.2, 0.5, 0.1, 0], [0.1, 0.5, 0, 0.8]])

The output value processed by Softmax is as follows:

[[0.24549354 0.33138161 0.22213174 0.20099311]
[0.18482871 0.27573204 0.16723993 0.37219932]]

Then, according to formula $(2)$, it can be known that the loss value of the above two samples is:

\ begin} {aligned loss & = – \ frac {1} {2} \ left (1 \ cdot \ log (0.245) + 1 \ cdot \ log (0.331) + 1 \ cdot \ \ \ log \ cdot (0.275) + 1 \ \ log (0.372) right)} {aligned \ \ approx2.395 \ end; \; \; \; \; \; \; \; \; \; (3) the

#### 3.1 numpyImplementation:

According to the calculation formula of Equation $(3)$, the loss value can be calculated through the following Python code:

def softmax(x): s = np.exp(x) return s / np.sum(s, axis=-1, keepdims=True) def compute_loss_v2(logits, y): Logits = softmax(logits) print(logits) c =- (y * np.log(logits)). Sum (axis=-1 Calculating the average loss of all sample y_true = np, array ([[1, 1, 0, 0], [0, 1, 0, 1]]) y_pred = np. The array ([[0.2, 0.5, 0.1, 0], [0.1, 0.5, 0, 0.8]]) print (compute_loss_v2 (y_pred y_true)) # 2.392

#### 3.2TensorFlowimplementation

In TensorFlow 1.x, it can be called using the softmax_cross_entropy_with_logits_v2 method under the tf.nn module:

def softmax_cross_entropy_with_logits(labels, logits): loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels, logits=logits) return tf.reduce_mean(loss) y_true = tf.constant([[1, 1, 0, 0], [0, 1, 0, 1.]], Dtype = tf. Float16) y_pred = tf. Constant ([[0.2, 0.5, 0.1, 0], [0.1, 0.5, 0, 0.8]]. dtype=tf.float16) with tf.Session() as sess: Print (loss)# 2.395 print(loss)# 2.395 print(loss)# 2.395 print(loss)# 2.395 print(loss)# 2.395

#### 3.3 Pytorchimplementation

In the PyTorch, I haven’t found a model to call yet, but I can code it myself:

def cross_entropy(logits, y): s = torch.exp(logits) logits = s / torch.sum(s, dim=1, keepdim=True) c = -(y * torch.log(logits)).sum(dim=-1) return torch.mean(c) y_true = torch.tensor([[1, 1, 0, 0], [0, 1, Y_pred = 0, 1]]) torch. The tensor ([[0.2, 0.5, 0.1, 0], [0.1, 0.5, 0, 0.8]]) loss = cross_entropy(y_pred,y_true) print(loss)# 2.392

It is important to note that the final result may vary slightly after the decimal place due to the different strategies that each framework uses to preserve the decimal when calculating.

### 4 Evaluation Indicators

#### 4.1 Partly correct assessment methods are not considered

(1) Exact Match Ratio

The so-called absolute matching rate refers to that, for each sample, the prediction is correct only when the predicted value is exactly the same as the true value. In other words, as long as there is a difference in the prediction result of one category, the prediction is not correct. Therefore, its accuracy is calculated by the following formula:

$$MR=\frac{1}{m}\sum_{i=1}^mI(y^{(i)}==\hat{y}^{(i)})\; \; \; \; \; \; \; \; \; \; (4)$$

Where $n$represents the total number of samples; $I(\cdot)$Indicator function. $1$if $y_i$is equal to $\hat{y}_i$, otherwise $0$. It can be seen that the greater the MR value, the higher the accuracy of classification.

For example, we have the following real and predicted values:

 y_true = np.array([[0, 1, 0, 1],
[0, 1, 1, 0],
[1, 0, 1, 1]])

y_pred = np.array([[0, 1, 1, 0],
[0, 1, 1, 0],
[0, 1, 0, 1]])

Then the corresponding MR should be $0.333$, because only the second sample can be counted as correct prediction. In sklearn, the accuracy_score method in sklearn.metrics module can be used to complete the calculation directly [3], as shown below:

From sklear.metrics import accuracy_score print(accuracy_score(y_true,y_pred)) # 0.33333333

(2) 0-1 loss

In addition to the absolute match rate, there is another evaluation criterion that is the exact opposite of the calculation process, namely zero-one Loss. Absolute accuracy is calculated as the percentage of the total number of samples whose predictions were completely correct, while 0-1 loss is calculated as the percentage of the total number of samples whose predictions were completely wrong. Therefore, for the above prediction and true results, the 0-1 loss would be 0.667. The calculation formula is as follows:

$$L_{0-1}=\frac{1}{m}\sum_{i=1}^mI(y^{(i)}\neq\hat{y}^{(i)})\; \; \; \; \; \; \; \; \; \; (5)$$

In sklearn, the calculation can be done by using the zero_one_loss method in the sklearn.metrics module [3], as shown below:

From sklearn.metrics import zero_one_loss print(zero_one_loss(y_true,y_pred))# 0.66666

#### 4.2 Consider partially correct assessment methods

It can be seen from the above two evaluation indicators that neither absolute matching rate nor 0-1 loss takes into account the partially correct situation when calculating the results, which is obviously inaccurate for the evaluation of the model. For example, suppose the correct label is [1,0,0,1] and the predicted label is [1,0,1,0]. As you can see, although the model did not predict all of the tags, it did predict some of them correctly. Therefore, it is advisable to take into account the partial correctness of the predictions [4]. In order to realize this idea, the calculation methods of Accuracy, Precision, Recall and $F_1$value ($F_1$-measure) in the multi-label classification scenario were proposed in Literature [5].

(1) Accuracy

For accuracy, its calculation formula is as follows:

$$\text{Accuracy} = \frac{1}{m} \sum_{i=1}^{m} \frac{\lvert y^{(i)} \cap \hat{y}^{(i)}\rvert}{\lvert y^{(i)} \cup \hat{y}^{(i)}\rvert}\; \; \; \; \; \; \; \; \; \; (6)$$

It can be seen from formula $(6)$that the accuracy is actually calculated as the average accuracy of all samples. For each sample, accuracy is the proportion of the number of correctly predicted tags in the number of correctly predicted tags or the number of correctly predicted tags. For example, for a sample, the true label is [0, 1, 0, 1] and the predicted label is [0, 1, 1, 0]. Then the accuracy of this sample should be:

$$acc = \frac{1}{1+1+1}=\frac{1}{3}\; \; \; \; \; \; \; \; \; \; (7)$$

Therefore, for the following real and predicted results:

 y_true = np.array([[0, 1, 0, 1],
[0, 1, 1, 0],
[1, 0, 1, 1]])

y_pred = np.array([[0, 1, 1, 0],
[0, 1, 1, 0],
[0, 1, 0, 1]])

Its accuracy is:

$$\ text = {Accuracy} \ frac {1} {3} \ times (\ frac {1} {3} + \ frac {2} {2} + \ frac {1} {4}) \ approx0.5278 \; \; \; \; \; \; \; \; \; \; (8)$$

The corresponding implementation code is [6] :

def Accuracy(y_true, y_pred): count = 0 for i in range(y_true.shape[0]): p = sum(np.logical_and(y_true[i], y_pred[i])) q = sum(np.logical_or(y_true[i], Y_pred [I])) count += p/q return count/y_true.shape[0] print(Accuracy(y_true, y_pred)) # 0.52777

(2) Accuracy

For accuracy, its calculation formula is as follows:

$$\text{Precision} = \frac{1}{m} \sum_{i=1}^{m} \frac{\lvert y^{(i)} \cap \hat{y}^{(i)}\rvert}{\lvert \hat{y}^{(i)}\rvert}\; \; \; \; \; \; \; \; \; \; (9)$$

It can be seen from formula $(9)$that the accuracy rate is actually calculated as the average accuracy rate of all samples. For each sample, the accuracy rate is the proportion of the number of correctly predicted tags in the total number of correctly predicted tags. For example, for a sample, the true label is [0, 1, 0, 1] and the predicted label is [0, 1, 1, 0]. Then the exact rate corresponding to this sample should be:

$$\text{pre} = \frac{1}{1+1}=\frac{1}{2}\; \; \; \; \; \; \; \; \; \; (10)$$

Therefore, for the real results and predicted results above, the accuracy rate is:

$$\ text = {Precision} \ frac {1} {3} \ times (\ frac {1} {2} + \ frac {2} {2} + \ frac {1} {2}) \ approx0.6666 \; \; \; \; \; \; \; \; \; \; (11)$$

The corresponding implementation code is:

def Precision(y_true, y_pred): count = 0 for i in range(y_true.shape[0]): if sum(y_pred[i]) == 0: continue count += sum(np.logical_and(y_true[i], Y_pred [I]))/sum(y_pred[I]) return count/y_true.shape[0] print(Precision(y_true, y_pred))# 0.6666

(3) Recall rate

For recall rate, its calculation formula is as follows:

$$\text{Recall} = \frac{1}{m} \sum_{i=1}^{m} \frac{\lvert y^{(i)} \cap \hat{y}^{(i)}\rvert}{\lvert y^{(i)}\rvert} \; \; \; \; \; \; \; \; \; \; (12)$$

As can be seen from Formula $(12)$, recall rate is actually calculated as the average accuracy rate of all samples. For each sample, the recall rate is the percentage of the predicted number of correct labels in the total number of correct labels.

Therefore, for the following real and predicted results:

 y_true = np.array([[0, 1, 0, 1],
[0, 1, 1, 0],
[1, 0, 1, 1]])

y_pred = np.array([[0, 1, 1, 0],
[0, 1, 1, 0],
[0, 1, 0, 1]])

Its recall rate is:

$$\ text = {Recall} \ frac {1} {3} \ times (\ frac {1} {2} + \ frac {2} {2} + \ frac {1} {3}) \ approx0.6111 \; \; \; \; \; \; \; \; \; \; (13)$$

The corresponding implementation code is:

def Recall(y_true, y_pred): count = 0 for i in range(y_true.shape[0]): if sum(y_true[i]) == 0: continue count += sum(np.logical_and(y_true[i], Y_pred [I]))/sum(y_true[I]) return count/y_true.shape[0] print(y_true, y_pred))# 0.6111

(4) $F_1$value

For the value of $F_1$, its calculation formula is:

$$F_{1} = \frac{1}{m} \sum_{i=1}^{m} \frac{2 \lvert y^{(i)} \cap \hat{y}^{(i)}\rvert}{\lvert y^{(i)}\rvert + \lvert \hat{y}^{(i)}\rvert} \; \; \; \; \; \; \; \; \; \; (14)$$

It can be seen from formula $(14)$that $F_1$also calculates the average accuracy of all samples. Therefore, for the real and predicted results above, the value of $F_1$is:

$$F_1 = \ frac {2} {3} (\ frac {1} {4} + \ frac {2} {4} + \ frac {1} {5}) \ approx0.6333 \; \; \; \; \; \; \; \; \; \; (15)$$

The corresponding implementation code is:

def F1Measure(y_true, y_pred): count = 0 for i in range(y_true.shape[0]): if (sum(y_true[i]) == 0) and (sum(y_pred[i]) == 0): continue p = sum(np.logical_and(y_true[i], y_pred[i])) q = sum(y_true[i]) + sum(y_pred[i]) count += (2 * p) / q return count / y_true.shape[0] Print (F1Measure (y_true y_pred)) # 0.6333

In the above four indicators, the larger the value is, the better the classification effect of the corresponding model will be. At the same time, it can be seen from Formula $(6), (9), (12) and (14)$that, although the calculation steps of various indicators in the multi-label scenario are different from those in the single-label scenario, they have similar ideas in the calculation of various indicators.

Of course, the calculation of the last three indicators can also be completed directly through Sklearn. The code is as follows:

from sklearn.metrics import precision_score, recall_score, f1_score print(precision_score(y_true=y_true, y_pred=y_pred, Business = 'samples') # 0.6666 print (recall_score (y_true = y_true, y_pred = y_pred, Business = 'samples') # 0.6111 print (f1_score (y_true y_pred, business =' samples')) # 0.6333

(5) Hamming Loss

In addition to the 6 evaluation methods previously introduced, another more intuitive measurement method, Hamming Loss[3], is introduced below. Its calculation formula is as follows:

$$\text{Hamming Loss} = \frac{1}{m q} \sum_{i=1}^{m}\sum_{j=1}^{q} I\left( y^{(i)}_{j} \neq \hat{y}^{(i)}_{j} \right) \; \; \; \; \; \; \; \; \; \; (16)$$

Where $y^{(I)}_j$represents the $j$tag of the $I$sample, and $q$represents how many categories a class has.

As can be seen from formula $(16)$, what Hamming Loss measures is the proportion of the number of mispredicted tags in the whole tag number in all samples. Therefore, for the Loss of Hamming Loss, the smaller its value is, the better the performance result of the model will be. Therefore, for the following real and predicted results:

 y_true = np.array([[0, 1, 0, 1],
[0, 1, 1, 0],
[1, 0, 1, 1]])

y_pred = np.array([[0, 1, 1, 0],
[0, 1, 1, 0],
[0, 1, 0, 1]])

Its Hamming Loss is:

$$\ text {Hamming Loss} = \ frac {1} {3 \ times4} (2 + 3) 0 + \ approx0.4166 \; \; \; \; \; \; \; \; \; \; (17)$$

The corresponding implementation code is:

def Hamming_Loss(y_true, y_pred): count = 0 for i in range(y_true.shape[0]): p = np.size(y_true[i] == y_pred[i]) q = np.count_nonzero(y_true[i] == y_pred[i]) count += p - q return count / Shape [0] * y_true.shape[3]) print(y_true, y_pred) # 0.4166

This can also be done using the hamming_loss method in sklearn.metrics:

From skLearn. Metrics import hamming_loss print(hamming_loss(y_true, y_pred))# 0.4166

Of course, although seven different evaluation indicators are described here, there are still other different evaluation methods in the multi-label classification, which can be detailed in document [4]. For example, you can also use the multilabel_confusion_matrix method in the sklearn.metric module to calculate the accuracy rate and recall rate of each category in the multi-label. Finally, we will find the average of each index in each category.

### 5 concludes

In this paper, the author firstly introduces the first common loss measurement method in the multi-label classification task, which is essentially the objective function of logistic regression model. Then, the author introduces a variety of evaluation indexes used to evaluate the results of the multi-label classification task, including absolute matching rate, accuracy rate, recall rate, etc. Finally, the author introduces another common loss function in the multi-label classification task.

This is the end of the content, thank you for reading! If you think the above content is helpful to you, welcome to pay attention to and spread this public number! If you have any questions or suggestions, please add WeChat ‘nulls8’ or leave a comment. Green mountains do not change, green water long flow, we come to the inn to meet!

### reference

[1] To understand multiple classifications, we have to talk about logistic regression

[2] Recall rate and F value under multi-classification task

[3] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

[4] Sorower, Mohammad S.. “A Literature Survey on Algorithms for Multi-label Learning.” (2010).

[5] Godbole, S., & Sarawagi, Dissemination and Dissemination of Dissemination and Dissemination of Dissemination and Dissemination of Dissemination and Dissemination of Dissemination and Dissemination. Lecture Notes in Computer Science,(2004), 22 — 30.

[6] https://mmuratarat.github.io/…