This is the 17th day of my participation in the August Challenge

What is a regression algorithm?

Regression algorithm is to fit the historical data and form the fitting equation. The equation is then used to predict the new data. If it is the fitting equation of unary data, a line will be fitted. If the data is binary data, its fitting equation will be a fitting plane. For higher dimensional data, its fitting equation will be more complex.

What is the evaluation index of regression algorithm?

For regression algorithms, we judge them by how different their predictions are from our real results. In regression algorithm, the most commonly used evaluation indicators are: mean absolute value error, mean square error, root mean square error, determination coefficient and so on.

Common regression algorithm evaluation indicators

Mean absolute error (MAE)

The mean absolute value error is calculated as the absolute value of the difference between the predicted value and the true value for each sample, and then summed and averaged. The formula is:


M A E ( y . y ^ ) = 1 m i = 1 m ( y i f ( x i ) ) MAE(y, \hat {y}) = \frac{1}{m} \sum_{i=1}^m{(|y_i – f(x_i)|)}

Where yiy_iyi is the true value, f(xi) and y^f(x_i) and \hat {y}f(xi) and y^ are the predicted values of the model.

Mean square Error (MSE)

The mean square error is the square of the difference between the predicted value and the true value for each sample, and then the sum and the average. The formula is:


M S E ( y . y ^ ) = 1 m i = 1 m ( y i f ( x i ) ) 2 MSE(y, \hat {y}) = \frac{1}{m} \sum_{i=1}^m{(y_i – f(x_i))^2}

Root mean Square Error (RMSE)

The root mean square error is taking the square root of the mean square error. The formula is:


R M S E ( y . y ^ ) = 1 m i = 1 m ( y i f ( x i ) ) 2 RMSE(y, \hat {y}) = \sqrt {\frac{1}{m} \sum_{i=1}^m{(y_i – f(x_i))^2}}

Dsmt4 Logarithmic Error (MSLE, Mean Squared)

MSLE calculates the expected squared logarithmic error or loss.


M S L E ( y . y ^ ) = 1 n s a m p l e s i = 0 n s a m p l e 1 ( log e ( 1 + y i ) log e ( 1 + y ^ i ) ) 2 MSLE(y, \hat {y}) = \frac {1} {n_{samples} } \sum_{i=0}^{n_{sample}-1}(\log_e(1+y_i)-\log_e(1+ \hat y_i))^2

MAPE (Mean Absolute Percentage Error)

MAPE is calculated as the expected loss of relative error. The so-called relative error is the percentage of absolute error and truth value.


M A P E ( y . y ^ ) = 1 n s a m p l e s i = 0 n s a m p l e 1 y i y ^ i m a x ( ϵ . y i ) MAPE(y, \hat {y}) = \frac {1} {n_{samples} } \sum_{i=0}^{n_{sample}-1} \frac {|y_i- \hat y_i|}{max(\epsilon, |y_i|)}

Where ϵ\epsilonϵ is an arbitrarily small positive number to avoid undefined results when y is zero and the denominator is zero.

Median Absolute Error

The median absolute error is interesting because it can blunt the effect of outliers. The loss is calculated by taking the median of all absolute differences between the target and the forecast.


M e d A E ( y . y ^ ) = m e d a i n ( y 1 y ^ 1 . . . . . y n y ^ n ) MedAE(y, \hat {y}) = medain(|y_1- \hat y_1|,… ,|y_n- \hat y_n|)

Explained variance

In statistics, explicable variation refers to the variation in a given data that can be explained by a mathematical model. Variance is usually used to quantify variation, so it is also called explicable variance.

Remark:

Variation can also be understood as the degree of dispersion.

In statistics, dispersion is the degree to which a distribution is compressed or stretched.

The degree of dispersion mainly includes variance, standard deviation and quartile distance.

Degree of dispersion is relative to position or central tendency.

The explainable variance measure is the degree to which the differences between all the predicted values and the sample are nearly as dispersed as the sample itself. It’s a contrast of dispersion.


e x p l a i n e d v a r i a n c e ( y . y ^ ) = 1 V a r { y y ^ } V a r { y } = 1 Variance of the difference between the sample value and the predicted value Sample variance = Explained_variance (y, hat {y}) =1- \frac {Var \{y- \hat y\}} {Var \{y\}}=1- \frac {variance of difference between sample and predicted value} {sample variance}=

e x p l a i n e d v a r i a n c e ( y . y ^ ) = 1 1 n i = 1 n ( z i z i ) 2 1 n i = 1 n ( y i y ) 2 explained_variance(y, \hat {y}) = 1 – \frac { \frac {1} {n} \sum_{i=1}^n {(z_i – \overline z_i)^2}} { \frac {1} {n} \sum_{i=1}^n{(y_i-\overline y)^2}}

Where z is the difference between the sample value and the predicted value (z=y−y^z=y- \hat yz=y−y^).

The maximum value is 1, and the closer it is to 1, the better. The larger the value is, the closer the dispersion and distribution of the prediction and sample values are, and the lower the value is, the worse it is.

R^2 (R-squared)

The determinant coefficient R2R^2R2 is used to measure the proportion of variation in dependent variables that can be explained by independent variables, so as to judge the explanatory power of statistical models. It divides the explained variance by the total variance and represents the rate at which the total variance is explained or determined by the predictive variable.

The value of determining coefficient R2R^2R2 is between 0 and 1. The closer R2R^2R2 is to 1, the better the effect of the model is. The closer R2R^2R2 is to 0, the worse the effect of the model is.

The formula is:


R 2 ( y . y ^ ) = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ) 2 = 1 1 n i = 1 n ( y i y ^ i ) 2 1 n i = 1 n ( y i y ) 2 = 1 M S E V a r R^2(y, \hat {y}) = 1 – \frac { \sum_{i=1}^n { (y_i – \hat y_i)^2 } }{ \sum_{i=1}^n{(y_i-\overline y)^2}} =1 – \frac { \frac {1} {n} \sum_{i=1}^n { (y_i – \hat y_i)^2 } }{ \frac {1} {n} \sum_{i=1}^n{(y_i-\overline y)^2}} = 1 – \frac {MSE} {Var}

Where, y‾\overline yy is the average value of YYY, and the numerator part represents the sum of square error of true value and predicted value, which is similar to the mean square error MSE; The denominator represents the sum of the squares of the true value and the mean, similar to variance Var.

Comparison and use of each evaluation index

With the exception of R Squared, the smaller the better. Depending on the goal, the combination of measures allows us to analyze the model in greater depth.

The use of a single indicator

  • MAE and MedAE are based on absolute error. If absolute error of real value and predicted value is valued, MAE or MedAE are selected, and they are the mean and median of error respectively. MAE is sensitive to extreme values.
  • If you value the square of the difference between the true value and the predicted value, use MSE or RMSE.
  • MAPE is the best choice if there is an order of magnitude difference between the true values of different samples or more attention is paid to the percentage difference between the predicted and true values.
  • MSLE is appropriate if y has a tendency to change exponentially with x.
  • If the model hopes to find the dependent variable that can explain the change of target Y, R Squared is more appropriate.

For the use of multiple indicators

  • MAE and RMSE are used together to show the degree of dispersion of sample errors. For example, when RMSE is much larger than MAE, it can be known that the errors of different cases vary greatly.
  • MAE and MAPE, combined with Y ‾\ Overline YY, can be used to estimate how well the model fits different orders of magnitude of samples. For example, MAE is much larger than MAPE∗y MAPE * \overline yMAPE∗y, which may be because the model is more accurate in predicting samples with small true values. Consider building different models for samples of different orders of magnitude.