Make writing a habit together! This is the 12th day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

Loss function

The loss function is used to calculate the loss value, and the model can update each parameter through back propagation. By reducing the loss between the real value and the predicted value, the predicted value calculated by the model tends to the real value, so as to achieve the purpose of model training. The loss function commonly used in the construction of neural network is as follows. The loss function needs to be nonnegative real – valued.

Mean square error (mse)

Error is the difference between the predicted value of the network output and the actual value. We square the error because the error can be positive or negative. Squared ensures that positive and negative errors do not cancel each other out. We calculate Mean Square Error (MSE) so that the errors between two datasets are comparable when they are of different sizes. The mean square error between the predicted value (p) and the actual value (y) is calculated as follows:


m s e ( p . y ) = 1 n i = 1 n ( p y ) 2 mse(p,y)=\frac 1 n \sum _{i=1} ^n(p-y)^2

Use Python to implement this function:

def mse(p, y) :
    return np.mean(np.square(p - y))
Copy the code

When neural networks need to predict continuous values, the mean square error is usually used.

Mean absolute error

Mean Absolute Error (MSE) works in a very similar way to Mean square Error. Mean absolute error ensures that positive and negative errors do not cancel each other out by averaging the absolute difference between the actual and predicted values at all data points. The mean absolute error between the predicted value (p) and the actual value (y) can be realized as follows:


m s e ( p . y ) = 1 n i = 1 n p y mse(p,y)=\frac 1 n \sum _{i=1} ^n|p-y|

Use Python to implement this function:

def mae(p, y) :
    return np.mean(np.abs(p - y))
Copy the code

Like the mean square error, the mean absolute error is usually used to predict the values of continuous variables.

Classification cross entropy

Cross entropy is a measure of the difference between two different distributions (actual and predicted). Different from the above two loss functions, it is widely used for discrete value output data. The cross entropy between the two distributions is calculated as follows:


( y l o g 2 p + ( 1 y ) l o g 2 ( 1 p ) ) -(ylog_2p+(1-y)log_2(1-p))

Yyy is the actual result, PPP is the forecast result. The Python implementation of the categorical cross entropy between the predicted value (p) and the actual value (y) is as follows:

def categorical_cross_entropy(p, y) :
    return -np.sum((y*np.log2(p) + (1-y)*np.log2(1-p)))
Copy the code

When the predicted value is far from the actual value, the cross entropy loss of classification has a high value, while when it is close to the actual value, the cross entropy loss of classification has a low value.

Smooth L1 loss

Usually we refer to the mean absolute error as L1 loss, while the mean square error is called L2 loss, but they are both defective. The former problem is that the gradient is not smooth, and the latter is prone to gradient explosion. In order to overcome their defects, smooth L1 loss was proposed. To solve the problem of gradient unsmoothness or gradient explosion, the formula is as follows:


s m o o t h _ L 1 = { 0.5 ( p y ) 2 . p y < 1 p y 0.5 . x p 0 Smooth \ _L1 = \ begin {cases} (p – y) ^ 2, 0.5 & {| p – y | < 1} \ \ | p – y | – 0.5, & {x \ ge0} {cases} \ end

When the difference between the predicted value and the true value is large, the above equation is equivalent to L1 loss; when the difference is small, it is equivalent to L2 loss, so as to prevent the gradient from being too large.