This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.

At the end of the last article, there was a simple introduction to the activation function. This article will make an important explanation of the activation function, which will be used frequently in the following network, and its role should not be underestimated.

1. The sigmoid function

The mathematical definition is as follows

X is between minus infinity and infinity, and y is between 0,1. The value of this function is continuous and changes slowly in the interval. As shown in the figure below

I used the linear function y=wx+b as an example in my last blog post. Suppose a neuron uses sigmoid function as activation function, then s(Wx +b) function can be calculated by plugging into the above formula. When Wx +b approaches positive infinity, the function as a whole approaches 1; on the contrary, the function as a whole approaches 0. Since the Sigmoid function is continuous, So the output from the input in this neuron also changes continuously, but very slowly.

2. Tanh function

The mathematical definition is as follows

It is not difficult to calculate from the mathematical formula of the function. The value range of TANh is (-1,1), and its variation range is larger than that of SigmoID. Only relevant theories are involved here, so the greater difference may not be experienced. Now we just need to know what each of these functions looks like and what the values are, just to make an impression. The formula for the change of the function is as follows

3. The ReLu function

The mathematical definition is as follows

f(x)=max(0,x)

In the part of x>0, f(x)=x is the simplest linear function. In the part of x<0, f(x)=0. The emergence of relu function has some obvious effects on solving the problems caused by sigmoid function, but it also brings a new problem of gradient explosion, which will be discussed in detail later. The graph for the function is as follows

4. LeakyReLu function

In order to solve the problem of gradient explosions caused by ReLU, LeakyReLU has been proposed, and its mathematical definition is as follows

In contrast to ReLU, the part less than zero is not directly assigned 0, but a slowly changing coefficient is added so that it is never 0. The graph of the function is shown below.

conclusion

The previous four activation functions were introduced, so what is the function of the activation function, and what is the application position, this point must be clear. First of all, in a gradient descent algorithm, whether sigmoID or ReLU is used, it is to make the algorithm gradually reduce errors through continuous learning, which is a slow process. The position used by the activation function is in the back part of the neuron, which can be seen intuitively from the following figure.