Small white learn PyTorch | 21 Keras API explanation (on) convolution, activation, initialization, regular

We should already have an intuitive, macro view of Keras. Now, let’s systematically learn some Keras apis about the network layer. The main content of this paper is about convolution, including the following contents:

Different types of convolution layers;
Different parameter initialization methods;
Different activation functions;
Add L1/L2 regularization;
Different pooling layers;
There are multiple exploratory layers;
Other common layers.

This article is more content, for API learning to understand.

Keras convolution layer

The convolution layer of Keras and the convolution layer of PyTorch include 1D, 2D and 3D versions. 1D is one-dimensional, 2D is an image, and 3D is a stereo image. Here, the most common 2D images are used to explain. 1D and 3D are basically the same as 2D, so there is no further explanation.

1.1 Conv2D

Let’s look at all the parameters of Conv2D:

tf.keras.layers.Conv2D(
    filters,
    kernel_size,
    strides=(1.1),
    padding="valid",
    data_format=None,
    dilation_rate=(1.1),
    groups=1,
    activation=None,
    use_bias=True,
    kernel_initializer="glorot_uniform",
    bias_initializer="zeros",
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    **kwargs
)
Copy the code

Let’s start with a simple example:

import tensorflow as tf
input_shape = (4, 28, 28, 3)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv2D(
    filters=2,kernel_size=3,
    activation='relu',padding='same'
)
print(y(x).shape)
>>> (4, 28, 28, 2)
Copy the code

Now let’s see what the parameters mean:

Filter: an int integer that outputs the number of channels in the feature graph.
Kernel_size: an int integer, the size of the convolution kernel;
Strides: an integer or list like (A,b), indicating whether the convolution kernel takes a step;
padding:'valid'That means no padding,'same'The size of output and input feature map is the same; There are only two options
data_format:'channels_last'Or is it'channels_first'. The default is'channels_last', the last dimension of the feature graph is channels (batch_size, height, width, channels); If you choose'channels_first'The first dimension of each sample is channel, so the format of the feature map is the same as that of PyTorch (batch_size, Channels, height, width).
Dilation_rate: setting of collision convolution, default is 1,1 is normal convolution. It should be noted that dilation_rate and stride cannot be 1 at the same time at present. In other words, if dilation_rate and stride need to be expanded convolution, then stride must be 1.
Groups; Grouping convolution;
activationThis means that you can set an activation layer directly after the convolutional layer, for example'relu'This will be explained in detail in a later section on all the activation layers currently supported by Keras. If nothing is filled in, the activation layer is not used
Use_bias: a bool argument. True indicates the use of bias. The default is True.
Kernel_initializer: the initialization method of the convolution kernel, which will be explained in detail in a later chapter;
Bias_initializer: the bias_initializer method, which is explained in detail in a later section;
Kernel_regularizer: the regularization method of convolution kernel, which will be explained in detail in a later chapter;
Bias_regularizer: a bias regularization method, discussed in detail in a later chapter;

2 The Keras parameter is initialized

Adding the initialization of the convolution kernel and bias to the simple example mentioned earlier:

import tensorflow as tf
input_shape = (4.28.28.3)
initializer = tf.keras.initializers.RandomNormal(mean=0., stddev=1.)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv2D(
    filters=2,kernel_size=3,
    activation='relu',padding='same',
    kernel_initializer=initializer,
    bias_initializer=initializer
)
print(y(x).shape)
>>> (4.28.28.2)
Copy the code

Simply put, you define an initializer and pass it to Keras.Layers.

2.1 Normal distribution

Tf. Keras. Initializers. RandomNormal (mean = 0.0, stddev = 0.05, seed = None)

2.2 Uniform Distribution

Tf. Keras. Initializers. RandomUniform (minval = 0.05, very = 0.05, seed = None)

2.3 Truncated normal distribution

Tf. Keras. Initializers. TruncatedNormal (mean = 0.0, stddev = 0.05, seed = None)

It’s basically the same thing as a normal distribution, but if the random value is outside of the range of two standard deviations from the mean, it’s going to be reevaluated.

In other words, the initialized value is limited to plus or minus two standard deviations from the mean

2.4 constant

tf.keras.initializers.Zeros()
tf.keras.initializers.Ones()
Copy the code

2.5 Xavier/Glorot

tf.keras.initializers.GlorotNormal(seed=None)

This is essentially a truncated normal distribution, but GlorotNormal (also known as Xavier) is a 0 mean, standard deviation calculation formula: STD =2in+outstd = SQRT {\frac{2}{in+out}} STD =in+out2

In and out represent the number of input and output neurons. For those of you who have studied or read my paper notes on Xavier initialization, you may have noticed that the paper uses a uniform distribution rather than a normal distribution.

Initialization of uniform distribution is as follows: tf. Keras. Initializers. GlorotUniform (seed = None)

This uniform distribution is what we’re talking about: [−6in+out,6in+out][-\ SQRT {\frac{6}{in+out}},\ SQRT {\frac{6}{in+out}}][−in+out6,in+out6] This Xavier method is also the default initialization method for Keras

2.6 Custom Initialization

Of course, Keras also supports custom initialization methods.

import tensorflow as tf

class ExampleRandomNormal(tf.keras.initializers.Initializer) :

def __init__(self, mean, stddev) :
  self.mean = mean
  self.stddev = stddev

def __call__(self, shape, dtype=None) ` :
  return tf.random.normal(
      shape, mean=self.mean, stddev=self.stddev, dtype=dtype)

def get_config(self) :  # To support serialization
  return {'mean': self.mean, 'stddev': self.stddev}
Copy the code

The key is to return a tf tensor in __call__ of the same size as the input parameter shape.

Keras activation function

Basically all the common activation functions are supported. In the parameter activation of the convolution layer, you can enter relu, sigmoID,softmax, etc., in the form of strings, all lowercase.

3.1 relu

Tf. Keras. Activations. Relu (x, alpha = 0.0, max_value = None, threshold = 0)

Alpha is the slope, and if it is 0.1, it becomes leakyReLU;
Max_value is an upper bound on ReLU. If it is None, there is no upper bound.
Threshold is the lower bound of ReLU. Anything less than the lower bound will be set to 0. Generally, the default value is 0.

3.2 sigmoid

tf.keras.activations.sigmoid(x)

Functional equation: sigmoid (x) = 11 + e – xsigmoid (x) = \ frac {1} {1 + e ^ {x} -} sigmoid (x) = 1 + e – x1

3.3 softmax

tf.keras.activations.softmax(x, axis=-1)

3.4 softplus

tf.keras.activations.softplus(x)

Softplus (x)=log(ex+1) SoftPlus (x)=log(e^x+1)

3.5 softsign

tf.keras.activations.softsign(x)

Formula: softsign (x) = x ∣ x ∣ + 1 softsign (x) = \ frac {x} {| | x + 1} softsign (x) = x ∣ ∣ x + 1

3.6 tanh

tf.keras.activations.tanh(x)

Formula: tanh (x) = the ex – e – xex + e – xtanh (x) = \ frac {e ^ – ^ e x {x} -} {e + e ^ ^ x {x} -} tanh (x) = ex + e – xex – e – x

3.6 selu

tf.keras.activations.selu(x)

If x>0x>0x>0, return scale×xscale \times xscale×x;
If x < 0 x < 0 x < 0, return to scale by alpha x (ex – 1) scale, times, alpha, times (e) ^ x – 1 scale * alpha * (ex – 1);
Scale and alpha \alpha alpha are pre-set values, alpha=1.67216214 and scale=1.05070098
Similar to the ELU activation function, but with a scale coefficient, selu=scale×eluselu=scale\times eluselu=scale×elu
Selu was proposed in a 2017 paper and ELU was proposed in 2016

4. L1/L2 regularization of Keras

Regularization is easy, L1, L2, or both.

4.1 L1 / L2 regular

from tensorflow.keras import layers
from tensorflow.keras import regularizers

layer = layers.Dense(
    units=64,
    kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),Copy the code

Here regularization can be used:

tf.keras.regularizers.l1_l2(l1=1e-5, l2=1e-4)
tf.keras.regularizers.l2(1e-4)
tf.keras.regularizers.l1(1e-5)

Details about calculation of L1 and L2:

Loss =L1×sum(abs(x)) Loss =L1\times sum(ABS (x)) Loss =L1×sum(ABS (x))
L2:L1 regular is loss=L1×sum(x2) Loss =L1\times sum(x^2) Loss =L1×sum(x2)

4.2 Custom regularization

class MyRegularizer(tf.keras.regularizers.Regularizer) :

    def __init__(self, strength) :
        self.strength = strength

    def __call__(self, x) :
        return self.strength * tf.reduce_sum(tf.square(x))
        
    def get_config(self) :
        return {'strength': self.strength}
Copy the code

This implementation is L2 regular. Get_config is used to store model data. If you don’t want it, it’s fine, but you can’t serialize the model (you don’t need to use Config or JSON to store the model).