Pytorch and TensorFlow's love-hate parameter initialization

Pytorch and TensorFlow’s basic love-hate data types

The love-hate tensor of PyTorch and Tensorflow

Pytorch and TensorFlow’s definition of love and hate is a trainable parameter

Pytorch version: 1.6.0

Tensorflow version: 1.15.0

Parameter initialization is mainly about some mathematical distributions, such as normal distribution, uniform distribution and so on.

1, pytorch

(1) User-defined trainable parameters


torch.bernoulli(input.out=None) – > Tensor	Extract binary random number (0 or 1) from Bernoulli distribution
torch.multinomial(input.num_samples.replacement=False.out=None) – > LongTensor	Returns a tensor where each row contains the num_samples index of the polynomial distribution sample in the corresponding row of the input tensor
torch.normal(means.std.out=None)	Returns a tensor of a random number extracted from a discrete normal distribution of a given mean and standard deviation.
torch.normal(Mean = 0.0.std.out=None)	The function is similar to the above, but all extracted elements share the mean
torch.normal(means.STD = 1.0.out=None)	The function is similar to the above, but all extracted elements share standard deviation
torch.rand( *sizes.out=None) – > Tensor	Returns a tensor filled with uniformly distributed random numbers in the interval [0,1]. The shape of this tensor is defined by the variable parameter sizes
torch.randn( *sizes.out=None) – > Tensor	Returns a tensor filled with random numbers from a normal distribution, with a mean of 0 and a variance of 1. The shape of this tensor is defined by the sizes variable parameter
torch.randperm(n.out=None) – > LongTensor	Returns a random permutation of integers from 0 to n-1
In-place random sampling
torch.Tensor.bernoulli_()	An in-place version of torch.Bernoulli ()
torch.Tensor.cauchy_()	Extracting numbers from cauchy distributions
torch.Tensor.exponential_()	Extract numbers from an exponential distribution
torch.Tensor.geometric_()	Extract elements from geometric distributions
torch.Tensor.log_normal_()	Samples in a lognormal distribution
torch.Tensor.normal_()	Is the in-place version of torch.normal()
torch.Tensor.random_()	A number sampled in a discrete uniform distribution
torch.Tensor.uniform_()	The number sampled in a normal distribution

Note: Normal_ () like this, underlined at the end, operates on the original data.

There are also functions like torch. Zeros (), torch. Zeros_ (), torch. Ones (), torch.

The following is an example of parameter initialization using these distributions:

a = torch.Tensor(3, 3).bernoulli_()
Copy the code

tensor([[1., 1., 1.],
        [0., 1., 0.],
        [0., 1., 0.]])
Copy the code

A = the torch. The Tensor (3, 3). Normal_ (0, 1)Copy the code

Tensor ([[0.7777, 0.9153, 0.1495], [0.0533, 1.6500, 1.2531], [0.5321, 0.1954, 1.3835]])Copy the code

Then we put it in torch.tensor() and set it to do gradient calculation:

b = torch.tensor(a,requires_grad=True)
Copy the code

E:\anaconda2\envs\python36\lib\site-packages\ipykernel_launcher.py:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). """Entry point for launching an IPython kernel. Out[7]: Tensor ([[0.7777, 0.9153, 0.1495], [0.0533, 1.6500, 1.2531], [0.5321, 0.1954, 1.3835]], requires_grad = True)Copy the code

The above warning is reported here, we can modify the following as prompted:

c = a.clone().detach().requires_grad_(True)
Copy the code

The result is the same:

Tensor ([[0.7777, 0.9153, 0.1495], [0.0533, 1.6500, 1.2531], [0.5321, 0.1954, 1.3835]], requires_grad = True)Copy the code

(2) Initialize layer parameters in the network

The default initialization of parameters in PyTorch is in the reset_Parameters () method of each layer.

class Net(nn.Module): def __init__(self,input,hidden,classes): super(Net, self).__init__() self.input = input self.hidden = hidden self.classes = classes self.w0 = nn.Parameter(torch.Tensor(self.input,self.hidden)) self.b0 = nn.Parameter(torch.Tensor(self.hidden)) self.w1 = nn.Parameter(torch.Tensor(self.hidden,self.classes)) self.b1 = nn.Parameter(torch.Tensor(self.classes)) self.reset_parameters() def reset_parameters(self): Normal_ (self.w0) nn.init.constant_(self.w1) nn.init.constant_(self.b1,0) def nn.init.constant_(self.b1,0) forward(self,x): out = torch.matmul(x,self.w0)+self.b0 out = F.relu(out) out = torch.matmul(out,self.w1)+self.b1 return outCopy the code

Nn.parameter () function: The purpose of using this function is to let some variables in the learning process to constantly modify their values to achieve optimization;

You can use the initialization method in torch.nn.init() :

W = torch. Empty (2, 3) # 1. Uniform distribution - u(a,b) # torch. B = 1) nn. The init. Uniform_ (w) # tensor (# [[0.0578, 0.3402, 0.5034], [0.7865, 0.7280, 0.6269]]) # 2. Normal_ (mean =0, STD =1) nn.init. Normal_ (W) # tensor([[0.3326, 0.0171, # [0.1669, 0.1747, 0.0472]]) # 3. Constant_ (val) nn.init. Constant_ (w, 0.3) # tensor([[0.3000, 0.3000, 0.3000]) # [0.3000, 0.3000, 0.3000]]) # 4. The diagonal is one, Eye_ (tensor) nn.init.eye_(w) # tensor([[1., 0., 0.], # [0., 1., 0.]]) Tensor # torch.nn.init. Dirac_ (Tensor) w1 = torch.empty(3, 16, 5) 5) nn.init.dirac_(w1) # 6. Xavier_uniform Initialization # torch.nn.init.xavier_uniform_(tensor, gain=1) # From - Understanding the difficulty of training deep feedforward neural networks - Bengio 2010 Nn.init.xavier_uniform_ (W, gain=nn.init.calculate_gain('relu')) # tensor([[1.3374, 0.7932, -0.0891], # [-1.3363, -0.0206, -0.9346]]) # localhost = localhost = localhost Gain =1) nn.init.xavier_normal_(W) # tensor([[-0.1777, 0.6740, 0.1139], # [0.3018, -0.2443, ]) # 4. Surface MOUNT RECTIFIERS: Surpassing human-level performance on ImageNet classification - HeKaiming 2015 # torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu') nn.init.kaiming_uniform_(w, mode='fan_in', Nonlinearity = 'relu) # tensor ([[0.6426, 0.9582, 1.1783], # [0.0515, 0.4975, You need to tensor kaiming_normal_(tensor, a=0, mode='fan_in', Linearity ='leaky_relu') non.init. Kaiming_normal_ (W, mode='fan_out', nonlinearity='relu') # tensor([[0.2530, -0.4382, ], # [0.0544, 1.6392, -2.0752]]) # 10. Orthogonal matrices - (semi) Orthogonal matrix # From Exact Solutions to the Nonlinear Dynamics of Learning in deep Linear Neural networks Orthogonal_ (w) # tensor([[0.5786, -0.5642, # [-0.7517, -0.0886, -0.6536]]) # 11. Non-zero elements with normal distribution N(0, # from-deep learning via Hessian- Free optimization - Martens 2010 # Torch.nn.init.sparse_(tensor, tensor, Sparsity, STD =0.01) nn.init.sparse_(w, sparsity=0.1) # tensor(e-03 * # [[-0.3382, 1.9501, -1.7761], # [0.0000, 0.0000, 0.0000]])Copy the code

If it is a built-in layer parameter in PyTorch, we can initialize it as follows:

for m in model.modules():
    if isinstance(m, (nn.Conv2d, nn.Linear)):
        nn.init.xavier_uniform_(m.weight)
Copy the code

Conv2d and Nn. Linear, then get its weight parameter m. eight for xavier_UNIFORM initialization. Similarly, m.bias can be used to get bias items. Here is the code for the pyTorch version of the residual network parameter initialization:

for m in self.modules():
    if isinstance(m, nn.Conv2d):
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
        nn.init.constant_(m.weight, 1)
        nn.init.constant_(m.bias, 0)
Copy the code

This code block is used in __ini__, where self refers to the current model.

Reference:

Blog.csdn.net/ys1305/arti…

2, tensorflow

(1) User-defined parameter initialization

Create a 2 by 3 matrix with all elements having a value of 0. (type tF.float)

A = tf.zeros([2,3], dtype = tf.float32)Copy the code

Create a 3 by 4 matrix with all elements having a value of 1.

B = tf. 'ones ([3, 4])Copy the code

Create a 1 by 10 matrix and fill it with 2. (Type tf.int32, negligible)

C = tf.constant(2, dtype=tf.int32, shape=[1,10])Copy the code

Create a 1 by 10 matrix where the elements are normally distributed, with an average of 20 and a standard deviation of 3.

D = tf.random_normal([1,10],mean = 20, stddev = 3)Copy the code

All of the above values can be used to initialize variables. For example, populate a 1*2 matrix with 0.01 to initialize a variable called bias.

Bias = tf.variable (tf.zeros([1,2]) + 0.01)Copy the code

(2) Who uses type __initializer() for initialization

Initialize constant

import tensorflow as tf value = [0, 1, 2, 3, 4, 5, 6, 7] init = tf.constant_initializer(value) with tf.Session() as sess: x = tf.get_variable('x', shape=[8], initializer=init) x.initializer.run() print(x.eval()) #output: #[0. 1. 2. 3.Copy the code

The tf.zerOS_Initializer () and tf.ones_Initializer () classes are used to initialize the tensor objects with all zeros and all ones, respectively.

import tensorflow as tf init_zeros=tf.zeros_initializer() init_ones = tf.ones_initializer with tf.Session() as sess: x = tf.get_variable('x', shape=[8], initializer=init_zeros) y = tf.get_variable('y', shape=[8], initializer=init_ones) x.initializer.run() y.initializer.run() print(x.eval()) print(y.eval()) #output: # [0. 0. 0. 0 0, 0, 0, 0.] # [1. 1. 1. 1. 1. 1. 1. 1.]Copy the code

Initialize to a normal distribution

The initialization parameter of normal distribution is most widely used in neural networks and can be initialized into standard normal distribution and truncated normal distribution.

The tf.random_normal_Initializer () class is used in TF to generate a set of tensor that matches the normal distribution.

The tF.Truncated_normal_Initializer () class is used in TF to generate a set of tensor that matches truncated positive distribution.

Mean: indicates the mean value of the positive square distribution. The default value is 0
Stddev: Standard deviation of positive distribution, default value 1
Seed: Random number seed, specifying the value of the seed to generate the same data each time
Dtype: indicates the data type

Import tensorflow as tf init_random = tf.random_normal_initializer(mean=0.0, stddev=1.0, seed=None, Dtype =tf.float32) init_truncated = tf.truncated_NORMal_initializer (mean=0.0, STddev =1.0, seed=None, dtype=tf.float32) with tf.Session() as sess: x = tf.get_variable('x', shape=[10], initializer=init_random) y = tf.get_variable('y', shape=[10], initializer=init_truncated) x.initializer.run() y.initializer.run() print(x.eval()) print(y.eval()) #output: # [-0.40236568-0.35864913-0.94253045-0.40153521 0.1552504 1.16989613 # 0.43091929-0.31410623 0.70080078-0.9620409] # [0.18356581-0.06860946-0.55245203 1.08850253-1.13627422-0.1006074 # 0.65564936 0.03948414 0.86558545-0.4964745 ]Copy the code

Initialize to uniform distribution

Tf.random_uniform_initializer class is used in TF to generate a set of evenly distributed tensor.

Minval: minimum value
Maxval: indicates the maximum value
Seed: random number seed
Dtype: indicates the data type

import tensorflow as tf init_uniform = tf.random_uniform_initializer(minval=0, maxval=10, seed=None, dtype=tf.float32) with tf.Session() as sess: x = tf.get_variable('x', shape=[10], initializer=init_uniform) x.initializer.run() print(x.eval()) # output: # [6.93343639 9.41196823 5.54009819 1.38017178 1.78720832 5.38881063 # 3.39674473 8.12443542 0.62157512 8.36026382]Copy the code

Others:

Orthogonal_initializer () is a random number initialized as an orthogonal matrix, the shape of which needs to be at least two-dimensional

Tf.glorot_uniform_initializer () is initialized to a uniformly distributed random number related to the number of input and output nodes

Tf.glorot_normal_initializer () is initialized as a truncated normal distributed random number related to the number of input and output nodes

In use:

with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
Copy the code

Use the above method to initialize the parameters.

Get_variable = get_variable; get_variable = get_variable; get_variable = get_variable

Specific differences can be reference: blog.csdn.net/kevindree/a…

Reference:

Blog.csdn.net/dcrmg/artic…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Pytorch and TensorFlow’s love-hate parameter initialization

Pytorch and TensorFlow’s love-hate parameter initialization

Related Posts

Starting Deep Learning and Python with Keras (part 1)

Machine Learning 055- Use LBP histograms to build face recognizers

NVIDIA HugeCTR, GPU version parameter server –(7) –Distributed Hash before transmission