Pytorch and TensorFlow’s basic love-hate data types

The love-hate tensor of PyTorch and Tensorflow

Pytorch and TensorFlow’s definition of love and hate is a trainable parameter

Pytorch version: 1.6.0

Tensorflow version: 1.15.0

Parameter initialization is mainly about some mathematical distributions, such as normal distribution, uniform distribution and so on.

1, pytorch

(1) User-defined trainable parameters

torch.bernoulli(input.out=None) – > Tensor Extract binary random number (0 or 1) from Bernoulli distribution
torch.multinomial(input.num_samples.replacement=False.out=None) – > LongTensor Returns a tensor where each row contains the num_samples index of the polynomial distribution sample in the corresponding row of the input tensor
torch.normal(means.std.out=None) Returns a tensor of a random number extracted from a discrete normal distribution of a given mean and standard deviation.
torch.normal(Mean = 0.0.std.out=None) The function is similar to the above, but all extracted elements share the mean
torch.normal(means.STD = 1.0.out=None) The function is similar to the above, but all extracted elements share standard deviation
torch.rand( *sizes.out=None) – > Tensor Returns a tensor filled with uniformly distributed random numbers in the interval [0,1]. The shape of this tensor is defined by the variable parameter sizes
torch.randn( *sizes.out=None) – > Tensor Returns a tensor filled with random numbers from a normal distribution, with a mean of 0 and a variance of 1. The shape of this tensor is defined by the sizes variable parameter
torch.randperm(n.out=None) – > LongTensor Returns a random permutation of integers from 0 to n-1
In-place random sampling
torch.Tensor.bernoulli_()  An in-place version of torch.Bernoulli ()
torch.Tensor.cauchy_()   Extracting numbers from cauchy distributions
torch.Tensor.exponential_()   Extract numbers from an exponential distribution
torch.Tensor.geometric_()   Extract elements from geometric distributions
torch.Tensor.log_normal_()  Samples in a lognormal distribution
torch.Tensor.normal_()  Is the in-place version of torch.normal()
torch.Tensor.random_()   A number sampled in a discrete uniform distribution
torch.Tensor.uniform_()  The number sampled in a normal distribution

Note: Normal_ () like this, underlined at the end, operates on the original data.

There are also functions like torch. Zeros (), torch. Zeros_ (), torch. Ones (), torch.

The following is an example of parameter initialization using these distributions:

a = torch.Tensor(3, 3).bernoulli_()
Copy the code
tensor([[1., 1., 1.],
        [0., 1., 0.],
        [0., 1., 0.]])
Copy the code
A = the torch. The Tensor (3, 3). Normal_ (0, 1)Copy the code
Tensor ([[0.7777, 0.9153, 0.1495], [0.0533, 1.6500, 1.2531], [0.5321, 0.1954, 1.3835]])Copy the code

Then we put it in torch.tensor() and set it to do gradient calculation:

b = torch.tensor(a,requires_grad=True)
Copy the code
E:\anaconda2\envs\python36\lib\site-packages\ipykernel_launcher.py:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). """Entry point for launching an IPython kernel. Out[7]: Tensor ([[0.7777, 0.9153, 0.1495], [0.0533, 1.6500, 1.2531], [0.5321, 0.1954, 1.3835]], requires_grad = True)Copy the code

The above warning is reported here, we can modify the following as prompted:

c = a.clone().detach().requires_grad_(True)
Copy the code

The result is the same:

Tensor ([[0.7777, 0.9153, 0.1495], [0.0533, 1.6500, 1.2531], [0.5321, 0.1954, 1.3835]], requires_grad = True)Copy the code

(2) Initialize layer parameters in the network

The default initialization of parameters in PyTorch is in the reset_Parameters () method of each layer.

class Net(nn.Module): def __init__(self,input,hidden,classes): super(Net, self).__init__() self.input = input self.hidden = hidden self.classes = classes self.w0 = nn.Parameter(torch.Tensor(self.input,self.hidden)) self.b0 = nn.Parameter(torch.Tensor(self.hidden)) self.w1 = nn.Parameter(torch.Tensor(self.hidden,self.classes)) self.b1 = nn.Parameter(torch.Tensor(self.classes)) self.reset_parameters() def reset_parameters(self): Normal_ (self.w0) nn.init.constant_(self.w1) nn.init.constant_(self.b1,0) def nn.init.constant_(self.b1,0) forward(self,x): out = torch.matmul(x,self.w0)+self.b0 out = F.relu(out) out = torch.matmul(out,self.w1)+self.b1 return outCopy the code

Nn.parameter () function: The purpose of using this function is to let some variables in the learning process to constantly modify their values to achieve optimization;

You can use the initialization method in torch.nn.init() :

W = torch. Empty (2, 3) # 1. Uniform distribution - u(a,b) # torch. B = 1) nn. The init. Uniform_ (w) # tensor (# [[0.0578, 0.3402, 0.5034], [0.7865, 0.7280, 0.6269]]) # 2. Normal_ (mean =0, STD =1) nn.init. Normal_ (W) # tensor([[0.3326, 0.0171, # [0.1669, 0.1747, 0.0472]]) # 3. Constant_ (val) nn.init. Constant_ (w, 0.3) # tensor([[0.3000, 0.3000, 0.3000]) # [0.3000, 0.3000, 0.3000]]) # 4. The diagonal is one, Eye_ (tensor) nn.init.eye_(w) # tensor([[1., 0., 0.], # [0., 1., 0.]]) Tensor # torch.nn.init. Dirac_ (Tensor) w1 = torch.empty(3, 16, 5) 5) nn.init.dirac_(w1) # 6. Xavier_uniform Initialization # torch.nn.init.xavier_uniform_(tensor, gain=1) # From - Understanding the difficulty of training deep feedforward neural networks - Bengio 2010 Nn.init.xavier_uniform_ (W, gain=nn.init.calculate_gain('relu')) # tensor([[1.3374, 0.7932, -0.0891], # [-1.3363, -0.0206, -0.9346]]) # localhost = localhost = localhost Gain =1) nn.init.xavier_normal_(W) # tensor([[-0.1777, 0.6740, 0.1139], # [0.3018, -0.2443, ]) # 4. Surface MOUNT RECTIFIERS: Surpassing human-level performance on ImageNet classification - HeKaiming 2015 # torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu') nn.init.kaiming_uniform_(w, mode='fan_in', Nonlinearity = 'relu) # tensor ([[0.6426, 0.9582, 1.1783], # [0.0515, 0.4975, You need to tensor kaiming_normal_(tensor, a=0, mode='fan_in', Linearity ='leaky_relu') non.init. Kaiming_normal_ (W, mode='fan_out', nonlinearity='relu') # tensor([[0.2530, -0.4382, ], # [0.0544, 1.6392, -2.0752]]) # 10. Orthogonal matrices - (semi) Orthogonal matrix # From Exact Solutions to the Nonlinear Dynamics of Learning in deep Linear Neural networks Orthogonal_ (w) # tensor([[0.5786, -0.5642, # [-0.7517, -0.0886, -0.6536]]) # 11. Non-zero elements with normal distribution N(0, # from-deep learning via Hessian- Free optimization - Martens 2010 # Torch.nn.init.sparse_(tensor, tensor, Sparsity, STD =0.01) nn.init.sparse_(w, sparsity=0.1) # tensor(e-03 * # [[-0.3382, 1.9501, -1.7761], # [0.0000, 0.0000, 0.0000]])Copy the code

If it is a built-in layer parameter in PyTorch, we can initialize it as follows:

for m in model.modules():
    if isinstance(m, (nn.Conv2d, nn.Linear)):
        nn.init.xavier_uniform_(m.weight)
Copy the code

Conv2d and Nn. Linear, then get its weight parameter m. eight for xavier_UNIFORM initialization. Similarly, m.bias can be used to get bias items. Here is the code for the pyTorch version of the residual network parameter initialization:

for m in self.modules():
    if isinstance(m, nn.Conv2d):
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
        nn.init.constant_(m.weight, 1)
        nn.init.constant_(m.bias, 0)
Copy the code

This code block is used in __ini__, where self refers to the current model.

Reference:

Blog.csdn.net/ys1305/arti…

2, tensorflow

(1) User-defined parameter initialization

Create a 2 by 3 matrix with all elements having a value of 0. (type tF.float)

A = tf.zeros([2,3], dtype = tf.float32)Copy the code

Create a 3 by 4 matrix with all elements having a value of 1.

B = tf. 'ones ([3, 4])Copy the code

Create a 1 by 10 matrix and fill it with 2. (Type tf.int32, negligible)

C = tf.constant(2, dtype=tf.int32, shape=[1,10])Copy the code

Create a 1 by 10 matrix where the elements are normally distributed, with an average of 20 and a standard deviation of 3.

D = tf.random_normal([1,10],mean = 20, stddev = 3)Copy the code

All of the above values can be used to initialize variables. For example, populate a 1*2 matrix with 0.01 to initialize a variable called bias.

Bias = tf.variable (tf.zeros([1,2]) + 0.01)Copy the code

(2) Who uses type __initializer() for initialization

Initialize constant

import tensorflow as tf value = [0, 1, 2, 3, 4, 5, 6, 7] init = tf.constant_initializer(value) with tf.Session() as sess: x = tf.get_variable('x', shape=[8], initializer=init) x.initializer.run() print(x.eval()) #output: #[0. 1. 2. 3.Copy the code

The tf.zerOS_Initializer () and tf.ones_Initializer () classes are used to initialize the tensor objects with all zeros and all ones, respectively.

import tensorflow as tf init_zeros=tf.zeros_initializer() init_ones = tf.ones_initializer with tf.Session() as sess: x = tf.get_variable('x', shape=[8], initializer=init_zeros) y = tf.get_variable('y', shape=[8], initializer=init_ones) x.initializer.run() y.initializer.run() print(x.eval()) print(y.eval()) #output: # [0. 0. 0. 0 0, 0, 0, 0.] # [1. 1. 1. 1. 1. 1. 1. 1.]Copy the code

Initialize to a normal distribution

The initialization parameter of normal distribution is most widely used in neural networks and can be initialized into standard normal distribution and truncated normal distribution.

The tf.random_normal_Initializer () class is used in TF to generate a set of tensor that matches the normal distribution.

The tF.Truncated_normal_Initializer () class is used in TF to generate a set of tensor that matches truncated positive distribution.

  • Mean: indicates the mean value of the positive square distribution. The default value is 0
  • Stddev: Standard deviation of positive distribution, default value 1
  • Seed: Random number seed, specifying the value of the seed to generate the same data each time
  • Dtype: indicates the data type
Import tensorflow as tf init_random = tf.random_normal_initializer(mean=0.0, stddev=1.0, seed=None, Dtype =tf.float32) init_truncated = tf.truncated_NORMal_initializer (mean=0.0, STddev =1.0, seed=None, dtype=tf.float32) with tf.Session() as sess: x = tf.get_variable('x', shape=[10], initializer=init_random) y = tf.get_variable('y', shape=[10], initializer=init_truncated) x.initializer.run() y.initializer.run() print(x.eval()) print(y.eval()) #output: # [-0.40236568-0.35864913-0.94253045-0.40153521 0.1552504 1.16989613 # 0.43091929-0.31410623 0.70080078-0.9620409] # [0.18356581-0.06860946-0.55245203 1.08850253-1.13627422-0.1006074 # 0.65564936 0.03948414 0.86558545-0.4964745 ]Copy the code

Initialize to uniform distribution

Tf.random_uniform_initializer class is used in TF to generate a set of evenly distributed tensor.

  • Minval: minimum value
  • Maxval: indicates the maximum value
  • Seed: random number seed
  • Dtype: indicates the data type
import tensorflow as tf init_uniform = tf.random_uniform_initializer(minval=0, maxval=10, seed=None, dtype=tf.float32) with tf.Session() as sess: x = tf.get_variable('x', shape=[10], initializer=init_uniform) x.initializer.run() print(x.eval()) # output: # [6.93343639 9.41196823 5.54009819 1.38017178 1.78720832 5.38881063 # 3.39674473 8.12443542 0.62157512 8.36026382]Copy the code

Others:

Orthogonal_initializer () is a random number initialized as an orthogonal matrix, the shape of which needs to be at least two-dimensional

Tf.glorot_uniform_initializer () is initialized to a uniformly distributed random number related to the number of input and output nodes

Tf.glorot_normal_initializer () is initialized as a truncated normal distributed random number related to the number of input and output nodes

In use:

with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
Copy the code

Use the above method to initialize the parameters.

Get_variable = get_variable; get_variable = get_variable; get_variable = get_variable

Specific differences can be reference: blog.csdn.net/kevindree/a…

Reference:

Blog.csdn.net/dcrmg/artic…