Translator: GeneZC

torch.nn.init.calculate_gain(nonlinearity, param=None)Copy the code

Returns the recommended gain value for a given nonlinear function. The corresponding relationship is shown in the following table:

Nonlinear function gain
Linear / Identity
D Conv {1, 2, 3}
Sigmoid
Tanh
ReLU
Leaky Relu

Parameters:

  • nonlinearity— Nonlinear function (nn.functionalThe name of the
  • Param – Optional argument corresponding to nonlinear function

example

>>> gain = nn.init.calculate_gain('leaky_relu')
Copy the code
torch.nn.init.uniform_(tensor, a=0, b=1)Copy the code

With uniform distributionInitialize inputTensor.

Parameters:

  • tensor– ntorch.Tensor
  • A – the lower bound of uniform distribution
  • B – upper bound on uniform distribution

example

>>> w = torch.empty(3, 5)
>>> nn.init.uniform_(w)
Copy the code
torch.nn.init.normal_(tensor, mean=0, std=1)Copy the code

Let’s use a normal distributionInitialize inputTensor.

Parameters:

  • tensor– ntorch.Tensor
  • Mean — The mean of the normal distribution
  • STD – Standard deviation of the normal distribution

example

>>> w = torch.empty(3, 5)
>>> nn.init.normal_(w)
Copy the code
torch.nn.init.constant_(tensor, val)Copy the code

With the constantInitialize inputTensor.

Parameters:

  • tensor– ntorch.Tensor
  • Val – the constant used to fill in the tensor

example

> > > w = torch. Empty (3, 5) > > > nn. The init. Constant_ (w, 0.3)Copy the code
torch.nn.init.eye_(tensor)Copy the code

Initialize 2-dimensional input Tensor with the identity matrix. Keep the input tensor as unique as the input Linear, and as much as possible.

Parameters:

  • tensor– 2 dtorch.Tensor

example

>>> w = torch.empty(3, 5)
>>> nn.init.eye_(w)
Copy the code
torch.nn.init.dirac_(tensor)Copy the code

Initialize {3, 4, 5} dimensions with Dirac delta function input Tensor. Keep the uniqueness of input tensor input Convolutional, and the more channels the better.

Parameters:

  • tensor– {3, 4, 5} dimensiontorch.Tensor

example

>>> w = torch.empty(3, 16, 5, 5)
>>> nn.init.dirac_(w)
Copy the code
torch.nn.init.xavier_uniform_(tensor, gain=1)Copy the code

Understanding the difficulty of training deep feedforward Neural networks – Glorot, X. & Bengio, Y. (2010)Tensor. The values in the initialized tensor are sampled from

Also known as Glorot initialization.

Parameters:

  • tensor– ntorch.Tensor
  • Gain – Optional scaling factor

example

>>> w = torch.empty(3, 5)
>>> nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu'))
Copy the code
torch.nn.init.xavier_normal_(tensor, gain=1)Copy the code

Understanding the difficulty of training deep feedforward Neural networks – Glorot, X. & Bengio, Y. (2010)Tensor. The values in the initialized tensor are sampled from

Also known as Glorot Initialization.

Parameters:

  • tensor– ntorch.Tensor
  • Gain – Optional scaling factor

example

>>> w = torch.empty(3, 5)
>>> nn.init.xavier_normal_(w)
Copy the code
torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')Copy the code

Using the paper “Delving Deep into RECTIFIERS: Well distributed initial input, Surpassing human-level performance on ImageNet classification “- He, K. et al. (2015)Tensor. The values in the initialized tensor are sampled from

Also known as He Initialization.

Parameters:

  • tensor– ntorch.Tensor
  • A — Negative slope in the rectifying function of the layer behind it (default 0, Relu)
  • Mode – ‘fan_in’ (default) or ‘fan_out’. Fan_in is used to keep the variance of weights constant in forward propagation. Fan_out is used to keep the variance of weights constant in back propagation.
  • nonlinearity— Nonlinear function (nn.functional), and only relu or leaky_relu (default) are recommended.

example

>>> w = torch.empty(3, 5)
>>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')
Copy the code
torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')Copy the code

Using the paper “Delving Deep into RECTIFIERS: Surpassing human-level performance on ImageNet classification “- He, K. et al. (2015), normal distribution initial inputTensor. Sample the values in the initialized tensor

Also known as He Initialization.

Parameters:

  • tensor– ntorch.Tensor
  • A — Negative slope in the rectifying function of the layer behind it (default 0, Relu)
  • Mode – ‘fan_in’ (default) or ‘fan_out’. Fan_in is used to keep the variance of weights constant in forward propagation. Fan_out is used to keep the variance of weights constant in back propagation.
  • nonlinearity— Nonlinear function (nn.functional), and only relu or leaky_relu (default) are recommended.

example

>>> w = torch.empty(3, 5)
>>> nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')
Copy the code
torch.nn.init.orthogonal_(tensor, gain=1)Copy the code

Using Exact Solutions to The Nonlinear Dynamics of Learning in Deep Linear Neural Networks – Saxe, The (half) positive definite matrix described in A. et al. (2013) initializes the input Tensor. The input tensor must have at least two dimensions. If the dimension of the input tensor is greater than two, the subsequent dimensions are flattened.

Parameters:

  • tensor– ntorch.TensorAnd,
  • Gain – Optional scaling factor

example

>>> w = torch.empty(3, 5)
>>> nn.init.orthogonal_(w)
Copy the code
The torch. Nn. The init. Sparse_ (tensor, sparsity, STD = 0.01)Copy the code

With the paper “Deep learning via Hessian-free Optimization” – Martens, J. (2010). The sparse matrix mentioned initializes the 2-d inputTensorAnd the normal distribution is usedInitialize a non-zero element.

Parameters:

  • tensor– ntorch.Tensor
  • Sparsity – The proportion of elements set to zero for each row
  • STD – Standard deviation of the normal distribution is used when initializing non-zero elements

example

> > > w = torch. Empty (3, 5) > > > nn. The init. Sparse_ (w, sparsity = 0.1)Copy the code