preface

This series of articles is a reading note for Deep Learning. This book is an excellent reference book for Deep Learning. Therefore, this series of articles should be read together with the original book for better effect.

Linear algebra in deep learning

Basic concept of immiscibility

  • Scalar: a single number
  • Vector: number of rows/columns
  • Matrix: a two-dimensional array
  • Tensors: Generally multidimensional (0 – dimensional tensors are scalars, 1 – dimensional tensors are vectors, 2 – dimensional tensors are matrices)
  • Transpose: Fold along the main diagonal

The way to define a matrix in Numpy, and the way to transpose:

import numpy as np

a = np.array([[1, 2, 3], 
              [4, 5, 6]])
a = a.reshape(3, 2)
print(a)

[[1 2]
 [3 4]
 [5 6]]
Copy the code

Fundamental arithmetic relation

Same as matrix multiplication in advanced mathematics:

a = np.array([[1, 2],
              [3, 4]])
b = np.array([[5, 6],
              [7, 8]])

print(a * b)
print(a.dot(b))
print(np.dot(a, b))
print(np.linalg.inv(a))

# star (*)[[5 12] [21 32]]# dot
[[19 22]
 [43 50]]

# dot
[[19 22]
 [43 50]]

To reverse the #[[-2.1.] [1.5-0.5]]Copy the code

norm

A norm is a function, a function that measures length. Mathematically, norms include vector norms and matrix norms.

The vector norm

Let’s talk about the norm of vectors. A vector has a direction and a magnitude, and that magnitude is expressed in terms of norms.

Strictly speaking, a norm is any function that satisfies the following properties:

  • When p=2, the norm (, can be simplified as) is called the Euclidean norm and the distance can be calculated. But we see there is a root operation, so in order to remove the root, we might ask is the square of the norm, the norm, it will reduce a open operation, referred to in the behind of loss function, the number of norm peace Fang Fan provides the same optimization goal, therefore square norm is more commonly used, calculation is more simple, can be calculated, That’s pretty fast.
  • When p=1, norm () is the sum of the absolute values of the elements of a vector. In the field of machine learning, norm is better than norm for distinguishing 0 from non-0.
  • When p=0, the norm is not actually a norm, and most references to a norm emphasize that it’s not really a norm to say how many non-zero elements there are in this vector, but it’s actually very useful, and has applications in regularization and sparse coding in machine learning. If the user name and password are two vectors, the login succeeds; if the user name and password have an error; if the user name and password are both incorrect. We know it’s happening. It’ll be nice to know when we read about it.
  • When P is infinite, the norm is also called the infinite norm, the maximum norm. Represents the largest absolute value of an element in a vector.

Matrix norm

For the matrix norm, we’ll just talk about the Frobenius norm, which is simply the sum of squares of all the elements of the matrix, and there are other ways to define it, as follows, where the conjugate transpose, tr, is the trace; Represents the singular value:

Singular value decomposition

We are familiar with the eigendecomposition matrix:, singular decomposition is similar to:, where the row and column values of the matrix are, orthogonal matrix, diagonal matrix and orthogonal matrix, and the elements on the diagonal of the matrix are called singular values of, where the non-zero singular values are the square root of the eigenvalues of or; Is called the left singular vector of, is the eigenvector; It’s called the right singular vector of omega, the eigen vector. Inversion, not suitable for singular matrix inverse matrix and research of very good method, so consider the next best method, strives for the pseudo inverse, this is the most close to the matrix inversion, the form of a matrix into a most comfortable to study the nature of the other, the pseudo-inverse matrix is given priority to on rank in nonzero diagonal elements, other elements in the matrix are zero, This is also a common approach in statistics, and is very useful in machine learning.

define

  • Diagonal matrix: Only the main diagonal contains non-zero elements;
  • 1. A vector having a unit norm,;
  • Vector orthogonality: If both vectors are non-zero, the Angle between them is 90 degrees;
  • Orthonormal: orthogonal to each other with a norm of 1;
  • Orthogonal matrix: row and column vectors are orthonormal respectively;
  • Eigendecomposition: decompose the matrix into eigenvectors and eigenvalues;
  • Eigenvalues and eigenvectors: the sum of;
  • Positive definite, semi-positive definite, negative definite: the eigenvalues are all positive, non-negative, and negative.

conclusion

One of the great features of linear algebra is that it is a “big string”, a unified body of knowledge that is closely related to each other. It is very beautiful and has important applications in deep learning. It should be learned well.

If necessary, it’s highly recommended that you listen to the lesson again, check it out here, and have fun!

  • This article starts from the official account: RAIS, looking forward to your attention.