About the author: Bai Shuotian, former Didi algorithm expert. This article is excerpted from The education column of Lugo
Introduction to Machine Learning Lecture 21″

Hello, I am Bai Shuotian, today we are going to learn “linear algebra”. In this class, we mainly learned linear algebra knowledge related to machine learning, including vector and matrix multiplication, norm, derivation and other basic operations, and its application in machine learning.

Linear algebra is a branch of mathematics. When you were in college, you must have studied this course and maybe even stayed up all night trying to pass the exam. From my experience, linear algebra is not an easy course, but it is easier than advanced mathematics. From a machine learning perspective, linear algebra is a must, but not a must. To keep linear algebra from getting in the way of machine learning, you need to know the basics of vectors and matrices. It is worth mentioning that linear algebra is a paper tiger when you master the logic and routines of linear algebra.

This text is selected from: Introduction to Machine Learning lecture 21 by Hook Education

Since it’s called linear algebra, it must be a prerequisite to linear models. The most basic object of linear algebra is vectors, and the vectors of vectors form matrices. With vectors or matrices, you can represent a lot of numbers in a vector notation, or even a lot of higher-dimensional data in a matrix. So, you can think of linear algebra as the foundation for dealing with big data.

Basic operations on vectors

Let’s first learn about vectors. Here you only need to learn about machine learning. We were introduced to vectors in high school, and the basic operations didn’t bother us. A vector is a directional quantity represented by a bold italic lowercase letter or an arrow pointing to the right over an italic lowercase letter. Remember, when we calculated the gradient of the function last time, the gradient is also a vector, and it’s also a directional quantity, which is the direction in which the value of the function changes the fastest.

Dot product

In addition to normal addition, another important operation of vectors is the dot product. Dot two vectors of the same dimension, you get a constant. The dot product is the sum of the products of the corresponding terms of the two vectors. For example, the dot product of the vectors [1,1] and [-1,2] is 1*(-1)+1*2, which equals 1.

The basic operations of matrices

And then let’s look at the matrix. You probably didn’t get into matrices until you went to college. Matrices can be thought of figuratively as vectors of vectors, usually in bold capital letters. In machine learning, a large data set is usually represented by a matrix. Each row is a different dimension of a sample, and the column directions are each data sample in the set. For example, the test scores of 3 students in 3 courses are shown in the picture.

Using a matrix to represent each person’s grade in each course, is

In particular, when there is only one piece of data in the dataset, the matrix is reduced to a vector. So, you can also think of a vector as a special matrix.

transpose

The matrix has a very important operation, which is the transpose. In the upper right corner of the matrix, it’s represented by a capital T. It will make the columns and columns of the matrix interchangeable, so that the m x N matrix becomes a new matrix of n x m. Such as the original

The transpose becomes 1, 2, 3

Transpose has very important mathematical significance in machine learning. When you’re learning machine learning, you’re also likely to get confused by all the transposes. This is because in machine learning, there are default rules for vectors and matrices. In machine learning, all vectors are column vectors by default. When you have to represent row vectors, you have to transpose them. Take the results of the first three students for example. Everybody’s grade in 3 courses is a vector. According to the column vector criterion, there are

The total score is a matrix, it’s a vector of vectors, so it also has to satisfy the column vector criterion. So there is

Now you need to use the transpose symbol to change the column to a row to represent each person’s grade. Again, in machine learning, all vectors are column vectors by default.

The multiplication

Matrix multiplication is used frequently in machine learning. Let’s do an example of how to compute matrix multiplication. If you have

the

It’s important to note here that matrix multiplication has a strict dimension requirement. The number of columns in the first matrix must be equal to the number of rows in the second matrix. Matrices whose dimensions do not match cannot be multiplied.

Hadamard product

In addition to multiplication, the basic operation of matrices is the Hadamard product. It requires that the dimensions of the two matrices be exactly the same, calculated by multiplying the corresponding elements. For example,

The result of its Hadamar product is,

inverse

For matrices, you also need to know how to invert. The inverse can only be applied to a square matrix where the number of rows is equal to the number of columns, denoted by -1 in the upper right corner, to obtain the inverse of a matrix. The inverse matrix satisfies the property that when multiplied by the original matrix, you get the identity matrix, the square matrix with the principal diagonal element being 1 and the other elements being 0. For example,

You don’t need to know how to compute the inverse by hand, you can do it with a Python package or Matlab.

This text is selected from: Introduction to Machine Learning lecture 21 by Hook Education

Linear algebra and machine learning

norm

So far the knowledge points, I believe that you are still difficult. And then it’s going to be a little bit more challenging. Let’s look at the norm first. Norm can be calculated for matrix and vector, which is an important knowledge in functional analysis. But from a machine learning perspective, we don’t need to master that much. Here, we just need to learn the L1 norm and the L2 norm of vectors; For more complicated things, if you’re interested, check out math books.

The L1 norm of a vector is calculated as the sum of the absolute values of each element. The L2 norm of a vector is computed as the square root of the sum of the squares of each element. For example,

the

You can also write it as theta

derivative

Derivation of matrix and vector may be the only difficulty in this class. The knowledge point here, you need to understand the basis of the independent derivation process. In machine learning, the derivation of matrices will be used less, and this knowledge point is not necessary to master. And the derivative of a with respect to b, you have to know. The reason is that the unknown variables of machine learning are usually the coefficients or sets of coefficients of the model, while the labels of learning are vectors of true values. This coefficient set and the true value of the label usually exist in the form of a vector, such as linear regression, logistic regression and other models. Other more complicated derivatives, you can look them up if you’re interested.

Now, let’s take the derivative of the vector with respect to the vector 1, 2. The derivative of y with respect to w is the matrix of the derivative of each member of y with respect to each member of w. If the dimension of vector w is n x 1, and the dimension of vector y is m x 1, then the matrix dimension after taking the derivative is n x m. In particular, when m is equal to 1, the vector y is a constant, and this definition is also true.

Now that we know the definition of derivatives, we can use it to solve for derivatives. Here we analyze the content that will be used in the subsequent machine learning modeling. So let’s look at the matrix vector product, if

the

if

the

These two results can be obtained by simple derivation. I want you to remember the derivation here.

This text is selected from: Introduction to Machine Learning lecture 21 by Hook Education

case

Let’s do a more complicated example. This example is also a prerequisite for linear regression models in machine learning. if

If X is a symmetric matrix, take the derivative of y with respect to w. Because it’s a symmetric matrix, xij is equal to xji. The result of this calculation, y, is a 1 x 1 vector. After expanding according to the multiplication calculation, get

By the definition of taking a derivative, the derivative is going to be n x 1 vectors, where each of these terms are

If we calculate it separately, then we have

So once we have that, we can put it back into the definition, and we can get the derivative.

The derivation process is more complicated and involves more algebraic operations. If you’re interested, you can take out a pen and paper and do it yourself, and see if it’s the same as mine. You can also memorize the formula and derivative definition for future use.

In this class, our review of linear algebra is just the tip of the iceberg. But this knowledge is enough for us to break through machine learning. We started by reviewing the basic operations on matrices and vectors, just to prepare for the norm and derivatives. Once we know the norm and the rules for taking derivatives, we have the ability to break through linear algebra in machine learning.

Among them, norm knowledge is an important means to overcome model overfitting in machine learning. Whether L1 or L2, norm can be used as penalty term of loss function, also known as regular term, to guide the learning and training of model. The larger the norm value is, the overall absolute value of model parameters is too high, and the complexity of the model is too high. Naturally, the risk of overfitting is high. I’ll have a whole class later on on over-fitting. In addition, linear algebra works mostly on linear models. Linear regression is the entry level algorithm of regression. The optimization process of linear regression model needs a lot of vector derivative calculation.

Without this basic knowledge, you may be stuck with a very simple entry level algorithm, which is not worth it. But now you’ve broken through all the essentials. The rest of linear algebra, if you are interested, can be studied in a specialized, systematic way for some time. If you’re a bit resistant to math, mastering this is enough to get you into machine learning.

Ok, that’s all for this time. Next time, we will learn about “probability theory”. Be sure to come to class on time.

Check out the following content: Introduction to Machine Learning lecture 21

Copyright notice: The copyright of this article belongs to Pull hook education and the columnist. Any media, website or individual shall not be reproduced, linked, reposted or otherwise copied and published/published without the authorization of this agreement, the offender shall be corrected.