Mathematical knowledge that programmers need to master

Compared with developing apps, backend servers and front ends, ARTIFICIAL intelligence requires a lot of mathematical knowledge. What do you usually use? Public account: Java Architects Association, updated daily technical good articles

Calculus, linear algebra probability optimization about books, special note, unless you are a special mathematical knowledge lost much, or didn’t learned undergraduate mathematics knowledge, or you have a strong interest in mathematics, otherwise don’t recommend carrying books to learn, will waste a lot of time and energy. The mathematical knowledge we need for artificial intelligence is only a part of the book. It is enough for us to listen carefully and master the mathematical knowledge points we explain. We are applied math, not research math, and we don’t learn math to do problems.

Derivative of Calculus and derivation formula Of first order derivative and monotonicity of function Unitary function extremum determination rule of higher order derivative of second order derivative and function convexity and concavity unitary function Taylor expansion first of calculus in higher mathematics. In machine learning, calculus is mainly used in the differential part, the function is to find the extreme value of the function, which is the function of solver in many machine learning libraries. The following points from calculus are used in machine learning:

Definition and Calculation method of Derivative and partial derivative Definition of gradient vector Extremum theorem, derivative or gradient of differentiable function at extremum point must be 0 Jacobian matrix, which is the matrix of partial derivative of vector to vector mapping function. Hessian matrix will be used in derivation of derivation, which is the extension of the 2nd derivative to multivariate function. Is closely associated with the function of the extremum judgment method and the definition of convex function Taylor expansion formula of Lagrange multiplier method, is used for solving extreme value problem with equality constraints is one of the most core remember Taylor expansion formula of multivariate function, according to which we can derive the gradient descent method is commonly used in machine learning, Newton’s method, quasi-newton method and a series of optimization methods, Taylor’s formula:

\

Calculus and linear algebra, calculus is going to use a lot of linear algebra, and linear algebra is going to use calculus, and they go hand in hand.

Bibliography:

Linear algebra vectors and their operations

Matrix and its operation

tensor

The determinant

quadratic

Eigenvalues and eigenvectors

In contrast, linear algebra uses more. It is used almost everywhere in machine learning.

Vector and its various operations including addition, subtraction, number multiplication, transpose, inner product vector and matrix norm, L1 norm and L2 norm matrix and its various operations including addition, subtraction, multiplication, Number on the definition and properties of inverse matrix determinant the definition and calculation method of the definition of quadratic matrix are qualitative matrix singular value decomposition (SVD) of matrix eigenvalue and eigenvector the numerical solution of linear equations, especially conjugate gradient is often in the matrix and vector formula of the machine learning algorithm processing of data are generally vector, matrix or tensor. Classical machine learning algorithms input data as feature vectors, while deep learning algorithms input 2-dimensional matrices or 3-dimensional tensors when processing images. This knowledge will give you an edge.

Bibliography:

Probability of random events and probability and conditional probability and bayes formula the expectation and variance of random variable random variables commonly used probability distributions (fall, uniform distribution, Bernoulli binomial distribution) random vector (probability density function, etc.) of covariance and the covariance matrix of maximum likelihood estimate if the sample data processing by machine learning as a random variable/vector, We can model the problem from a probabilistic point of view, which represents a large class of approaches to machine learning. Probability theory for machine learning:

Random variables and probability distributions, especially the probability density function and distribution function of continuous random variables conditional probability and probability distributions commonly used by Bayes formula, including normal distribution, Bernoulli binomial distribution, mean and variance of uniformly distributed random variables, Independent Maximum likelihood Estimation of Covariance Random Variables bibliography:

Optimization, finally, is optimization, because almost all machine learning algorithms are ultimately about optimization. The guiding idea for solving optimization problems is that the derivative/gradient of the function must be 0 at extremum points. So you have to understand that gradient descent, Newton’s method, two commonly used algorithms, their iterative formulas can be derived from Taylor’s expansion formula. It would be better to know the method of coordinate descent and quasi – Newton.

Convex optimization is a concept often mentioned in machine learning. It is a special optimization problem whose feasible region of optimization variables is convex set and objective function is convex function. The best property of convex optimization is that all of its local optima are global optima, so you don’t get trapped in local optima. If a problem is proved to be convex optimization, the problem is essentially declared solved. In machine learning, linear regression, ridge regression, support vector machine, logistic regression and many other algorithms are solving convex optimization problems.

The Lagrangian multiplier method constructs optimization problems with constraints (equality and inequality) as Lagrangian functions. Through this step transformation, the problem with constraints is transformed into the problem without constraints. By changing the optimization order of the original optimization variable and the Lagrange multiplier, it is further transformed into a dual problem. If certain conditions are satisfied, the original problem and the dual problem are equivalent. The significance of this method is that a difficult problem can be converted into a more easily solved problem. There are applications of Lagrangian duality in support vector machines.

KKT condition is an extension of Lagrange multiplier method to problems with inequality constraints. It gives the conditions that optimization problems with equality and inequality constraints must satisfy at extremum points. It is also used in support vector machines. Don’t worry if you haven’t taken optimization methods, they’re easy to derive from the basics of calculus and linear algebra. If you need to learn this knowledge systematically, you can read the classic textbook convex optimization.

Bibliography:

The most frequent optimization methods, Lagrange multiplier method, gradient descent method, Newton method, convex optimization. Next is probability theory knowledge, random variables, Bayes formula, random variable independence, positive distribution, maximum likelihood estimation. The third is linear algebra knowledge. Almost all of them involve the calculation of vectors, matrices and tensors, including eigenvalues and eigenvectors. Many algorithms will eventually become the problem of solving eigenvalues and eigenvectors.

Calculus knowledge like the chain rule. In addition to the subject of these mathematical knowledge, differential geometry will be used in the manifold, geodesic, geodesic distance concepts. Support vector opportunities use Mercer conditions, kernel functions, and involve functional analysis and identification functions. Another example is the proof of artificial neural network. The universal approximation theorem will use functional analysis and recognition function content to prove that such a function can approach any form of function. Discrete mathematical knowledge such as graph theory and tree will also be used in machine learning, but it is relatively simple.

So if we can master calculus, linear algebra, probability theory, and some optimization algorithms, we’ll be able to understand all machine learning algorithms. Like some relatively advanced differential geometry, functional analysis and identification function mentioned just now, they are mainly used in some basic theoretical proof, to prove the rationality of some algorithms, even if you do not understand these proofs, it does not affect your understanding of the derivation, idea and use of these algorithms.