Deep learning AI Framework and Fundamentals of Mathematics

The math required:

Math book

With a basic knowledge of Probability/Statistics, Linear Algebra, and Calculus, you’ll be ready to learn the algorithms and practices of deep learning. But after a period of engineering practice, I gradually realized that most of my time was spent on model selection, overparameter tuning, or arrangement and combination of network structure. The black-box nature of deep learning is becoming more and more obvious. Are deep learning engineers really data alchemists? Machine learning is based on mathematical theory. To learn machine learning well, we must be prepared for hard work and insist on the pursuit of mathematical knowledge. Machine learning requires at least a basic knowledge of calculus, linear algebra, probability theory, statistics, and optimization theory.

“Advanced Mathematics tongji 7th edition” two volumes

Linear Algebra Tongji Sixth Edition

Probability Theory and Mathematical Statistics zhejiang University 4th Edition

Optimization Theory and Methods, Yuan Yaxiang, Sun Wenyu, science Press

Optimization Theory and Algorithms (2nd Ed.), Baolin Chen, Tsinghua University Press

Optimization Method and its MATLAB Implementation, Xu Guogen, Beijing University of Aeronautics and Astronautics Press, supporting resources download

Neural Networks and Deep Learning, Qiu Xipeng, Fudan University, github.com/nndl/nndl.g…

List of essential advanced mathematics knowledge points for artificial intelligence

Advanced mathematical knowledge required by AI technical posts can be roughly divided into four areas: calculus, probability and statistics, linear algebra, and optimization theory.

Each subfield should be at least one book (and possibly a stack of books). Here we temporarily extract the most basic parts related to machine learning and deep learning, and give you a focus:

3. Calculus The basic concepts (limits, differentiable and differentiable, total and partial derivatives) that must be understood as soon as you learn calculus, otherwise you will not be able to continue learning anything.

Derivative of function: Derivative is the basis of gradient, and gradient is the basis of AI algorithm, so derivative is very important! You have to understand the concept, and you have to learn the derivative of a common function.

Chain rule: in accordance with the function derivative rule, the theoretical basis of back propagation algorithm.

Taylor’s formula and Fermat’s lemma: These are also fundamental components of gradient descent and are as important as derivatives.

Differential equations and their solution: very important, is part of the machine learning model solution necessary knowledge.

Lagrange multipliers and duality learning: Theoretical foundations for understanding SVM/SVR. SVM/SVR is a common “backbone” of machine learning models, and its importance is self-evident.

Probability statistics Simple statistics (number, maximum, minimum, median, mean, variance) and their physical significance: the conceptual basis of probability statistics.

Randomness and sampling: randomness — the basis of probability statistics; Sampling — statistical method.

Frequency and probability, and the basic concept of probability: to understand what probability is, the difference and connection between it and frequency.

Several common probability distributions and formulas (mean distribution, binomial distribution, normal distribution…)

Parameter estimation: only know the rough distribution, do not know the specific parameters how to do? It doesn’t matter. We can make an estimate. The most important of these is maximum likelihood estimation.

Central limit Theorem: What if you don’t know the probability distribution of something? It doesn’t matter. Let’s just call it a normal distribution. But why is it so close? Because we have the central limit theorem.

Hypothesis testing: Are we right? Let’s verify this with samples.

Bayes’ formula: Too important! It allows us to predict posterior probabilities based on prior probabilities. And the naive Bayes formula is itself the naive Bayes model.

Regression analysis: Think of all the models that have “regression” in their names!

State transition networks: Probability chains, Hidden Markov models, and conditional random fields.

3. Linear algebra Vectors and scalars. What is the difference between vectors and scalars in terms of the characteristics of things?

Vector space, Vector properties and Geometric Meaning of vector: What do we mean by high and low dimensions? Can the same vector exist in different vector Spaces? How do vectors move, turn, and stretch?

Linear function: what is a linear function and what properties does it have?

Matrices and matrix operations: What is the purpose of matrices? Master basic operations of matrices (addition and multiplication with constants/vectors/matrices).

Special matrices (square matrices, real symmetric matrices, (semi) positive definite/negative definite matrices, etc.) and their properties: According to different properties, which special matrices can we classify, and what special properties do they have?

Eigenvalues and eigenvectors: definition, properties, and eigenvalue solution.

Solve differential equations with matrices.

Orthogonal: What is orthogonal? How to formalize the orthogonality of function, vector and hyperplane, and what kind of physical significance.

Convex function and extreme value: understand what is convex function, the relationship between convex function and extreme value, the relationship between extreme value and maximum value, etc.

Note: there are some differences in the definition of “convex” in different textbooks in China. Some books call the “convex function” in other books “concave function”.

Intuitively, the convex function we always talk about is the one that looks like a U in one dimension and a bowl in two.

Optimization: What is an optimization problem? What is optimization? What are the basic principles of optimization under unrestricted and restricted conditions?

Gradient descent method: the most basic and most commonly used optimization methods, as well as the basis of several other optimization methods, must be fully mastered.

Other optimization algorithms: Understand other common optimization methods, such as Newton method, conjugate gradient method, linear search algorithm, simulated annealing algorithm, genetic algorithm, etc.

Fundamentals of AI Mathematics

First, advanced mathematics: derivative: first derivative: in the actual calculation, in the discrete case, two points participate in the difference calculation, divided into the derivation of the front term and the derivation of the back term;

Second derivative: in the actual calculation, in the discrete case, three or four points participate in the difference calculation, which is divided into intermediate difference, anterior difference and posterior difference;

Gradient: The concept of gradient is based on the concepts of partial and directional derivatives. Partial derivative: For a function of several variables, select one independent variable and leave the others unchanged, and only examine the relationship between the dependent variable and the selected variable.

Taylor progression (Taylor formula) : Taylor progression is the basis for some practical explanations of machine learning and deep learning algorithms. Taylor’s formula, used in mathematics and physics, is a formula that uses information about a function at a point to describe its nearby value. If the function is smooth enough, given the values of the derivatives of the function at a certain point, Taylor’s formula can use those values as coefficients to construct a polynomial that approximates the value of the function in the neighborhood of that point. Taylor’s formula also gives the deviation between the polynomial and the actual function value.

Convex optimization: optimization problem: the function has a maximum and minimum value

Unconstrained optimization: Without constraints, the function itself is convex and there is a minimum in the whole space. The derivative method is used to process.

Constraint optimization: Equality constraint (Lagrange multiplier method), inequality constraint (KKT condition & Lagrange multiplier method)

Linear algebra (matrix calculation) : Scalar: tensors of order 0, having only magnitude but no direction

Vector: tensor of order 1, having magnitude and direction

Matrix:

Matrix multiplication [dot product, cross product, convolution (this matrix operation is mainly applicable for processing in convolutional neural network)]

Matrix transpose

Special matrix [diagonal matrix, symmetric matrix, orthogonal matrix, positive definite matrix]

Take the derivative of a matrix

Eigenvalue decomposition, singular value decomposition (rotation, stretching, mapping)

Trace of a matrix [sum of elements on the main diagonal of a matrix]

Tensors: The learning of tensors involves tensors related transformation operations in deep learning

Iii. Probability and Statistics: Random variables and their distribution:

In a randomized experiment, events that may or may not occur occur with some regularity in the repeated building experiments are called random events.

Random events:

Random variable: continuous random variable, discrete random variable

Distribution law: The probability of a random event reaching a certain possible value

Probability distribution: Bernoulli distribution, binomial distribution, Poisson distribution, normal distribution

Numerical characteristics of random variables:

Expectation (mean)

Variance: Large variance indicates that the data has great volatility, unstable distribution, large amount of information, and great differentiation of samples. In practical application, the main application direction includes two aspects, consciousness feature rotation, the other is model evaluation. If the variance of the model output is large, it indicates that the gap with the real output is relatively small (the deviation is small), indicating that the model may be over-fitting.

Covariance: Covariance, correlation coefficient and covariance matrix are mainly used to measure the correlation between attributes. The correlation coefficient is in the range of [-1,1], which can be used as the basis for feature selection in feature processing.

The correlation coefficient

Covariance matrix

chi-square

Big Data Law and Central Limit Theorem:

Law of large Numbers

Central limit theorem

Parameter estimation and hypothesis testing:

Parameter estimation, hypothesis testing, matrix estimation, least square method, maximum likelihood estimation

Information theory: Information entropy: the uncertainty of event occurrence

Information gain: The change in information entropy

GINI index: the purity of a sample

Mainstream Development frameworks

TensorFlow and PyTorch are still “industry versus academia”. But with PyTorch on the march, that may soon change. Tensorflow isn’t exactly machine-learning friendly, and Tensorflow has its own high-level apis, such as Estimator and TF. Data, that are relatively easy to use. Keras was originally a stand-alone machine learning framework, but has gradually been integrated into Tensorflow.

TensorFlow:

Tensorflow. Google. Cn/website

www.tensorfly.cn/ Chinese Community

Github.com/tensorflow/…

PyTorch

pytorch.org/

Github.com/pytorch/pyt…

Pytorch -cn. Readthedocs. IO/useful/latest/Chinese documents

Caffe: Jia Yangqing

Convolutional neural network framework

caffe.berkeleyvision.org/

github.com/BVLC/caffe

Keras

Is a package TF and other deep learning framework code library, has a good ease of use. TensorFlow has introduced the Keras API to address the much-malregarded difficulty of getting started. However, in terms of TensorFlow’s positioning and functionality, the combination with Keras is not successful at this time.

keras.io/

PaddlePaddle

PaddlePaddle (PArallel Distributed Deep LEarning) is an easy-to-use, efficient, flexible and extensible Deep LEarning framework

www.paddlepaddle.org.cn/

Github.com/PaddlePaddl…

Gitee.com/paddlepaddl…

Blog.csdn.net/paddlepaddl…

Kuang apparent Brain++

Kuang apparent Face++

www.faceplusplus.com.cn/

megengine.org.cn/

Github.com/MegEngine/M… Megvii is derived from the research deep learning framework. Its Chinese name is Tianyuan — the name of the center point of the go board

Tsinghua meter figure

Jittor is led by Prof. Hu Shimin, who has long been engaged in the research of visual media intelligent processing, in the graphics laboratory of the Computer Science Department of Tsinghua University.

Github.com/Jittor/Jitt…

Hong Kong – Sensetime Open-MMLab

OpenMMLab is an open source algorithm platform of THE Chinese University of Hong Kong-Sensetime Joint Laboratory MMLab.

open-mmlab.github.io/

github.com/open-mmlab

MMCV is a basic Python library for computer vision research that supports other open source libraries under OpenMMLab.

Github.com/open-mmlab/…

The main functions are I/O, image and video processing, annotation visualization, all kinds of CNN architecture, all kinds of CUDA operators.

MMDetection is an open source target detection toolkit based on PyTorch. OpenMMLab is the most well-known open source library of OpenMMLab, almost necessary for research object detection!

Github.com/open-mmlab/…

MMDetection3D this library is an open source library dedicated to 3D object detection.

MMSegmentation is an open source semantic segmentation toolkit based on PyTorch.

MMClassification is an open source image classification toolkit based on PyTorch.

MMPose is an open source pose estimation toolkit based on PyTorch.

MMAction is a PyTorch based open source toolkit for action understanding.

MMAction2 is a PyTorch based open source toolkit for action understanding.

MMSkeleton is used for human posture estimation, skeleton-based motion recognition and motion synthesis.

MMFashion is an open source visual fashion analysis toolkit based on PyTorch.

MMEditing is an open source image and video editing toolkit based on PyTorch.

OpenPCDet is a clear, simple, self-contained open source project for LIDAR-based 3D object detection.

OpenUnReID is an open source library for studying unsupervised learning and unsupervised domain adaptation based on PyTorch.

OpenSelfSup is an unsupervised presentation learning toolkit based on PyTorch.

Tencent optimal figure

Github.com/TencentYout…

Github.com/jantic/DeOl…

Andrew Ng: Andrew Ng

Github.com/deeplearnin…

Github.com/fengdu78/Co… Haiguang Huang, PhD in Computer science, Notes on Deep Learning (Ng) 5.7

Github.com/fengdu78/de…

www.ai-start.com/

Microsoft

Github.com/microsoft/c…

jingdong

Github.com/JDAI-CV/Fac…

It has three

Core Technology and Case Study of Image Recognition in Deep Learning

Core Algorithms and Case Studies for Model Design in Deep Learning

Github.com/longpeng200…

www.zhihu.com/people/long…

Several open source applications

1. Industrial defect detection based on Tensorflow framework

Github.com/sundyCoder/…

Industrial Defect Detection based on deep learning (DEye). Defect Eye is an open source software library based on tensorflow1.4 that focuses on surface Defect detection. Application areas cover all yield applications in manufacturing environments, including access to process tool certification, wafer certification, glass surface certification, mask certification, research and development and tool, process and line monitoring. In addition, it can be used for the detection of medical images, including lung PET/CT, breast MRI, CT Colongraphy, and digital chest X-ray images.

2. Silent-face-anti-spoofing

Github.com/minivision-…

A number of blog

blog.csdn.net/qq_29462849

Blood and tears history of deep learning in automatic visual detection

Comparison of major Deep learning frameworks (TensorFlow, Keras, MXNet, PyTorch)

Several open source tutorials

Hands-on deep learning

Github.com/d2l-ai/d2l-…