Original address:www.ctocio.com/hotnews/159…

Happened to see an article, this article is very clear, so reprint it, supplement their knowledge base, the following is the body

Machine learning is undoubtedly a hot topic in the field of data analysis. Many of us use machine learning algorithms at some point in our daily work. Here IT manager summarizes common machine learning algorithms for your reference in work and study.

There are many algorithms for machine learning. A lot of the confusion people have is, a lot of algorithms are kind of algorithms, and some algorithms are extensions of other algorithms. Here, we introduce to you from two aspects, the first aspect is the way of learning, the second aspect is the similarity of algorithm.

Learning style

Depending on the data type, there are different ways to model a problem. In machine learning or artificial intelligence, people first consider how algorithms learn. In machine learning, there are several main ways to learn. It is a good idea to classify algorithms according to the learning mode, so that people can consider selecting the most appropriate algorithm according to the input data to obtain the best results when modeling and algorithm selection.

Supervised learning:

 

Under supervised learning, input data is called “training data”, and each group of training data has a clear identification or result, such as “spam” and “non-spam” in anti-spam system, and “1”, “2”, “3”, “4” in handwritten digit recognition. When building a prediction model, supervised learning establishes a learning process, compares the prediction results with the actual results of the “training data”, and constantly adjusts the prediction model until the prediction results of the model reach a desired accuracy. The common application scenarios of supervised learning are classification problems and regression problems. Common algorithms include Logistic Regression and Back Propagation Neural Network.

Unsupervised learning:

In unsupervised learning, data is not specifically identified and the learning model is designed to infer some internal structure of the data. Common application scenarios include learning association rules and clustering. Common algorithms include Apriori algorithm and K-means algorithm.

Semi-supervised learning:

In this learning mode, input data are partially identified and partially not identified. This learning model can be used for prediction, but the model first needs to learn the internal structure of the data in order to reasonably organize the data for prediction. Application scenarios include classification and regression, and algorithms include some extensions of commonly used supervised learning algorithms. These algorithms first attempt to model unlabeled data, and then make predictions for labeled data. Graph Inference or Laplacian SVM.

 

Reinforcement learning:

In this mode of learning, input data serves as feedback to the model, unlike in a supervised model, where input data is merely a way to check whether the model is right or wrong. In reinforcement learning, input data is fed directly back to the model, which must be adjusted immediately. Common application scenarios include dynamic systems and robot control. Common algorithms include Q-learning and Temporal difference Learning.

 

In the scenario of enterprise data application, supervised learning and unsupervised learning are probably the most commonly used models. In the field of image recognition, semi-supervised learning is a hot topic due to the existence of a large amount of non-identifiable data and a small amount of identifiable data. Reinforcement learning is more widely used in robot control and other systems control fields.

 

Algorithmic similarity

 

Algorithms can be classified according to their similarity in function and form, such as tree-based algorithms, neural network algorithms and so on. Of course, the scope of machine learning is so vast that some algorithms are hard to categorize. For some categories, the algorithm of the same category can be used for different types of problems. Here, we try to categorize commonly used algorithms in the easiest way to understand them.

Regression algorithm:

Regression algorithm is a kind of algorithm that tries to explore the relationship between variables by measuring the error. Regression algorithms are a powerful tool in statistical machine learning. In machine learning, when people talk about regression, they sometimes refer to a class of problems and sometimes to a class of algorithms, which often confuses beginners. Common regression algorithms include: Ordinary Least Square, Logistic Regression, Stepwise Regression, Smoothing by Multivariate Adaptive Regression Splines and Locally Estimated Scatterplot Smoothing

Instance-based algorithms

 

Instance-based algorithms are often used to model decision problems by taking a set of sample data and comparing the new data with the sample data based on some approximation. This is the way to find the best match. For this reason, instance-based algorithms are often referred to as winner-takes-all learning or memory-based learning. Common algorithms include K-nearest Neighbor(KNN), Learning Vector Quantization (LVQ), and self-organizing Map (SOM).

 

Regularization method

 

Regularization is an extension of other algorithms (usually regression algorithms) that adjust the algorithm according to its complexity. Regularization methods generally reward simple models and penalize complex algorithms. Common algorithms include Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), and Elastic Net.

 

Decision tree learning

Decision tree algorithm uses tree structure to build decision model according to data attributes. Decision tree model is often used to solve classification and regression problems. Common algorithms include: Classification And Regression Tree (CART), Iterative Dichotomiser 3 (ID3), C4.5 Chi-squared Automatic Interaction Detection(CHAID), Decision Stump, Random Forest, Multiple-adaptive Regression Spline (MARS) and Gradient Boosting Machine (GBM)

 

Bayes method

Bayesian algorithm is a kind of algorithm based on Bayes theorem, which is mainly used to solve classification and regression problems. Common algorithms include: naive Bayesian algorithm, Averaged one-dependence Estimators (AODE), and Bayesian Belief Network (BBN).

 

Kernel based algorithm

 

The most famous kernel – based algorithms are support vector machines (SVM). Kernel-based algorithms map input data to higher-order vector Spaces where some classification or regression problems are easier to solve. Common kernel-based algorithms include: Support Vector Machine (SVM), Radial Basis Function (RBF), Linear discriminant Analysis (LDA), etc

 

Clustering algorithm

 

Clustering, like regression, sometimes people describe a class of problems, sometimes they describe a class of algorithms. Clustering algorithms usually merge input data in a central point or hierarchical manner. All the clustering algorithms try to find the internal structure of the data, so as to classify the data according to the most common. Common clustering algorithms include K-means algorithm and Expectation Maximization (EM) algorithm.

 

Association rule learning

 

Association rule learning seeks useful association rules in a large number of multivariate data sets by looking for rules that can best explain the relationships between data variables. Common algorithms include Apriori algorithm and Eclat algorithm.

 

Artificial neural network

 

 

Artificial neural network algorithm simulates biological neural network and is a kind of pattern matching algorithm. Usually used to solve classification and regression problems. Artificial neural networks are a vast branch of machine learning, with hundreds of different algorithms. (Deep learning is one of these algorithms, which will be discussed separately.) Important artificial neural network algorithms include: Perceptron Neural Network, Back Propagation, Hopfield Network, self-organizing Map (SOM). Learning Vector Quantization (LVQ)

 

Deep learning

 

Deep learning algorithms are the development of artificial neural networks. It has gained a lot of attention recently, especially in China after Baidu also started to make efforts on deep learning. Deep learning seeks to build much larger and more complex neural networks at a time when computing power is becoming cheaper. Many deep learning algorithms are semi-supervised, used to deal with large data sets with a small amount of unidentified data. Common deep learning algorithms include: Restricted Boltzmann Machine (RBN), Deep Belief Networks (DBN), Convolutional Network and Stacked auto-encoders.

 

Dimensionality reduction algorithm

 

Like clustering algorithm, dimensionality reduction algorithm tries to analyze the internal structure of data, but dimensionality reduction algorithm tries to summarize or interpret data by using less information in an unsupervised learning way. Such algorithms can be used to visualize high-dimensional data or to simplify data for supervised learning. Common algorithms include: Principle Component Analysis (PCA), Partial Least Square Regression (PLS), Sammon mapping, Multi-dimensional Scaling (MDS), Projection Pursuit, etc.

 

Integration algorithm:

The ensemble algorithm uses some relatively weak learning models to independently train on the same samples, and then integrates the results for overall prediction. The main difficulty of the integration algorithm lies in which independent weak learning models are integrated and how to integrate the learning results. It’s a very powerful class of algorithms, and it’s also very popular. Common algorithms include: Boosting, Bootstrapped Aggregation (Bagging), AdaBoost, Stacking Generalization, Blending theory, Gradient Boosting Machine (GBM), Random Forest.