Thank you for your attention, in the last article after the release of many warm-hearted friend suggested that I can improve the classification, a classification is based on the learning ways, additionally one kind is classified according to the similar form or function, a few days I have been thinking about it this is always a good classification, so in these days to collect data to classify, Be able to communicate with people on an ongoing basis.

I hope that after reading this article, you will have a good understanding of the most popular machine learning algorithms in supervised learning and how they relate to each other.

1: Classification according to learning mode

Supervised learning: Input data is training data, and each data will be labeled, such as “advertising/non-advertising”, or the stock price at the time. By modeling the training process, the model needs to make predictions that can be corrected if the predictions are wrong. The training process will continue until the model outputs accurate training results. Commonly used to solve problems are classification and regression. Common algorithms include logistic regression and BP neural network

Unsupervised learning: Input labels have no data, output has no standard answers, just a series of samples. Unsupervised learning models structures in input data by inferencing them. This may be to extract general rules, to mathematically process systems to systematically reduce redundancy, or to organize data according to similarities. Common problems to solve are clustering, reduction and association rule learning. Common algorithms include Apriori algorithm and K-means algorithm

Semi-supervised learning: Input data for semi-supervised learning includes labeled and unlabeled samples. Semi-supervised learning is a case where there is an expected prediction, but the model must learn to structure the data to make the prediction. Common problems to solve are classification and regression. Common algorithms are predictive algorithms for all unlabeled data modeling (which can be seen as an extension of unsupervised learning)

2: Classification from the functional perspective

1. Regression algorithm: Regression analysis is a predictive model technology to study the relationship between independent variables and dependent variables. These techniques are used to predict time series models and to find relationships between variables. Regression analysis is also a common statistical method, through statistical machine learning into the field of machine learning. “Regression” can refer to both algorithms and problems, so it can be confusing when referring to them. Actually, I think it’s a process.

Commonly used regression algorithms include:

Ordinary least squares regression (OLSR), Linear regression, logistic regression, stepwise regression, Multivariate adaptive regression spline (MARS), Local estimation smooth scatter chart (LOESS)

2: Instance-based learning algorithms: Instance-based learning is modeled by training data samples or instances, which are also considered necessary for modeling. Such models typically build a sample database, compare new data sets with data in the database, find the best matches and make predictions in this way. In other words, when making predictions, such algorithms generally use the similarity matching criterion to compare the similarity between the sample of data to be predicted and the original sample to make predictions. Therefore, Instance-based approaches are also not collectively referred to as winner-take-all and memory-based learning.

Commonly used instance-based learning algorithms include:

K-adjacent algorithm (KNN), Learning Vector quantization algorithm (LVQ), Self-organizing mapping algorithm (SOM), Local Weighted Learning algorithm (LWL)

3. Normalization algorithm: The idea of normalization algorithm in my learning process is that when the parameters are relatively small, the model is more simple. The regularization algorithm can be said to be an extension of the tolerance of model complexity in favor of simpler, more easily generalized models. I have singled out regularization algorithms because I have heard that they are popular, powerful, and can be easily embellishments to other methods

Common regularization algorithms include:

Ridge regression, LASSO algorithm, Elastic Net, Least Angle regression Algorithm (LARS)

4. Decision tree algorithm: The goal of decision tree algorithm is to create a model to predict the target value of samples according to the actual value of data attributes. During training, the tree branches until the final decision is made. Decision tree algorithms are often used for classification and regression. Decision trees are generally fast and accurate. So this is one of the most popular machine learning algorithms.

Common decision tree algorithms include:

Classification and regression tree (CART)

ID3 algorithm, C4.5 and C5.0 algorithm, which are two different versions of an algorithm, CHAID algorithm, single-layer decision tree, M5 algorithm, conditional decision tree

5: Bayesian algorithms: Bayesian methods refer to those algorithms that can explicitly use Bayesian theorem to solve classification and regression problems

Common Bayesian algorithms include:

Naive Bayesian algorithm, Gaussian naive Bayesian algorithm. Polynomial Naive Bayesian algorithm, AODE Algorithm, Bayes belief network (BBN), Bayesian Network (BN)

6: clustering algorithm: clustering and regression, can be used to describe a class of problems, can also be used to refer to a group of methods. Methods of clustering usually involve modeling methods such as centroid-based or hierarchal, all of which are related to inherent data structures and aim to bring data together in the largest common organization. In other words, the algorithm finds patterns in the data distribution structure by aggregating the input samples into data clusters around some centers.

Common clustering algorithms include: K- mean, K- median, EM algorithm, hierarchical clustering algorithm

7. Association rule learning: Association rule learning observes some associations between different variables of data. What the algorithm needs to do is to find the rules that can best describe these associations, that is, to acquire the knowledge of the dependence and association between time and other events.

Commonly used association rule algorithms are:

Apriori algorithm, Eclat algorithm

8: Artificial neural network: Artificial neural network is a kind of model inspired by the structure and function of biological neural network. They are a class of pattern matching commonly used to solve problems such as regression and classification, but it is actually a subset of hundreds of algorithms and variations of the problem. But I’ve now separated deep learning from artificial neural network algorithms. Because deep learning is so popular. Artificial neural network is still biased to the classical perception algorithm.

Commonly used artificial neural networks include: perceptron, back propagation algorithm, Hopfield network, radial basis function network (RBFN)

9. Deep learning algorithm: Deep learning algorithm is an upgraded version of artificial intelligence, which makes full use of cheap computing power to achieve calculations. In recent years, deep learning has been applied on a large scale, especially in speech recognition and image recognition. Deep learning algorithms build larger neural networks with more complex results. As I mentioned in my article a few days ago, many deep learning algorithms involve semi-supervised learning problems, in which there is usually a large amount of data and only a small portion of the data is labeled.

Commonly used deep learning algorithms include:

Deep Boltzmann Machine (DBM), Deep belief Network (DBN), Convolutional Neural Network (CNN), Stacked Auto-encoder

10. Dimensionality reduction algorithm: In fact, I think dimensionality reduction algorithm is somewhat similar to clustering algorithm, which is also an attempt to discover the inherent structure of data. However, dimensionality reduction algorithm adopts unsupervised learning to summarize and describe with less information. Dimensionality reduction algorithm can supervise learning and be used for visualization or simplification of multidimensional data. Many dimensional reduction algorithms can also be used for classification and regression problems after modification.

Common dimensionality reduction algorithms include:

Principal component analysis, principal component regression, partial least squares regression, Salmon mapping, multidimensional scale analysis, projective pursuit, active discriminant analysis, mixed discriminant analysis, quadratic discriminant analysis, flexible discriminant analysis

— — — — — — — — — — — — — —

In fact, many algorithms have not been mentioned, such as which group SVM should be divided into, or should be a separate group. All these are unknown to me and puzzling, but I also ask you to give advice on these knowledge, learn and progress together.