This article lists the top 10 machine learning algorithms used by data scientists and introduces the features of these algorithms for new machine learning students to better understand and apply.

The text/James Le

Translation/food

Edit/Disappear croissant

There is a saying in machine learning that “there is no such thing as a free lunch.” In short, it means that no algorithm can do the best job on every problem, and this theory is especially important in supervised learning.

For example, you can’t say that a neural network is always better than a decision tree, and vice versa. Model execution is influenced by many factors, such as the size and structure of the data set.

Therefore, you should try many different algorithms based on your problem, while using a data test set to evaluate performance and pick the best one.

Of course, the algorithm you try has to fit your problem, and that’s the main task of machine learning. For example, if you want to clean your house, you might use a vacuum cleaner, broom or mop, but you certainly don’t get a shovel and start digging holes.

For those new to machine learning who are eager to understand the basics of machine learning, here is a list of ten machine learning algorithms used by data scientists.

1- Linear regression

Linear regression is probably one of the best known and best understood algorithms in statistics and machine learning.

Since predictive modeling is mainly concerned with minimizing model error, or making the most accurate prediction at the expense of interpretability. We borrow, reuse, and steal algorithms from many different domains, with some statistical knowledge involved.

Linear regression is represented by an equation that describes the linear relationship between the input variable (x) and the output variable (y) by finding a specific weight of the input variable (B).


Linear Regression

Example: y = B0 + B1 * x

Given the input x, we will predict y. The goal of the linear regression learning algorithm is to find values for the coefficients B0 and B1.

Linear regression models can be learned from data using different techniques, such as linear algebraic solutions for ordinary least squares and gradient descent optimization.

Linear regression has existed for over 200 years and has been extensively studied. If possible, some rules of thumb when using this technique are to remove very similar (related) variables and remove noise from the data. This is a quick and easy technique and a good first algorithm.

2- Logistic regression

Logistic regression is another technique machine learning borrows from the field of statistics. This is a special method for dichotomous problems (two class – valued problems).

Logistic regression is similar to linear regression because the goal of both is to find the weight value of each input variable. Unlike linear regression, the output predictions are worth transforming using nonlinear functions called logical functions.

The logical function looks like a big S and can convert any value in the range of 0 to 1. This is useful because we can apply the rules to the output of the logical function, classifying the values into 0 and 1 (for example, IF less than 0.5, then output 1) and predicting the category values.


Logistic Regression

Predictions made through logistic regression can also be used to calculate the probability of belonging to class 0 or class 1 because of the model’s unique learning style. This is useful for problems that require a lot of fundamentals.

Like linear regression, logistic regression does work better when you remove properties that are unrelated to the output variable and properties that are very similar (related) to each other. This is a model for fast learning and efficient handling of binary classification problems.

3- Linear discriminant analysis

Traditional logistic regression is limited to dichotomous problems. If you have more than two classes, Linear Discriminant Analysis (LDA) is the preferred Linear classification technique.

The representation of LDA is very simple. It consists of statistical attributes of your data, calculated according to each category. For a single input variable, this includes:

1. Average of each category.

2. Variance calculated across all categories.


Linear Discriminant Analysis

LDA does this by calculating the discriminant value for each class and predicting the class with the maximum value. This technique assumes that the data has a Gaussian distribution (bell curve), so it is best to manually remove outliers from the data first. This is a simple and powerful approach to classification prediction modeling problems.

4- Classification and regression trees

Decision tree is an important algorithm in machine learning.

The decision tree model can be represented by binary tree. Yeah, it’s binary trees from algorithms and data structures, nothing special. Each node represents a single input variable (x) and the left and right children on that variable (assuming the variable is a number).


Decision Tree

The leaves of the tree contain the output variable (y) used to make the prediction. The prediction is made by traversing the tree, stopping when a leaf node is reached and printing the class value of that leaf node.

Decision trees are fast in learning and prediction. Predictions are often accurate for many problems, and you don’t need to do anything special with the data.

5- Naive Bayes

Naive Bayes is a simple but extremely powerful predictive modeling algorithm.

The model consists of two types of probabilities that can be calculated directly from your training data: 1) probabilities for each category; 2) The conditional probability given for each category of x values. Once calculated, the probabilistic model can be used to make predictions about new data using Bayes’ law. When your data is numerical, you usually assume a Gaussian distribution (bell curve) so that you can easily estimate these probabilities.


Bayes Theorem

Naive Bayes is called naive because it assumes that each input variable is independent. This is a strong assumption, impractical for real data, but the technique is still very effective for a wide range of complex problems.

6 – K neighbor

KNN algorithm is very simple and very effective. The model of KNN is represented by the entire training data set. Is it really easy?

New data points can be predicted by searching K most similar instances (neighbors) in the whole training set and summarizing the output variables of these K instances. For regression problems, the new point may be the average output variable, and for classification problems, the new point may be the mode class value.

The trick to success lies in determining the similarities between data instances. If your attributes are all of the same proportions, the easiest way is to use Euclidean distance, which can be calculated directly from the difference between each input variable.


K-Nearest Neighbors

KNN may need a lot of memory or space to store all the data, but only performs calculations (or learning) when prediction is needed. You can also update and manage your training set at any time to keep your predictions accurate.

Concepts of distance or tightness can break down in high-dimensional environments (with a large number of input variables), which can negatively affect the algorithm. Such events are known as the curse of dimensions. It also implies that you should use only those input variables that are most relevant to the predicted output variables.

7- Learn vector quantization

The disadvantage of k-nearest neighbors is that you need to maintain the entire training data set. The Learning Vector quantization algorithm (or LVQ) is an artificial neural network algorithm that allows you to suspend any training instance and learn them exactly.


Learning Vector Quantization

LVQ is represented by a set of Codebook vectors. Start with a random selection of vectors, then iterate many times to fit the training data set. After learning, codebook vectors can be used for prediction like k-nearest neighbors. Find the most similar neighbor (best match) by calculating the distance between each Codebook vector and the new data instance, and then return the category value of the best match unit or the actual value in the regression case as a prediction. You’ll get the best results if you limit your data to the same range (say, between 0 and 1).

If you find that KNN gives good results on your dataset, try using LVQ to reduce the memory requirements for storing the entire training dataset.

8- Support vector machines

Support vector machines are perhaps one of the most popular and discussed machine learning algorithms.

A hyperplane is a line that divides the space of input variables. In SVM, a hyperplane is selected to separate points in the input variable space according to their class (class 0 or class 1). You can think of it in two dimensions as a line, and all the input points can be completely separated by this line. The SVM learning algorithm is to find the coefficients that can make the hyperplane have the best separation of categories.


Support VectorMachine

The distance between the hyperplane and the nearest data point is called the boundary, and the hyperplane with the largest boundary is the best choice. At the same time, only these close data points are relevant to the definition of the hyperplane and the construction of the classifier. These points are called support vectors, and they support or define the hyperplane. In practice, we use optimization algorithms to find the coefficient values that maximize the boundary.

SVM is probably one of the most powerful ready-to-use classifiers and is worth a try on your data set.

9- Bagging and Random Forests

Random forest is one of the most popular and powerful machine learning algorithms. It is an integrated machine learning algorithm called Bootstrap Aggregation or Bagging.

Bootstrap is a powerful statistical method for estimating a certain amount, such as an average, from a sample of data. It takes a large sample of data, calculates averages, and then averages all averages to get a more accurate estimate of the true average.

The same approach is used in Bagging, but it is most commonly used for decision trees rather than estimating the entire statistical model. It trains the data to take multiple samples, and then builds a model for each sample. When you need to make predictions about new data, each model makes predictions and averages them out to better estimate the true output.


Random Forest

Random forest is an adjustment of decision tree. It introduces randomness to achieve suboptimal segmentation.

As a result, the models created for each data sample are more diverse from one another, but still accurate in their own sense. Combined with the predicted results, the correct potential output value can be better estimated.

If you get good results using high variance algorithms such as decision trees, then adding this algorithm will be even better.

10 – Boosting and AdaBoost

Boosting is an ensemble technique for creating a strong classifier from a number of weak classifiers. It builds a model from the training data, then creates a second model to try to correct the errors of the first model. Keep adding models until the training set is perfectly predicted or has been added to the maximum number.

AdaBoost is the first truly successful Boosting algorithm developed for dichotomies, and is the best starting point for understanding Boosting. Random gradient Boosting is the most famous algorithm based on AdaBoost at present.


AdaBoost

AdaBoost is often used with short decision trees. After the first tree is created, the performance of each training instance on the tree determines how much attention the next tree needs to devote to that training instance. Training data that is hard to predict is given more weight, while instances that are easy to predict are given less weight. Models are created sequentially, and each model update will affect the learning effect of the next tree in the sequence. After all the trees are built, the algorithm predicts the new data and weights the performance of each tree by training the accuracy of the data.

Because the algorithm is so focused on error correction, it is important to have clean data with no outliers.

Write in the last

A typical question for beginners faced with a wide variety of machine learning algorithms is “Which algorithm should I use?” The answer depends on a number of factors, including :(1) the size, quality, and nature of the data; (2) available computing time; (3) urgency of the task; And (4) what do you want to do with the data.

Even an experienced data scientist has no way of knowing which algorithm will perform best until he tries different ones. While there are many other machine learning algorithms, these are among the most popular. If you’re new to machine learning, this is a great place to start.