• Ten Machine Learning Algorithms You Should Know to Become a Data Scientist
  • Original article by Shashank Gupta
  • Translation from: The Gold Project
  • This article is permalink: github.com/xitu/gold-m…
  • Translator: JohnJiang
  • Proofread by: Daltan He, Kasheem Lew

Top 10 machine learning algorithms in data science

Practitioners of machine learning have different beliefs. Some believe in “the sword” (in this case, the sword is an algorithm, and the law is various types of data), while others believe in “the right tool to do the right thing.” Many of them also agree that they are “know-it-all, know-it-all” and have extensive expertise in one area, as well as some knowledge of different areas of machine learning. However, there is no denying that as a data scientist, you must have some knowledge of common machine learning algorithms, which will help us to provide ideas when solving new problems. This tutorial takes you through common machine learning algorithms and related resources to get started quickly.

Principal component Analysis (PCA)/SVD

PCA is an unsupervised method for understanding the global properties of a dataset consisting of vectors. The covariance matrix of the data points is analyzed here to see which dimensions (more often) or data points (partly) are more important. For example, they have a high variance with each other, but a low covariance with the other dimensions. Consider those eigenvectors with the highest eigenvalues, and they are likely to be the upper principal components (PC). SVD is also essentially a way to calculate the ordered components, but it can be obtained without obtaining the covariance of the data points.

This kind of algorithm solves the problem of high-dimensional data analysis by reducing the data dimension.

Tool library:

Docs.scipy.org/doc/scipy/r…

Scikit-learn.org/stable/modu…

Introduction Tutorial:

Arxiv.org/pdf/1404.11…

2A. Least square method and polynomial fitting

Remember the numerical analysis methods you used in college to fit a line or curve to a point to get an equation? They can be used on smaller, low-dimensional data sets to fit curves in machine learning. (For large data or multi-dimensional data sets, the final result may appear serious overfit. So don’t bother.) The least squares (OLS) method has a closed solution, so it does not need to use complex optimization techniques.

Obviously, this algorithm can only be used to fit simple curves or regressions.

Tool library:

Docs.scipy.org/doc/numpy/r… Docs.scipy.org/doc/numpy-1…

Introduction Tutorial:

Lagunita.stanford.edu/c4x/Humanit…

2B. Constrained linear regression

The least square method confuses outliers, pseudo fields and noise in data processing. Therefore, in fitting a data set, constraints are needed to reduce the variance in the data rows. The correct approach is to use a linear regression model to fit the data set so that the weight values are not wrong. The model can be L1 (LASSO) or L2 (Ridge regression) or both (Elastic Regression). Mean square loss optimization.

This kind of algorithm has constraints when fitting regression line, which can avoid overfitting and reduce noise dimension in model.

Tool library:

Scikit-learn.org/stable/modu…

Introduction Tutorial:

www.youtube.com/watch?v=5as…

www.youtube.com/watch?v=jbw…

3. K – means clustering

This is your favorite unsupervised clustering algorithm. Given a set of data points in vector form, they can be divided into different group classes by the distance between them. This is a maximum expectation (EM) algorithm, which constantly moves the center point of the group, and then performs clustering according to the center data points of the group. The input of this algorithm is the number of clusters to be generated and the number of iterations of the clustering process.

As the name suggests, you can use this algorithm to divide the dataset into K clusters.

Tool library:

Scikit-learn.org/stable/modu…

Introduction Tutorial:

www.youtube.com/watch?v=hDm…

www.datascience.com/blog/k-mean…

Logistic regression

Logistic regression is a constrained linear regression, and the weighting has nonlinear applications (often sigmod functions, but you can also use TANh), so the output is strictly limited to +/- classes (1 and 0 in SIGMOD). The cross entropy loss function is optimized by gradient descent. Note to beginners: Logistic regression is used for categorization, not regression. You can also think of logistic regression as a single layer neural network. Logistic regression is trained using methods such as gradient descent or L-BFGS. In NLP, it is often referred to as the maximum entropy classifier.

The Sigmod function image is as follows:

You can use LR to train simple but very robust classifiers.

Tool library:

Scikit-learn.org/stable/modu…

Introduction Tutorial:

www.youtube.com/watch?v=-la…

5. SVM (Support Vector Machine)

Support vector machines are linear models similar to linear regression and logistic regression, the difference between them is that they use different marginal loss functions (the support vector derivation is one of the most elegant mathematical results I have seen using eigenvalues). You can use optimizations like L-BFGS or even SGD to optimize the loss function.

Another innovative aspect of SVM is the core use of data in feature engineering. If you have good PivotTable capabilities, you can benefit from replacing a decent RBF with a smarter core.

A unique feature of SVM is the ability to learn a classifier.

Support vector machines can be used to train classifiers (even regressors).

Tool library:

Scikit-learn.org/stable/modu…

Introduction Tutorial:

www.youtube.com/watch?v=eHs…

Note: SgD-based logistic regression and SVM training are derived from SKLearn scikit-learn.org/stable/modu… I use this frequently because I can detect both logistic regression and SVM using a common interface. You can also use small batches to train with a RAM-sized data set.

6. Feedforward neural network

These are essentially multi-level logistic regression classifiers. The weight of each layer is split by nonlinear functions (SIGmod, TANh, Relu + SoftMax, and the cool new thing selu). It’s also called a multilayer perceptron. Feedforward neural networks can be used as autoencoders in classifier or unsupervised feature learning.

Multilayer perceptron

Feedforward neural networks as autoencoders

Feedforward neural networks can be used to train classifiers or to extract features as self-encoders.

Tool library:

Scikit-learn.org/stable/modu…

Scikit-learn.org/stable/modu…

Github.com/keras-team/…

Introduction Tutorial:

www.deeplearningbook.org/contents/ml…

www.deeplearningbook.org/contents/au…

www.deeplearningbook.org/contents/re…

7. Convolutional Neural network (CONvolution network)

Almost all of the world’s most advanced vision-based machine learning is achieved through convolutional neural networks. They can be used for image classification, target detection and even image segmentation. The convolutional network was invented by Yann Lecun in the late 1980s and early 1990s. It takes the convolutional layer as its main feature, and these convolutional layers play the role of hierarchical feature extraction. You can use them in text (or even diagrams).

Use convolutional network for the latest image and text classification, target detection, image segmentation.

Tool library:

developer.nvidia.com/digits

Github.com/kuangliu/to…

Github.com/chainer/cha…

Keras. IO/application…

Introduction Tutorial:

cs231n.github.io/

Adeshpande3. Making. IO/A – Beginner %…

8. Recurrent Neural Network (RNN)

RNN models the sequence data by recursively applying the same weight set to the aggregator state at T and the input at T. Given a point in time 0.. t.. T, with a hidden state at each T output by step T-1 in the RNN). Pure RNN is rarely used today, but its similar architectures, such as LSTM and GRAS, are state-of-the-art for most sequential modeling tasks.

RNN (now F is usually LSTM or GRU if this is a dense join unit and has nonlinearity). LSTM units are often used in place of ordinary dense layers in RNN structures.

Use RNN for any sequential modeling task, especially text classification, machine translation, and language modeling.

Tool library:

Github.com/tensorflow/… (Many cool NLP research papers from Google are here)

Github.com/wabyking/Te…

opennmt.net/

Introduction Tutorial:

cs224d.stanford.edu/

www.wildml.com/category/ne…

Colah. Making. IO/posts / 2015 -…

9. Conditional Random Field (CRF)

CRF is probably the most commonly used model in the PGM family. They can be used for sequential modeling just like RNNS, or in combination with RNNS. CRFS were state-of-the-art before neural network machine translation systems, and they still learn better than RNNS that require large amounts of data for inductive reasoning in many sequence-type marker tasks with small data sets. They can also be used in other structured prediction tasks, such as image segmentation. CRF models each element in a sequence, such as a sentence, so that adjacent elements affect the tags of a component in the sequence, rather than all tags being independent of each other.

Use CRF to mark sequences (text, images, time series, DNA, etc.)

Tool library:

sklearn-crfsuite.readthedocs.io/en/latest/

Introduction Tutorial:

Blog. Echen. Me / 2012/01/03 /…

Hugo Larochelle’s 7 Series on YouTube: www.youtube.com/watch?v=GF3…

10. The decision tree

Let’s say I receive an Excel spreadsheet with data on various fruits, and I have to say which ones look like apples. All I have to do is ask the question “Which fruit is red and round?” “And divided all the fruits that answered” yes “and” no “into two parts. Now, a red and round fruit doesn’t have to be an apple, and all apples don’t have to be red and round. So I’m going to ask the next question, for red and round fruit: “Which fruit has red or yellow?” And to the fruit that was not red and round: “Which fruit is green and round?” . Based on these questions, it’s fairly accurate to say which ones are apples. This set of problems is the decision tree. However, this is a decision tree that I’ve visually described. Intuition cannot be applied to complex data with high dimensions. We have to ask a series of questions automatically by looking at the marked data. That’s what machine learning-based decision trees do. Earlier versions like CART trees were used for simple data, but as data sets grew larger, the trade-offs between bias and variance needed to be solved by better algorithms. Two commonly used decision tree algorithms are the random forest algorithm (which builds different classifiers on a subset of attributes and combines them together for output) and the enhanced tree algorithm (which trains a series of trees on top of other trees and corrects errors in their subtrees).

Decision trees can be used to classify data points (or even regressions).

Tool library:

Scikit-learn.org/stable/modu…

Scikit-learn.org/stable/modu…

xgboost.readthedocs.io/en/latest/

catboost.yandex/

Introduction Tutorial:

Xgboost. Readthedocs. IO/en/latest/m…

Arxiv.org/abs/1511.05…

Arxiv.org/abs/1407.75…

Education.parrotprediction.teachable.com/p/practical…

More algorithms (you should learn)

If you’re still wondering whether this approach can solve a task like DeepMind’s defeat of the world go champion, think again. The 10 algorithms we discussed above are all pattern recognition, not strategy learning. To learn strategies for solving multi-step problems, such as winning a board game or playing an Atari game, we need to create a free end that learns the rewards and punishments it faces. This type of machine learning is called reinforcement learning. Much (but not all) of the recent work in the field has been done by combining the perception of convolutional networks or LSTM with a group of algorithms called real-time differential learning. These include Q-learning, SARSA, and other variants. These algorithms cleverly use the Behrman equation to get a loss function that allows the terminal to be trained by environmental rewards.

These algorithms are mainly used for automatic game playing :D, but also have some applications in speech generation and entity recognition.

Tool library:

Github.com/keras-rl/ke…

Github.com/tensorflow/…

Introduction Tutorial:

Get free books by Sutton and Barto: web2.qatar.cmu.edu/~gdicaro/15…

Check out David Silver’s course at www.youtube.com/watch?v=2pW… .

These are the 10 machine learning algorithms you must learn to be a data scientist.

You can read about machine learning tools here.

Hope you enjoyed this article, and get started on your AI journey with your free ParallelDots account. You can also view a sample of our API here.

See the original article here.

If you find any errors in the translation or other areas that need improvement, you are welcome to revise and PR the translation in the Gold Translation program, and you can also get corresponding bonus points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.