Machine learning: The basics of machine learning

The previous article machine Learning introduced you to the basics of machine learning and briefly introduced the concept of machine learning. This article will continue to introduce the basics of machine learning. We rely on large amounts of data for machine learning, and data is the foundation of machine learning. Let’s start by looking at some concepts related to data

Recognize data in machine learning

Let’s take the data of iris as an example to understand the basic concept of data

The length and width of petals and sepals in the figure are all characteristics of data

Each sample is actually a point in the space composed of various features of the sample, which is called special spatial classification. The essence of classification is to slice in the feature space, for example, the red part is the red iris, and the blue part is the blue iris. The example in the diagram shows the data in two dimensional space, and the same is true in multi-dimensional space, if it’s not convenient for us to think about a problem in high latitude space we usually analyze it in the present position space and then generalize the results from low latitude space to high latitude space.

In the above example, iris features have very clear semantics, but in machine learning features can be very abstract, for example, each pixel in image recognition is a feature, for example, a 28*28 image has 784 features, of course, there will be more features if it is a color image.

Machine learning is taking as much information as possible and asking the machine to find the relationship between that information and the result that we want in the end, and the features that we give to the machine will largely determine the accuracy and reliability of the result that the algorithm eventually computates and there’s a special area of research for that which is feature engineering, For example, deep learning can be understood as the algorithm automatically helps us to carry out feature engineering

Machine learning tasks

The basic tasks concerned by supervised learning in machine learning can be divided into two categories: classification tasks and regression tasks.

Classification task

The second category

The common dichotomies are

Determine whether the message is spam or not
Determine whether the patient’s tumor is benign or malignant
Judge the rise or fall of a stock, etc
Determine whether credit card users are at risk or not

Many classification

The common multiple categories are

Digital recognition
Image recognition
Determine the risk rating of credit cards issued to customers

Many complex problems can also be converted into multi-classification problems, such as 2048 games, Go games, unmanned cars, but pay attention to solve does not represent the best way.

Some algorithms only support binary tasks; However, multi-classification tasks can be converted into binary tasks; There are algorithms that can naturally perform multiple categories

Multi-label task

For example, the picture is divided into multiple categories

Return to the task

The final conclusion of the classification task is to get a category, whileReturn to the taskYou get a continuous number of values, not a category, such as home prices

Such as:

House prices
Market analysis predicts sales
Student achievement
The price of the stock

Some algorithms solve regression problems intelligently; Some algorithms can only solve classification tasks; Some algorithms can solve both regression problems and classification tasks

In some cases, regression task can be simplified to classification task, such as Angle tree value of steering wheel in unmanned driving.

The classification tasks and regression tasks mentioned above are classified from the perspective of the problems that machine learning can solve, rather than the machine learning algorithm itself. We mentioned supervised learning above, so what is supervised learning? And if we look at machine learning itself, what categories can we classify machine learning algorithms into?

Classification of machine learning

Machine learning can be divided into supervised learning, unsupervised learning, semi-supervised learning and enhanced learning.

Supervised learning

It means the training data we give the machine has a marker or an answerFor example, the iris data set and real estate data mentioned above are availabletagthe

There are many examples of supervised learning in real life, such as

The image already has tag information
Banks have accumulated a certain amount of customer information and credit status on their credit cards
The bank has accumulated a certain amount of information about patients and whether they were eventually diagnosed with the disease
The market accumulates basic information about a home and the amount of money ultimately sold

Many algorithms in machine learning are supervised learning algorithms, such as

K neighbor
Linear regression and polynomial regression
Logistic regression
SVM
Decision trees and random forests

Unsupervised learning

The training data pointed to the machine had no marks or answers

The significance of unsupervised learning lies in classifying and clustering analysis of unlabeled data

Semi-supervised learning

Part of the training data given to the machine is marked or answered, and part is not

More common, such as: missing marks caused by various reasons.

Unsupervised learning is usually used to process the data first, and then supervised learning is used to train and forecast the model

To enhance learning

Take action based on the circumstances around you, and learn from the results of taking actionThe following figure determines the final behavior according to the environment. Common examples include Go, unmanned driving and robot.

Supervised learning and semi-supervised learning are the basis of reinforcement learning

Other categories of machine learning

Online learning and Batch learning (offline learning)
Parametric learning and nonparametric learning

Batch learning

The batch learning process is shown below

Pros: Simplicity

Question: How to adapt to the changing environment? Solution: Re-learn in batches regularly

Disadvantages: re-batch learning each time, huge amount of calculation; In some cases, the environment changes so drastically that it is even impossible.

Online learning

The online learning process is shown below

Advantages: timely response to new environmental changes

Q: Does new data bring bad changes? Solution: Data monitoring needs to be strengthened

Others: It is also suitable for the environment where the amount of data is huge and batch learning is completely impossible

Parameter learning

Once you learn parameters, you no longer need raw data

Nonparametric learning

Don’t make too many assumptions about the model
Nonargument does not mean no argument

Machine learning thinking

Machine learning focuses on solving uncertain problems. Unlike classical algorithms, which often have standard, deterministic, and unique answers, machine learning gives us probabilistic and statistically significant answers that are uncertain. Faced with this answer, we often think about its reliability, how much we can trust the answer, and the nature of what machine learning has learned.

In fact, as long as the data of our algorithm is enough and the quality of the data is good enough, there is even the concept of data as an algorithm

The data itself is important
Data driven, data cleaning, processing, feature engineering, etc

Of course, there are other theories such as algorithm is king

How to choose machine algorithm?

As mentioned above, machine learning mainly solves two kinds of problems: classification problem and regression problem. In fact, it can apply Occam’s razor principle: simple is good.

There is no such thing as a free lunch

It can be rigorously deduced that the expected (average) performance of any two algorithms is the same.
When it comes to a particular problem, one algorithm might be better.
No algorithm is absolutely better than another algorithm.
It doesn’t make sense to talk about which algorithm is good apart from the specific problem.
In the face of a specific problem, it is necessary to try to use multiple algorithms for comparative experiments

Above we have a general overview of machine learning global introduction, I believe that we have a closer understanding of machine learning, and will continue to learn the algorithms related to machine learning.