For instance or sample

Represents each record in a data set that describes an event or object.

Attribute or frature

Something that reflects the performance or nature of an event or object in a particular way, such as the color, sound, etc., of a watermelon.

Attribute value or characteristic value

The value of an attribute or characteristic, such as the greenish color of a watermelon.

Attribute space or SAMp1e space or input space

For example, if the three attributes of “color”, “root” and “knock” are used as coordinate axes, they span into a three-dimensional space for describing watermelon, in which each watermelon can find its own coordinate position.

Fearute Vector (Feature vector)

Each sample point in the space spanened by attributes corresponds to a coordinate vector, so an instance can also be called an eigenvector.

Dimensionality of dimension

For DDD with MMM instances, each instance xi=(xi1,… ,xid)\pmb{x}_i=(x_{i1},… ,x_{id})xxi=(xi1,… Each instance is a vector in the sample space X\mathcal{X}X of DDD dimension, i.e. Xi ∈X\ PMB {X} _i \in \mathcal{X}xxi∈X. DDD is called the dimension of sample XI \ PMB {X} _ixxi.

Content hypothesis

The model learned corresponds to some underlying pattern of data correlation, which is called hypothesis.

Ground-truth/truth

The process of model learning is to find or approximate the truth.

Label Indicates the label or label

The resulting information about an instance is called a marker; for example, the marker for a watermelon can be a good melon or a bad melon.

Label Space Indicates the label space or output space indicates the output space

If (xi,yi),yi∈Y(\ PMB {x}_i,y_i),y_i \in \mathcal{Y}(xxi,yi),yi∈Y is used to represent the third sample with tags, then Y\mathcal{Y}Y is the set of all tags, which is called the tag space or output space.

Regression and classification

Classification predicts discrete values (e.g., good or bad) and regression predicts continuous values (e.g., temperature, humidity). For a training set {(x1,y1)… ,xm,ym)}\{(\pmb{x}_1,y_1),… ,\pmb{x}_m,y_m)\}{(xx1,y1),… X↦Y\mathcal{X} \mapsto \mathcal{Y}X↦Y. For dichotomous tasks, Y = {- 1, + 1} or {0, 1} \ mathcal {} Y = \ {1, + 1 \} \ or \ \ {0, 1 \} Y = {- 1, + 1} or {0, 1}; For multiple classification task, ∣ Y ∣ > 2 | \ mathcal {} Y | > 2 ∣ Y ∣ > 2; For regression tasks Y=R\mathcal{Y}=\mathbb{R}Y=R, R\mathbb{R}R denotes the set of real numbers.

Testing the test

The mapping function FFF is learned through training, and the test mark y=f(x)y=f(\ PMB {x})y=f(xx) is obtained for the test sample x\ PMB {x}xx.

Generalization ability

The ability of a model acquired through training set sample learning to be applicable to other samples in the sample space is called generalization ability.

Independent and identically distributed, I.I.D

Assuming that all samples in the sample space obey an unknown distribution D\mathcal{D}D, each sample obtained is independently sampled from the distribution, which is called independent isodistribution.

1. The Revolution is inductive learning

In the broad sense, it refers to learning from samples, and in the narrow sense, it refers to learning concepts from training data, also known as concept learning.

1. The physical revolution bias

Machine learning algorithms in the process of learning preferences hypothesis of some type, as shown in the figure below, the training has several sample points, to learn is consistent with the training set distribution model is to find a can through all the curve of the sample points and find a function to the curve, for this a few sample points may have numerous curves can through them at the same time, Therefore, the learning algorithm must have some “preference” to select the most “correct” curve:

It’s Occam’s Razor

A common preference setting principle is “if multiple hypotheses are consistent with observations, choose the simplest hypothesis”. In regression problems, it is generally assumed that a “smoother” curve is simpler (because it is easier to express it as a function).