Definition 2.4 Agnostic PAC learning

Let HHH be a hypothesis set. AAA is an unknowable PAC learning algorithm, if there is a poly real polynomial function (,,, ⋅ ⋅ ⋅ ⋅) poly real (poly real,,,,,,,) (,,, ⋅ ⋅ ⋅ ⋅) makes any ϵ > 0 \ epsilon > 0 ϵ > 0 and the delta delta > > 0 0 delta > 0, For DDD of all distributions on X × YX × YX × Y, The following applies to any sample size m or more poly real (1 / ϵ, 1 / delta, n, the size (c) m p poly real (1 / \ epsilon, 1 / delta, n, the size (c)) m p poly real (1 / ϵ, 1 / delta, n, the size (c)) :


P r S …… D m [ R ( h s ) m i n h H R ( h ) Or less ϵ ] p 1 Delta t. . ( 2.21 ) \ \ underset {S sim D ^ m} {Pr} [R (h_s) – \ underset \ {h in h} {min} R (h) \ le \ epsilon] \ ge1 – delta. (2.21)

If A is further run in poly(1/ε, 1/δ, n, size(c)), then it is considered to be an efficient unknowable PAC learning algorithm.

A scene is said to be deterministic when the label of a point can be uniquely determined by some measurable function f: X → Yf: X → Yf: X → Y with probability 1. In this case, it is sufficient to consider the distribution DDD over the input space. The training sample is obtained by drawing (x1,.., XM)(x1,.., xM)(x1,.., xM) according to DDD and obtaining labels by FFF: Yi = f (xi) y_i = f (x_i) yi = f (xi) for all the I ∈ [m] 1, I ∈ [m] 1, I ∈ [m] 1,. Many learning problems can be expressed in such deterministic scenarios. In the previous sections and most of the material presented in this book, we have restricted the introduction to deterministic scenarios for the sake of simplicity. However, for all this material, the extension to random scenes should be straightforward for the reader.

2.4.2 Bayesian error and noise

In the deterministic case, there exists, by definition, an objective function f:R (h) =0 without generalization error. In the random case, any hypothesis has a minimum non-zero error

Definition 2.5 Bayesian error

Given the distribution DDD on X×YX×YX×Y, The Bayesian error R∗R^∗R∗ is defined as the lower bound of the error obtained by the measurable function H :X→Yh:X→ Yh:X→Y:


R = i n f h h measurable R ( h ) . ( 2.22 ) R ^ * = \ underset {\ underset measurable {h} {h}} {inf} R (h). (2.22)

According to the definition, in the deterministic case, we have R∗ = 0R^∗ = 0R∗ =0, but in the random case, R∗≠0R^*\neq 0R∗=0 obviously, R(h) = R∗R(h) = R∗R(h) = R∗R(h) = R∗ Bayesian classifier hBayesh_{Bayes}hBayes can be defined according to conditional probability as:


x X . h B a y e s ( x ) = a r g m a x P r y { 0 . 1 } [ y ] [ x ] . ( 2.23 ) \ \ forall x in x, h_ (x) = {the Bayes} \ underset {y \ in \ {0, 1 \}} {argmaxPr} [y] [x]. (2.23)

HBayesh_ {the Bayes} hBayes on Xx Xx ∈ x ∈ x ∈ average error is minPr ∣ x [0], Pr ∣ x [1] min {Pr | x [0], [1] | x Pr} minPr ∣ x [0], Pr ∣ x [1], this is the minimum possible error. This leads to the following definition of noise.

Definition 2.6 Noise

Given the distribution DDD on X × YX × YX × Y, the noise at point X ∈ Xx ∈ X is defined as


n o i s e ( x ) = m i n { P r [ 1 x ] . P r [ 0 x ] } . ( 2.24 ) Noise (x) = min \ {Pr | x [1], Pr [0] | x \}. (2.24)

The average noise or ddD-related noise is E[noise(x)]E[noise(x)]E[noise(x)]. Therefore, average noise is precisely a Bayesian error: noise = E[noise(x)] = R∗noise = E[noise(x)] = R^*noise = E[noise(x)] = R∗. Noise is characteristic of a learning task and indicates its difficulty. A point X ∈ Xx ∈ X whose noise (x)(x)(x) is close to 1/21/21/2 is sometimes referred to as noise and is of course a challenge for accurate prediction.