This is the fourth day of my November challenge

The original link

Introduction to nuclear methods

You can begin to understand kernel methods with the concept of functional bases.

Kernel method has been widely used in various data analysis techniques. Its inspiration lies in mapping a vector in Rn\mathcal{R}^nRn space to another vector in an eigenspace. As shown in the figure below, there are some red and blue dots that are difficult to separate in Rn\mathcal{R}^nRn space, and it may be easier to separate them if they are mapped to higher dimensional feature space.

Characteristics of decomposition

For A symmetric matrix A\mathbf{A}A (AT=A\mathbf{A}^T=\mathbf{A}AT=A), there exists A real number λ\lambdaλ and A vector x\mathbf{x}x:


A x = Lambda. x \mathbf{A}\mathbf{x}=\lambda\mathbf{x}

λ\lambda lambda is the eigenvalue of A\mathbf{A}A, and x\mathbf{x}x is the eigenvector. If A\mathbf{A}A has two eigenvalues λ1,λ2\lambda_1,\lambda_2λ1,λ2, and two eigenvectors x1,x2\mathbf{x}_1,\mathbf{x}_2x1,x2, it can obviously be deduced:


Lambda. 1 x 1 T x 2 = x 1 T A T x 2 = x 1 T A x 2 = Lambda. 2 x 1 T x 2 \lambda_1\mathbf{x}_1^T\mathbf{x}_2=\mathbf{x}_1^T\mathbf{A}^T\mathbf{x}_2=\mathbf{x}_1^T\mathbf{A}\mathbf{x}_2=\lambda_ 2\mathbf{x}_1^T\mathbf{x}_2

Lambda_1 \neq \lambda_2λ1=λ2, x1Tx2=0\mathbf{x}^ t_1\ mathbf{x}_2=0x1Tx2=0, Therefore x1T\mathbf{x}_1^Tx1T and x2\mathbf{x}_2x2 are orthogonal.

For A∈Rn×n\mathbf{A} \in \mathcal{R}^{n \times n}A∈Rn×n, n eigenvalues and their corresponding eigenvectors can be found. Thus, A\mathbf{A}A can be expressed as:


A = Q D Q T \mathbf{A}=\mathbf{Q}\mathbf{D}\mathbf{Q}^T

Here the Q = (q1,… ,qn)\mathbf{Q}=(\mathbf{q}_1,… ,\mathbf{q}_n)Q=(q1,… , qn) is an orthogonal matrix (i.e. QQT = E \ mathbf {Q} \ mathbf {Q} ^ T = EQQT = E), and D = diag (lambda 1,… , lambda n) \ mathbf {D} = diag (\ lambda_1,… , \ lambda_n) D = diag (lambda 1,… ,λn) (diag is diagonal matrix). The above formula can be expanded as:

, I = 1 n \ {qi} {\ mathbf {q} _i \} ^ n_ (I = 1} {qi} I = 1 n is an Rn \ mathcal {R} ^ nRn orthogonal basis set in space.

Kernel function

A function f(x)f(x)f(x) can be thought of as an infinite vector, and a function K(x,y)K(x,y)K(x,y) K(x,y) can be thought of as an infinite matrix with two independent variables. If K(x,y)=K(y,x)K(x,y)=K(y,x)K(x,y)=K(y,x), and:


f ( x ) K ( x . y ) f ( y ) d x d y p 0 \int \int f(x)K(x,y)f(y)dxdy \geq 0

Then for any function FFF, K(x,y)K(x,y)K(x,y) is symmetric and positive definite, then K(x,y)K(x,y)K(x,y) is a kernel function.

Let A∈Rn×nA\in R^{n \times n}A∈Rn×n, if A=ATA=A^TA=AT, XTAX>0X^TAX>0XTAX>0 for any 0≠X∈Rn0 \neq X \in R^n0=X∈Rn, A is called A symmetric positive definite matrix.

And the eigenvalue λ\lambdaλ and the eigenfunction ψ(x)\psi(x)ψ(x) are such that:


K ( x . y ) Bits of ( x ) d x = Lambda. Bits of ( y ) \int K(x,y)\psi(x)dx=\lambda \psi (y)

For different eigenvalues
Lambda. 1 \lambda_1
and
Lambda. 2 \lambda_2
And the corresponding eigenfunctions
Bits of 1 ( x ) \psi_1(x)
and
Bits of 2 ( x ) \psi_2(x)
Easy:

Therefore, it can be concluded that:


< Bits of 1 . Bits of 2 > = Bits of 1 ( x ) Bits of 2 ( x ) d x = 0 <\psi_1,\psi_2>=\int \psi_1(x) \psi_2(x)dx = 0

We know that the eigenfunction is orthogonal, where ψ\psiψ represents the function (infinite vector) itself.

For a kernel function, if there is infinite eigenvalues up I} {lambda I = 1 \ {\ lambda_i \} ^ \ infty_ {I = 1} {lambda I} I = 1 up and infinite characteristic function up I} {bits I = 1 \ {\ psi_i \} ^ \ infty_ {I = 1} {bits I}, I = 1 up It can be obtained as in the matrix case:


K ( x . y ) = i = 0 up Lambda. i Bits of i ( x ) Bits of i ( y ) K(x,y)=\sum^\infty_{i=0}\lambda_i\psi_i(x)\psi_i(y)

This is also known as Mercer’s theorem: any semidefinite symmetric function can be a kernel. Up here, I} {bits I = 1 \ {\ psi_i \} ^ \ infty_ {I = 1} {bits I} I = 1 up constitute a set of orthogonal basis in a function space.

Common kernel functions are:

  • Polynomial kernel function is: (x, y) = K (gamma xTy + C) dK (x, y) = (\ gamma x ^ Ty + C) ^ dK (x, y) = (gamma xTy + C) d, where d = 1, 2,… Nd = 1, 2,… Nd = 1, 2,… , N.
  • Gaussian radial basis kernel (Gaussian radial basis kernel), K (x, y) = exp (- gamma ∣ ∣ x – y ∣ ∣ 2) K (x, y) = exp (- \ gamma | | x-y | | ^ 2) K (x, y) = exp (- gamma ∣ ∣ x – y ∣ ∣ 2).
  • Sigmoid kernel: K(x,y)=tanh(γxTy+C)K(x,y)=tanh(\gamma x^Ty+C)K(x,y)=tanh(γxTy+C),tanh refers to the hyperbolic tangent function.

Reproducing Kernel Hilbert Space

Will {lambda I bits I}, I = 1 up \ {\ SQRT {\ lambda_i} \ psi_i \} ^ \ infty_ {I = 1} {lambda I bits I} I = 1 up as a set of orthogonal basis constructed a Hilbert space H \ mathcal {H} H. Any function or vector in space can be represented as a linear combination of these bases. Hilbert space concept

Assumptions:


f = i = 1 up f i Lambda. i Bits of i f=\sum^\infty_{i=1}f_i\sqrt{\lambda_i}\psi_i

FFF can be defined as an infinite vector in H\mathcal{H}H:


f = ( f 1 . f 2 . . . . ) H T f=(f_1,f_2,…) ^T_{\mathcal{H}}

For another function g=g1,g2… HTg={g_1,g_2,… }^T_{\mathcal{H}}g=g1,g2,… HT:


< f . g > H = i = 1 up f i g i <f,g>_{\mathcal{H}}=\sum^\infty_{i=1}f_ig_i

For a kernel function K, use K(x,y)K(x,y)K(x,y) to represent the evaluation of K at the point (x,y)(x,y)(x,y) as a scalar. Using the K (⋅ ⋅) K (, \ \ cdot cdot) K (⋅ ⋅) (infinite vector) to represent the function itself, using the K (x, ⋅) K (x, \ cdot) K (x, ⋅) to represent a matrix of the first x lines. One parameter of the kernel function is defined as XXX, and then it can be regarded as a function with one parameter or an infinite vector, obtaining:


K ( x . ) = i = 0 up Lambda. i Bits of i ( x ) Bits of i K(x,\cdot)=\sum^\infty_{i=0}\lambda_i\psi_i(x)\psi_i

In space H\mathcal{H}H we can define:


K ( x . ) = ( Lambda. 1 Bits of 1 ( x ) . Lambda. 2 Bits of 2 ( x ) . . . . ) H T K(x,\cdot)=(\sqrt{\lambda_1}\psi_1(x),\sqrt{\lambda_2}\psi_2(x),…) ^T_{\mathcal{H}}

Therefore, it can be obtained:


< K ( x . ) . K ( y . ) > H = i = 0 up Lambda. i Bits of i ( x ) Bits of i ( y ) = K ( x . y ) <K(x,\cdot),K(y,\cdot)>_{\mathcal{H}}=\sum^\infty_{i=0}\lambda_i\psi_i(x)\psi_i(y)=K(x,y)

This is the regenerative property, that is, the kernel function is used to regenerate the inner product of two functions. The regenerative property allows us to calculate only the kernel function instead of the inner product in the higher dimensional characteristic space, which greatly reduces the computation. Therefore, H\mathcal{H}H is called the regenerative kernel Hilbert space (RKHS).

Back to the original question: how do you map points to the feature space using kernels?

Define a mapping:


Φ ( x ) = K ( x . ) = ( Lambda. 1 Bits of 1 ( x ) . Lambda. 2 Bits of 2 ( x ) . . . . ) T \Phi(x)=K(x,\cdot)=(\sqrt{\lambda_1}\psi_1(x),\sqrt{\lambda_2}\psi_2(x),…) ^T

This allows the point x to be mapped to H\mathcal{H}H, where Phi Phi Phi does not represent a function but points to a vector or function in the characteristic space H\mathcal{H}H. Then we get:


< Φ ( x ) . Φ ( y ) > H = < K ( x . ) . K ( y . ) > H = K ( x . y ) <\Phi(x),\Phi(y)>_{\mathcal{H}}=<K(x,\cdot),K(y,\cdot)>_{\mathcal{H}}=K(x,y)

So, you don’t need to know what a mapping is, where the eigenspace is, or what the basis of the eigenspace is. For a symmetric positive definite function KKK, there must be a mapping Phi Phi Phi and an eigenspace H\mathcal{H}H such that:


< Φ ( x ) . Φ ( y ) > = K ( x . y ) <\Phi(x),\Phi(y)>=K(x,y)

That’s the trick with the kernel.

A simple case

Define the kernel function:


K ( x . y ) = ( x 1 . x 2 . x 1 x 2 ) ( y 1 . y 2 . y 1 y 2 ) T = x 1 y 1 + x 2 y 2 + x 1 x 2 y 1 y 2 K(x,y)=(x_1,x_2,x_1x_2)(y_1,y_2,y_1y_2)^T=x_1y_1+x_2y_2+x_1x_2y_1y_2

Define x = (x1, x2) T, y = (y1, y2) T \ mathbf {x} = (x_1, x_2) ^ T, \ mathbf {} y = (y_1, y_2) ^ Tx = (x1, x2) T, y = T (y1, y2). make Lambda = 1 lambda = lambda 3 = 1, 2 bits (x) = 1 x1, bits of (x) = 2 x2, bits of 3 (x) = x1x2 \ lambda_1 = \ lambda_2 = \ lambda_3 = 1, \ psi_1 (\ mathbf {x}) = x_1, \ psi_2 (\ mathbf {x}) = x_2, \ ps I_3 (\ mathbf {x}) = x_1x_2 lambda = 1 lambda = lambda 3 = 1, 2 bits (x) = 1 x1, bits of (x) = 2 x2, bits of 3 (x) = x1x2, mapping can be defined as a: