preface

What is called principal component analysis? Let’s look at a graph of an ellipse, and if you were to find a line where all the points on the ellipse map to the most scattered points and retain the most information, how would you choose that line? If the figure below is a horizontal line, it is to represent two-dimensional data as much as possible in a one-dimensional way. Then how about multidimensional data? Can it be represented as much as possible in lower-dimensional data?

How do you represent as much of an ellipsoid as possible in terms of a two-dimensional plane?

thought

Principal component analysis is a statistical method, a way to simplify data, a linear transformation, the data is transformed into a new coordinate system, so that the first variance of any projection is mapped to the first principal component, and the second variance is mapped to the second principal component. If the high-dimensional principal components are discarded, the features with the greatest contribution from the other side can be retained. In some aspects, the main features of the data can be retained. Of course, in order to make the data look better, we will move the center of the coordinate axis to the center of the data, which can make the data processing more convenient.

In math

Mathematically, we use the square of the L2L^2L2 norm (the square of the L2L^2L2 norm minimizes at the same position as itself, monotonically increasing, and has better properties), x is the input, and C ∗ C ^*c∗ is the optimal encoding:


c = ( L 2 ) 2 = a r g m i n c x g ( c ) 2 2 = ( x g ( c ) ) T ( x g ( c ) ) = x T x 2 x T g ( c ) + g ( c ) T g ( c ) = a r g m i n c 2 x T D c + c T I l c (including c = f ( x ) . g ( c ) = D c ) c ( 2 x T D c + c T c ) = 0 c = f ( x ) = D T x c^*=(L^2)^2=argmin_c||x-g(c)||_2^2 \\\\ =(x-g(c))^T(x-g(c)) \\\\ =x^Tx-2x^Tg(c)+g(c)^Tg(c) \\\\ =argmin_c-2x^TDc+c^TI_lc \ \ \ \ (c = f (x), g (c) = Dc) \ \ \ \ \ therefore \ nabla_c (- 2 x ^ TDc + c ^ Tc) = 0 \ \ \ \ c = f (x) = D ^ Tx

We know from the above that all we need to get to C is a matrix multiplication. Define refactoring operations:


r ( x ) = g ( f ( x ) ) = D D T x D = a r g m i n D i . j ( x j ( i ) r ( x ( i ) ) j ) 2 Among them D T D = I l R (x) = g (f (x) = DD ^ ^ Tx \ \ \ \ D * = argmin_D \ SQRT {\ sum_ {I, j} (x_j ^ {} (I) -r (x ^ {(I)}) _j) ^ 2} \ \ \ \ D = I_l ^ TD

After complex derivation, it can be proved by mathematical induction that matrix D can be composed of eigenvectors corresponding to the largest eigenvalues of the first LLL of the first XTXX^TXXTX.

conclusion

Principal component analysis (PCA) is mainly used for data dimension reduction, aiming to reduce the amount of data as much as possible while minimizing the loss of original data.

  • This article was originally posted by RAIS