By explaining the principle of PCA algorithm, we can understand the general principle of dimensionality reduction algorithm and what functions can be achieved. Combined with the practice of applying dimension reduction algorithm to preprocess the classification algorithm before it is used, it helps us to realize the function of the algorithm.

0 Related to the source code

1 PCA algorithm and principle overview

1.1 What is Dimensionality reduction?

The process of changing from a higher dimension to a lower dimension is dimensionality reduction

◆ Taking a picture, for example, is to transform a person or an object in a three-dimensional space into a two-dimensional picture

There are linear and nonlinear methods for dimensionality reduction. Machine learning can simplify operations and reduce the number of features

1.2 PCA algorithm introduction

PCA algorithm is a common linear dimensionality reduction algorithm, the algorithm is similar to “projection”

Dimension reduction simplifies the data set, so it can be regarded as a compression process, in the compression process may; Information will be lost

PCA can be used to simplify features, but also can be used in image processing, such as the feature face method based on PCA algorithm, it can be used for face recognition

1.3 Introduction to PCA algorithm principle

◆ PCA is an algorithm based on K-L transformation

PCA algorithm in the implementation of covariance matrix, and matrix eigendecomposition

◆ The basic main content is to find the covariance matrix, and then find the eigenvalue and eigenvector of the covariance matrix

1.4 PCA algorithm steps

◆ Input n rows and m columns of matrix X, representing M pieces of N-dimensional data

◆ Zero mean treatment is performed on each row of matrix X

◆ Find the covariance matrix C of X

◆ Find the eigenvalues and eigenvectors of covariance matrix C

◆ Arrange the eigenvectors from top to bottom according to the size of the eigenvalues, take the first K rows as the matrix P

◆ The cross product of P and X matrix is m data of dimension reduction value K

2 Actual PCA algorithm to achieve dimensionality reduction

  • code
  • The feature column is reduced to 3

Spark Machine learning Practice series

  • Spark based Machine learning Practices (PART 1) – Introduction to machine learning
  • Spark based Machine learning practices (II) – Introduction to MLlib
  • Spark based machine learning practice (III) – Actual environment construction
  • Spark based Machine learning practice (IV) – Data visualization
  • Spark based Machine learning practice (vi) – Basic statistics module
  • Spark based machine learning practice (vii) – regression algorithm
  • Spark based machine learning practice (viii) – Classification algorithm
  • Spark based machine learning practice (IX) – clustering algorithm
  • Spark based machine learning practice (10) – Dimensionality reduction algorithm

X to contact me

Picture captions

Java communication group

blog

zhihu

Github