This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money. Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

A maximum likelihood estimation of self-selected hyperparameter

What options do n_components have other than entering integers? As we have mentioned before, the theoretical development of matrix decomposition is unique in the industry. Minka, T.P., a diligent and intelligent mathematical god. In my research at MIT Media Lab, I found a method for PCA to select its own hyperparameters with maximum likelihood Estimation. This method can be invoked by inputting “MLE” as the parameter input of N_Components

Pca_mle = pca_mle. Fit (X) X_mle = pca_mle. Transform (X) X_mle# Mle automatically selects 3 features for us pCA_mle.explained_variance_ratio_.sum ()#0.9947878161267247 # higher information content than when setting 2 features. For the small data set of iris, Three features correspond to such a high information content, so it is not necessary to worry about keeping only two features, after all, three features can also be visualizedCopy the code

Choose the hyperparameter according to the proportion of information

Enter a floating point number between [0,1] and let the argument svd_solver ==’full’ to indicate that you want the total interpretative variance after dimensionality reduction to be greater than the percentage specified by n_components, that is, what percentage of information you want to retain. For example, if we want to retain 97% of the information, we can enter n_components = 0.97, and the PCA will automatically select the number of features that will allow us to retain more than 97% of the information.

Pca_f = pca_f.fit(X) X_f = pca_f.transform(X) X_f Pca_f. # explained_variance_ratio_ array ([0.92461872, 0.05306648])Copy the code
  • Singular value decomposition can directly calculate the new eigenspace and dimensionally reduced eigenmatrix without calculating the covariance matrix and other complicated and tedious matrices.
  • In short, the process of SVD in matrix decomposition is simpler and faster than PCA. Although both algorithms go through the same decomposition process, SVD can cheat by directly calculating V. Unfortunately, the information measurement index of SVD is complicated, and it is far easier to understand “singular value” than “variance”. Therefore, Sklearn splits the dimensionality reduction process into two parts: One part is to calculate the characteristic space V, which is completed by singular value decomposition, and the other part is to map data and solve the new characteristic matrix, which is completed by principal component analysis. The property of SVD is realized to reduce the amount of calculation, but the evaluation index of information is variance.
  • Sklearn implements a “cooperative dimension reduction” that is faster and easier to calculate, but works well. Many people understand THAT SVD is regarded as a solution method of PCA. In fact, SVD is used to reduce the amount of computation rather than the eigenvalue decomposition of PCA itself. This method does exist, but in sklearn matrix although U and 2 will be calculated (the same is also a kind of than PCA process simplified very much mathematics, do not produce covariance matrix), but will not be used completely, cannot access to or use, so we can believe that U and z was abandoned after the ft. Singular value decomposition only pursues V, and once V is available, the eigenmatrix after dimension reduction can be calculated. After the transform process, the results of singular value decomposition in FIT will be discarded except for V(k,n), while V(k,n) will be stored in the attribute Components_, which can be called to view.