A list,

1 PCA Principal Component Analysis (PCA) is a common data Analysis method. PCA is a data representation method that transforms the original data into a set of linearly independent dimensions through linear transformation. PCA can be used to extract the main feature components of data and is often used for dimensionality reduction of high-dimensional data.

1.1 Dimension reduction

In data mining and machine learning, data is represented as vectors. For example, the annual flow and transaction of a Taobao store in 2012 can be regarded as a set of records, in which the data of each day is a record in the following format:

(Date, number of views, number of visitors, number of transactions, amount of transactions)

Where, “date” is a record mark rather than a metric value, while data mining mostly cares about metric value. Therefore, if we ignore the field of date, we get a group of records, each record can be represented as a five-dimensional vector, and one sample is shown as follows:



It is customary to use column vectors to represent a record, and this guideline will be followed later in this article.

The complexity of many algorithms in machine learning is closely related to, or even exponentially related to, the dimension of the data. Here only 5 dimensions of data may not matter, but in actual machine learning, it is not uncommon to deal with tens of thousands or even tens of thousands of dimensions of data. In this case, the resource consumption of machine learning is unacceptable, so dimensionality reduction will be adopted for data. Dimensionality reduction means the loss of information, but since the actual data itself is often relevant, reduce the loss of information in dimensionality reduction.

For example, in the data of taobao stores above, we know from experience that there is a strong correlation between “page views” and “number of visitors”, and a strong correlation between “number of orders” and “number of transactions”. When the number of page views is high (or low) on a particular day, we should largely assume that the number of visitors is also high (or low) on that day. Therefore, if you delete views or visitors, you don’t end up losing so much information that you reduce the dimension of the data, a so-called dimensional reduction operation. If the data dimension reduction is analyzed and discussed by mathematics, it is expressed by the professional term PCA, which is a dimension reduction method with strict mathematical basis and has been widely adopted.

1.2 Vectors and basis transformations

1.2.1 Inner product and projection

The inner product of two vectors of the same size is defined as follows:







1.2.2 base

In algebra, vectors are often represented by the point coordinates of the end of a line segment. So let’s say that some vector has a coordinate of 3,2, and the 3 here really means that the projection of the vector on the x axis is 3, and the projection of the vector on the y axis is 2. That is, implicitly introducing a definition of x and y vectors of length 1 in the positive direction. So a vector (3,2) is actually projecting 3 onto the X-axis and 2 onto the Y-axis. Notice that the projection is a vector, it can be negative. The vectors (x, y) actually represent linear combinations:



So from the representation above, it turns out that all two-dimensional vectors can be represented as linear combinations like this. Here (1,0) and (0,1) are called bases in two dimensions.



The default bases of (1,0) and (0,1) are chosen for convenience, of course, because they are unit vectors in the positive direction of the x and y axes respectively, thus making point coordinates and vectors on the two-dimensional plane correspond one to one. But in fact any two linearly independent two-dimensional vectors can be a basis, and linearly independent two-dimensional vectors in a two-dimensional plane, intuitively, are two vectors that are not in a straight line.



And if the basis is orthogonal, the only thing that makes it a basis is that it’s linearly independent, and a non-orthogonal basis is fine. But because of the good properties of the orthogonal basis, the basis used in general is orthogonal.

1.2.3 Matrix of basis transformation

The basis transformation in the above example can be represented by matrix multiplication, i.e



If promotion, suppose there are M A N d vector, to transform it as R A N d vectors of new space, so first will be R A base according to the row of matrix A, then the vector according to the column of matrix B, then the product of two matrices AB is transformation as A result, the AB first M after the first M column transformation as A result, through the matrix multiplication is expressed as:





1.3 Covariance matrix and optimization objectives

In data dimension reduction, the key problem is how to determine whether the selected basis is optimal. That is, selecting the optimal basis is the best way to ensure the characteristics of the original data. So let’s say I have five pieces of data



You take the average of each row, and then you subtract the average from each row, and you get



The matrix is expressed in the form of coordinates, and the graph is as follows:



So now the question is: how do you choose to represent these data with one-dimensional vectors and still want to retain the original information as much as possible? In fact, this problem is to select a vector in a direction in the two-dimensional plane, project all data points onto this line, and represent the original record with the value of the projection, that is, the problem of two-dimensional reduction to one-dimensional. So how do you choose this direction (or base) so that you retain as much original information as possible? An intuitive view is that you want the projected values to be as diffuse as possible.

1.3.1 variance

The above problem is to hope that the values of the post-projection can be scattered in one direction as far as possible, and the degree of dispersion can be expressed by mathematical variance, namely:



Thus, the above problem is formalized as: looking for a one-wiki, which maximizes the variance value after all data is transformed into coordinates on this basis.

2.3.2 covariance

Mathematically, the correlation can be expressed by the covariance of two features, namely:



When the covariance is 0, it means that the two features are completely independent. In order for the covariance to be zero, you can only choose the second basis in directions that are orthogonal to the first basis. So the two directions that you end up choosing must be orthogonal.

At this point for dimension reduction problem of the optimization goal: will be reduced to a set of N d vector K d (K < N), the goal is to choose K units (mode 1) orthogonal basis, so that when the raw data transformation to the group based on, in various fields between the two covariance is 0, the field of variance was as large as possible (under the constraint of orthogonal, take the biggest K variance).

2.3.3 Covariance matrix

Assume that there are only two fields x and Y, and form them into a matrix by row, where is the matrix obtained by the centralized matrix, that is, each field minus the average value of each field:





3.4 Diagonalization of covariance matrix





1.4 Algorithm and examples

1.4.1 PCA algorithm



1.4.2 instance





1.5. Discuss

Based on the above explanation of the mathematical principles of PCA, you can learn some of the capabilities and limitations of PCA. In essence, PCA takes the direction with the largest variance as the main feature and “de-correlates” the data in each orthogonal direction, that is, makes them irrelevant in different orthogonal directions.

Therefore, PCA also has some limitations. For example, it can remove linear correlation well, but there is no way for high-order correlation. For data with high-order correlation, Kernel PCA can be considered and non-linear correlation can be converted into linear correlation through Kernel function. In addition, PCA assumes that the main features of the data are distributed in the orthogonal direction. If there are several directions with large variances in the non-orthogonal direction, the effect of PCA will be greatly reduced.

Finally, IT should be noted that PCA is a parameterless technology. In other words, in the face of the same data, if cleaning is not considered, the results will be the same. There is no subjective parameter intervention, so PCA is convenient for general implementation, but it cannot be personalized optimization.

Ii. Source code

function varargout = MainForm(varargin)
% MAINFORM MATLAB code for MainForm.fig
%      MAINFORM, by itself, creates a new MAINFORM or raises the existing
%      singleton*.
%
%      H = MAINFORM returns the handle to a new MAINFORM or the handle to
%      the existing singleton*.
%
%      MAINFORM('CALLBACK',hObject,eventData,handles,...) calls the local
%      function named CALLBACK in MAINFORM.M with the given input arguments.
%
%      MAINFORM('Property'.'Value',...). creates anew MAINFORM or raises the
%      existing singleton*.  Starting from the left, property value pairs are
%      applied to the GUI before MainForm_OpeningFcn gets called.  An
%      unrecognized property name orinvalid value makes property application % stop. All inputs are passed to MainForm_OpeningFcn via varargin. % % *See GUI  Options on GUIDE's Tools menu.  Choose "GUI allows only one % instance to run (singleton)".
%
% See also: GUIDE, GUIDATA, GUIHANDLES

% Edit the above text to modify the response to help MainForm

% Last Modified by GUIDE v2. 5 17-Mar- 2014. 21:27:08

% Begin initialization code - DO NOT EDIT
gui_Singleton = 1;
gui_State = struct('gui_Name',       mfilename, ...
    'gui_Singleton',  gui_Singleton, ...
    'gui_OpeningFcn', @MainForm_OpeningFcn, ...
    'gui_OutputFcn',  @MainForm_OutputFcn, ...
    'gui_LayoutFcn', [],...'gui_Callback'[]);if nargin && ischar(varargin{1})
    gui_State.gui_Callback = str2func(varargin{1});
end

if nargout
    [varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
    gui_mainfcn(gui_State, varargin{:});
end
% End initialization code - DO NOT EDIT


% --- Executes just before MainForm is made visible.
function MainForm_OpeningFcn(hObject, eventdata, handles, varargin)
% This function has no output args, see OutputFcn.
% hObject    handle to figure
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
% varargin   command line arguments to MainForm (see VARARGIN)

% Choose default command line output for MainForm
handles.output = hObject;
clc;
set(handles.axes1, 'XTick'[],'YTick', [],...'XTickLabel'.'a', 'YTickLabel'.'a', 'Color'[0.7020 0.7804 1.0000].'Box'.'On'.'xlim'[- 1 1].'ylim'[- 1 1]);
set(handles.axes2, 'XTick'[],'YTick', [],...'XTickLabel'.'a', 'YTickLabel'.'a', 'Color'[0.7020 0.7804 1.0000].'Box'.'On'.'xlim'[- 1 1].'ylim'[- 1 1]);
set(handles.axes3, 'XTick'[],'YTick', [],...'XTickLabel'.'a', 'YTickLabel'.'a', 'Color'[0.7020 0.7804 1.0000].'Box'.'On'.'xlim'[- 1 1].'ylim'[- 1 1]);
handles.Ims = 0;
handles.c = 0;
handles.Im = 0;
handles.f = 0;
handles.Img = 0;
% Update handles structure
guidata(hObject, handles);

% UIWAIT makes MainForm wait for user response (see UIRESUME)
% uiwait(handles.figure1);


% --- Outputs from this function are returned to the command line.
function varargout = MainForm_OutputFcn(hObject, eventdata, handles)
% varargout  cell array for returning output args (see VARARGOUT);
% hObject    handle to figure
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)

% Get default command line output from handles structure
varargout{1} = handles.output;



function edit1_Callback(hObject, eventdata, handles)
% hObject    handle to edit1 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)

% Hints: get(hObject,'String') returns contents of edit1 as text
%        str2double(get(hObject,'String')) returns contents of edit1 as a double


% --- Executes during object creation, after setting all properties.
function edit1_CreateFcn(hObject, eventdata, handles)
% hObject    handle to edit1 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    empty - handles not created until after all CreateFcns called

% Hint: edit controls usually have a white background on Windows.
%       See ISPC and COMPUTER.
if ispc && isequal(get(hObject,'BackgroundColor'), get(0.'defaultUicontrolBackgroundColor'))
    set(hObject,'BackgroundColor'.'white');
end


% --- Executes on button press in pushbutton1.
function pushbutton1_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton1 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
filePath = OpenImageFile();
if filePath == 0
    return;
end
Img = imread(filePath);
if ndims(Img) = =3
    Img = rgb2gray(Img);
end
sz = size(Img);
sz0 = [112 92];
if ~isequal(sz, sz0);
    Img = imresize(Img, sz0, 'bilinear');
end
% wh = 600;
% if sz(1) > wh
%     rate = wh/sz(1);
%     Img = imresize(Img, rate, 'bilinear'); Imshow (Img, [],'Parent', handles.axes1);
handles.Img = Img;
handles.sz = size(Img);
guidata(hObject, handles);


% --- Executes on button press in pushbutton2.
function pushbutton2_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton2 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
if isequal(handles.Img, 0)
    return;
end
f = GetFaceVector(handles.Img);
f = f(1:round(length(f)*0.9));
handles.f = f;
guidata(hObject, handles);
msgbox('Dimension reduction succeeded! '.'Prompt message');


% --- Executes on button press in pushbutton3.
function pushbutton3_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton3 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
if isequal(handles.f, 0)
    return; end Im = QrGen(handles.f); % display imshow(Im, [],'Parent', handles.axes2);
handles.Im = Im;
guidata(hObject, handles);


% --- Executes on button press in pushbutton4.
function pushbutton4_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton4 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
if isequal(handles.Im, 0)
    return;
end
c = QrDen(handles.Im);
set(handles.edit1, 'String', c);
handles.c = c;
guidata(hObject, handles);


% --- Executes on button press in pushbutton5.
function pushbutton5_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton5 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
if isequal(handles.c, 0)
    return; end Ims = FaceRec(handles.c, handles.sz); % display imshow(Ims, [],'Parent', handles.axes3);
handles.Ims = Ims;
guidata(hObject, handles);
Copy the code

3. Operation results

Fourth, note

Version: 2014 a