Speech Recognition Based on MATLAB Vector quantization (VQ) speaker recognition

A list,

Vector quantization is defined as the process of encoding points in a vector space with a finite subset of them. In vector quantization coding, the key is the establishment of codebook and codeword search algorithm. For example, the common clustering algorithm is a vector quantization method. In ANN approximate nearest neighbor search, vector Quantization method is the most typical Product Quantization method. At the end of my previous post on content-based image retrieval, I briefly outlined the PQ product quantization approach. In this section, Xiaocabbage combined with his own reading paper and code to PQ product quantization, inversion product quantization (IVFPQ) to make a more intuitive explanation.

1 PQ product quantization

The core idea of PQ product quantization is clustering, or specifically applied to ANN approximate nearest neighbor search. K-means is a special case where the number of subspaces of PQ product quantization is 1. The process of PQ product quantization to generate codebook and quantization can be illustrated by the following diagram:

In the training phase, in view of the N the training sample, assumes that the sample dimension is 128 d, we will be the segmentation space for four subsystems, each child space dimensions as 32 d, then we in the space of each child and their vector using K – Means of clustering (motioned to 256 class), so that each child would receive a code in this space. In this way, each sub-segment of the training sample can be approximated by the clustering center of the subspace, and the corresponding code is the ID of the class center. As shown in the figure, through such a coding method, only a very short code used by training samples can be represented, so as to achieve the purpose of quantization. For the sample to be encoded, it is segmented in the same way, and then the nearest class center is found in each subspace one by one, and then the ID of the class center is used to represent them, that is, the coding of the sample to be encoded is completed.

As mentioned above, in vector quantization coding, the key is the establishment of codebook and codeword search algorithm. Above, we have obtained the establishment of codebook and quantization coding method. The remaining focus is how to calculate the distance between the query sample and the sample in the dataset.

In the query stage, PQ also calculates the distance between the query sample and each sample in the dataset, but this distance calculation is obtained by indirect approximation. PQ product quantization method has two ways of calculating distance, one is symmetric distance, the other is asymmetric distance. The loss of asymmetric distance is small (that is, it is closer to the real distance), and this distance calculation method is often adopted in practice. The following procedure shows the calculation of the query samples to the dataset samples in the form of asymmetric distance (the part marked by red box) when the query samples arrive:

In particular, when it comes to the query vector generated by the training sample code in this process, it also divided into the same segment, and then in each subspace, calculating all in sub paragraph to the subspace clustering center distance, as shown in figure in, be able to get 4 * 256 distance, behind here for ease of understanding, Chinese cabbage is called the good distance from the pool. When calculating the distance between a sample in the library and the query vector, for example, the distance between the sample coded as (124, 56, 132, 222) and the query vector, we can take the corresponding distance of each sub-segment in the distance pool respectively, for example, the sub-segment coded as 124, In the first 256 distances calculated, the distance numbered 124 can be taken out. After the distances corresponding to all subsegments are taken out, the distances of these subsegments are summed up, that is, the asymmetric distance between the sample and the query sample is obtained. So once we’ve calculated all the distances, and sorted them, we’re going to get what we want.

From the above process, it is clear that PQ product quantization can accelerate the principle of indexing: that is, the distance calculation of the full sample is transformed into the distance calculation to the center of the subspace class. For example, as mentioned above, the number of distance calculation in the brute-force search method originally increased linearly with the number of samples N, but after PQ coding, the time-consuming distance calculation could be almost ignored as long as it was calculated 4*256 times. In addition, as can be seen from the figure above, after encoding features, a relatively short code can be used to represent samples, which naturally consumes much less memory than the brute-force search method.

In some special occasions, we always hope to obtain accurate distance rather than approximate distance, and we always like to obtain cosine similarity between vectors (cosine similarity distance range is between [-1,1], which is convenient to set a fixed threshold). For this kind of scene, A sort of Brute-force search can be done for the PQ product quantization before top@K. Inversion product quantization inversion PQ product quantization (IVFPQ) is a further accelerated version of PQ product quantization. Its accelerating nature cannot be escaped. The principle of acceleration is emphasized at the top of the list: Brute force – search approach is to search in the whole space, in order to accelerate the search speed, almost all the ANN method is based on the division of space, to be divided into many small subspace, at the time of the search, in some way, quickly locked in a certain (some) subspace, and then in the (few) subspace traversal. As can be seen from the previous section, although the distance has been calculated in advance when the PQ product quantifies the distance, the distance from each sample to the query sample must be summed up one by one honestly to calculate the distance. But, in fact we are interested in those with query sample similar sample (choy said interested in such areas as area), that is to say honestly each additive actually did a lot of busywork, if can through some means fast lock global traversal for interested area, you can give up unnecessary global computing and sorting. The “inversion” of inverted PQ product quantization is just the embodiment of such an idea. In terms of the specific implementation method, it adopts the method of clustering to realize the rapid location of the region of interest. In the inverted PQ product quantization, clustering can be said to be applied incisively and vividly.

The whole process of inverted PQ product quantization is shown in the figure below:

A rough quantization process is added before PQ product quantization. Specifically, N training samples are firstly clustered by K-means, and the number of clustering should not be too large, generally set to 1024, which can complete the clustering process in a relatively fast speed. After obtaining the cluster center, for each sample X_i, find the nearest class center C_i, and subtract the two to obtain the residual vector of sample X_I (X_i-C_i). The remaining process is the PQ product quantization process for (X_I-C_i), which will not be described again.

During the query, the same rough quantization can be used to quickly locate which C_I (i.e., which region of interest) the query vector belongs to, and then calculate the distance in the region of interest according to the PQ product quantization distance calculation method described above.

Ii. Source code

Function MFC = my_mFCC (x,fs) % my_mFCC: obtain the parameters of Speaker recognition %x: input speech signal fs: sampling rate % MFC: twelve MFCC coefficients and one energy and first and second order difference total36% CLC,clear %[x,fs]= Audioread ('wo6.wav');
N=256; p=24;
bank=mel_banks(p,N,fs,0,fs/2); % Normalized MEL filter bank coefficient Bank =full(bank); bank=bank/max(bank(:)); % Normalized cepstrum lifting window w =1 + 6 * sin(pi * [1:12] ./ 12); w = w/max(w); % preweighted filter x=double(x);
x=filter([1 0.9375].1,x); % voice signal frame sf=check_ter(x,N,128.10);

x=div_frame(x,N,128);
%sf=sp_ter(x,4.16);
%m=zeros(size(x,1),13);
m=zeros(1.13); % Calculates the MFCC parameters for each framefor i=1:size(x,1)
    if sf(i)==1
        % j=j+1;
         y = x(i,:);
         y = y' .* hamming(N);
         energy=log(sum(y.^2)+eps); % energy y is equal toabs(fft(y));
         y = y.^2+eps;
        c1=dct(log(bank * y));
        c2 = c1(2:13).*w'; % take2~13A coefficient of % m (I, :) = [c2; energy]'; m1=[c2;energy]';
        %m1=c2'; m=[m;m1]; End end % dm = zeros(size(m)); dmm= zeros(size(m));for i=2:size(m,1)- 1
  dm(i,:) = (m(i,:) - m(i- 1:)); endfor i=3:size(m,1)2 -
  dmm(i,:) = (dm(i,:) - dm(i- 1:)); end %dm = dm /3; Function v= LBG (x,k) % LBG: Complete the LBG mean clustering algorithm % LBG (x,k) to input sample X and divide it into K classes. Where x is the row*col matrix, each column has one sample, % each sample has row elements. % [v1 v2 v3 ...vk]=lbg(...) Vi. Num = number of elements in %; vi. Ele (I) = number of elements in %; vi. Mea =size(x); %u=zeros(row,k); % each column has a central value epision=0.03; % Select the epision argument delta=0.01; %u2=zeros(row,k); %LBG algorithm generates k centers u=mean(x,2); % First cluster center, population meanfor i3=1:log2(k)
    u=[u*(1-epision),u*(1+epision)]; % % time = double0;
    D=0;
    DD=1;
    while abs(D-DD)/DD>delta   %sum(abs(u2(:).^2-u(:).^2))> 0.5 &&(time<=80)   %u2~=u
        DD=D;
        for i=1:2^i3 % Initialize v(I).num=0;
            v(i).ele=zeros(row,1);
        end
        for i=1Distance =dis(u,x(:, I)); % the distance of the ith sample to each center [val,pos]=min(distance); v(pos).num=v(pos).num+1; % The number of elements plus1
            if v(pos).num= =1% ele empty v (pos). Ele = x (:, I);else
                v(pos).ele=[v(pos).ele,x(:,i)];
            end
        end
        for i=1:2^i3 
            u(:,i)=mean(v(i).ele,2); % new mean centerfor m=1:size(v(i).ele,2)
                D=D+sum((v(i).ele(m)-u(:,i)).^2);
            end
        end
    end
end
Copy the code

3. Operation results

Fourth, note

Version: 2014 a

Speech Recognition Based on MATLAB Vector quantization (VQ) speaker recognition

A list,

Ii. Source code

3. Operation results

Fourth, note

Related Posts

【TWVRP】 Vehicle path Planning Problem with Time Window Based on MatalB Particle Swarm Optimization

(4) Introduce the pre-training language model into WikiSQL task

Python tutorial 73