A list,

This paper designs and implements a text-related voiceprint recognition system based on Matlab, which can determine the speaker’s identity.

1 System Principle

A. Voice print recognition

In the past two years, with the development of artificial intelligence, many mobile phone apps have introduced the function of voice print lock. This is mainly used in the voice print recognition related technology. Voice print recognition, also known as speaker recognition, is a little different from speech recognition.



B. Meyer frequency cepstrum coefficient (MFCC)

Mel Frequency Cepstrum Coefficient (MFCC) is one of the most commonly used speech signal features in speech signal processing.

Experimental observations show that the human ear acts like a filter bank, focusing only on certain frequencies on the spectrum. The range of sound frequency perception of human ear does not follow a linear relationship in the spectrum, but follows an approximate linear relationship in the Mel frequency domain.

Meier frequency cepstrum coefficient takes into account human auditory characteristics, first mapping the linear spectrum to the Mel nonlinear spectrum based on auditory perception, and then converting to cepstrum. The relation between ordinary frequency conversion and Mayer frequency is:



C. VectorQuantization

The system uses vector quantization to compress the extracted speech MFCC features.

VectorQuantization (VQ) is a lossy data compression method based on block coding rules. In fact, there is a VQ step in multimedia compression formats such as JPEG and MPEG-4. Its basic idea is: several scalar data groups form a vector, and then the whole quantization in the vector space, so as to compress the data without losing much information.

3 System Structure

The structure of the whole system in this paper is shown as follows:

— Training process

Firstly, the speech signal is preprocessed, then the MFCC characteristic parameters are extracted and compressed by vector quantization method to obtain the speaker’s pronunciation codebook. The same speaker says the same content for many times, and the training process is repeated to form a codebook library.

— Identification process

In recognition, the speech signal is also preprocessed to extract MFCC features and compare the Euclidean distance between this feature and the training library codebook. When the value is smaller than a certain threshold, we assume that the speaker and the content of the speech are consistent with those in the training codebook, and the pairing is successful.

Ii. Source code

function varargout = test4(varargin)
% TEST4 MATLAB code for test4.fig
%      TEST4, by itself, creates a new TEST4 or raises the existing
%      singleton*.
%
%      H = TEST4 returns the handle to a new TEST4 or the handle to
%      the existing singleton*.
%
%      TEST4('CALLBACK',hObject,eventData,handles,...) calls the local
%      function named CALLBACK in TEST4.M with the given input arguments.
%
%      TEST4('Property'.'Value',...). creates anew TEST4 or raises the
%      existing singleton*.  Starting from the left, property value pairs are
%      applied to the GUI before test4_OpeningFcn gets called.  An
%      unrecognized property name or invalid value makes property application
%      stop.  All inputs are passed to test4_OpeningFcn via varargin.
%
%      *See GUI Options on GUIDE's Tools menu.  Choose "GUI allows only one % instance to run (singleton)".
%
% See also: GUIDE, GUIDATA, GUIHANDLES

% Edit the above text to modify the response to help test4

% Last Modified by GUIDE v2. 5 17-Mar- 2019. 09:58:00

% Begin initialization code - DO NOT EDIT
gui_Singleton = 1;
gui_State = struct('gui_Name',       mfilename, ...
                   'gui_Singleton',  gui_Singleton, ...
                   'gui_OpeningFcn', @test4_OpeningFcn, ...
                   'gui_OutputFcn',  @test4_OutputFcn, ...
                   'gui_LayoutFcn', [],...'gui_Callback'[]);if nargin && ischar(varargin{1})
    gui_State.gui_Callback = str2func(varargin{1});
end

if nargout
    [varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
    gui_mainfcn(gui_State, varargin{:});
end
% End initialization code - DO NOT EDIT


% --- Executes just before test4 is made visible.
function test4_OpeningFcn(hObject, eventdata, handles, varargin)
% This function has no output args, see OutputFcn.
% hObject    handle to figure
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
% varargin   command line arguments to test4 (see VARARGIN)

% Choose default command line output for test4
handles.output = hObject;

% Update handles structure
guidata(hObject, handles);

% UIWAIT makes test4 wait for user response (see UIRESUME)
% uiwait(handles.figure1);


% --- Outputs from this function are returned to the command line.
function varargout = test4_OutputFcn(hObject, eventdata, handles) 
% varargout  cell array for returning output args (see VARARGOUT);
% hObject    handle to figure
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)

% Get default command line output from handles structure
varargout{1} = handles.output;


% --- Executes on button press in pushbutton1.
function pushbutton1_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton1 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
global thk1 thk2 thk3 
global tlc1 tlc2 tlc3
global tlyy1 tlyy2 tlyy3 
global tqs1 tqs2 tqs3
global tyqc1 tyqc2 tyqc3
global startpos len

startpos=601;
len=399;
[s,fs]=audioread('Training Sample HK1.wav');
thk1= MFCC2par(s,fs);
thk1=thk1(startpos:startpos+len,1:12);

[s,fs]=audioread('Training Sample HK2.wav');
thk2= MFCC2par(s,fs);
thk2=thk2(startpos:startpos+len,1:12);

[s,fs]=audioread('Training Sample HK3.wav');
thk3= MFCC2par(s,fs);
thk3=thk3(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample Lc1. Wav');
tlc1= MFCC2par(s,fs);
tlc1=tlc1(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample lc2. Wav');
tlc2= MFCC2par(s,fs);
tlc2=tlc2(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample LC3.wav');
tlc3= MFCC2par(s,fs);
tlc3=tlc3(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample Lyy1.wav');
tlyy1= MFCC2par(s,fs);
tlyy1=tlyy1(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample lyy2.wav');
tlyy2= MFCC2par(s,fs);
tlyy2=tlyy2(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample lyy3.wav');
tlyy3= MFCC2par(s,fs);
tlyy3=tlyy3(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample Qs1.wav');
tqs1= MFCC2par(s,fs);
tqs1=tqs1(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample qs2.wav');
tqs2= MFCC2par(s,fs);
tqs2=tqs2(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample qs3.wav');
tqs3= MFCC2par(s,fs);
tqs3=tqs3(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample');
tyqc1= MFCC2par(s,fs);
tyqc1=tyqc1(startpos:startpos+len,1:12);

[s,fs]=audioread('Training sample yqc2.wav');
tyqc2= MFCC2par(s,fs);
tyqc2=tyqc2(startpos:startpos+len,1:12); Function getmfcc = MFCC2par (x, fs) % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = % % denoising and endpoint detection Input: audio data x, sampling rate fs % Output: (N,M) Size of the feature parameter matrix, where N is the number of frames,M is the feature dimension % feature parameter: M=24Cepstrum coefficient12Dimensional, first order difference12D % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = % fs [x] = wavread (sound); % mono signal [~, etMP]=size(x);if (etmp==2)
x=x(:,1); End % Normalized MEL filter bank coefficient Bank = Melbankm (24.256,fs,0.0.5.'m'); The order of %Mel filter is24, the length of FFT transformation is256And the sampling frequency is8000Hz bank=full(bank); bank=bank/max(bank(:)); % [24*129] % Set DCT coefficientfor k=1:12

n=0:23;

dctcoef(k,:)=cos((2*n+1)*k*pi/(2*24)); End % Normalized cepstrum lifting window W =1+6*sin(pi*[1:12]. /12); w=w/max(w); % preweighted filter xx=double(x);

xx=filter([10.9375].1,xx); % preweighted xx=enframe(xx,256.80); The % of x256Points are divided into a frame % to calculate the MFCC parameters of each framefor i=1:size(xx,1) y=xx(i,:); % Take a frame s=y'.*hamming(256);

t=abs(fft(s)); % FFT fast Fourier transform amplitude spectrum t=t.^2; % Energy spectrum % Perform MEL filtering on FFT parameters and take logarithm to calculate cepstrum C1 = DCTCoef *log(bank*t(1:129)); % energy spectrum filtering and DCT % T (1:129) to a frame before128Number (frame shift is128) c2 = c1. * w'; % normalized cepstrum % MFCC parameter m(I,:)=c2';

end
Copy the code

3. Operation results



Fourth, note

Version: 2014 a