Voice enhancement Based on matlab human ear masking effect voice enhancement

A list,

The human ear can pick up small sounds in a quiet environment, but in a noisy environment, these small sounds are drowned out by the noise. This phenomenon of raising the threshold of hearing for the second sound due to the presence of the first sound is called the masking effect. The first sound is called the masking sound, the second sound is called the masked sound, and the amount of threshold increase in the second sound is called the masking effect.

Masking effect occurs, generally with different properties of sound as masking sound, such as pure tone, polyphony, noise and so on. It is also found that masking occurs when the masking sound and the masked sound do not arrive at the same time. This masking phenomenon is called non-simultaneous masking. The masking that occurs before the masking sound is masked is called premasking; The masking that occurs after the masking sound is called post-masking.

The masking effect of hearing is generally expressed in terms of the new threshold curve in the presence of masking sound, so the masked sound referred to here is generally pure sound. The auditory threshold at which masking sound exists is called masking threshold.

1. Masking of pure tones

Pure tone is the simplest kind of sound. The following figure reflects the characteristics of the measured threshold of pure tone with frequency variation when 1KHz and 80dB pure tone is masking sound. In the figure, the dotted line is the curve of auditory threshold, the solid line is the curve of masking threshold, and the words represent the sounds that can be heard in different areas.

In the frequency range below 700Hz and above 9KHz, pure tone threshold is almost unaffected by masking sound.

Between 700Hz and 9KHz, the threshold of pure tone hearing is significantly increased, the closer to the masking sound frequency, the greater the masking.

The masking of pure tones basically conforms to the following rules: bass is easy to mask treble, treble is difficult to mask bass; Pure tones with similar frequencies are easy to mask each other; When the SPL of the masking sound is increased, the masking threshold will be increased and the range of frequencies to be masked will be extended.

2, polyphony masking

Most sounds exist in polyphony. Music is generally composed of a fundamental frequency and multiple harmonic frequencies, and its timbre mainly depends on its harmonic frequency structure. Polyphony masking range is mainly decided by the frequency components that are included in the polyphony, contained in each frequency have produced near one of the biggest amount of masking, when the frequency is less than or greater than the minimum frequency polyphony contains contains the maximum frequency, masking effect weakened gradually, and the masking threshold to converge with no sound masking threshold.

3. Masking of narrowband noise

Narrow-band noise usually refers to the noise whose bandwidth is equal to or less than the critical auditory band. When pure tone is used as masking sound, it is difficult to measure masking threshold because of beat tone and difference tone. If narrowband white noise is used as masking sound, the measurement is easier and the result is more reliable. The masking characteristic of narrowband noise is very similar to that of pure tone, but the left and right asymmetry of curve is not so strong. The following figure shows the threshold curve of hearing when narrowband noises with different central frequencies are used as masking sound. The central frequencies of narrowband noises are 0.25KHz, 1KHz and 4KHz respectively.

3. Non-simultaneous masking effect

Most of the time, the sound signal is transient and unsteady, and the sound pressure level changes rapidly with time, that is, the strong tone is followed by the weak tone, and the weak tone may be followed by the strong tone. Stronger sounds tend to drown out weaker ones that follow.

Generally speaking, the simultaneous masking effect is the strongest and the masking quantity is the largest. The effect of front masking is larger than that of back masking, and the time of front masking is much longer than that of back masking. Masking that occurs before the test signal is easy to understand because hearing has a memory function. However, the auditory sense of sound needs a process of establishment, so there will be a certain delay. The auditory sense of strong sound is established faster than that of weak sound, so there is the phenomenon of post-masking.

According to the principle of masking effect, SNR (signal-to-noise ratio) and THD (harmonic distortion) are derived from electro-acoustic technical indicators. When noise or distortion is kept within a certain range, it has no effect on the auditory effect.

Ii. Source code

clc;
clear all;
[x,fs,nbits]=wavread('beijing.wav');
y=awgn(x,0.'measured'); Function output=sub_rener(Signal,fs) L=size(Signal,1);

W=1024;
SP=0.5;
OverLapNum=W*SP;
Window=hamming(W);
y=segment(Signal,W,SP,Window);

FrameNum=size(y,2);
ffty=fft(y,W);

Yabs=abs(ffty); Yangle=angle(ffty); % % % %24The Bark frequency group %%%% k=0;
for f=21:fs
    k=k+1;
    z(k)=ceil(13*atan(0.76*f/1000) +3.5*atan((f/7500) ^2));    
end 

BarkNum=18; % The sampling frequency of the audio signal used in the experiment is8k

va=20;
dz=diff(z);
fz=find(dz==1)+va; % %%%% extension function B%%%% I =1:1:BarkNum;
delta=abs(repmat(i'.1,BarkNum)-repmat(i,BarkNum,1));
S=10.^ ((15.81+7.5.*(delta+0.474)17.5. * (1+(delta+0.474). ^2.) ^ (1/2)). /10); %~db DC_Gain=S*ones(BarkNum,1); %~db %%%% The energy in each bark %%%% Nz=fix(fz.*W/fs); Nz(BarkNum)=W/2+1;
start=1;
B=zeros(BarkNum,FrameNum);
for i=1:BarkNum;
    B(i,:)=sum(Yabs(start:Nz(i),:).^2);
    start=Nz(i)+1; End %~db %%%% %%%% Py=Yabs(1:W/2+1, :). ^2;

Am=sum(Py)./(W/2+1); %~db half=fix(W/4) +1;
Gm=prod(Py(1:half,:)).^(1/(W/2+1)).*prod(Py(half+1:end,:)).^(1/(W/2+1)); %~db SFM=10.*log10(Gm./Am); %db SFMmax=- 60; %db C=zeros(BarkNum,FrameNum);for i=1:BarkNum
    Btemp=repmat(B(i,:),BarkNum,1);
    C(i,:)=S(i,:)*Btemp; %~db
end

Alpha=max(min(SFM./SFMmax,1),0); %%%% exciter - masking threshold O(I)%%%% I =1:1:BarkNum;
O=Alpha'*(i+9) +5.5;
O=O'; %db %%% Actual masking threshold T(I)%%%% TT=10.^ (log10(C)-O./10); %~db %%%%%%%%%%%%%%%%%%%% T=TT./DC_Gain(:,ones(1,FrameNum)); %~db %%%%%%%%%%%%%%%%%%%%% w=fz(1:BarkNum);
Tqq=10.^ ((3.64.*(w./1000.) ^ (0.8)6.5. *exp(0.6.*(w./10003.3). ^2) + (10^ (- 3)).*(w./1000). ^4). /10);
Tq=repmat(Tqq'.1,FrameNum); Tfinal=max(Tq,T); % minus parameter AW bw AlphaMin=1; AlphaMax=6;
BetaMin=0; BetaMax=0.02;
Alphaw=zeros(BarkNum,FrameNum);
Betaw=zeros(BarkNum,FrameNum);

Tmin=repmat(min(Tfinal),BarkNum,1);
Tmax=repmat(max(Tfinal),BarkNum,1);

Alphaw(:,:)=AlphaMin;
Betaw(:,:)=BetaMin;

    
index=Tfinal>=Tmax;
Alphaw(index)=AlphaMax;
Betaw(index)=BetaMax;
    
index=(Tfinal>Tmin)&(Tfinal<Tmax);
Alphaw(index)=(AlphaMax.*(Tmax(index)-Tfinal(index))+AlphaMin.*(Tfinal(index)-Tmin(index)))...
        ./(Tmax(index)-Tmin(index));
Betaw(index)=(BetaMax.*(Tmax(index)-Tfinal(index))+BetaMin.*(Tfinal(index)-Tmin(index)))...
        ./(Tmax(index)-Tmin(index));


start=1;
Alpha=zeros(W/2+1,FrameNum);
Beta=zeros(W/2+1,FrameNum);


for i=1:BarkNum
    Alpha(start:Nz(i),:)=repmat(Alphaw(i,:),Nz(i)-start+1.1);
    Beta(start:Nz(i),:)=repmat(Beta(i,:),Nz(i)-start+1.1);
    start=Nz(i)+1;
end

    

Gamma=3;

Yabs=Yabs(1:W/2+1, :); G=zeros(W/2+1,FrameNum);

NoiseLength=9;
NIS=20;
N=mean(Yabs(:,1:NIS),2);
NoiseCounter=0;
for i=1:FrameNum
    [~, SpeechFlag, NoiseCounter, Dist]=vad(Yabs(:,i),N,NoiseCounter); %Magnitude Spectrum Distance VAD
    if SpeechFlag==0
        N=(NoiseLength*N+Yabs(:,i))/(NoiseLength+1); %Update and smooth noise
    end
    for k=1:W/2+1
        if N(i)^2>Yabs(k,i)^2
            G(k,i)=0;
        else
            if ((N(i)/Yabs(k,i))^2) < (1/(Gamma.*Alpha(k,i)+Beta(k,i)))
                G(k,i)=(1-Gamma.*Alpha(k,i).*( N(i)./Yabs(k,i) ).^2.) ^ (1/2);
            else
                G(k,i)=(Beta(k,i).*(N(i)./Yabs(k,i)).^2.) ^ (1/2);
            end
        end
    end
end


S=Yabs.*G;

Sfinal=[S;S(W/2:- 1:2:)]; j=sqrt(- 1);

Yfinal=Sfinal.*exp(j.*Yangle);


output=zeros(1,L);
hamwin=zeros(1,L);
for i=1:FrameNum   
    output(1+(i- 1)*OverLapNum:W+(i- 1)*OverLapNum)=...
    output(1+(i- 1)*OverLapNum:W+(i- 1)*OverLapNum)+(real(ifft(Yfinal(:,i))))'; hamwin(1+(i-1)*OverLapNum:W+(i-1)*OverLapNum)=... hamwin(1+(i-1)*OverLapNum:W+(i-1)*OverLapNum)+Window';
end
for i=1:L
    if hamwin(i)==0
        output(i)=0;
    else
        output(i)=output(i)/hamwin(i);
    end
end
output=output';

function [NoiseFlag, SpeechFlag, NoiseCounter, Dist]=vad(signal,noise,NoiseCounter,NoiseMargin,Hangover)

%[NOISEFLAG, SPEECHFLAG, NOISECOUNTER, DIST]=vad(SIGNAL,NOISE,NOISECOUNTER,NOISEMARGIN,HANGOVER)
%Spectral Distance Voice Activity Detector
%SIGNAL is the the current frames magnitude spectrum which is to labeld as
%noise or speech, NOISE is noise magnitude spectrum template (estimation),
%NOISECOUNTER is the number of imediate previous noise frames, NOISEMARGIN
%(default 3)is the spectral distance threshold. HANGOVER ( default 8 )is
%the number of noise segments after which the SPEECHFLAG is reset (goes to
%zero). NOISEFLAG is set to one if the the segment is labeld as noise
%NOISECOUNTER returns the number of previous noise segments, this value is
%reset (to zero) whenever a speech segment is detected. DIST is the
%spectral distance. 
%Saeed Vaseghi
%edited by Esfandiar Zavarehei
%Sep- 04

if nargin<4
    NoiseMargin=3;
end
if nargin<5
    Hangover=8;
end
if nargin<3
    NoiseCounter=0;
end
Copy the code

3. Operation results

Fourth, note

Version: 2014 a

Voice enhancement Based on matlab human ear masking effect voice enhancement

A list,

Ii. Source code

3. Operation results

Fourth, note

Related Posts

Three financial text corpus (business, news, information) – you are welcome to help yourself

Map construction and localization based on MATLAB GUI SLAM simulation

BloomFilter details (BloomFilter)