Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

This article is published by Tencent Game Cloud in cloud + community column

Often visit the game forum friends will be deeply touched, many players often ridicule in the forum in the Internet cafe black experience is very poor, too much noise. In the process of game voice blackening, if one of the teammates is in the Internet cafe, the headphones of others will inevitably be filled with all kinds of noisy noise, which is very bad experience, and even affects the performance of the whole team. In such a scene, noise reduction becomes the basic operation to improve the game experience.

The difficulty of noise reduction in the scene of Internet cafes is often greater than that in the ordinary noise environment, which is caused by the great difference between the noise environment of Internet cafes and the ordinary noise environment. The noise sources of Internet cafes are wide, including the chatting and Shouting of many people, the large clicking sound of mouse and keyboard, the movement of desks and chairs, and so on. Some Internet cafes also broadcast background music and some voice broadcasts, just like barbershops. In addition, the seats of Internet cafes are close to each other, and almost everyone has close neighbors around them. These close sounds interfere with each other more annoying.

Eliminating these complex noises is not a simple thing, the noise in the Internet cafe environment is almost non-stationary, so the traditional noise elimination method is difficult to apply in the Internet cafe scene. Tencent Cloud Gaming Multimedia Engine (GME) has proposed a set of noise reduction technology solutions for Internet cafes, which can minimize the impact of noise on speech in complex environments. **

How to achieve noise reduction in the complex Internet cafe environment?

In the noisy environment of Internet cafes, the demands for noise reduction are as follows: when teammates do not speak, they cannot hear any other sound; when teammates speak, they hope to hear the clear voice of teammates; when teammates finish speaking, they will be silent immediately.

The above problems can be abstracted into call processing for a single presenter in a noisy environment. For a tolerable experience, a speech activity detection algorithm (VAD) that excludes voices other than the speaker is required. The VAD algorithm is different from the conventional speech detection, because it not only excludes non-speech, but also excludes voices other than the speaker. Otherwise, the voices of people nearby or even noisy voices in a distant environment will still be sent to you in the headset.

In view of this situation, GME presents this “VAD” algorithm in the direction of satisfying demands, and the process is as follows:

In judging the nature of sound, a process to be carried out is to calculate the correlation of sound, and the correlation measure is defined as follows:

E (tau) = N – 1 ∑ N = 0 [s (N) – beta s (N – tau)] 2

Where β is the gain factor and N is the analysis frame length. Set ∂E(τ)∂β=0, and obtain:

Beta = N – 1 ∑ N = 0 s (N) – beta s (N – tau) N – 1 ∑ N = 0 s2 (N – tau)

Thus, there are

E (tau) = N – 1 ∑ 0 N = s2 (N) – [N – 1 ∑ N = 0 s (N) s (N – tau)] 2 N – 1 ∑ N = 0 s2 (N – tau)

The relative error energy is zero

R (tau) [E] = E (tau) N – 1 ∑ N = 0 s2 (N) = 1-2 rho (tau)

Among them

Rho (tau) = N – 1 ∑ 0 N = s (N) s (N – tau) square root of N – 1 ∑ 0 N = s2 (N), N – 1 ∑ N = 0 s2 (N – tau)

To get this result, we need to do some preprocessing:

1. De-mean: the ρ(τ) will be large on all τ when the non-zero mean or very low frequency noise appears in the analysis window, which is especially troublesome for the quiet segment speech which relies on ρ(τ) for turbidity classification. The solution is to remove the mean:

‘s (n) = s (n) – 1 nn – 1 ∑ n = 0 s (n)

2. Low-pass filtering: In order to reduce the influence of high frequency formant and high frequency noise, an 800-Hertz low-pass filtering should be carried out to remove most of the influence of formant and retain the first and second harmonics when the pitch frequency is up to 500 Hz. The technical specifications are as follows:

1T=8000Hz,ωc2π=800Hz,ω r2π=1200Hz, 1−δ1=−0.25dB, δ2=−50dB

Accordingly, a fifth-order filter is designed by bilinear transformation method, and the amplitude-frequency response is shown as follows:

3. Numerical filtering:

The above low-pass filtering can effectively remove the influence of the third and fourth formant peaks, but the influence of the first two formant peaks still exists, and the voiced speech will be blurred periodically. In order to remove this influence, numerical filtering is carried out. Numerical filtering can correctly show the trend of the signal, such as rising edge:

Y (n) = 12 k + 1 k ∑ I = – Kx (n + I)

But this is a non-causal number system, rewritten for causality as follows:

Y (n) = 1 nn ∑ I = 0 x (n – I)

Note that this process introduces an algorithm delay. In some speech encoders with parametric coding principles, the residual of the LPC process is used to estimate the pitch period, because the residual is “whitened” out the formant effect. Because the main output of LPC analysis is not to calculate pitch, and it involves adding overlap window and solving Euler-Walker equation or Burg iteration, it is not used in this paper.

What we’re ultimately interested in is the measure of the periodicity level, which we defined as follows

Zperiod = rho max1 + rho max2 + rho max33 + rho Max

When this periodic level meets the conditions, it also depends on whether the period meets the pitch period range of the voice signal: the pitch frequency range of the voice signal is 60Hz to 500Hz; For 8k sampling, the interval represented by sampling period is [80,147],[40,79],[20,39], which satisfies both periodic and periodic range. We believe that this sound property has voice characteristics.

Other links, such as bottom noise envelope tracking and speaker sound intensity tracking, are not detailed here.

In this scheme, we eliminate most of the noisy sounds from the time segment, as shown in the figure below:

It can be clearly seen from the renderings that the noise is greatly suppressed, but it does not affect the normal voice dialogue of players, and the appeal of the noisy environment of Internet cafes is satisfied.

Through the grinding technology, Internet cafes in GME can already in the complex environment and accurately detect specific vocals and effectively remove the noise environment or other players, brought players open black experience to the extreme, let the voice interaction between friends there is no noise, at present, the multimedia game engine GME has formally on tencent cloud, For the vast number of game manufacturers and developers to provide services, detailed information can click here to browse.

Question and answer

What is a strategy game server concept?

reading

3 lines of code for QQ light game plus voice interaction ability

Real time voice interesting voice change, uncle change voice “wonderful sound maiden” Get

Expert see the way: look like “Buddha department” of “QQ dazzle dance hand tour”, the audio technology behind is not simple

Cloud, college courses, special recommend | tencent technology test team leader, in combination with 8 years experience in detail for you hot and cold separation principle

This article has been authorized by the author to Tencent Cloud + community, more original text pleaseClick on the

Search concern public number “cloud plus community”, the first time to obtain technical dry goods, after concern reply 1024 send you a technical course gift package!

Massive technical practice experience, all in the cloud plus community!