Disclaimer: Original article, shall not be reproduced without permission.

Audio visualization is a very “beautiful” topic, and its complexity depends heavily on the visual scheme (some examples), which determines your technical scheme, such as three.js, pixi.js, etc.

No matter what rendering scheme you choose, the part of processing audio signals is the same. This article will elaborate on the processing of audio signals, hoping to popularize the basic knowledge of audio (due to the limited ability to avoid mistakes, welcome to point out).

The first five parts are mainly theoretical and basic concepts that you can skip if you are not interested.

  • Github address: sound-processor
  • Three examples (audio is slow to load on Git and takes a little longer to load) :
  1. Not;

  1. Demo2;

  1. Demo3.

What is sound?

Sound comes from vibration and spreads through sound waves. Countless hair cells in the human ear will convert vibration signals into electrical signals and transmit them to the brain through auditory nerve, forming “sound” in human subjective consciousness. After sound waves enter the ear, different parts of the cochlea are sensitive to sound differently because of the special structure:

High frequency sounds are perceived near the root of the cochlea, while low frequency sounds are perceived near the end of the cochlea. Therefore, people’s perception of different frequencies is non-linear, which is the basis of subsequent acoustic weight measurement.

Two, acoustic meter

Acoustic weights commonly include frequency weights and time weights, and their function is to simulate the nonlinear experience of human ear to different frequencies of sound:

  • Insensitive to low frequency sound;
  • The most sensitive area is in1~5K HzBetween;
  • Upper limit on15~20K HzBetween;

The auditory range of human ear is shown in the figure below:

2.1 Frequency meter weight

Frequency meter weight is acting on the frequency spectrum of audio signals, commonly used are: A, B, C, D four:

Among them, A weighting is closest to people’s subjective feeling, which will weaken the insensitive part of human ear in low and high frequency parts. Therefore, A weighting method should be selected in audio visualization, and detailed explanation can be read in Wiki.

2.2 Time weighting

Reality where the sound is continuous, voice of feelings is also in charge of the result of the accumulation (imagine, caused the eardrum vibrates, the first wave of acoustic vibration can not stop, the second wave of the voice came, so the actual vibration of the eardrum is the result of sound waves on the time accumulation), time weighting is the median of the sound is continuous time. For the signal that changes quickly, we can use the interval of 125ms to average, and for the signal that changes slowly, we can use the interval of 1000ms.

Three, sound measurement

Sound Pressure Level (SPL) is the most commonly used physical quantity in Sound measurement. The audible sound pressure range of human ear is 2×10-5Pa~20Pa, and the corresponding sound pressure level range is 0~120dB.

Sound pressure of a common sound

Sound pressure is often measured in decibels, and it should be noted that decibels are only a measure in themselves, representing the logarithmic ratio of the measured value to the reference value:

Definition of sound pressure level:

Where P is the measurement amplitude, and P ref represents the minimum sound pressure that the ear can hear at 1000 Hz: 20 uP.

Fourth, frequency range

First of all, continuous signals contain a large amount of data, so it is not necessary for us to process all of them. Therefore, we generally conduct sampling and divide continuous frequencies into intervals for analysis. Frequency range represents a frequency range, and frequency octaving represents a scheme of frequency division. Specifically, the ratio of the upper frequency to the lower frequency of a period of octaving is constant:

See this article, “What is octave?”

When N equals one, it’s one octave, or octave for short, and if N equals two, it’s one half octave. After the frequency path is well divided, the mean square of the spectrum distributed in the frequency path is worth obtaining the power spectrum of frequency octave:

Five, Webaudio audio processing

Audio visualization on the Web can’t be done without WebAudio API, the most important of which is getByteFrequencyData (document). This method can obtain the frequency domain signal after the time domain signal conversion. The detailed process is as follows:

  1. Obtain the original time domain signal;
  2. For its applicationBlackman window(Blackman window function), its function is to compensate the signal distortion and energy leakage caused by DFT;
  3. The Fast Fourier Transform transforms the time domain into the frequency domain;
  4. Smooth over time, which is a weighted average of the signal over time (Webaudio only takes 2 frames);
  5. Convert to dB according to the above sound pressure formula;
  6. Normalization. Webaudio uses the following normalization methods:

Signal processing scheme in audio visualization

Combined with the above content, we think the reasonable way to deal with it is as follows:

6.1 the filter

One might say, well, getByteFrequencyData already has window filtering inside it, why do we need to filter it again?

Because the window function inside Webaudio is mainly used to compensate for signal distortion and energy leakage, its parameters are fixed. In audio visualization scenarios, visual perception often takes precedence over data accuracy, so we add a Gaussian filter to filter out spurts and smooth signals, and the degree of “smooth” can be arbitrarily controlled by parameters.

Right of 6.2 meter

Visual presentation should be associated with human subjective hearing, so weighting is necessary, JavaScript weighting implementation AudioJS/A – Weighting. In addition, we also provide additional time weighting, and we will calculate 5 historical data internally for average.

6.3 Frequency division

We will automatically divide the frequency range according to the incoming upper and lower frequency ranges and the number of top output frequency bands. The core code:

// Determine the frequency multiplier according to the frequency band number: N
// fu = 2^(1/N)*fl => n = 1/N = log2(fu/fl) / bandsQty
let n = Math.log2(endFrequency / startFrequency) / outBandsQty;
n = Math.pow(2, n);  // n = 2^(1/N)
    
const nextBand = {
    lowerFrequency: Math.max(startFrequency, 0),
    upperFrequency: 0
};
    
for (let i = 0; i < outBandsQty; i++) {
    // The upper frequency of the band is 2^n times the lower frequency
    const upperFrequency = nextBand.lowerFrequency * n;
    nextBand.upperFrequency = Math.min(upperFrequency, endFrequency);
    
    bands.push({
        lowerFrequency: nextBand.lowerFrequency,
            upperFrequency: nextBand.upperFrequency
    });
    nextBand.lowerFrequency = upperFrequency;
}
Copy the code

Seven, the sound – processor

Sound processor is a small (gzip < 3KB) library for processing audio signals. As the underlying part of audio visualization, sound-processor uses a relatively scientific method to process original audio signals and output signals consistent with human subjective hearing. The internal processing process is as follows:

7.1 installation

npm install sound-processor
Copy the code

7.2 the use of

import { SoundProcessor } from "sound-processor";

const processor = new SoundProcessor(options);
// in means original signal
// analyser is the AnalyserNode
const in = new Uint8Array(analyser.frequencyBinCount)
analyser.getByteFrequencyData(in);
const out = processor.process(in);
Copy the code

7.3 the options

  • filterParams: Filter parameter, object, defaultundefined, indicates no filtering:
    • sigma: Sigma parameter of the Gaussian distribution. The default value is 1, indicating the standard normal distribution. The larger the sigma is, the more obvious the smoothing effect is0.1 ~ 250Between;
    • radius: Filter radius, default is 2;
  • sampleRate: sampling rate, which can be taken from the Webaudio context (audioContext.sampleRate), typically 48,000;
  • fftSize: Fourier transform parameter, default is 1024;
  • startFrequency: Start frequency, default is 0.
  • endFrequency: Cut-off frequency, default 10000, cooperatestartFrequencyCan select any frequency band signal;
  • outBandsQty: Number of output bands, corresponding to the number of visual objects, default isfftSizeThe half;
  • tWeight: Indicates whether to enable time metering. The default value isfalse;
  • aWeight: Indicates whether to enable A. The default value istrue;

7.4 Frequency interception

Generally, the frequency range of music is between 50 and 10000 Hz, but in practice, it can be smaller, such as 100 to 7000 Hz. It is difficult to get a unified perfect range for the sounds of different styles and instruments. In addition, different visual styles may also affect the frequency range.


Reference material

  • Why acoustic weighting
  • A-weighting
  • What is sound pressure level?
  • What is octave?
  • AnalyserNode.getByteFrequencyData
  • Step by step to teach you how to achieve iOS audio spectrum animation
  • One-dimensional Gaussian distribution