Vx Search for “gjZKeyFrame” Follow “Keyframe Keyframe” to get the latest audio and video technical articles in time.

(Basic logic of this article: what is the definition of sound → what are the characteristics of sound → how to describe sound mathematically → how to digitize sound → what is digital audio data)

“Sound” is a physical phenomenon that we are all too familiar with. We make sounds when we sing, we hear sounds with our ears, we record and share sounds with our phones; As audio and video developers, we also deal with a lot of sound data in our work. But do you really know anything about sound?

If you’re feeling confident and thinking “Sure,” try answering this question: What happens when you go from the “sound” we hear to the “audio data” we process with our phones and computers? If you’re wondering about it, read on and take a look at how audio data, which is processed in daily development work, has evolved from a physical phenomenon. This discussion may be useless, but it might be interesting.

There are at least two major cognitive processes involved in this question: 1) the process of physical definition, feature exploration, law discovery and mathematical description of an everyday phenomenon by means of scientific research; 2) The process of digitizing physical phenomena by means of information processing.

When we look back at the problem from this perspective, it may be possible to break it down into the following sub-questions:

  • What is the definition of sound? We need to define the scope of a physical phenomenon before we can proceed.
  • What are the characteristics of sound? Looking for features can help us accurately describe it and study it specifically.
  • How do you mathematically describe sound? Mathematics is the best language to describe physical phenomena and explore physical laws. The mathematical description of physical phenomena is also the basis for further digital information.
  • How do you digitize sound? Digitization is the means from the physical world to the information world.
  • What is digital audio data? After the sound is digitized, digital audio data can be obtained for processing, storage or transmission.

1. What is the definition of sound?

“Sound” is the sound wave produced by vibration, transmitted through the medium (gas, solid, liquid) and can be perceived by human or animal auditory organs of the wave phenomenon.

The above is the definition of sound, which defines sound as a wave phenomenon, so that we can study it in the scope of the physical concept of “wave”. Of course, it’s possible that if we find something new in our research, we can overturn the old perception and redefine it.

2. What are the features of sound?

To extract the features of a sound, it is necessary to first perceive it, and the human auditory perception system is a complex system, as shown in the figure below. How does it perceive sound? In simple terms, sound is a mechanical wave that travels through the air to the ear, where it is converted into a neural action potential, and nerve impulses travel to the brain, where you perceive sound. As for the details, we won’t go into them here.

The features of sound are gradually recognized and extracted by us in the process of perceiving sound and constantly studying its phenomena. For example, we can easily perceive how big sounds are and how small sounds are. Sharp and thick; We can recognize different people when they speak, even in similar voices. We summarize these perceptions and extract the characteristics of the sound.

We now know that the characteristics of sound are known as the “three elements of sound” :

  • Loudness: Indicates the volume of sound.
  • Pitch: Indicates the level of sound.
  • Timbre: Indicates the characteristics of sound.

Based on the characteristics of sound, we can also find the related laws and causality, and show them visually through some means.

The sound we usually hear, for example, is the result of air molecules moving in dense, alternating rows at a certain frequency due to vibrations.

When we take a single point and measure the change of pressure at that point with time, on the horizontal axis time and on the vertical axis pressure, we can get a waveform similar to the following:

The greater the pressure deviation from the standard value, the more intense the vibration, so the waveform with the larger amplitude means the greater the sound, that is, the greater the loudness. The tighter the waveform, the more vibrations per unit time, the higher the frequency, that is, the higher the pitch.

For a single frequency vibration like the one above, it is easy to see the information about the sound through the waveform. In reality, the sound we hear is often a combination of complex vibrations, like this one:

From this waveform, it’s hard to see any useful information about the sound, because the waveforms of each frequency are stacked on top of each other. That’s where we need the help of a spectrum map.

Where does the spectrum come from? Take a look at the following image:

Waveforms can be obtained from the compound superposition of simple sine waves of multiple frequencies with different amplitudes and phases. The abscissa of the waveform is time, and the ordinate is amplitude, which represents the change rule of the total amplitude of all frequency superimposed sinusoidal waves with time.

The composite wave Fourier transform, and dismantling back into single sine wave form, each frequency is equivalent to the two-dimensional wave in the direction of paper tensile, turned into a three dimensional solid model, and drawing on the direction of the axis is called frequency, now grew up each frequency point corresponds to a different amplitude and phase of the sine wave.

The spectrum diagram is a graph formed by slicing on the time axis of this three-dimensional model, with the abscissa as frequency and the ordinate as amplitude. It represents the amplitude distribution of sinusoidal waves of each frequency at a static time point.

The waveform can help us check the overall volume of the music, in the mix can often be seen in dynamic and loudness issues, can help adjust the compressor and limiter. Spectrum maps can help us locate the distribution of music details in various bands and can be used to assist in adjusting filters and equalizer in mixing.

Below is an example of a sound waveform (top) and spectrum (bottom) :

Among them, the waveform diagram is relatively simple, the horizontal axis is time, the vertical axis is loudness, and distinguishes the left and right channels.

However, the spectrum diagram is a little more complicated than what we have described above. Here, the spectrum diagram is a THREE-DIMENSIONAL diagram, in which the horizontal axis is time, the vertical axis is frequency (represented by tone, for example, A5(880) corresponds to frequency 880Hz), and the color brightness represents loudness. Therefore, the spectrum diagram contains more information than the wave diagram. The only disadvantage is that it cannot represent the overall volume, so it is generally used in conjunction with waveform to identify sound features.

(Through the discussion above, we know that sound is a wave phenomenon, and that sound has several characteristics such as loudness, tone and timbre. We also have preliminary access to auxiliary tools for sound research: waveform diagram and spectrum diagram. We’ll talk more about the mathematical description of sound later, so stay tuned.)