Today I learned about the basics of audio. Take notes for a reminder.

I. The nature of sound

The essence of sound is the human ear’s reflection of the vibration of objects in the air. When the object vibrates, it drives the surrounding air molecules to vibrate, and the air molecules drive the adjacent air molecules to vibrate, thus forming a phenomenon of sound spreading outward in the form of waves. That’s probably where “sound waves” came from.

The farther the diffusion of sound waves, the greater the vibration amplitude of the object; The faster the diffusion, the higher the frequency of the vibration of the object. Contact junior high school physics sound three elements: loudness, tone, timbre. We know that to say that a sound is loudest means that sound waves can travel a long way and only decay a little when they reach our ears. And when a sound is high-pitched, it means that the sound wave spreads very quickly, giving us the sensation that the sound is high-pitched.

Nature of audio and related terminology

The essence of audio is the imitation of sounds in nature.

Sounds in nature might be represented as analog signals, showing a smooth curve over time.

But when we want to put this sound information into a computer, we can’t take every point in time. We can only perform limited sampling, that is, conversion of analog signals to digital signals, A process called A/D(Analog to digital conversion).

For example, we divide the sound of one second into 44100 pieces, so that we have a time slice with a size of 1/44100s. We record a corresponding amplitude value for each time slice, and generally take the amplitude value corresponding to the midpoint of the time slice.

The more time slices per unit time is divided, the higher the degree of recovery of analog signal is. The number of time slices per second is called sampling rate. The 44100 time slices mentioned above correspond to a sampling rate commonly used in the field of audio and video: 44100Hz.

The blue line below is a digital signal.

Since we want to record the values of each sample, what data type should we use to store those values? Whereas audio samples used to be represented as an 8-bit signed integer, most audio samples are now stored as 16-bit signed int. Of course, there are also 24 bits, 32 bits to record. These record different data types of sample data, which are called sampling size.

There’s also an audio term called number of channels. There are mono channel, two channel and so on. My understanding of dual-channel is that when audio is played, it simulates two sources of sound, so that the human ear can perceive the sound more three-dimensional.

Image credit: developer.mozilla.org