With the development of the whole Internet, the form of data transmission is constantly upgrading and changing. In particular, the popularity of Douyin has made audio and video shine in the Internet era. The development of audio and video continues to expand to various industries, from e-commerce remote teaching, transportation face recognition, medical remote medical treatment and so on, audio and video direction has occupied a very important position. I hope to record my learning process by writing this blog.

This is the first in a series on iOS Audio Playback, which gives an overview of how to implement audio playback on iOS.

basis

Let’s take a quick look at some basic audio.

We are currently on the computer for audio playback all need depend on the audio files, audio file generation process is to voice information sampling, quantization and coding of digital signal process, the human ear can hear the sound, the lowest frequency from 20 hz up until the highest frequency of 20 KHZ, so the biggest bandwidth audio file format is 20 KHZ. According to Nyquist’s theory, only when the sampling frequency is higher than two times of the highest frequency of the sound signal, can the sound represented by the digital signal be restored to the original sound. Therefore, the sampling rate of audio files is generally 40-50khz, such as the sampling rate of the most common CD sound quality of 44.1khz.

Sampling and quantification of sound is called Pulse Code Modulation (PCM). PCM data is the original audio data is completely lossless, so PCM data has excellent sound quality but huge volume. To solve this problem, a series of audio formats have been developed, which use different methods to compress audio data. There are lossless compression (ALAC, APE, FLAC) and lossy compression (MP3, AAC, OGG, WMA) two kinds.

At present, the most commonly used audio format is MP3, which is a lossy compressed audio format. The purpose of this format is to greatly reduce the amount of audio data, and discard the part of PCM audio data that is not sensitive to human hearing. It is clear from the comparison chart below that the MP3 data is significantly lower than the PCM data (image from imp3 forum).

The BitRate (BitRate) in MP3 format represents the compression quality of MP3 data. Now commonly used BitRate is 128kbit/s, 160kbit/s, 320kbit/s, etc. The higher the value, the higher the sound quality. There are two commonly used MP3 encoding methods: Constant bitrate (CBR) and Variable bitrate (VBR).

The data in MP3 format is usually composed of two parts. One part is ID3 used to store information such as song name, singer, album and track number, and the other part is audio data. The audio data is stored in the unit of frame, and each audio has its own frame head, as shown in the figure is a frame structure diagram of AN MP3 file (the picture also comes from the Internet). Each frame in MP3 has its own frame header, which stores the sampling rate and other necessary information for decoding, so each frame can exist and play independently of the file. This feature combined with high compression ratio makes MP3 files become the mainstream format for audio stream playback. After the frame header, the audio data is stored. The audio data is compressed by several PCM data frames through the compression algorithm. For CBR MP3 data, the PCM data frames contained in each frame are fixed, while THE VBR is variable.

Overview of iOS Audio Playback

Now that we know the basics, we can list a classic audio flow (using MP3 as an example) :

  • 1. Read MP3 files
  • 2. Analyze sampling rate, bit rate, duration and other information to separate audio frames in MP3
  • 3. Decode the separated audio frames to obtain PCM data
  • 4. Sound processing for PCM data (equalizer, reverberator, etc., not required)
  • 5. Decode PCM data into audio signals
  • 6. Hand over the audio signal to the hardware for playback
  • 7. Repeat steps 1-6 until playback is complete

In iOS, Apple encapsulates the above process and provides different levels of interface (image from official documentation).

The following describes the functions of the middle and high-level interfaces:

  • Audio File Services: Reads and writes Audio data to complete step 2 in the playback process.
  • Audio File Stream
  • Services: Decoding the audio, can complete the second step in the playback process;
  • Audio Converterservices: Audio data conversion, which can complete step 3 in the playback process;
  • Audio Unit Services: Play Audio data: Step 5 and Step 6 in the playback process can be completed.
  • Extended Audio File Services: Audio File Services and Audio Converter Services
  • AVAudioPlayer/ AVFoundation: advanced interface to complete the entire audio playback process (including local files and network streams, except step 4);
  • Audio Queue Services: Advanced interface for recording and playing Audio Queue Services. Audio Queue Services can perform steps 3, 5, and 6 in the playback process.
  • OpenAL: For game audio playback, not discussed yet

It can be seen that The interface types provided by Apple are very rich, which can meet the needs of various classes:

  • If you just want to achieve audio playback, there is no other need AVFoundation will meet your needs. Its interface is easy to use and doesn’t care about the details;
  • If your app needs to stream and store audio at the same time, AudioFileStreamer and AudioQueue can help you. You can download the audio data locally. At the same time, the local AudioFile is read by NSFileHandler and handed to AudioFileStreamer or AudioFile to parse separated audio frames. The separated audio frames can be sent to AudioQueue for decoding and playing. If it is a local file, read the file directly and parse it. (Both of these are fairly straightforward, but you can also do this with AVFoundation+ local server, where AVAudioPlayer sends the request to the local server and the local server forwards it, The data is retrieved and stored in the local server and forwarded to the AVAudioPlayer. AVAudioPlayer (AVAudioPlayer, AVAudioPlayer, AVAudioPlayer, AVAudioPlayer, AVAudioPlayer) ;
  • If you are developing professional music player software and need to apply sound effects to audio (equalizer, reverberator), you need to use AudioConverter to convert audio data to PCM data in addition to reading and parsing data. Then AudioUnit+AUGraph is used for sound processing and playback (but at present, most apps with sound are developed by themselves to handle PCM data processing sound module, this part of the function is self-developed in customization and scalability will be stronger. After the PCM data is processed by the sounder, the AudioUnit can play the PCM data. Of course, the AudioQueue can play PCM data directly. . The following figure describes the process of audio playback using AudioFile + AudioConverter + AudioUnit (picture taken from official documentation).