This section is the sixth section of ffMPEG Development player learning Notes “FFMPEG decodes Audio, Plays with AudioQueue”

Ffmpeg audio decoded data is PCM(Pulse Code Modulation) audio data is uncompressed audio sampling data bare stream, it is by analog signal after sampling, quantization, coding into standard digital audio data. For the most commonly said “lossless audio”, it generally refers to the file format of 16bit/ 44.1khz sampling rate in the traditional CD format. Therefore, it is called lossless compression because it contains the frequency response frequency of 20HZ-22.05khz, which completely covers the range of human ear. Of course, various kinds of PCM coding high bit rate files have been emerging in an endless stream. However, as mentioned above, high bit rate cannot effectively improve the frequency response range of PCM coding sampling rate, but can only increase its sampling points to obtain smooth waveform more similar to analog recording. The PCM audio format can be played on macOS/iOS platforms using AudioQueueu and AudioUint. In this section, AudioQueue is used.

✅ section 1 – Hello FFmpeg ✅ section 2 – Soft unclip video stream, render RGB24 ✅ section 3 – understand YUV ✅ section 4 – hard decoding,OpenGL render YUV Disk section 5 – Metal render YUV disk section 6 – Decode audio, play with AudioQueue 📗 section 7 – audio and video synchronization 📗 Section 8 – Perfect playback control 📗 Section 9 – double speed play 📗 section 10 – increase video filtering effect section 11 – audio changes

The Demo address of this section is github.com/czqasngit/f… The example code provides both Objective-C and Swift implementations. This article cites Objective-C code for ease of illustration, since Swift code Pointers don’t look neat. The final effect of this section is shown below:

The target

  • Understand the PCM
  • Understand the process of ffMPEG decoding simultaneous playback of audio and video
  • Understand the AudioQueue playback process
  • Audio and video are played simultaneously after ffMPEG decoding is completed

Understand the PCM

Sample rate and sample size

Sound is actually a wave of energy, so it also has characteristics of frequency and amplitude, frequency corresponding to the time axis, amplitude corresponding to the level axis. The wave is infinitely smooth, and the string can be regarded as composed of countless points. Because the storage space is relatively limited, the points of the string must be sampled in the process of digital coding. Sampling is the process of extracting some point frequency values, obviously, the more sampling points in a second inside, to achieve frequency information more abundant, in order to recover waveform, a vibration, there must be two point sampling, the human ear can hear the highest frequency of 20 KHZ, thus to meet the requirements of the auditory, requires at least 40 k times per second, sampling, Expressed in 40kHz, this 40kHz is the sampling rate. Our common CD has a sampling rate of 44.1khz. It is not enough for light to have frequency information; we must also obtain and quantify the energy value of that frequency for signal strength. The quantization level number is an integer power of 2, the sampling size of our common CD bit 16bit, that is, 2 to the power of 16. The sampling size is more difficult to understand than the sampling rate because it is more abstract. For a simple example, suppose that a wave is sampled eight times and the corresponding energy values of the sampling points are A1-A8. However, we only use the sampling size of 2 bits, so we can only retain the values of 4 points in a1-A8 and discard the other 4 points. If we take a sample size of 3 bits, we record all the information of exactly 8 points. The larger the values of sampling rate and sampling size are, the more similar the recorded waveform is to the original signal.

Lossy and lossless

According to the sampling rate and sampling size, audio coding can only be infinitely close to the natural signal at most, at least the current technology can only do so, relative to the natural signal, any digital audio coding scheme is lossy, because it cannot be completely restored. In the computer application, can achieve the highest fidelity level is PCM coding, is widely used in material preservation and music appreciation, CD, DVD and our common WAV files are applied. Therefore, PCM is generally accepted as lossless coding, because PCM represents the best fidelity level in digital audio, it does not mean that PCM can guarantee the absolute fidelity of the signal, and PCM can only achieve the maximum degree of infinite proximity. We used to include MP3 in the category of lossy audio coding, is relative to PCM coding. The emphasis on lossless and lossless relative to coding is to show that true lossless is difficult to achieve, just as the expression of PI in numbers, no matter how accurate, is infinitely close, not really equal to PI.

Reasons for using audio compression technology

It is very easy to calculate the bit rate of a PCM audio stream, the sample rate value x the sample size value x the number of channels BPS. A two-channel PCM encoded WAV file with a sampling rate of 44.1khz and sampling size of 16bit has a data rate of 44.1K×16×2 = 1411.2kbps. We often say 128K MP3, the corresponding WAV parameter, is this 1411.2kbps, this parameter is also known as data bandwidth, it and ADSL bandwidth is the same concept. Divide the bit rate by 8 to get the data rate for this WAV, which is 176.4KB/s. This means that the storage of one-second sampling rate of 44.1khz, sampling size of 16bit, two-channel PCM encoded audio signal, requires 176.4KB of space, 1 minute is about 10.34m, this is unacceptable to most users, especially those who like to listen to music on the computer, to reduce disk occupancy, there are only two ways. Lower the sampling index or compress. Reducing the target is not desirable, so experts have developed various compression schemes. Due to different uses and target markets, various audio compression encodings achieve different sound quality and compression ratio, which will be mentioned in the following article. One thing’s for sure, they’ve all been compressed.

The relationship between frequency and sampling rate

The sampling rate refers to the number of times of sampling the original signal per second. The sampling rate of most common audio files is 44.1khz. What does this mean? Suppose we have two sinusoidal signals, respectively 20Hz and 20KHz, each with a length of one second, corresponding to the lowest frequency and highest frequency that we can hear, and sample these two signals respectively at 40KHz, what result can we get? The result is: the 20Hz signal is sampled 40K/20=2000 times per vibration, while the 20K signal is sampled only 2 times per vibration. Obviously, at the same sampling rate, the recorded low frequency information is much more detailed than the high frequency information. This is why some audiophiles complain that CD’s digital sound is not realistic enough, and CD’s 44.1khz sampling does not guarantee that the high frequency signal will be recorded well. To better record the high frequency signal, it seems that a higher sampling rate is needed, so some friends in the capture of CD tracks using 48KHz sampling rate, which is not desirable! This doesn’t really do any good for sound quality, and for track-catching software, maintaining the same sample rate as the 44.1khz provided by the CD is one of the guarantees of optimal sound quality, not improving it. A higher sampling rate is only useful for analog signals. If the signal being sampled is digital, do not attempt to increase the sampling rate. Stream characteristics With the development of the network, people put forward requirements for online listening to music, so it also requires audio files can be read and played at the same time, and do not need to read the whole file and then play back, so that you can do not download can realize listening; It can also be broadcast while coding. It is this feature that can realize online live broadcasting and set up its own digital broadcasting station has become a reality.

Coding format

PCM coding

PCM Pulse Code Modulation is short for Pulse Code Modulation. In the previous text, we mentioned the general workflow of PCM. We don’t need to care about the calculation method of THE final PCM encoding. We just need to know the advantages and disadvantages of PCM encoding audio stream. The biggest advantage of PCM coding is good sound quality, the biggest disadvantage is large size. Our common Audio CD uses PCM encoding, a disc capacity can only hold 72 minutes of music information.

WAV format

This is an ancient audio file format developed by Microsoft. WAV is a File Format that complies with the Resource Interchange File Format (RIFF) specifications. All WAVs have a file header that contains the encoding parameters of the audio stream. WAV has no hard and fast rules for encoding audio streams. In addition to PCM, almost all codings that support the ACM specification can encode WAV audio streams. Many friends do not have this concept, let’s take AVI as an example, because AVI and WAV in the file structure is very similar, but AVI has a video stream. We have been exposed to many kinds of AVI, so we often need to install some Decode to watch some AVI. DivX is a kind of video encoding that we have been exposed to a lot. AVI can use DivX encoding to compress video stream, of course, it can also use other encoding to compress. WAV can also use a variety of audio encoding to compress its audio streams. However, we often use PCM encoding for audio streams in WAV, but this does not mean that WAV can only use PCM encoding. MP3 encoding can also be used in WAV. You can enjoy these WAVs. Under The Windows platform, WAV based on PCM coding is the best supported audio format, all audio software can perfectly support, because of its high sound quality requirements, WAV is also the preferred format for music editing and creation, suitable for saving music materials. Therefore, THE PCM encoded WAV is used as an intermediary format, often used in the conversion of other encodings, such as MP3 to WMA.

MP3 encoding

As the most popular audio compression format, MP3 is widely accepted by everyone. Various mp3-related software products emerge in endlessly, and more hardware products also begin to support MP3. VCD/DVD players that we can buy are all able to support MP3, as well as more portable MP3 players and so on. Although the major music companies are extremely averse to this open format, they cannot prevent the survival and spread of this audio compression format. MP3 has been in development for 10 years, he is MPEG(MPEG: Moving Picture Experts Group (Audio Layer-3) is a derivative encoding scheme of MPEG1, which was successfully developed by Fraunhofer IIS Research Institute and Thomson Company in 1993. MP3 can achieve an astonishing compression ratio of 12:1 and maintain basic audible sound quality. In the days when hard disk prices were sky-high, MP3 was quickly accepted by users. With the popularity of the network, MP3 was accepted by hundreds of millions of users. At the beginning of the release of MP3 encoding technology is actually very imperfect, due to the lack of sound and human ear hearing research, the early MP3 encoder is almost all rough to code, sound quality damage. With the introduction of new technology, MP3 coding technology was improved, including 2 major technical improvements. In addition to the above code, there are other interested can go to Google yourself.

Understand the process of ffMPEG decoding simultaneous playback of audio and video

This section uses decoding, playing audio, rendering video in a separate thread, sharing the data buffer. The decoding thread uses FFMPEG to read data from the data stream, decode and transcode the audio and video data frames and store them in the audio and video buffers respectively.

The audio player and video render thread requests data from the audio decoder thread buffer with the following flow diagram:

  • The decoding thread maintains two buffers: the audio and video buffers
  • The audio or video buffer informs the decoding thread to begin decoding when the buffer data is not filled
  • When the buffer data fills up, decode the data pause and inform the audio and video that they can resume playing (if the audio or video buffer thread pauses at this point).
  • The audio player thread retrieves data from the audio buffer, suspends the audio buffer if there is not enough data in the buffer and tells the decoding thread to continue decoding (if the decoding thread pauses)
  • The video rendering thread retrieves data from the video buffer, suspends the video buffer if there is not enough data in the buffer and tells the decoding thread to continue decoding (if the decoding thread pauses)

Understand the AudioQueue playback process

AudioQueue is an audio player library for macOS/iOS platforms that can play PCM data directly. It provides a C language interface and can play PCM data conveniently with AudioQueu.

1. Initialize the AudioQueue

When initializing the AudioQueue, we need to provide a description of the target PCM data. The structure of this description looks like this:

AudioStreamBasicDescription asbd; asbd.mSampleRate = audioInformation.rate; asbd.mFormatID = kAudioFormatLinearPCM; asbd.mChannelsPerFrame = 2; asbd.mFramesPerPacket = 1; asbd.mBitsPerChannel = 16; asbd.mBytesPerFrame = 4; asbd.mBytesPerPacket = 4; / / / kLinearPCMFormatFlagIsSignedInteger: / / / kAudioFormatFlagIsPacked stored data types: Data cross arranged asbd. MFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked; asbd.mReserved = NO;Copy the code

MSampleRate: sampling rate, which describes the frequency of audio sampling (how many times an audio wave is sampled in a second). MFormatID: Data format for playback mChannelsPerFrame: MFramesPerPacket: How many audio frames do a packet contain? PCM data. The value is 1. MBitsPerChannel: How many bits of data do a data channel occupy? MBytesPerPacket: Indicates the number of bytes occupied by a data frame. MFormatFlags: indicates the specific description of the data format. MReserved: Enforce 8-bit alignment. This must be set to 0. This structure describes how PCM data should be read. With this data structure, the following interface can be called to initialize the AudioQueue:

OSStatus status = AudioQueueNewOutput(&asbd, _AudioQueueOutputCallback, (__bridge void *)self, NULL, NULL, 0, &audioQueue);
NSAssert(status == errSecSuccess, @"Initialize audioQueue Failed");
Copy the code
  • The first parameter is the structure pointer to the PCM data description
  • The second argument is a callback function that needs to reuse the AudioQueueBuffer during the actual playback to reduce the overhead of reinitializing the AudioQueueBuffer
  • The third parameter is the context parameter in the callback function. This design is common in C function Pointers. The callback function needs to know the context and associate the corresponding object correctly
  • The last parameter is the AudioQueue object that has been initialized

The function signature for the second argument looks like this:

static void _AudioQueueOutputCallback(void *inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inBuffer) {
    FFAudioQueuePlayer *player = (__bridge FFAudioQueuePlayer *)inUserData;
    [player reuseAudioQueueBuffer:inBuffer];
}
Copy the code

2. Initialize the AudioQueueBuffer

AudioQueueBuffer is the carrier of specific audio data to be played. The AudioQueueBuffer can carry multiple audio frames. After datafization of the AudioQueue, the following functions can be used to initialize enough AudioQueueBuffer that can be reused: MAX_ BUFFER_COUNT is set to 3.

for(NSInteger i = 0; i < MAX_BUFFER_COUNT; i ++) {
    AudioQueueBufferRef audioQueueBuffer = NULL;
    status = AudioQueueAllocateBuffer(self->audioQueue, audioInformation.buffer_size, &audioQueueBuffer);
    NSAssert(status == errSecSuccess, @"Initialize AudioQueueBuffer Failed");
    CFArrayAppendValue(buffers, audioQueueBuffer);
}
Copy the code

The AudioQueueBuffer object needs to be saved and referenced to prevent it from being freed. Secondly, these objects need to be reused in the later playback control.

3. Start and play data

AudioQueueStart(audioQueue, NULL);
Copy the code

The above code is called to start the player

AudioQueueBufferRef aqBuffer;
aqBuffer->mAudioDataByteSize = (int)length;
memcpy(aqBuffer->mAudioData, data, length);
AudioQueueEnqueueBuffer(self->audioQueue, aqBuffer, 0, NULL);
AudioQueueEnqueueBuffer(self->audioQueue, aqBuffer, 0, NULL);
Copy the code

After setting the size of the data to be played and the specific data to be played, the AudioQueueBuffer is put into the AudioQueue to play the sound.

Audio and video are played simultaneously after ffMPEG decoding is completed

The audio data can be played through the above three steps, but with ffMPEG, you need to play video as well as audio.

1. Improve decoding

The decoding takes place in a separate thread, and pseudo-code is used here to illustrate the flow logic (see sample project for complete code). The general logic looks like this:

dispatch_async(decode_dispatch_queue, ^{while (true) {sleep_for_wait() {sleep_for_wait(); } AVFrame *frame = decode(); if(is_audio(frame)) { audio_enqueue(frame); notify_audio_play_thread_play_if_wait(); } else if(is_video(frame)) { video_enqueue(frame); notify_video_render_thread_render_if_wait(); } else if(decode_complete) {// break the while loop; } } NSLog(@"Decode completed, read end of file."); });Copy the code

2. Improve video rendering

dispatch_async(video_render_dispatch_queue, ^{
    if(!has_enough_video_frame() && !isDecodeComplete) {
        notify_decode_thread_keep_decode();
        sleep_video_render_thread_for_wait();
    }
    AVFrame* frame = video_dequeue();
    if(obj) {
        video_render(frame);
        if(low_max_cache_video_frame()) {
          notify_decode_thread_keep_decode();
        }
    } else {
        if(isDecodeComplete) {
            stop_video_render();
        }
    }
});
Copy the code

3. Improve audio playback

dispatch_async(audio_play_dispatch_queue, ^{
    if(!has_enough_audio_frame() && !isDecodeComplete) {
        notify_decode_thread_keep_decode();
        sleep_audio_play_thread_for_wait();
    }
    AVFrame* frame = audio_dequeue();
    if(obj) {
        audio_play(frame);
        if(low_max_cache_audio_frame()) {
          notify_decode_thread_keep_decode();
        }
    } else {
        if(isDecodeComplete) {
            stop_audio_play();
        }
    }
});
Copy the code

The decoder thread acts as a producer, producing data when it is insufficient. Audio playback and video rendering play the role of consumers, with enough data to consume data. The data buffer maintains a reasonable dynamic balance.

At this point, the player has completed the video rendering and audio playback. However, this is not enough. Audio and video in a separate thread would not be ideal for simultaneous playback. The next section will address the issue of synchronous playback.

conclusion

  • Know what PCM is and the conversion process of analog audio signal to digital signal and common encoding format
  • Understand ffMPEG decoding at the same time play audio and video process, have a general understanding of the player
  • Understand the AudioQueue playback process
  • Audio and video are played simultaneously after ffMPEG decoding is completed