IOS complete file pull stream parsing decoding synchronous rendering audio and video streams

demand

The audio and video streams in the file are parsed to decode synchronization and render the video to the screen, where the audio is output through a speaker. For only need to simply play a video file can directly use AVFoundation upper-layer player, here is the lowest way to achieve, can obtain the original audio and video frame data.

Realize the principle of

This paper is mainly divided into three parts. The parsing module uses the Audio and video streams in the FFmpeg parse file, the decoding module uses FFmpeg or Apple native decoder to decode the Audio and video, the rendering module uses OpenGL to render the video stream to the screen, and the Audio Queue Player is used to output the Audio in the form of speakers.

Read the premise

Note: The detailed implementation of all modules involved in this article is in the links below, you can see the explanation section according to your requirements.

Fundamentals of Audio and Video
Set up the iOS FFmpeg environment
FFmpeg parses video data
VideoToolbox implements hard decoding of video
Audio Converter Audio decoding
FFmpeg audio decoding
FFmpeg video decoding
OpenGL renders video data
H.264,H.265 bit stream structure
Transmission of audio data queue implementation
Audio Queue Player

Code Address:iOS File Player

Address of nuggets:iOS File Player

Letter Address:iOS File Player

Blog Address:iOS File Player

The overall architecture

This article decodes one. Take the MOV media file for example, which contains H.264 encoded video data, The aAC-encoded Audio data is first parsed to the Audio and video stream information in the file through FFmpeg, and the parse results are saved in AVPacket structure. Then the Audio and video frame data are extracted respectively, and the Audio frame data is through FFmpeg decoder or Audio in Apple native framework The video can be decoded through FFmpeg or a decoder in Apple’s native framework VideoToolbox. The decoded audio data format is PCM, and the decoded video data format is YUV original data. The audio and video data are synchronized according to the time stamp, and finally the PCM data and audio are transmitted to Aud IO Queue to realize audio playback, YUV video raw data is encapsulated as CMSampleBufferRef data structure and passed to OpenGL to render the video on the screen, so a complete operation of pulling file video stream is completed.

Note: The process of pulling an RTMP stream for decoding and playing is basically the same as that of pulling a file stream, except that the decoding and subsequent process need to be completed after receiving audio and video data through socket.

Simple and easy process

Parse

createAVFormatContextContext object:AVFormatContext *avformat_alloc_context(void);
Gets a context object from a file and assigns it to the specified object:int avformat_open_input(AVFormatContext **ps, const char *url, AVInputFormat *fmt, AVDictionary **options)
Read stream information from a file:int avformat_find_stream_info(AVFormatContext *ic, AVDictionary **options);
Get the file mid-tone video stream:m_formatContext->streamse
Start parse to get the video frames in the file:int av_read_frame(AVFormatContext *s, AVPacket *pkt);
If a video frame goes throughav_bitstream_filter_filterGenerate SPS, PPS and other key information.
Read theAVPacketContains all compressed audio and video data in the file.

decoding

Decoded by FFmpeg

Get the decoder context for the file stream:formatContext->streams[a/v index]->codec;
Find the decoder through the decoder context:AVCodec *avcodec_find_decoder(enum AVCodecID id);
To open the decoder:int avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options);
To send audio and video data from a file to the decoder:int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt);
Circularly receive decoded audio and video data:int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame);
If it is audio, the data may need to be resampled for playback in a format supported by the deviceSwrContext)

Decoded video with VideoToolbox

SPS, PPS and other key NALU header information will be extracted from extra data parsed in FFmpeg
Create the video description with the key information extracted above:CMVideoFormatDescriptionRef.CMVideoFormatDescriptionCreateFromH264ParameterSets / CMVideoFormatDescriptionCreateFromHEVCParameterSets
Create a decoder:VTDecompressionSessionCreateAnd specify a series of related parameters.
Put compressed data into CMBlockBufferRef:CMBlockBufferCreateWithMemoryBlock
Start decoding:VTDecompressionSessionDecodeFrame
Receives decoded video data in the callback

Decodes audio with AudioConvert

Create decoder from raw data and ASBD structure of decoded data format:AudioConverterNewSpecific
Specifies the decoder typeAudioClassDescription
Start decoding:AudioConverterFillComplexBuffer
Note: The premise of decoding is that 1024 sampling points are required to complete a decoding operation.

synchronous

Because what we decode here is the audio and video in the local file, that is to say, as long as the time stamp of the audio and video in the local file is completely correct, the decoded data can be played directly to achieve the effect of synchronization. All we have to do is make sure the audio and video are rendered simultaneously.

Note: For example, a stream pulled through an RTMP address may lose data during a certain period of time due to network reasons, resulting in audio and video synchronization, so you need a mechanism to correct the timestamp. The general mechanism is video catch up with audio, there will be a special document to introduce, not too much here.

Apply colours to a drawing

The video raw data obtained through the above steps can be directly rendered to the screen through encapsulated OpenGL ES. Apple’s native framework also has GLKViewController to complete screen rendering. Audio This is where the Audio frame data is received through the Audio Queue to complete playback.

File structure

Quick to use

Decode using FFmpeg

FFmpeg is initialized based on the file address to implement the Parse audio and video stream. The decoder in FFmpeg is then used to decode the audio and video data. It is important to note here that we will start from the first I frame read to achieve audio and video synchronization. Decoded audio is loaded into the transmission queue first, because the Audio Queue Player is designed to continuously fetch data from the transmission queue for playback. Video data can be rendered directly.

- (void)startRenderAVByFFmpegWithFileName:(NSString *)fileName {
    NSString *path = [[NSBundle mainBundle] pathForResource:fileName ofType:@"MOV"];
    
    XDXAVParseHandler *parseHandler = [[XDXAVParseHandler alloc] initWithPath:path];
    
    XDXFFmpegVideoDecoder *videoDecoder = [[XDXFFmpegVideoDecoder alloc] initWithFormatContext:[parseHandler getFormatContext] videoStreamIndex:[parseHandler getVideoStreamIndex]];
    videoDecoder.delegate = self;
    
    XDXFFmpegAudioDecoder *audioDecoder = [[XDXFFmpegAudioDecoder alloc] initWithFormatContext:[parseHandler getFormatContext] audioStreamIndex:[parseHandler getAudioStreamIndex]];
    audioDecoder.delegate = self;
    
    static BOOL isFindIDR = NO;
    
    [parseHandler startParseGetAVPackeWithCompletionHandler:^(BOOL isVideoFrame, BOOL isFinish, AVPacket packet) {
        if (isFinish) {
            isFindIDR = NO;
            [videoDecoder stopDecoder];
            [audioDecoder stopDecoder];
            dispatch_async(dispatch_get_main_queue(), ^{
                self.startWorkBtn.hidden = NO;
            });
            return;
        }
        
        if (isVideoFrame) { // Video
            if (packet.flags == 1 && isFindIDR == NO) {
                isFindIDR = YES;
            }
            
            if(! isFindIDR) {return;
            }
            
            [videoDecoder startDecodeVideoDataWithAVPacket:packet];
        }else{ // Audio [audioDecoder startDecodeAudioDataWithAVPacket:packet]; }}]; } -(void)getDecodeVideoDataByFFmpeg:(CMSampleBufferRef)sampleBuffer { CVPixelBufferRef pix = CMSampleBufferGetImageBuffer(sampleBuffer); [self.previewView displayPixelBuffer:pix]; } - (void)getDecodeAudioDataByFFmpeg:(void *)data size:(int)size pts:(int64_t)pts isFirstFrame:(BOOL)isFirstFrame { // NSLog(@"demon test - %d",size); // Put audio data from audio file into audio data queue [self addBufferToWorkQueueWithAudioData:data size:size pts:pts];  / / the control rate usleep (14.5 * 1000); }Copy the code

Decode using native framework

FFmpeg is initialized based on the file address to implement the Parse audio and video stream. Here, the ASBD structure is first constructed according to the actual audio stream data in the file to initialize the audio decoder, and then the decoded audio and video data are rendered separately. Note here that if the file video to be pulled is in H.265 encoding format, the timestamp of the decoded data is out of order because it contains B frames, we need to sort it by a linked list, and then render the sorted data to the screen.

- (void)startRenderAVByOriginWithFileName:(NSString *)fileName {
    NSString *path = [[NSBundle mainBundle] pathForResource:fileName ofType:@"MOV"];
    XDXAVParseHandler *parseHandler = [[XDXAVParseHandler alloc] initWithPath:path];
    
    XDXVideoDecoder *videoDecoder = [[XDXVideoDecoder alloc] init];
    videoDecoder.delegate = self;

    // Origin file aac format
    AudioStreamBasicDescription audioFormat = {
        .mSampleRate         = 48000,
        .mFormatID           = kAudioFormatMPEG4AAC,
        .mChannelsPerFrame   = 2,
        .mFramesPerPacket    = 1024,
    };
    
    XDXAduioDecoder *audioDecoder = [[XDXAduioDecoder alloc] initWithSourceFormat:audioFormat
                                                                     destFormatID:kAudioFormatLinearPCM
                                                                       sampleRate:48000
                                                              isUseHardwareDecode:YES];
    
    [parseHandler startParseWithCompletionHandler:^(BOOL isVideoFrame, BOOL isFinish, struct XDXParseVideoDataInfo *videoInfo, struct XDXParseAudioDataInfo *audioInfo) {
        if (isFinish) {
            [videoDecoder stopDecoder];
            [audioDecoder freeDecoder];
            
            dispatch_async(dispatch_get_main_queue(), ^{
                self.startWorkBtn.hidden = NO;
            });
            return;
        }
        
        if (isVideoFrame) {
            [videoDecoder startDecodeVideoData:videoInfo];
        }else {
            [audioDecoder decodeAudioWithSourceBuffer:audioInfo->data
                                     sourceBufferSize:audioInfo->dataSize completeHandler:^(AudioBufferList * _Nonnull destBufferList, UInt32 outputPackets, AudioStreamPacketDescription * _Nonnull outputPacketDescriptions) { // Put audio data from audio file into audio data queue [self addBufferToWorkQueueWithAudioData:destBufferList->mBuffers->mData size:destBufferList->mBuffers->mDataByteSize pts:audioInfo->pts]; / / the control rate usleep (16.8 * 1000); }]; }}]; } - (void)getVideoDecodeDataCallback:(CMSampleBufferRef)sampleBuffer isFirstFrame:(BOOL)isFirstFrame {if (self.hasBFrame) {
        // Note : the first frame not need to sort.
        if (isFirstFrame) {
            CVPixelBufferRef pix = CMSampleBufferGetImageBuffer(sampleBuffer);
            [self.previewView displayPixelBuffer:pix];
            return;
        }
        
        [self.sortHandler addDataToLinkList:sampleBuffer];
    }else{ CVPixelBufferRef pix = CMSampleBufferGetImageBuffer(sampleBuffer); [self.previewView displayPixelBuffer:pix]; }}#pragma mark - Sort Callback
- (void)getSortedVideoNode:(CMSampleBufferRef)sampleBuffer {
    int64_t pts = (int64_t)(CMTimeGetSeconds(CMSampleBufferGetPresentationTimeStamp(sampleBuffer)) * 1000);
    static int64_t lastpts = 0;
//    NSLog(@"Test marigin - %lld",pts - lastpts);
    lastpts = pts;
    
    [self.previewView displayPixelBuffer:CMSampleBufferGetImageBuffer(sampleBuffer)];
}


Copy the code

The specific implementation

The implementation of each part of this article is described in detail. Please refer to the links in the prerequisites for help.

Pay attention to

Because compressed audio and video data in different files have different formats, only some formats are compatible with this document and can be customized for expansion.