IOS uses VideoToolbox to achieve hard decoding of video

demand

In this paper, h.264 and H.265 video streams containing encoding are decoded into raw video data, which can be rendered to the screen or used for other purposes after decoding.

Realize the principle of

As we know, the encoded data is only for transmission and cannot be directly rendered to the screen, so here we use Apple’s native framework VideoToolbox to parse the encoded video stream in the file, and decode the compressed video data (H264 / H265) into the original video data in the specified format (YUV,RGB) for rendering to the screen.

Note: this example is mainly for decoding, need to use FFmpeg to build modules, video parsing module, rendering module, these modules are linked in the following premise can be accessed directly.

Read the premise

Fundamentals of Audio and Video
Set up the iOS FFmpeg environment
FFmpeg parses video data
OpenGL renders video data
H.264,H.265 bit stream structure

Code Address:Video Decoder

Address of nuggets:Video Decoder

Letter Address:Video Decoder

Blog Address:Video Decoder

The overall architecture

The general idea is to load FFmpeg parsed data into CMBlockBuffer and add extra The VPS, SPS, PPS separated from data are loaded into CMVideoFormatDesc, the calculated time stamp is loaded into CMTime, and finally the completed CMSampleBuffer can be assembled to provide to the decoder.

Simple and easy process

FFmpeg parse process

Create the format context:avformat_alloc_context
Open file stream:avformat_open_input
Looking for streaming information:avformat_find_stream_info
Get the index value of audio and video stream:formatContext->streams[i]->codecpar->codec_type == (isVideoStream ? AVMEDIA_TYPE_VIDEO : AVMEDIA_TYPE_AUDIO)
Get audio and video streams:m_formatContext->streams[m_audioStreamIndex]
Parsing audio and video data frames:av_read_frame
Get extra data:av_bitstream_filter_filter

VideoToolbox decode process

Compare the last extra data and re-create the decoder if the data is updated
Isolate and save FFmpeg parse into extra data to isolate VPS, SPS, PPS and other key information (compare NALU header)
throughCMVideoFormatDescriptionCreateFromH264ParameterSets.CMVideoFormatDescriptionCreateFromHEVCParameterSetsLoad NALU headers such as VPS, SPS, and PPS.
Specifies the decoder callback function and decoded video data type (YUV,RGB…)
Create decoderVTDecompressionSessionCreate
generateCMBlockBufferRefLoad pre-decoding data and convert it toCMSampleBufferRefTo provide to the decoder.
To decodeVTDecompressionSessionDecodeFrame
In the callback functionCVImageBufferRefIs the decoded data, which can be convertedCMSampleBufferRefOutgoing.

File structure

Quick to use

Initialize the preview

The decoded video data will be rendered to this preview layer

- (void)viewDidLoad {
    [super viewDidLoad];
    [self setupUI];
}

- (void)setupUI {
    self.previewView = [[XDXPreviewView alloc] initWithFrame:self.view.frame];
    [self.view addSubview:self.previewView];
    [self.view bringSubviewToFront:self.startBtn];
}
Copy the code

Parse and decode the video data in the file

- (void)startDecodeByVTSessionWithIsH265Data:(BOOL)isH265 {
    NSString *path = [[NSBundle mainBundle] pathForResource:isH265 ? @"testh265" : @"testh264"  ofType:@"MOV"];
    XDXAVParseHandler *parseHandler = [[XDXAVParseHandler alloc] initWithPath:path];
    XDXVideoDecoder *decoder = [[XDXVideoDecoder alloc] init];
    decoder.delegate = self;
    [parseHandler startParseWithCompletionHandler:^(BOOL isVideoFrame, BOOL isFinish, struct XDXParseVideoDataInfo *videoInfo, struct XDXParseAudioDataInfo *audioInfo) {
        if (isFinish) {
            [decoder stopDecoder];
            return;
        }
        
        if(isVideoFrame) { [decoder startDecodeVideoData:videoInfo]; }}]; }Copy the code

Render the decoded data onto the screen

Note: if the data contains B frames, a reorder is required to render. This example provides two files, one h264 file without B frames and one H265 file with B frames.

- (void)getVideoDecodeDataCallback:(CMSampleBufferRef)sampleBuffer {
    if (self.isH265File) {
        // Note : the first frame not need to sort.
        if (self.isDecodeFirstFrame) {
            self.isDecodeFirstFrame = NO;
            CVPixelBufferRef pix = CMSampleBufferGetImageBuffer(sampleBuffer);
            [self.previewView displayPixelBuffer:pix];
        }
        
        XDXSortFrameHandler *sortHandler = [[XDXSortFrameHandler alloc] init];
        sortHandler.delegate = self;
        [sortHandler addDataToLinkList:sampleBuffer];
    }else {
        CVPixelBufferRef pix = CMSampleBufferGetImageBuffer(sampleBuffer);
        [self.previewView displayPixelBuffer:pix];
    }
}

- (void)getSortedVideoNode:(CMSampleBufferRef)sampleBuffer {
    int64_t pts = (int64_t)(CMTimeGetSeconds(CMSampleBufferGetPresentationTimeStamp(sampleBuffer)) * 1000);
    static int64_t lastpts = 0;
    NSLog(@"Test marigin - %lld",pts - lastpts);
    lastpts = pts;
    
    [self.previewView displayPixelBuffer:CMSampleBufferGetImageBuffer(sampleBuffer)];
}

Copy the code

The specific implementation

1. Check whether extra data needs to be updated from Parse.

The data that uses FFmpeg Parse is stored in the XDXParseVideoDataInfo structure, which is defined as follows. The parse module can be learned in the link above. This section only decodes the module.

struct XDXParseVideoDataInfo {
    uint8_t                 *data;
    int                     dataSize;
    uint8_t                 *extraData;
    int                     extraDataSize;
    Float64                 pts;
    Float64                 time_base;
    int                     videoRotate;
    int                     fps;
    CMSampleTimingInfo      timingInfo;
    XDXVideoEncodeFormat    videoFormat;
};
Copy the code

By caching the current extra data, the current obtained extra data can be compared with the last one. If the change is made, the decoder needs to be re-created, and if there is no change, the decoder can be reused (this code is especially suitable for video streams in network streams, because video streams may change).

uint8_t *extraData = videoInfo->extraData; int size = videoInfo->extraDataSize; BOOL isNeedUpdate = [self isNeedUpdateExtraDataWithNewExtraData:extraData newSize:size lastData:&_lastExtraData lastSize:&_lastExtraDataSize]; . - (BOOL)isNeedUpdateExtraDataWithNewExtraData:(uint8_t *)newData newSize:(int)newSize lastData:(uint8_t **)lastData lastSize:(int *)lastSize { BOOL isNeedUpdate = NO;if (*lastSize == 0) {
        isNeedUpdate = YES;
    }else {
        if(*lastSize ! = newSize) { isNeedUpdate = YES; }else {
            if(memcmp(newData, *lastData, newSize) ! = 0) { isNeedUpdate = YES; }}}if (isNeedUpdate) {
        [self destoryDecoder];
        
        *lastData = (uint8_t *)malloc(newSize);
        memcpy(*lastData, newData, newSize);
        *lastSize = newSize;
    }
    
    return isNeedUpdate;
}

Copy the code

2. Separate key information (H265: VPS), SPS, PPS from extra data.

The decoder must have some key information from the NALU Header, such as VPS, SPS, and PPS, to create a CMVideoFormatDesc data structure that describes the video information, as shown in the figure above

Note: H264 bit streams need SPS, PPS, while H265 bit streams need VPS, SPS, PPS

The separation of NALU Header

First determine the position of the start code by comparing the first four bytes to see if they are 00 00 00 01. For H264 data,start code is followed by SPS, PPS, and for H265 data is VPS, SPS, PPS

Determine the NALU Header length

The SPS length can be determined by the SPS index and PPS index value. Otherwise, it is similar. Note that 4-byte start code is used as the delimiter in the code stream structure, so the corresponding length needs to be subtracted.

Separate NALU Header data

For h264 data, the type of NALU header can be determined by &0x1f. For H265 data, the type of NALU can be determined by &0x4f Header type, this is derived from H264, H265 code stream structure, if you do not understand the article at the top of the first read the code stream structure related articles.

Once you have the corresponding type of data and size, assign it to the global variable for later use.

        if (isNeedUpdate) {
            log4cplus_error(kModuleName, "%s: update extra data",__func__); [self getNALUInfoWithVideoFormat:videoInfo->videoFormat extraData:extraData extraDataSize:size decoderInfo:&_decoderInfo]; }... - (void)getNALUInfoWithVideoFormat:(XDXVideoEncodeFormat)videoFormat extraData:(uint8_t *)extraData extraDataSize:(int)extraDataSize decoderInfo:(XDXDecoderInfo *)decoderInfo { uint8_t *data = extraData; int size = extraDataSize; int startCodeVPSIndex = 0; int startCodeSPSIndex = 0; int startCodeFPPSIndex = 0; int startCodeRPPSIndex = 0; int nalu_type = 0;for (int i = 0; i < size; i ++) {
        if (i >= 3) {
            if (data[i] == 0x01 && data[i - 1] == 0x00 && data[i - 2] == 0x00 && data[i - 3] == 0x00) {
                if (videoFormat == XDXH264EncodeFormat) {
                    if (startCodeSPSIndex == 0) {
                        startCodeSPSIndex = i;
                    }
                    if(i > startCodeSPSIndex) { startCodeFPPSIndex = i; }}else if (videoFormat == XDXH265EncodeFormat) {
                    if (startCodeVPSIndex == 0) {
                        startCodeVPSIndex = i;
                        continue;
                    }
                    if (i > startCodeVPSIndex && startCodeSPSIndex == 0) {
                        startCodeSPSIndex = i;
                        continue;
                    }
                    if (i > startCodeSPSIndex && startCodeFPPSIndex == 0) {
                        startCodeFPPSIndex = i;
                        continue;
                    }
                    if (i > startCodeFPPSIndex && startCodeRPPSIndex == 0) {
                        startCodeRPPSIndex = i;
                    }
                }
            }
        }
    }
    
    int spsSize = startCodeFPPSIndex - startCodeSPSIndex - 4;
    decoderInfo->sps_size = spsSize;
    
    if (videoFormat == XDXH264EncodeFormat) {
        int f_ppsSize = size - (startCodeFPPSIndex + 1);
        decoderInfo->f_pps_size = f_ppsSize;
        
        nalu_type = ((uint8_t)data[startCodeSPSIndex + 1] & 0x1F);
        if (nalu_type == 0x07) {
            uint8_t *sps = &data[startCodeSPSIndex + 1];
            [self copyDataWithOriginDataRef:&decoderInfo->sps newData:sps size:spsSize];
        }
        
        nalu_type = ((uint8_t)data[startCodeFPPSIndex + 1] & 0x1F);
        if(nalu_type == 0x08) { uint8_t *pps = &data[startCodeFPPSIndex + 1]; [self copyDataWithOriginDataRef:&decoderInfo->f_pps newData:pps size:f_ppsSize]; }}else {
        int vpsSize = startCodeSPSIndex - startCodeVPSIndex - 4;
        decoderInfo->vps_size = vpsSize;
        
        int f_ppsSize = startCodeRPPSIndex - startCodeFPPSIndex - 4;
        decoderInfo->f_pps_size = f_ppsSize;
        
        nalu_type = ((uint8_t) data[startCodeVPSIndex + 1] & 0x4F);
        if (nalu_type == 0x40) {
            uint8_t *vps = &data[startCodeVPSIndex + 1];
            [self copyDataWithOriginDataRef:&decoderInfo->vps newData:vps size:vpsSize];
        }
        
        nalu_type = ((uint8_t) data[startCodeSPSIndex + 1] & 0x4F);
        if (nalu_type == 0x42) {
            uint8_t *sps = &data[startCodeSPSIndex + 1];
            [self copyDataWithOriginDataRef:&decoderInfo->sps newData:sps size:spsSize];
        }
        
        nalu_type = ((uint8_t) data[startCodeFPPSIndex + 1] & 0x4F);
        if (nalu_type == 0x44) {
            uint8_t *pps = &data[startCodeFPPSIndex + 1];
            [self copyDataWithOriginDataRef:&decoderInfo->f_pps newData:pps size:f_ppsSize];
        }
        
        if (startCodeRPPSIndex == 0) {
            return;
        }
        
        int r_ppsSize = size - (startCodeRPPSIndex + 1);
        decoderInfo->r_pps_size = r_ppsSize;
        
        nalu_type = ((uint8_t) data[startCodeRPPSIndex + 1] & 0x4F);
        if (nalu_type == 0x44) {
            uint8_t *pps = &data[startCodeRPPSIndex + 1];
            [self copyDataWithOriginDataRef:&decoderInfo->r_pps newData:pps size:r_ppsSize];
        }
    }
}

- (void)copyDataWithOriginDataRef:(uint8_t **)originDataRef newData:(uint8_t *)newData size:(int)size {
    if (*originDataRef) {
        free(*originDataRef);
        *originDataRef = NULL;
    }
    *originDataRef = (uint8_t *)malloc(size);
    memcpy(*originDataRef, newData, size);
}


Copy the code

3. Create a decoder

The h264 decoder or H265 decoder is determined according to the encoding data type. As shown in the figure above, we need to compose the data into a CMSampleBuffer type to be passed to the decoder for decoding.

generateCMVideoFormatDescriptionRef

Through the SPS (VPS), PPS CMVideoFormatDescriptionRef information. It should be noted that there are two PPS in some code stream data of H265 coding data, so judgment is needed to determine the number of parameters during assembly.

Determine the video data type

By specifying kCVPixelFormatType_420YpCbCr8BiPlanarFullRange sets the video data types to yuv 420 sp, if you need other formats can change the adapter itself.

Specify the callback function
Create encoder

By providing all the information above, can call VTDecompressionSessionCreate generated decoder context object.

    // create decoder
    if(! _decoderSession) { _decoderSession = [self createDecoderWithVideoInfo:videoInfo videoDescRef:&_decoderFormatDescription videoFormat:kCVPixelFormatType_420YpCbCr8BiPlanarFullRange lock:_decoder_lock callback:VideoDecoderCallback decoderInfo:_decoderInfo]; } - (VTDecompressionSessionRef)createDecoderWithVideoInfo:(XDXParseVideoDataInfo *)videoInfo videoDescRef:(CMVideoFormatDescriptionRef *)videoDescRef videoFormat:(OSType)videoFormat lock:(pthread_mutex_t)lock callback:(VTDecompressionOutputCallback)callback decoderInfo:(XDXDecoderInfo)decoderInfo { pthread_mutex_lock(&lock); OSStatus status;if (videoInfo->videoFormat == XDXH264EncodeFormat) {
        const uint8_t *const parameterSetPointers[2] = {decoderInfo.sps, decoderInfo.f_pps};
        const size_t parameterSetSizes[2] = {static_cast<size_t>(decoderInfo.sps_size), static_cast<size_t>(decoderInfo.f_pps_size)};
        status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault,
                                                                     2,
                                                                     parameterSetPointers,
                                                                     parameterSetSizes,
                                                                     4,
                                                                     videoDescRef);
    }else if (videoInfo->videoFormat == XDXH265EncodeFormat) {
        if (decoderInfo.r_pps_size == 0) {
            const uint8_t *const parameterSetPointers[3] = {decoderInfo.vps, decoderInfo.sps, decoderInfo.f_pps};
            const size_t parameterSetSizes[3] = {static_cast<size_t>(decoderInfo.vps_size), static_cast<size_t>(decoderInfo.sps_size), static_cast<size_t>(decoderInfo.f_pps_size)};
            if(@ the available (iOS 11.0. *)) {status = CMVideoFormatDescriptionCreateFromHEVCParameterSets (kCFAllocatorDefault, 3, parameterSetPointers, parameterSetSizes, 4, NULL, videoDescRef); }else {
                status = -1;
                log4cplus_error(kModuleName, "%s: System version is too low!",__func__); }}else {
            const uint8_t *const parameterSetPointers[4] = {decoderInfo.vps, decoderInfo.sps, decoderInfo.f_pps, decoderInfo.r_pps};
            const size_t parameterSetSizes[4] = {static_cast<size_t>(decoderInfo.vps_size), static_cast<size_t>(decoderInfo.sps_size), static_cast<size_t>(decoderInfo.f_pps_size), static_cast<size_t>(decoderInfo.r_pps_size)};
            if(@ the available (iOS 11.0. *)) {status = CMVideoFormatDescriptionCreateFromHEVCParameterSets (kCFAllocatorDefault, 4, parameterSetPointers, parameterSetSizes, 4, NULL, videoDescRef); }else {
                status = -1;
                log4cplus_error(kModuleName, "%s: System version is too low!",__func__); }}}else {
        status = -1;
    }
    
    if(status ! = noErr) {log4cplus_error(kModuleName, "%s: NALU header error !",__func__);
        pthread_mutex_unlock(&lock);
        [self destoryDecoder];
        return NULL;
    }
    
    uint32_t pixelFormatType = videoFormat;
    const void *keys[]       = {kCVPixelBufferPixelFormatTypeKey};
    const void *values[]     = {CFNumberCreate(NULL, kCFNumberSInt32Type, &pixelFormatType)};
    CFDictionaryRef attrs    = CFDictionaryCreate(NULL, keys, values, 1, NULL, NULL);
    
    VTDecompressionOutputCallbackRecord callBackRecord;
    callBackRecord.decompressionOutputCallback = callback;
    callBackRecord.decompressionOutputRefCon   = (__bridge void *)self;
    
    VTDecompressionSessionRef session;
    status = VTDecompressionSessionCreate(kCFAllocatorDefault,
                                          *videoDescRef,
                                          NULL,
                                          attrs,
                                          &callBackRecord,
                                          &session);
    
    CFRelease(attrs);
    pthread_mutex_unlock(&lock);
    if(status ! = noErr) {log4cplus_error(kModuleName, "%s: Create decoder failed",__func__);
        [self destoryDecoder];
        return NULL;
    }
    
    return session;
}

Copy the code

4. Start decoding

Install the raw data from Parse inXDXDecodeVideoInfoStructure for later extension use.

typedef struct {
    CVPixelBufferRef outputPixelbuffer;
    int              rotate;
    Float64          pts;
    int              fps;
    int              source_index;
} XDXDecodeVideoInfo;

Copy the code

Install the encoded data inCMBlockBufferRefIn the.
throughCMBlockBufferRefgenerateCMSampleBufferRef
Decoding data

By VTDecompressionSessionDecodeFrame function can complete decode a frame of video data. The third parameter can specify whether the decoding is synchronous or asynchronous.

// start decode [self startDecode:videoInfo session:_decoderSession lock:_decoder_lock]; . - (void)startDecode:(XDXParseVideoDataInfo *)videoInfo session:(VTDecompressionSessionRef)session lock:(pthread_mutex_t)lock { pthread_mutex_lock(&lock); uint8_t *data = videoInfo->data; int size = videoInfo->dataSize; int rotate = videoInfo->videoRotate; CMSampleTimingInfo timingInfo = videoInfo->timingInfo; uint8_t *tempData = (uint8_t *)malloc(size); memcpy(tempData, data, size); XDXDecodeVideoInfo *sourceRef = (XDXDecodeVideoInfo *)malloc(sizeof(XDXParseVideoDataInfo));
    sourceRef->outputPixelbuffer  = NULL;
    sourceRef->rotate             = rotate;
    sourceRef->pts                = videoInfo->pts;
    sourceRef->fps                = videoInfo->fps;
    
    CMBlockBufferRef blockBuffer;
    OSStatus status = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault,
                                                         (void *)tempData,
                                                         size,
                                                         kCFAllocatorNull,
                                                         NULL,
                                                         0,
                                                         size,
                                                         0,
                                                         &blockBuffer);
    
    if (status == kCMBlockBufferNoErr) {
        CMSampleBufferRef sampleBuffer = NULL;
        const size_t sampleSizeArray[] = { static_cast<size_t>(size) };
        
        status = CMSampleBufferCreateReady(kCFAllocatorDefault,
                                           blockBuffer,
                                           _decoderFormatDescription,
                                           1,
                                           1,
                                           &timingInfo,
                                           1,
                                           sampleSizeArray,
                                           &sampleBuffer);
        
        if (status == kCMBlockBufferNoErr && sampleBuffer) {
            VTDecodeFrameFlags flags   = kVTDecodeFrame_EnableAsynchronousDecompression;
            VTDecodeInfoFlags  flagOut = 0;
            OSStatus decodeStatus      = VTDecompressionSessionDecodeFrame(session,
                                                                           sampleBuffer,
                                                                           flags,
                                                                           sourceRef,
                                                                           &flagOut);
            if(decodeStatus == kVTInvalidSessionErr) {
                pthread_mutex_unlock(&lock);
                [self destoryDecoder];
                if (blockBuffer)
                    CFRelease(blockBuffer);
                free(tempData);
                tempData = NULL;
                CFRelease(sampleBuffer);
                return; } CFRelease(sampleBuffer); }}if (blockBuffer) {
        CFRelease(blockBuffer);
    }
    
    free(tempData);
    tempData = NULL;
    pthread_mutex_unlock(&lock);
}

Copy the code

5. Decoded data

The decoded data is available in the callback function. Here we need to convert the decoded data CVImageBufferRef to CMSampleBufferRef. And then it goes out through the proxy.

#pragma mark - Callback
static void VideoDecoderCallback(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef pixelBuffer, CMTime presentationTimeStamp, CMTime presentationDuration) {
    XDXDecodeVideoInfo *sourceRef = (XDXDecodeVideoInfo *)sourceFrameRefCon;
    
    if (pixelBuffer == NULL) {
        log4cplus_error(kModuleName, "%s: pixelbuffer is NULL status = %d",__func__,status);
        if (sourceRef) {
            free(sourceRef);
        }
        return;
    }
    
    XDXVideoDecoder *decoder = (__bridge XDXVideoDecoder *)decompressionOutputRefCon;
    
    CMSampleTimingInfo sampleTime = {
        .presentationTimeStamp  = presentationTimeStamp,
        .decodeTimeStamp        = presentationTimeStamp
    };
    
    CMSampleBufferRef samplebuffer = [decoder createSampleBufferFromPixelbuffer:pixelBuffer
                                                                    videoRotate:sourceRef->rotate
                                                                     timingInfo:sampleTime];
    
    if (samplebuffer) {
        if ([decoder.delegate respondsToSelector:@selector(getVideoDecodeDataCallback:)]) {
            [decoder.delegate getVideoDecodeDataCallback:samplebuffer];
        }
        CFRelease(samplebuffer);
    }
    
    if (sourceRef) {
        free(sourceRef);
    }
}

- (CMSampleBufferRef)createSampleBufferFromPixelbuffer:(CVImageBufferRef)pixelBuffer videoRotate:(int)videoRotate timingInfo:(CMSampleTimingInfo)timingInfo {
    if(! pixelBuffer) {return NULL;
    }
    
    CVPixelBufferRef final_pixelbuffer = pixelBuffer;
    CMSampleBufferRef samplebuffer = NULL;
    CMVideoFormatDescriptionRef videoInfo = NULL;
    OSStatus status = CMVideoFormatDescriptionCreateForImageBuffer(kCFAllocatorDefault, final_pixelbuffer, &videoInfo);
    status = CMSampleBufferCreateForImageBuffer(kCFAllocatorDefault, final_pixelbuffer, true, NULL, NULL, videoInfo, &timingInfo, &samplebuffer);
    
    if(videoInfo ! = NULL) { CFRelease(videoInfo); }if(samplebuffer == NULL || status ! = noErr) {return NULL;
    }
    
    return samplebuffer;
}
Copy the code

6. Destroy the decoder

Destroy them after use for future use.

    if (_decoderSession) {
        VTDecompressionSessionWaitForAsynchronousFrames(_decoderSession);
        VTDecompressionSessionInvalidate(_decoderSession);
        CFRelease(_decoderSession);
        _decoderSession = NULL;
    }
    
    if (_decoderFormatDescription) {
        CFRelease(_decoderFormatDescription);
        _decoderFormatDescription = NULL;
    }
Copy the code

7. Supplement: About reordering of data with B frames

Note, if the video file or video stream contains B frames, then the rendering needs to do a reorder of the video frames, this article focuses on decoding, sorting will be updated in the following article, code to achieve, if you need to know, please download the Demo.