preface

Recent research on ios hard decoding for a simple summary

Note: This article decodes h264’s bare stream only

H264

composition

  • H264 bit stream consists of NALU unit, which contains video image data and H264 parameter information.
  • The video image data is the CMBlockBuffer, and the H264 parameter information can be combined as FormatDesc. Specifically, the Parameter information includes SPS (Sequence Parameter Set) and PPS (Picture Parameter Set).

We can see that each piece of data is preceded by the startCode fixed information, followed by the type, followed by the actual data. According to the type of NALU, we can determine whether it is PPS, SPS, IDR or other frames.

Split data type

The decoding of H264 is mainly to obtain PPS and SPS parameters

    const uint8_t* const parameterSetPointers[2] = { _sps, _pps };
    const size_t parameterSetSizes[2] = { _spsSize, _ppsSize };
    OSStatus status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault,
                                                                          2.//param count
                                                                          parameterSetPointers,
                                                                          parameterSetSizes,
                                                                          4.//nal start code size
                                                                          &_decoderFormatDescription);
Copy the code

Because they all have the same header, we can split StartCode directly to get each type of data

1. Read the video stream through NSInputStream

self.fileStream = [NSInputStream inputStreamWithFileAtPath:fileName];

    [self.fileStream open];
Copy the code

2. Add the segmented data to the object

// The packet used to fetch each frame
-(VideoPacket*)nextPacket
{
    if(_bufferSize < _bufferCap && self.fileStream.hasBytesAvailable) {
        // To ensure that the data read by backstab is 512 * 1024 length
        NSInteger readBytes = [self.fileStream read:_buffer + _bufferSize maxLength:_bufferCap - _bufferSize];
        _bufferSize += readBytes;// Take it out to the end of the position each time
    }
//
    if(memcmp(_buffer, KStartCode, 4) != 0) { // Check if it is the same
        return nil;
    }
    
    if(_bufferSize >= 5) {  // At least 5 bytes are valid because Starcode takes up 4 bytes
        uint8_t *bufferBegin = _buffer + 4;         // Start position of the frame
        uint8_t *bufferEnd = _buffer + _bufferSize; // The end of the frame
        while(bufferBegin ! = bufferEnd) {// Start position and end are different
            if(*bufferBegin == 0x01) {              // The first byte of the NALU header
                if(memcmp(bufferBegin - 3, KStartCode, 4) = =0) { // It is used to check whether errors occur during transmission. 0 indicates normal and 1 indicates syntax violation
                    NSInteger packetSize = bufferBegin - _buffer - 3;
                    VideoPacket *vp = [[VideoPacket alloc] initWithSize:packetSize];
                    memcpy(vp.buffer, _buffer, packetSize);       // Save the data
                    
                    //_buffer is given a new value
                    memmove(_buffer, _buffer + packetSize, _bufferSize - packetSize); // Move the buffer
                    _bufferSize -= packetSize;
                    
                    returnvp; } } ++bufferBegin; }}return nil;
}
Copy the code

3. As can be seen from the figure above, the fifth byte represents the data type. After converting to base 10, 7 is SPS, 8 is PPS and 5 is IDR (I frame) information, so I frame, B frame and P frame can be decoded directly

- (void)decodeVideoPacket:(VideoPacket *)vp {
    uint32_t nalSize = (uint32_t)(vp.size - 4);
    uint8_t *pNalSize = (uint8_t*)(&nalSize);
    vp.buffer[0] = *(pNalSize + 3);
    vp.buffer[1] = *(pNalSize + 2);
    vp.buffer[2] = *(pNalSize + 1);
    vp.buffer[3] = *(pNalSize);
    
    CVPixelBufferRef pixelBuffer = NULL;
    int nalType = vp.buffer[4] & 0x1F;
    switch (nalType) {
        case 0x05:
            NSLog(@"Nal type is IDR frame");
            if([self initH264Decoder]) {
                pixelBuffer = [self decode:vp];
            }
            break;
        case 0x07:
            NSLog(@"Nal type is SPS");
            _spsSize = vp.size - 4;
            _sps = malloc(_spsSize);
            memcpy(_sps, vp.buffer + 4, _spsSize);
            break;
        case 0x08:
            NSLog(@"Nal type is PPS");
            _ppsSize = vp.size - 4;
            _pps = malloc(_ppsSize);
            memcpy(_pps, vp.buffer + 4, _ppsSize);
            break;
            
        default:
            NSLog(@"Nal type is B/P frame");
            pixelBuffer = [self decode:vp];
            break;
    }
    
    if(pixelBuffer) {
        dispatch_sync(dispatch_get_main_queue(), ^ {if (self.delegate &&[self.delegate respondsToSelector:@selector(WJVideoToolBoxDecoderPixelBuffer:)]) { [self.delegate WJVideoToolBoxDecoderPixelBuffer:pixelBuffer]; }});CVPixelBufferRelease(pixelBuffer);
    }
    
    NSLog(@"Read Nalu size %ld", vp.size);
}
Copy the code

VideoToolBox decoding

Actually get PPS and SPS decoding part of the code is very simple

1. Initialize the decoder

-(BOOL)initH264Decoder {
    if(_deocderSession) {
        return YES;
    }
    
    const uint8_t* const parameterSetPointers[2] = { _sps, _pps };
    const size_t parameterSetSizes[2] = { _spsSize, _ppsSize };
    OSStatus status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault,
                                                                          2.//param count
                                                                          parameterSetPointers,
                                                                          parameterSetSizes,
                                                                          4.//nal start code size
                                                                          &_decoderFormatDescription);
    
    if(status == noErr) {
        CFDictionaryRef attrs = NULL;
        const void *keys[] = { kCVPixelBufferPixelFormatTypeKey };
        // kCVPixelFormatType_420YpCbCr8Planar is YUV420
        // kCVPixelFormatType_420YpCbCr8BiPlanarFullRange is NV12
        uint32_t v = kCVPixelFormatType_420YpCbCr8BiPlanarFullRange;
        const void *values[] = { CFNumberCreate(NULL, kCFNumberSInt32Type, &v) };
        attrs = CFDictionaryCreate(NULL, keys, values, 1.NULL.NULL);
        
        VTDecompressionOutputCallbackRecord callBackRecord;
        callBackRecord.decompressionOutputCallback = didDecompress;
        callBackRecord.decompressionOutputRefCon = NULL;
        
        status = VTDecompressionSessionCreate(kCFAllocatorDefault,
                                              _decoderFormatDescription,
                                              NULL, attrs,
                                              &callBackRecord,
                                              &_deocderSession);
        CFRelease(attrs);
    } else {
        NSLog(@"IOS8VT: reset decoder session failed status=%d", status);
    }

    return YES;
}

Copy the code
  1. Create a decoder callback
static void didDecompress( void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef pixelBuffer, CMTime presentationTimeStamp, CMTime presentationDuration ){
    
    CVPixelBufferRef *outputPixelBuffer = (CVPixelBufferRef *)sourceFrameRefCon;
    *outputPixelBuffer = CVPixelBufferRetain(pixelBuffer);
}
Copy the code
  1. Perform the decoding

-(CVPixelBufferRef)decode:(VideoPacket*)vp {
    CVPixelBufferRef outputPixelBuffer = NULL;
    
    CMBlockBufferRef blockBuffer = NULL;
    OSStatus status  = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault,
                                                          (void*)vp.buffer, vp.size,
                                                          kCFAllocatorNull,
                                                          NULL.0, vp.size,
                                                          0, &blockBuffer);
    if(status == kCMBlockBufferNoErr) {
        CMSampleBufferRef sampleBuffer = NULL;
        const size_t sampleSizeArray[] = {vp.size};
        status = CMSampleBufferCreateReady(kCFAllocatorDefault,
                                           blockBuffer,
                                           _decoderFormatDescription ,
                                           1.0.NULL.1, sampleSizeArray,
                                           &sampleBuffer);
        if (status == kCMBlockBufferNoErr && sampleBuffer) {
            VTDecodeFrameFlags flags = 0;
            VTDecodeInfoFlags flagOut = 0;
            OSStatus decodeStatus = VTDecompressionSessionDecodeFrame(_deocderSession,
                                                                      sampleBuffer,
                                                                      flags,
                                                                      &outputPixelBuffer,
                                                                      &flagOut);
            
            if(decodeStatus == kVTInvalidSessionErr) {
                NSLog(@"IOS8VT: Invalid session, reset decoder session");
            } else if(decodeStatus == kVTVideoDecoderBadDataErr) {
                NSLog(@"IOS8VT: decode failed status=%d(Bad data)", decodeStatus);
            } else if(decodeStatus ! = noErr) {NSLog(@"IOS8VT: decode failed status=%d", decodeStatus);
            }
            
            CFRelease(sampleBuffer);
        }
        CFRelease(blockBuffer);
    }
    
    return outputPixelBuffer;
}
Copy the code

reference

IOS H264 hardware solution and display details

H264VideoToolBox hardware decoding

VideoToolBox video decoding

Can H.264 video streams be hardware decoded in iOS? How to achieve it?