preface

VideoToolBox is an API developed by apple after iOS8 for hard decoding H264/H265(supported after iOS11).

For those who are not familiar with H264, please take a look at the introduction of H264.

H.264 Basic Introduction

Encoding process

We implemented a simple Demo that took the video data from the camera and then encoded it into H264 raw data and saved it in a sandbox.

1. Create and initialize VideoToolBox

- (void)initVideoToolBox {
    dispatch_sync(encodeQueue  , ^{
        frameNO = 0;
        int width = 480, height = 640;
        OSStatus status = VTCompressionSessionCreate(NULL, width, height, kCMVideoCodecType_H264, NULL, NULL, NULL, didCompressH264, (__bridge void *)(self),  &encodingSession);
        NSLog(@"H264: VTCompressionSessionCreate %d", (int)status);
        if(status ! = 0) { NSLog(@"H264: Unable to create a H264 session");
            return; } / / set the real-time encoding output (avoid delay) VTSessionSetProperty (encodingSession kVTCompressionPropertyKey_RealTime, kCFBooleanTrue); VTSessionSetProperty(encodingSession, kVTCompressionPropertyKey_ProfileLevel, kVTProfileLevel_H264_Baseline_AutoLevel); // Set the GOPsize interval int frameInterval = 24; CFNumberRef frameIntervalRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &frameInterval); VTSessionSetProperty(encodingSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, frameIntervalRef); // Set the expected frame rate int FPS = 24; CFNumberRef fpsRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &fps); VTSessionSetProperty(encodingSession, kVTCompressionPropertyKey_ExpectedFrameRate, fpsRef); Int bitRate = width * height * 3 * 4 * 8; int bitRate = width * height * 3 * 4 * 8; CFNumberRef bitRateRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bitRate); VTSessionSetProperty(encodingSession, kVTCompressionPropertyKey_AverageBitRate, bitRateRef); Int bitRateLimit = width * height * 3 * 4; CFNumberRef bitRateLimitRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bitRateLimit); VTSessionSetProperty(encodingSession, kVTCompressionPropertyKey_DataRateLimits, bitRateLimitRef); / / start coding VTCompressionSessionPrepareToEncodeFrames (encodingSession); }); }Copy the code

Initialization here set the encoding type kCMVideoCodecType_H264, resolution 640 * 480, FPS, GOP, bit rate.

2. Get video data from the camera and send it to VideoToolBox for coding into H264

The core code for initializing the video capture terminal is as follows

// Initialize the camera collection end - (void)initCapture{self.captureSession = [[AVCaptureSession alloc]init]; / / set to record 640 * 480 self captureSession. SessionPreset = AVCaptureSessionPreset640x480; AVCaptureDevice *inputCamera = [self cameraWithPostion:AVCaptureDevicePositionBack]; self.captureDeviceInput = [[AVCaptureDeviceInput alloc] initWithDevice:inputCamera error:nil];if ([self.captureSession canAddInput:self.captureDeviceInput]) {
        [self.captureSession addInput:self.captureDeviceInput];
    }
    
    self.captureDeviceOutput = [[AVCaptureVideoDataOutput alloc] init];
    [self.captureDeviceOutput setAlwaysDiscardsLateVideoFrames:NO]; // Set YUV420p output [self.captureDeviceOutput]setVideoSettings:[NSDictionary dictionaryWithObject:[NSNumber numberWithInt:kCVPixelFormatType_420YpCbCr8BiPlanarFullRange] forKey:(id)kCVPixelBufferPixelFormatTypeKey]];
    
    [self.captureDeviceOutput setSampleBufferDelegate:self queue:captureQueue];
    
    if([self.captureSession canAddOutput:self.captureDeviceOutput]) { [self.captureSession addOutput:self.captureDeviceOutput]; } / / connect AVCaptureConnection * connection = [self. CaptureDeviceOutput connectionWithMediaType: AVMediaTypeVideo]; [connectionsetVideoOrientation:AVCaptureVideoOrientationPortrait];
}
Copy the code

Note that the video resolution and encoder are 640 * 480. The AVCaptureVideoDataOutput type is YUV420p.

Camera data callback

- (void)captureOutput:(AVCaptureOutput *)output didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection{ dispatch_sync(encodeQueue, ^{ [self encode:sampleBuffer]; }); } // encode sampleBuffer - (void) encode:(CMSampleBufferRef)sampleBuffer {CVImageBufferRef imageBuffer = (CVImageBufferRef)CMSampleBufferGetImageBuffer(sampleBuffer); // Frame time. If this parameter is not set, the timeline will be too long. CMTime presentationTimeStamp = CMTimeMake(frameNO++, 1000); VTEncodeInfoFlags flags; OSStatus statusCode = VTCompressionSessionEncodeFrame(encodingSession, imageBuffer, presentationTimeStamp, kCMTimeInvalid, NULL, NULL, &flags);if(statusCode ! = noErr) { NSLog(@"H264: VTCompressionSessionEncodeFrame failed with %d", (int)statusCode);
        
        VTCompressionSessionInvalidate(encodingSession);
        CFRelease(encodingSession);
        encodingSession = NULL;
        return;
    }
    NSLog(@"H264: VTCompressionSessionEncodeFrame Success");
}
Copy the code

3. The data structure CMSampleBufferRef in the framework stores one or more compressed or uncompressed media data; The following figure illustrates two cmSampleBuffers.

CMTime 64-bit value, 32-bit scale, media time format;

The CMBlockBuffer is called raw data;

CVPixelBuffer contains uncompressed pixel data, image width, height, etc.

PixelBufferAttributes CFDictionary includes width and height, pixel formats (RGBA, YUV), usage scenarios (OpenGL ES, Core Animation)

CVPixelBufferPool Buffer pool of CVPixelBuffer, because CVPixelBuffer creation and destruction are expensive

CMVideoFormatDescription video format, including width and height, color space, encoding format information, etc. For H264, SPS and PPS data are also included;

4. Write data to H264 after coding

In this case, the encoding is completed and we first determine whether it is I frame. If we need to read the SPS and PPS parameter set, why do we need to do so?

Let’s take a look at the NALU composition of a naked data H264 (Elementary Stream)

In H.264 raw stream, there is no single SPS, PPS packet or frame, but is attached to the front of I frame, and the general form of storage is

00 00 00 00 01 PPS 00 00 00 00 01

The preceding 00 00 data is called Start Code, they do not belong to the content of SPS, PPS.

SPS (Sequence Parameter Sets) and PPS (Picture Parameter Sets) : H.264 SPS and PPS contain information parameters needed to initialize the H.264 decoder, including profile, level, image width and height, Deblock filter, etc.

SPS and PPS are encapsulated in the CMFormatDescriptionRef, so we need to fetch the SPS and PPS from the CMFormatDescriptionRef and write them to the H264 raw stream.

This makes it easy to understand the process of writing to H264.

The following code

Void didCompressH264(void *outputCallbackRefCon, void *sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer) {
    NSLog(@"didCompressH264 called with status %d infoFlags %d", (int)status, (int)infoFlags);
    if(status ! = 0) {return;
    }
    if(! CMSampleBufferDataIsReady(sampleBuffer)) { NSLog(@"didCompressH264 data is not ready ");
        return; } ViewController* encoder = (__bridge ViewController*)outputCallbackRefCon; bool keyframe = ! CFDictionaryContainsKey( (CFArrayGetValueAtIndex(CMSampleBufferGetSampleAttachmentsArray(sampleBuffer,true), 0)), kCMSampleAttachmentKey_NotSync); // Determine whether the current frame is a key frame // get the SPS & PPS dataif (keyframe)
    {
        CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);
        size_t sparameterSetSize, sparameterSetCount;
        const uint8_t *sparameterSet;
        OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sparameterSet, &sparameterSetSize, &sparameterSetCount, 0 );
        if(statusCode == noErr) {size_t pparameterSetSize, pparameterSetCount; const uint8_t *pparameterSet; OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 1, &pparameterSet, &pparameterSetSize, &pparameterSetCount, 0 );if(statusCode = = noErr) {/ / get the SPS and PPS data NSData * SPS = [NSData dataWithBytes: sparameterSet length: sparameterSetSize]; NSData *pps = [NSData dataWithBytes:pparameterSet length:pparameterSetSize];if(encoder) { [encoder gotSpsPps:sps pps:pps]; } } } } CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer); size_t length, totalLength; char *dataPointer; // Get the data pointer and the total frame length of NALU, The first four bytes stored in OSStatus statusCodeRet = CMBlockBufferGetDataPointer (dataBuffer, 0, & length, & totalLength, & dataPointer);if(statusCodeRet == noErr) { size_t bufferOffset = 0; static const int AVCCHeaderLength = 4; // The first four bytes of nALU data returned are not 0001 startCode, but the frame length of large-ended modewhile(bufferOffset < totalLength - AVCCHeaderLength) { uint32_t NALUnitLength = 0; Memcpy (&NALUnitLength, dataPointer + bufferOffset, AVCCHeaderLength); NALUnitLength = CFSwapInt32BigToHost(NALUnitLength); NSData* data = [[NSData alloc] initWithBytes:(dataPointer + bufferOffset + AVCCHeaderLength) length:NALUnitLength]; [encoder gotEncodedData:data]; // Move to the next NALU cell bufferOffset += AVCCHeaderLength + NALUnitLength; }} // fill the SPS and PPS data - (void)gotSpsPps:(NSData*) SPS PPS :(NSData*) PPS {NSLog(@"gotSpsPps %d %d", (int)[sps length], (int)[pps length]);
    const char bytes[] = "\x00\x00\x00\x01";
    size_t length = (sizeof bytes) - 1; //string literals have implicit trailing '\ 0'NSData *ByteHeader = [NSData dataWithBytes:bytes length:length]; // Write startCode [self.h264FileHandle writeData:ByteHeader]; [self.h264FileHandle writeData:sps]; // Write startCode [self.h264FileHandle writeData:ByteHeader]; [self.h264FileHandle writeData:pps]; } // populate NALU data - (void)gotEncodedData:(NSData*)data {NSLog(@"gotEncodedData %d", (int)[data length]);
    if(self.h264FileHandle ! = NULL) { const char bytes[] ="\x00\x00\x00\x01";
        size_t length = (sizeof bytes) - 1; //string literals have implicit trailing '\ 0'NSData *ByteHeader = [NSData dataWithBytes:bytes length:length]; // Write startCode [self.h264FileHandle writeData:ByteHeader]; // Write NALU data [self.h264FileHandle writeData:data]; }}Copy the code

Destroy session after finishing encoding

- (void)EndVideoToolBox
{
    VTCompressionSessionCompleteFrames(encodingSession, kCMTimeInvalid);
    VTCompressionSessionInvalidate(encodingSession);
    CFRelease(encodingSession);
    encodingSession = NULL;
}
Copy the code

This completes the H264 coding using VideoToolbox. Encoded H264 files can be retrieved from the sandbox.

conclusion

Just look at the process without looking at the code is certainly not learning the framework, code yourself try! IOS – VideoToolboxer-Demo