Capture part of the camera image in iOS development and cut the Crop sample buffer (Crop sampleBuffer)

Requirement of this example: in the function interface like live broadcast, TWO-DIMENSIONAL code scanning, face recognition or other requirements of the function interface or other requirements, it is necessary to separate part of the area from the picture captured by the camera.

Principle: Part because of the need to capture camera to capture the whole picture, so it is necessary to get the data of part of the picture, again because the camera AVCaptureVideoDataOutputSampleBufferDelegate of sampleBuffer not direct operation for the system of private data structure, Therefore, it is necessary to convert it into a data structure that can be cut and then cut. There is an idea on the Internet that the sampleBuffer is indirectly converted into UIImage and then the image is cut. This idea is cumbersome and has low performance. In this example, sampleBuffer is converted into CIImage in CoreImage, which has relatively high performance and reduces code complexity.

The final effect is as follows, the screenshot is in the green box, long press can move.

GitHub address (with code)Crop sample buffer

Letter Address:Crop sample buffer

Blog Address:Crop sample buffer

Address of nuggets:Crop sample buffer

Note: Using ARC is different from using MRC code, which is already marked in the project to manage the global CIContext object. The compiler does not retain the CIContext object in the initialization method, so the call will report an error.

Usage scenarios

The background resolution captured by the camera in this project is set to 2K by default (1920*1080), which can be switched to 4K, so it needs iPhone 6s or higher to support it.
In this example, you can use CPU or GPU for cutting. In VC, you need to set the isOpenGPU value before cropView initialization. If you enable it, GPU is used
In this example, only the Crop function under the landscape screen is implemented. The default state is always landscape in this example, without portrait processing.

The basic configuration

1. Configure the basic camera environment (initialize AVCaptureSession, set up proxy, open), as shown in the sample code, which will not be repeated here.

2. Through AVCaptureVideoDataOutputSampleBufferDelegate agent to get the original image data (CMSampleBufferRef) for processing

Implement way

1. Interception by CPU software (CPU is used for calculation and cutting, which has high consumption performance)

(CMSampleBufferRef) cropSampleBufferBySoftware (CMSampleBufferRef) sampleBuffer;

2. Use of hardware interception (use of Apple’s official public method to use hardware to cut, good performance, but there are still problems to be solved)

(CMSampleBufferRef) cropSampleBufferByHardware (CMSampleBufferRef) buffer;

parsing

// Called whenever an AVCaptureVideoDataOutput instance outputs a new video frame. - (void)captureOutput:(AVCaptureOutput *)captureOutput is called once every video frame is generated didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection { CMSampleBufferRef cropSampleBuffer;# Warning Either of the two cutting methods, GPU cutting performance is better, CPU cutting depends on the device, generally long time will drop frames.
    if (self.isOpenGPU) {
         cropSampleBuffer = [self.cropView cropSampleBufferByHardware:sampleBuffer];
    }else{ cropSampleBuffer = [self.cropView cropSampleBufferBySoftware:sampleBuffer]; } CFRelease(cropSampleBuffer); }Copy the code

The above method is the camera agent that is called every time a video frame is generated, where sampleBuffer is the original data of each frame, and the original data needs to be cut to meet the requirements of this example. Note that cropSampleBuffer must be released at the end to avoid memory overflow and flash back.

Using CPU interception

- (CMSampleBufferRef)cropSampleBufferBySoftware:(CMSampleBufferRef)sampleBuffer {
    OSStatus status;
    
    //    CVPixelBufferRef pixelBuffer = [self modifyImage:buffer];
    CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
    // Lock the image buffer
    CVPixelBufferLockBaseAddress(imageBuffer,0);
    // Get information about the image
    uint8_t *baseAddress     = (uint8_t *)CVPixelBufferGetBaseAddress(imageBuffer);
    size_t  bytesPerRow      = CVPixelBufferGetBytesPerRow(imageBuffer);
    size_t  width            = CVPixelBufferGetWidth(imageBuffer);
    // size_t  height           = CVPixelBufferGetHeight(imageBuffer);
    NSInteger bytesPerPixel  =  bytesPerRow/width;
    
    // YUV 420 Rule
    if(_cropX % 2 ! = 0) _cropX += 1; NSInteger baseAddressStart = _cropY*bytesPerRow+bytesPerPixel*_cropX; static NSInteger lastAddressStart = 0; lastAddressStart = baseAddressStart; // PixBuffer and videoInfo need to be updated only when the camera is rebooted or the position is changed"demon pix first : %zu - %zu - %@ - %d - %d - %d -%d",width, height, self.currentResolution,_cropX,_cropY,self.currentResolutionW,self.currentResolutionH);
    static CVPixelBufferRef            pixbuffer = NULL;
    static CMVideoFormatDescriptionRef videoInfo = NULL;
    
    // x,y changed need to reset pixbuffer and videoinfo
    if(lastAddressStart ! = baseAddressStart) {if(pixbuffer ! = NULL) { CVPixelBufferRelease(pixbuffer); pixbuffer = NULL; }if (videoInfo != NULL) {
            CFRelease(videoInfo);
            videoInfo = NULL;
        }
    }
    
    if (pixbuffer == NULL) {
        NSDictionary *options = [NSDictionary dictionaryWithObjectsAndKeys:
                                 [NSNumber numberWithBool : YES],           kCVPixelBufferCGImageCompatibilityKey,
                                 [NSNumber numberWithBool : YES],           kCVPixelBufferCGBitmapContextCompatibilityKey,
                                 [NSNumber numberWithInt  : g_width_size],  kCVPixelBufferWidthKey,
                                 [NSNumber numberWithInt  : g_height_size], kCVPixelBufferHeightKey,
                                 nil];
        
        status = CVPixelBufferCreateWithBytes(kCFAllocatorDefault, g_width_size, g_height_size, kCVPixelFormatType_32BGRA, &baseAddress[baseAddressStart], bytesPerRow, NULL, NULL, (__bridge CFDictionaryRef)options, &pixbuffer);
        if(status ! = 0) { NSLog(@"Crop CVPixelBufferCreateWithBytes error %d",(int)status);
            return NULL;
        }
    }
    
    CVPixelBufferUnlockBaseAddress(imageBuffer,0);
    
    CMSampleTimingInfo sampleTime = {
        .duration               = CMSampleBufferGetDuration(sampleBuffer),
        .presentationTimeStamp  = CMSampleBufferGetPresentationTimeStamp(sampleBuffer),
        .decodeTimeStamp        = CMSampleBufferGetDecodeTimeStamp(sampleBuffer)
    };
    
    if (videoInfo == NULL) {
        status = CMVideoFormatDescriptionCreateForImageBuffer(kCFAllocatorDefault, pixbuffer, &videoInfo);
        if(status ! = 0) NSLog(@"Crop CMVideoFormatDescriptionCreateForImageBuffer error %d",(int)status);
    }
    
    CMSampleBufferRef cropBuffer = NULL;
    status = CMSampleBufferCreateForImageBuffer(kCFAllocatorDefault, pixbuffer, true, NULL, NULL, videoInfo, &sampleTime, &cropBuffer);
    if(status ! = 0) NSLog(@"Crop CMSampleBufferCreateForImageBuffer error %d",(int)status);
    
    lastAddressStart = baseAddressStart;
    
    return cropBuffer;
}

Copy the code

CVImageBufferRef data structure is extracted from CMSampleBufferRef, and then CVImageBufferRef is locked. If page rendering is to be carried out, An image compatible with OpenGL buffering is required. The images created with the camera API are already compatible, and you can map them immediately for input. Suppose you are taking a new image from an existing one and using it for other purposes, you must create a special property to create the image. The image property must have Crop width and height as the Key of the dictionary. Therefore, the Key steps of creating the dictionary cannot be omitted.

Calculation of position

In soft cutting, we get the data of a picture and determine the real Crop position by traversing the data. The specific position can be determined by using the following formula. The specific cutting principle is mentioned in [Introduction to YUV], and all the variables needed in calculation can be obtained in the above code.

 `NSInteger baseAddressStart = _cropY*bytesPerRow+bytesPerPixel*_cropX;
    `
Copy the code

Note:

1. The X, Y coordinates for correction, for CVPixelBufferCreateWithBytes is carried out in accordance with the pixel cutting, so you need to point to pixels, then according to the proportion to calculate the current position. Is the code of int cropX = (int) (currentResolutionW/kScreenWidth * self. CropView. Frame. The origin, x); CurrentResolutionW is the width of the current resolution, and kScreenWidth is the actual screen width.
2. According to the rule of YUV 420, every 4 Y’s share 1 UV, and there are 2 Y’s in a row, so the points must be selected according to the even number. The YUV separation method is used in CPU cutting. For details about the cutting method, see YUV introduction
3. In this example, pixelBuffer and videoInfo are declared static variables. To save memory for each creation, they need to be reset in three cases: position change, resolution change, and camera restart. Note the detailed mention at the end of the article.

// hardware crop
- (CMSampleBufferRef)cropSampleBufferByHardware:(CMSampleBufferRef)buffer {
    // a CMSampleBuffer CVImageBuffer of media data.
    
    CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(buffer);
    CGRect           cropRect    = CGRectMake(_cropX, _cropY, g_width_size, g_height_size);
    //        log4cplus_debug("Crop"."dropRect x: %f - y : %f - width : %zu - height : %zu", cropViewX, cropViewY, width, height);
    
    /*
     First, to render to a texture, you need an image that is compatible with the OpenGL texture cache. Images that were created with the camera API are already compatible and you can immediately map them for inputs. Suppose you want to create an image to render on and later read out for some other processing though. You have to have create the image with a special property. The attributes forthe image must have kCVPixelBufferIOSurfacePropertiesKey as one of the keys to the dictionary. To render a page, you need an image that is compatible with OpenGL buffering. The images created with the camera API are already compatible, and you can map them immediately for input. Suppose you are taking a new image from an existing one and using it for other purposes, you must create a special property to create the image. For image attributes must have a kCVPixelBufferIOSurfacePropertiesKey as the Key of the dictionary. Therefore, do not omit the following steps */ OSStatus status; /* Only resolution has changed we need to reset pixBuffer and videoInfo so that reduce calculate count */ static CVPixelBufferRef pixbuffer = NULL; static CMVideoFormatDescriptionRef videoInfo = NULL;if (pixbuffer == NULL) {
        NSDictionary *options = [NSDictionary dictionaryWithObjectsAndKeys:
                                 [NSNumber numberWithInt:g_width_size],     kCVPixelBufferWidthKey,
                                 [NSNumber numberWithInt:g_height_size],    kCVPixelBufferHeightKey, nil];
        status = CVPixelBufferCreate(kCFAllocatorSystemDefault, g_width_size, g_height_size, kCVPixelFormatType_420YpCbCr8BiPlanarFullRange, (__bridge CFDictionaryRef)options, &pixbuffer);
        // ensures that the CVPixelBuffer is accessible in system memory. This should only be called if the base address is going to be used and the pixel data will be accessed by the CPU
        if(status ! = noErr) { NSLog(@"Crop CVPixelBufferCreate error %d",(int)status);
            return NULL;
        }
    }
    
    CIImage *ciImage = [CIImage imageWithCVImageBuffer:imageBuffer];
    ciImage = [ciImage imageByCroppingToRect:cropRect];
    // Ciimage get real image is not in the original point  after excute crop. So we need to pan.
    ciImage = [ciImage imageByApplyingTransform:CGAffineTransformMakeTranslation(-_cropX, -_cropY)];
    
    static CIContext *ciContext = nil;
    if (ciContext == nil) {
        //        NSMutableDictionary *options = [[NSMutableDictionary alloc] init];
        //        [options setObject:[NSNull null] forKey:kCIContextWorkingColorSpace];
        //        [options setObject:@0            forKey:kCIContextUseSoftwareRenderer];
        EAGLContext *eaglContext = [[EAGLContext alloc] initWithAPI:kEAGLRenderingAPIOpenGLES3];
        ciContext = [CIContext contextWithEAGLContext:eaglContext options:nil];
    }
    [ciContext render:ciImage toCVPixelBuffer:pixbuffer];
    //    [ciContext render:ciImage toCVPixelBuffer:pixbuffer bounds:cropRect colorSpace:nil];
    
    CMSampleTimingInfo sampleTime = {
        .duration               = CMSampleBufferGetDuration(buffer),
        .presentationTimeStamp  = CMSampleBufferGetPresentationTimeStamp(buffer),
        .decodeTimeStamp        = CMSampleBufferGetDecodeTimeStamp(buffer)
    };
    
    if (videoInfo == NULL) {
        status = CMVideoFormatDescriptionCreateForImageBuffer(kCFAllocatorDefault, pixbuffer, &videoInfo);
        if(status ! = 0) NSLog(@"Crop CMVideoFormatDescriptionCreateForImageBuffer error %d",(int)status);
    }
    
    CMSampleBufferRef cropBuffer;
    status = CMSampleBufferCreateForImageBuffer(kCFAllocatorDefault, pixbuffer, true, NULL, NULL, videoInfo, &sampleTime, &cropBuffer);
    if(status ! = 0) NSLog(@"Crop CMSampleBufferCreateForImageBuffer error %d",(int)status);
    
    return cropBuffer;
}

Copy the code

The above is the method of hardware cutting. Hardware cutting uses GPU for cutting and mainly uses CIContext object in CoreImage for rendering.
Coordinates CoreImage and UIKit coordinates CoreImage and UIKit coordinates At the beginning, I cut the image with the set position as normal, but I found that the cut position was wrong. After searching online, I found an interesting phenomenon that CoreImage is different from UIKit coordinate system as shown below: normal UIKit coordinate system is based on the upper left corner:

The CoreImage coordinate system is the origin of the lower left corner :(in CoreImage, the coordinate system of each image is independent of the device)

So when you cut, you have to switch Y, X is in the right position, Y is in the opposite position.

To render a page, you need an image that is compatible with OpenGL buffering. The images created with the camera API are already compatible, and you can map them immediately for input. Suppose you are taking a new image from an existing one and using it for other purposes, you must create a special property to create the image. Attributes for images must have width and height as dictionary keys. Therefore, the Key steps of creating a dictionary cannot be omitted.
There are two cutting methods available for cutting CoreImage:

ciImage = [ciImage imageByCroppingToRect:cropRect];Render if this line is used[ciContext render:ciImage toCVPixelBuffer:pixelBuffer];
Or just use:[ciContext render:ciImage toCVPixelBuffer:pixelBuffer bounds:cropRect colorSpace:nil];

Note: CIContext contains a lot of context information about the image and cannot be called more than once in a callback. But notice the DIFFERENCE between ARC and MRC.

Note:

1. The use of ARC is different from the code under MRC, which has been marked in the project, mainly for managing the global CIContext object. The compiler does not retain it in the initialization method, so the call will report an error.

2. Switch the front and rear cameras: Because the front and rear cameras of different models differ greatly, one solution is to add the attribute of the resolution supported by the front and rear cameras in the PLIST file recording crop of iPhone models, and then reference them in the code according to the model mapped by PLIST. Another solution is to perform automatic degradation processing, for example, the rear support 2K, the front support 720P, then after the conversion detection does not support 2K, the front will automatically lower one level, until the required level is found. If this operation processing logic is more and not easy to understand at first glance, and the front cutting function is not applicable to the scope, so only post-cutting is supported for the time being.

added

Logical screen resolution vs. video resolution

[UIScreen mainScreen].bounds.size. Width [UIScreen mainScreen].bounds.size. Width Therefore, point can be simply understood as the coordinate system in iOS development to facilitate the description of interface elements.
Pixel: Pixel is a more precise unit than point, 1 point equals 1 Pixel on a normal screen and 1 point equals 2 pixels on a Retina display.
Resolution The resolution depends on the maximum resolution supported by different models. For example, iPhone 6S and later support 4K (3840 x 2160) video resolution. The API we call when we operate Crop is cut by pixels, so we operate in pixel instead of point. More on that below.

ARC, MRC work is different

Initialization of CIContext

CIContext should be declared as a global variable or a static variable, because CIContext initialization contains a large amount of information, which consumes memory and is only used for rendering. Therefore, there is no need to initialize CIContext every time. Manual retain is required, but this is done automatically under ARC.

ARC:

static CIContext *ciContext = NULL;
ciContext = [CIContext contextWithOptions:nil];
Copy the code

MRC:

static CIContext *ciContext = NULL;
ciContext = [CIContext contextWithOptions:nil];
[ciContext retain];
Copy the code

Coordinate the problem

#####1. Understand the correspondence between points and pixels. Firstly, CropView needs to be displayed on the mobile phone, so the coordinate system is still UIKit coordinate system, the upper left corner is the origin, and the width and height are the width and height of different mobile phones (e.g. IPhone8:375*667, iPhone8P: 414 * 736, iPhoneX: 375 * 816), but we need to work out the coordinates of CropView under the actual resolution, that is, we can convert the positions of x and Y points of CropView currently obtained into the corresponding positions of Pixel.

// Note that this is the pixel coordinates of X, using iPhone 8 as an example (375 * 667), The resolution is (1920 * 1080) _cropX = (int)(_currentResolutionW / _screenWidth * (cropView.frame.Origine.x); That is, _cropX = (int)(1920/375 * the x coordinates of the current cropView;Copy the code

#####2. The coordinate system positions are different for CPU/GPU cutting

The origin position

CPU: UIKit is the coordinate system with the origin in the upper left corner

GPU: CoreImage is the coordinate system with the origin in the lower left corner

Therefore, if GPU is used in the calculation, the coordinate of Y is opposite, and we need to convert the point to the point in the normal coordinate system with the upper left corner as the origin through the following formula.

_cropY  = (int)(_currentResolutionH / _screenHeight * (_screenHeight - self.frame.origin.y  -  self.frame.size.height)); 
Copy the code

#####3. When the phone screen is not 16:9, there will be deviation if the video is set to fill the screen

Note that because some phones or ipads don’t have 16:9 screens (iPhone X, all ipads (4: 3)), if we are at 2k(1920 * 1080), 4 k resolution (3840 * 2160) to display the View set up under the captureVideoPreviewLayer. VideoGravity = AVLayerVideoGravityResizeAspectFill; So will sacrifice part of the video screen fill view, namely the camera to capture video data is not complete show in the mobile phone view, so use our crop function again, because we are using the UIKit coordinate system, that is to say the origin (0, 0) and not the frame picture really pixels (0, 0), and if you need to write a lot of extra code calculation, So we can set up under the function of Crop captureVideoPreviewLayer. VideoGravity = AVLayerVideoGravityResizeAspect; In this case, the video view is adjusted to show the full video based on the resolution. However, if the device is iPhoneX (the ratio is greater than 16:9, the X-axis will shrink and the black edge will fill),iPad(the ratio is less than 16:9, the Y-axis will shrink and the black edge will fill).

According to the above analysis, the points we calculated before will be biased, because the equivalent x or y axis will shrink by a fraction, and we will still get the coordinates of cropView relative to the entire parent View.

At this point, it’s a lot of code if we keep changing the cropView, so I’ve defined a videoRect property here to record the actual Rect of Video, because when the program runs we can get the ratio of screen width to height, Therefore, the recT of real Video can be obtained by determining the aspect ratio. At this time, in the subsequent code, we only need to pass in the size of videoRect for calculation, even if the original normal 16:9 behind the phone API does not need to change.

#####4. Why do we use int in soft cuts when we create pixelBuffer

CV_EXPORT CVReturn CVPixelBufferCreateWithBytes(
   CFAllocatorRef CV_NULLABLE allocator,
   size_t width,
   size_t height,
   OSType pixelFormatType,
   void * CV_NONNULL baseAddress,
   size_t bytesPerRow,
   CVPixelBufferReleaseBytesCallback CV_NULLABLE releaseCallback,
   void * CV_NULLABLE releaseRefCon,
   CFDictionaryRef CV_NULLABLE pixelBufferAttributes,
   CV_RETURNS_RETAINED_PARAMETER CVPixelBufferRef CV_NULLABLE * CV_NONNULL pixelBufferOut)
Copy the code

In this API, we need to put the x and y points in the baseAddress, and we need to use the formula NSInteger baseAddressStart = _cropY*bytesPerRow+bytesPerPixel*_cropX; But in YUV 420 we can’t pass in an odd number of X points, so we need if (_cropX % 2! = 0) _cropX += 1; , and only integers can do the remainder, so we’re going to call all the points here int, and we’re going to ignore the error of the decimal point in our view presentation.

TODO:

Found in hardware (GPU) in the process of cutting [ciContext render: ciImage toCVPixelBuffer: pixelBuffer]; CiContext initializations only once without memory leaks. If input resolution is 2k, the cut 720P performs better on the 7Plus, but other models and sizes have serious frame drops. On the other hand, software cutting (CPU) has a relatively stable performance although its CPU usage is about 15% higher than that of GPU. Frame dropping only occasionally occurs after a long live broadcast, but its CPU usage is relatively high.

Capture part of the camera image in iOS development and cut the Crop sample buffer (Crop sampleBuffer)