Vision image recognition framework for Swift

  • In 2017, Apple released new models iPhone 8 and iPhone 8Plus, which is not the key point, but the key point is the iPhone X with a value of 9000RMB. Although the online ridicule has never stopped, I think it is still good ha!
  • In terms of software, Apple has also launched iOS 11, which has been tested by my iPhone 7, with fast power consumption and endless bugs in the notification bar
  • Of course, with the release of iOS 11, there are also some new apis, such as: ARKit, Core ML, FileProvider, IdentityLookup, Core NFC, Vison, etc.
  • Here we also talk about Apple’s WWDC 2017 image recognition framework —VisonThe official documentation
  • The Demo address

I. Vision Application Scenarios

  • Face Detection and Recognition: Face detection
    • Supports the detection of smiling face, side face, partial occlusion of the face, wearing glasses and hats, and can mark the rectangular area of the face
    • Can mark the outline of the face and eyes, eyebrows, nose, mouth, teeth, as well as the axis of the face
  • Image Alignment Analysis: Image comparative analysis
  • Barcode Detection: QR code/bar code detection
    • Used to find and identify the barcode in the image
    • Detect bar code information
  • Text Detection: Text detection
    • Find the area of the image where text is visible
    • Detects information in a text area
  • Object Detection and Tracking: Target tracking
    • Faces, rectangles and generic templates

Ii. Image types supported by Vision

1. The Objective – C

  • CVPixelBufferRef
  • CGImageRef
  • CIImage
  • NSURL
  • NSData

2. The Swift

  • CVPixelBuffer
  • CGImage
  • CIImage
  • URL
  • Data

You can view the details in the vnImagerequesthandler. h file of vision. framework

Iii. Vision API introduction

  • Used in thevisionFirst, we need to know what effect we want, and then choose different classes according to the desired effect
  • For all kinds of functionsRequestOffer to aRequestHandler
  • HandlerHold the image information to be identified and distribute the processing results to eachRequestcompletion Block
  • Available from theresultsProperty.ObservationAn array of
  • observationsThe contents of the array are returned differently depending on the requestobservation
  • Each of theseObservationThere areboundingBox.landmarksAnd other attributes, storage is the object after the recognition of the coordinates, point location, etc
  • So once we have the coordinates, we can do some UI painting.

1. RequestHandlerProcessing the request object

  • VNImageRequestHandler: An object that processes one or more image analysis requests related to a single image
    • This class is typically used to process identification requests
    • Initialization method supportCVPixelBuffer.CGImage.CIImage.URL.Data
  • VNSequenceRequestHandler: An object that handles image analysis requests related to multiple image sequences
    • I currently use this class for object tracking
    • The initialization method is the same as above

2. VNRequest is introduced

  • VNRequest: Abstract class for image analysis requests, inherited fromNSObject
  • VNBaseImageRequest: Analysis requests that focus on a specific part of the image
  • Specific analysis request types are as follows:

3. VNObservationDetection object

  • VNObservationAbstract class: image analysis results, inherit andNSObject
  • The related processing classes of image detection results are as follows:

Iv. Actual combat exercise

1. Text detection

  • Method 1: identify the specific location information of each font
  • Method two: identify the position of a line of fonts
  • As shown:

1.1 Now turn the image into initializationVNImageRequestHandlerObject when acceptableCIImage

//1. Convert to CiImage Guardlet ciImage = CIImage(image: image) else { return }
Copy the code

1.2 Creating a Handle for processing requests

  • Parameter 1: Image type
  • Parameter 2: dictionary type, with default [:]
let requestHandle = VNImageRequestHandler(ciImage: ciImage, options: [:])
Copy the code

1.3 Creating a callback closure

  • Two arguments, no return value
  • VNRequest: is the parent of all requests Request
public typealias VNRequestCompletionHandler = (VNRequest, Error?) -> Swift.Void

Copy the code
  • The specific code is as follows:
//4. Set the callback
let completionHandle: VNRequestCompletionHandler = { request, error in
    let observations = request.results
    // An array of recognized objects
}

Copy the code

1.4 Creating an Identification Request

  • There are two initialization methods
/ / no parameters
public convenience init()
    
// Closure arguments
public init(completionHandler: Vision.VNRequestCompletionHandler? = nil)

Copy the code
  • Initialization with closures is used here
let baseRequest = VNDetectTextRectanglesRequest(completionHandler: completionHandle)
Copy the code
  • Property Settings (whether to recognize each literal)
BaseRequest. SetValue (baseRequest.true.forKey: "reportCharacterBoxes") 
Copy the code
  • If this property is not set, a line of text is recognized

1.5 Sending a Request

    open func perform(_ requests: [VNRequest]) throws
Copy the code
  • This method throws an exception error
  • This method must be executed in the child thread during continuous (camera scan) requests, otherwise it will block the thread
//6. Send requests
DispatchQueue.global().async {
    do{
        try requestHandle.perform([baseRequest])
    }catch{
        print(Throws: \ "(error)")}}Copy the code

1.6 Processing identifiedObservationsobject

  • identifiedresultsis[Any]?type
  • According to theboundingBoxProperty to get the size of the corresponding text area
  • Note that:
    • boundingBoxThe obtained dimensions are proportional to iAMGE, which are all less than 1
    • The Y coordinate is opposite to the UIView coordinate system
//1. Get the recognized VNTextObservation
guard let boxArr = observations as? [VNTextObservation] else { return }
        
//2. Create recT array
var bigRects = [CGRect](), smallRects = [CGRect] ()//3. Iterate through the recognition result
for boxObj in boxArr {
    // 3.1 Size conversion
    // Gets the area position of a line of text
    bigRects.append(convertRect(boxObj.boundingBox, image))
    
    / / (2)
    guard let rectangleArr = boxObj.characterBoxes else { continue }
    for rectangle in rectangleArr{
        //3. Get the size of each font
        let boundBox = rectangle.boundingBox
        smallRects.append(convertRect(boundBox, image))
    }
}

Copy the code

Coordinate transformation

/// image coordinates transform
fileprivate func convertRect(_ rectangleRect: CGRect, _ image: UIImage) - >CGRect {
// Convert the actual size of the Image to the size of the imageView
    let imageSize = image.scaleImage()
    let w = rectangleRect.width * imageSize.width
    let h = rectangleRect.height * imageSize.height
    let x = rectangleRect.minX * imageSize.width
    // The y-coordinate is the opposite of UIView's y-coordinate
    let y = (1 - rectangleRect.minY) * imageSize.height - h
    return CGRect(x: x, y: y, width: w, height: h)
}

Copy the code

2 rectangular recognition and static face recognition

  • Identify the rectangles in the image

  • Static face recognition

  • Key core code
//1. Convert to CiImage Guardlet ciImage = CIImage(image: image) else { return} //2. Create a processing requestletrequestHandle = VNImageRequestHandler(ciImage: ciImage, options: [:]) //3. BaseRequest = VNImageBasedRequest() var baseRequest = VNImageBasedRequest Set the callbacklet completionHandle: VNRequestCompletionHandler = { request, error in
    let observations = request.results
    self.handleImageObservable(type: type, image: image, observations, completeBack) } //5. Create an identification request switchtype {
case .rectangle:
    baseRequest = VNDetectRectanglesRequest(completionHandler: completionHandle)
case .staticFace:
    baseRequest = VNDetectFaceRectanglesRequest(completionHandler: completionHandle)
default:
    break
}

Copy the code
  • Handling identification observations
    /// rectangle detectionfileprivate func rectangleDectect(_ observations: [Any]? , image:UIImage, _ complecHandle: JunDetectHandle){
        //1. Obtain the VNRectangleObservation identified
        guard let boxArr = observations as? [VNRectangleObservation] else { return }
        //2. Create recT array
        var bigRects = [CGRect] ()//3. Iterate through the recognition result
        for boxObj in boxArr {
            / / 3.1
            bigRects.append(convertRect(boxObj.boundingBox, image))
        }
        //4. Callback result
        complecHandle(bigRects, [])
    }

Copy the code
  • Static face recognition needs to beobservationintoVNFaceObservation
guard let boxArr = observations as? [VNFaceObservation] else { return }
Copy the code

3. Barcode identification

  • The requested steps are the same as rectangle identification and will not be described here
  • Note that a recognisable bar code type parameter is required when initializing the request
  • So let’s take a look at thatVNDetectBarcodesRequestTwo parameters of
Open class var supportedSymbologies open class var supportedSymbologies [VNBarcodeSymbology] {get} open var symbology: [VNBarcodeSymbology]Copy the code
  • In this step, the recognized bar code type is set to. The request supports all other types as follows
  • Pay attention tosupportedSymbologiesParameter
let request = VNDetectBarcodesRequest(completionHandler: completionHandle)
request.symbologies = VNDetectBarcodesRequest.supportedSymbologies
Copy the code
  • Bar code recognition can not only identify the bar code location information, but also identify the bar code related information, here to the TWO-DIMENSIONAL code as an example
  • Here we need to identifyobservationsinto[VNBarcodeObservation]
  • VNBarcodeObservationThere are three properties
// Barcode type: QR, code128.... , etc.
open var symbology: VNBarcodeSymbology { get }

// Bar code information
open var barcodeDescriptor: CIBarcodeDescriptor? { get }

// If it is a QR code, it is the url of the QR code
open var payloadStringValue: String? { get }
Copy the code
  • As identified by the above picturepayloadStringValueThe parameters are smallJane’s address book
  • The following is the two-dimensional code of the above picture as an exampleCIBarcodeDescriptorobject
  • Those who are interested can study it carefully
    ///
    fileprivate func qrCodeHandle(barCode: CIBarcodeDescriptor?). {//1. Convert to the corresponding bar code object
        guard let code = barCode as? CIQRCodeDescriptor else { return }
        
        //2. Read the bar code information
        let level = code.errorCorrectionLevel.hashValue
        let version = code.symbolVersion
        let mask = code.maskPattern
        let data = code.errorCorrectedPayload
        let dataStr = String(data: data, encoding: .utf8)
        print("Here's the QR code --", level, "-", version, "--", mask, "-", dataStr ?? "")}Copy the code

4. Face feature recognition

  • Can recognize the outline of the face, eyes, nose, mouth and other specific locations

  • VNFaceLandmarks2Dintroduce
    /// Facial contours
    var faceContour: VNFaceLandmarkRegion2D?
    
    // left eye, right eye
    var leftEye: VNFaceLandmarkRegion2D?
    var rightEye: VNFaceLandmarkRegion2D?
    
    /// Left eyelash, right eyelash
    var leftEyebrow: VNFaceLandmarkRegion2D?
    var rightEyebrow: VNFaceLandmarkRegion2D?
    
    /// Left eye pupil, right eye pupil
    var leftPupil: VNFaceLandmarkRegion2D?
    var rightPupil: VNFaceLandmarkRegion2D?
    
    /// Nose, nasal crest, midline
    var nose: VNFaceLandmarkRegion2D?
    var noseCrest: VNFaceLandmarkRegion2D?
    var medianLine: VNFaceLandmarkRegion2D?
    
    /// outer lip, inner lip
    var outerLips: VNFaceLandmarkRegion2D?
    var innerLips: VNFaceLandmarkRegion2D?
Copy the code
Var pointCount: Int {get} @nonobjc public var pointCount: [CGPoint] {get}Copy the code
  • Convert all pixel coordinates to the size coordinates corresponding to the image
  • Using the image context, line the corresponding parts
  • I’ll rewrite it in UIViewfunc draw(_ rect: CGRect)methods
//5.1 Getting the current context
let content = UIGraphicsGetCurrentContext(a)//5.2 Sets the Fill Color (setStroke Sets the stroke color)
UIColor.green.set()
                
//5.3 Set the widthcontent? .setLineWidth(2)
                
//5.4. Set the line type (connection)content? .setLineJoin(.round) content? .setLineCap(.round)//5.5. Set the anti-aliasing effectcontent? .setShouldAntialias(true) content? .setAllowsAntialiasing(true)
                
//5.6 Start drawingcontent? .addLines(between: pointArr) content? .drawPath(using: .stroke)//5.7 Finish drawingcontent? .strokePath()Copy the code

5. Dynamic face recognition and real-time dynamic addition

Because the real machine is not good to record GIF (tried, the effect is not very good, give up), want to see the effect of friends download source real machine run it

  • Here is a picture available for scanning

  • I will not go into the initialization of Request, but the initialization method of Handle

    • CVPixelBuffer: Scans the output objects in real time
//1. Create a processing requestlet faceHandle = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:])

Copy the code
  • It is mainly emphasized that the process of camera scanning and obtaining real-time images must be executed in the sub-thread, or the thread will be blocked, the whole APP will lose response, and the pit will be stepped on personally
DispatchQueue.global().async {
    do{
        try faceHandle.perform([baseRequest])
    }catch{
        print(Throws: \ "(error)")}}Copy the code

Scanning result processing

  • Dynamic face recognition and static face recognition is different, dynamic real-time refresh, update UI, so the method of processing results is the same
  • Dynamic Add: This is done by adding a glass effect
  • You need to get the position and width of both eyes
    • Firstly, all pixels and the number of pixels of left and right eyes were obtained
    • Iterate over all the pixels and convert them to the appropriate coordinates
    • Put all points of the left and right eyes into different arrays to obtain X and Y coordinates respectively
    • Sort the array from small to large to get the maximum and minimum difference of X and the maximum and minimum difference of Y
    • The specific code is as follows
    /// H even to convert the dimension coordinates
    fileprivate func getEyePoint(faceModel: FaceFeatureModel, position: AVCaptureDevice.Position) -> CGRect{
        //1. Obtain left and right eyes
        guard let leftEye = faceModel.leftEye else { return CGRect.zero }
        guard let rightEye = faceModel.rightEye else { return CGRect.zero }

        //2. Position array
        let leftPoint = conventPoint(landmark: leftEye, faceRect: faceModel.faceObservation.boundingBox, position: position)
        let rightPoint = conventPoint(landmark: rightEye, faceRect: faceModel.faceObservation.boundingBox, position: position)

        / / 3. Sort
        let pointXs = (leftPoint. 0 + rightPoint. 0).sorted()
        let pointYs = (leftPoint1. + rightPoint1.).sorted()
        
        //4. Add eyes
        let image = UIImage(named: "eyes")!
        let imageWidth = (pointXs.last ?? 0.0) - (pointXs.first ?? 0) + 40
        let imageHeight = image.size.height / image.size.width * imageWidth
        
        return CGRect(x: (pointXs.first ?? 0) - 20, y: (pointYs.first ?? 0) - 5, width: imageWidth, height: imageHeight)
    }

Copy the code
  • Coordinate processing for each eye
    /// Coordinate conversion
    fileprivate func conventPoint(landmark: VNFaceLandmarkRegion2D, faceRect: CGRect, position: AVCaptureDevice.Position) -> ([CGFloat], [CGFloat]) {/ / 1. Definitions
        var XArray = [CGFloat](), YArray = [CGFloat]()
        let viewRect = previewLayer.frame
        
        / / 2. Traversal
        for i in 0..<landmark.pointCount {
            //2.1 Get the current position and convert it to the appropriate sizelet point = landmark.normalizedPoints[i] let rectWidth = viewRect.width * faceRect.width let rectHeight = viewRect.height * faceRect.height let rectY = viewRect.height - (point.y * rectHeight + faceRect.minY * viewRect.height)  var rectX = point.x * rectWidth + faceRect.minX * viewRect.widthif position == .front{
                rectX = viewRect.width + (point.x - 1) * rectWidth
            }
            XArray.append(rectX)
            YArray.append(rectY)
        }
        
        return (XArray, YArray)
    }

Copy the code
  • Finally gets thisCGRect, add glasses effect

6. Object tracking

  • Introduction to the
    • We click on something on the screen, and Vision tracks that object in real time, based on what we click on
    • When you move the phone or object, the object is identified in the same position as the red box
  • The object that we’re playing here isVNDetectedObjectObservation
  • Define an observation property
fileprivate var lastObservation: VNDetectedObjectObservation?
Copy the code
  • Creates a request to process multiple image sequences
// Handle requests with multiple image sequenceslet sequenceHandle = VNSequenceRequestHandler()
Copy the code
  • Create a trace identification request
//4. Create a trace recognition request
let trackRequest = VNTrackObjectRequest(detectedObjectObservation: lastObservation, completionHandler: completionHandle)
// Set the precision to high
trackRequest.trackingLevel = .accurate

Copy the code
  • When the user clicks on the screen, we want to figure out where the user clicks,
  • Gets a new object object based on the click position
//2. Convert coordinatesletconvertRect = visionTool.convertRect(viewRect: redView.frame, layerRect: previewLayer.frame) //3. Gets a new object based on the click positionlet newObservation = VNDetectedObjectObservation(boundingBox: convertRect)
lastObservation = newObservation
Copy the code
  • Get the result of the scan, if it is aVNDetectedObjectObservationObject, reassign
//1. Get an actual result guardletnewObservation = observations? .first as? VNDetectedObjectObservationelse { return} //2. Reassign self.lastObservation = newObservationCopy the code
  • Gets the coordinate position of the object based on the new value obtained
  • Convert coordinates to change the position of the red box
//4. Coordinate conversionlet newRect = newObservation.boundingBox
let convertRect = visionTool.convertRect(newRect, self.previewLayer.frame)
self.redView.frame = convertRect
Copy the code

That’s all the Vision for iOS 11 in Swift

  • The list may be a little empty and a little messy
  • Xiaobian is also just in touch with Vision, if there are incomplete explanations or mistakes in the article, please do not hesitate to comment

GitHub–The Demo address

  • Note:
  • Here is just a list of the main core code, the specific code logic please refer to demo
  • If the relevant introduction of some places in the article is not very detailed or better advice, welcome to contact xiaobian
  • If convenient, also look at star

Other related articles

  • Generation, recognition and scanning of Swift qr code
  • IOS Black Technology (CoreImage) Static face recognition (a)
  • IOS Black Technology (AVFoundation) Dynamic Face recognition