IOS Black Technology (AVFoundation) Dynamic Face recognition

The last article introduced static face recognition implemented by Core Image. Here we introduce dynamic face recognition, one of the powerful features of AVFoundation

First, introduce some ways of face recognition

1. CoreImageStatic face recognition, can recognize photos, images and so on

  • Check out the previous blog introduction for details

2. Face++

  • Beijing Megvii Technology Co., Ltd. is a new visual service platform, aiming to provide easy-to-use, powerful, platform universal visual services
  • Face++ is a new generation of cloud vision service platform, providing a set of world-leading face detection, face recognition, face analysis of visual technology services
  • Face++ baidu encyclopedia introduction
  • Face++ website

3. OpenCV

  • It is composed of a series of C functions and a small number of C++ classes. It implements many general algorithms in image processing and computer vision, and others are not well understood
  • This is the content of Baidu Encyclopedia

4. Vision

  • Vision is an image recognition framework based on CoreML that Apple launched with iOS 11 at WWDC 2017
  • According to theSee the official document of Vision.VisionItself hasFace Detection and Recognition(Face detection and recognition),Machine Learning Image Analysis(Machine learning image analysis),Barcode Detection(Bar code detection),Text Detection(Text detection)… And so on
  • Interested students can view the relevant documents to learn, here xiaobian will not be introduced

5. AVFoundation

  • Can be used to use and create a framework for time-based audio-visual media
  • The face recognition method we use here is also usedAVFoundationThe framework

Ii. A brief introduction to key classes

1. AVCaptureDevice: indicates the hardware device

  • From this class we can get the camera, sound sensor, and so on for the phone hardware.
  • When we need to change the properties of some hardware devices in the application (such as switching the camera, changing the flash mode, changing the camera focus), we must lock the device first and unlock the device after the modification.
  • Example: Switch the camera
//4. Remove old input and add new input
//4.1 Locking the device
session.beginConfiguration()
//4.2. Remove old devices
session.removeInput(deviceIn)
4.3 Adding a New device
session.addInput(newVideoInput)
4.4 Unlock the device
session.commitConfiguration()

Copy the code

2. AVCaptureDeviceInput: Device input data management object

  • According toAVCaptureDeviceCreate the corresponding AVCaptureDeviceInput object,
  • This object will be added to the AVCaptureSession management, representing the input device, which configudes the ports for the abstract hardware device. Common input devices are (microphone, camera, etc.)

3. AVCaptureOutput: indicates output data

  • The output can be a picture (AVCaptureStillImageOutput) or video (AVCaptureMovieFileOutput)

4. AVCaptureSession: Media (audio and video) capture sessions

  • Responsible for the capture of audio and video data output to the output equipment.
  • aAVCaptureSessionYou can have multiple inputs or outputs.
  • Is the connectionAVCaptureInputandAVCaptureOutputBridge, which coordinates the transfer of data between input and output.
  • It has two methods, startRunning and stopRunning, to start and end a session.
  • Each session is called a session, that is, if you need to change some configurations of the session (for example, switching the camera) during the application running, you need to enable the configuration first and submit the configuration after the configuration is complete.

5. AVCaptureVideoPreviewLayer: Image preview layer

  • How do our photos and videos show up on our phones? That’s by adding this object toUIViewthelayerOn the

Well, the above said so much nonsense, so our face recognition is how to achieve it? Here comes the dry stuff

Add scanning devices

  • Access device (camera)
  • Creating input Devices
  • Create scan output
  • Create a capture callback

1. Output device

  • Used hereAVCaptureMetadataOutput, can scan face, TWO-DIMENSIONAL code, bar code and other information
  • The proxy must be set; otherwise, the scan result cannot be obtained
  • What kind of data do you want to output: face, QR, etc.
//3. Create an output object for the original data
let metadataOutput = AVCaptureMetadataOutput(a)//4. Set the agent to listen to the output data of the object and refresh it in the main thread
metadataOutput.setMetadataObjectsDelegate(self, queue: DispatchQueue.main)

//7. Tell the output object to output what kind of data, face recognition, up to 10 faces can be recognized
metadataOutput.metadataObjectTypes = [.face]
Copy the code

The main code is as follows:

fileprivate func addScaningVideo(){
    //1. Get input device (camera)
    guard let device = AVCaptureDevice.default(for: .video) else { return }
    
    //2. Create an input object based on the input device
    guard let deviceIn = try? AVCaptureDeviceInput(device: device) else { return }
    deviceInput = deviceIn
    
    //3. Create an output object for the original data
    let metadataOutput = AVCaptureMetadataOutput(a)//4. Set the agent to listen to the output data of the object and refresh it in the main thread
    metadataOutput.setMetadataObjectsDelegate(self, queue: DispatchQueue.main)
    //4.2 Setting the output proxy
    faceDelegate = previewView
    
    //5. Set output quality (high pixel output)
    session.sessionPreset = .high
    
    //6. Add input and output to the session
    if session.canAddInput(deviceInput!) {
        session.addInput(deviceInput!)
    }
    if session.canAddOutput(metadataOutput) {
        session.addOutput(metadataOutput)
    }
    
    //7. Tell the output object to output what kind of data, face recognition, up to 10 faces can be recognized
    metadataOutput.metadataObjectTypes = [.face]
    
    //8. Create a preview layer
    previewLayer = AVCaptureVideoPreviewLayer(session: session)
    previewLayer.videoGravity = .resizeAspectFill
    previewLayer.frame = view.bounds
    previewView.layer.insertSublayer(previewLayer, at: 0)
    
    //9. Set valid scan area (the entire screen area by default) (each value is 0 to 1, starting from the upper right corner of the screen)
    metadataOutput.rectOfInterest = previewView.bounds
    
    //10. Start scanning
    if! session.isRunning { DispatchQueue.global().async {self.session.startRunning()
        }
    }
}

Copy the code

2. Switch the camera

  • Gets the current camera direction
  • Create a new input input
  • Remove old inputcaptureTo add new inputcapture
  • The specific code is as follows:
@IBAction func switchCameraAction(_ sender: Any) {
    //1. Execute the transition animation
    let anima = CATransition()
    anima.type = "oglFlip"
    anima.subtype = "fromLeft"
    anima.duration = 0.5
    view.layer.add(anima, forKey: nil)
    
    2. Obtain the current camera
    guard let deviceIn = deviceInput else { return }
    let position: AVCaptureDevice.Position = deviceIn.device.position == .back ? .front : .back
    
    //3. Create new input
    let deviceSession = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: .video, position: position)
    guard let newDevice = deviceSession.devices.filter({ $0.position == position }).first else { return }
    guard let newVideoInput = try? AVCaptureDeviceInput(device: newDevice) else { return }
    
    //4. Remove old input and add new input
    //4.1 Locking the device
    session.beginConfiguration()
    //4.2. Remove old devices
    session.removeInput(deviceIn)
    4.3 Adding a New device
    session.addInput(newVideoInput)
    4.4 Unlock the device
    session.commitConfiguration()
    
    //5. Save the latest input
    deviceInput = newVideoInput
}

Copy the code

3. Process the scan result

Implement AVCaptureMetadataOutputObjectsDelegate the agreement of the method (one way only)

// 'metadataObjects' is an optional public func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection)Copy the code

4. AVMetadataFaceObjectintroduce

  • faceID: Unique identifier of a face
    • Each person that came out of the scan, there was a differentfaceID
    • The same person, in different states (shaking head, tilting head, head up, etc.), will be differentfaceID
  • hasRollAngle: Tilt Angle, roll Angle (left and right head tilt)(BOOL type)
  • rollAngle: The Angle of the roll (CGFloatType)
  • hasYawAngle: Is there any deviation Angle (shaking head from left to right)
  • yawAngle: Deflection Angle

5. Process the scan result

5.1 Get the face array of preview layer

  • Traverse the scanned array of faces and convert it into an array of faces in the preview layer
  • This is mainly the conversion of the face to the left of the layer
  • Returns the transformed array
fileprivate func transformedFaces(faceObjs: [AVMetadataObject]) -> [AVMetadataObject] {
    var faceArr = [AVMetadataObject] ()for face in faceObjs {
        // Convert the scanned face object to the face object in the preview layer (mainly coordinate conversion)
        if let transFace = previewLayer.transformedMetadataObject(for: face){
            faceArr.append(transFace)
        }
    }
    return faceArr
}

Copy the code

5.2 Add a red box according to the position of the face

  • Set the frame of the red box
faceLayer? .frame = face.boundsCopy the code
  • According to the Angle of deflection Angle and inclination AngleCATransform3D
    fileprivate func transformDegress(yawAngle: CGFloat) - >CATransform3D {
        let yaw = degreesToRadians(degress: yawAngle)
        // Rotate around the Y axis
        let yawTran = CATransform3DMakeRotation(yaw, 0.- 1.0)
        // Red box rotation problem
        return CATransform3DConcat(yawTran, CATransform3DIdentity)}// Handle the deflection problem
    fileprivate func transformDegress(rollAngle: CGFloat) - >CATransform3D {
        let roll = degreesToRadians(degress: rollAngle)
        // Rotate around the z-axis
        return CATransform3DMakeRotation(roll, 0.0.1)}// Angle conversion
    fileprivate func degreesToRadians(degress: CGFloat) - >CGFloat{
        return degress * CGFloat(Double.pi) / 180
    }

Copy the code
  • Rotate the red box according to the unbiased Angle and tilt Angle
//3.4 Set deflection Angle (shake head left and right)
if face.hasYawAngle{
    let tranform3D = transformDegress(yawAngle: face.yawAngle)
    
    // matrix processingfaceLayer? .transform =CATransform3DConcat(faceLayer! .transform, tranform3D) }//3.5 Set the tilt Angle and side Angle (left and right head tilting)
if face.hasRollAngle{
    let tranform3D = transformDegress(rollAngle: face.rollAngle)
    
    // matrix processingfaceLayer? .transform =CATransform3DConcat(faceLayer! .transform, tranform3D) }Copy the code
  • At this point, the dynamic face recognition is completed, will increase the red box display in the face position, and the red box will be dynamic according to the position of the face, real-time adjustment
  • Grab your camera and test it out

GitHub–The Demo address

  • Note:
  • Here is just a list of the main core code, the specific code logic please refer to demo
  • If the relevant introduction of some places in the article is not very detailed or better advice, welcome to contact xiaobian

Other related articles

  • Generation, recognition and scanning of Swift qr code
  • IOS Black Technology (CoreImage) Static face recognition (a)
  • Vision image recognition framework for Swift