The Vision framework was launched in 2017 to make it easy for mobile App developers to leverage computer Vision algorithms. Specifically, the Vision framework includes a number of pre-trained deep learning models that also act as wrappers to quickly execute your customized Core ML models.

After introducing Text Recognition and VisionKit in iOS 13 to enhance OCR, Apple is now focusing on motion and motion classification in iOS 14 Vision framework.

In the previous article, we said that Vision framework can do Contour Detection, Optical Flow Request, And provides a series of offline video processing tools. But more importantly, Hand and Body Pose Estimation is now possible, which opens up even more possibilities for augmented reality and computer vision.

In this article, we will use gesture estimation to build an iOS App that can also sense gestures without touchless.

I’ve already published a post showing how ML Kit’s face Detection API can be used to build contactless iOS apps. I think this Prototype works great and can be integrated into dating apps like Tinder or Bumble. However, this can cause eye strain or headaches by constantly blinking and turning the head.

So let’s simply extend this paradigm by using gestures instead of touch to swipe left and right. These days, after all, it makes sense to use your phone to live lazier lives or practice social distancing. Before we delve further, let’s take a look at how to create a visual gesture request in iOS 14.

Visual gesture estimation

The new VNDetectHumanHandPoseRequest, is an image based visual request, used to detect a person’s gestures. In the instance of type don’t for VNHumanHandPoseObservation, this request will in each hand back 21 markers (Landmark Point). We can set the maximumHandCount value to control the maximum number of frames that can be detected during visual processing.

We can simply use enumeration (enum) like this in our example to get the array of markers for each finger:

try observation.recognizedPoints(.thumb)

try observation.recognizedPoints(.indexFinger)

try observation.recognizedPoints(.middleFinger)

try observation.recognizedPoints(.ringFinger)

try observation.recognizedPoints(.littleFinger)
Copy the code

There is also a marker of the wrist, which is located at the center of the wrist. It does not belong to any of these groups, but is in the All group. You can obtain it by:

let wristPoints = try observation.recognizedPoints(.all)

After we get the above array of marker points, we can extract each point independently as follows:

guard  let thumbTipPoint = thumbPoints[.thumbTip],

let indexTipPoint = indexFingerPoints[.indexTip],

let middleTipPoint = middleFingerPoints[.middleTip],

let ringTipPoint = ringFingerPoints[.ringTip],

let littleTipPoint = littleFingerPoints[.littleTip],

let wristPoint = wristPoints[.wrist]else  {return}
Copy the code

ThumbIP, thumbMP, thumbCMC are other markers that can be obtained in the Thumb group, which also applies to other fingers.

Each individual marker object contains its position and confidence threshold in the AVFoundation coordinate system.

Next, we can find distance or Angle information from point to point to create a gesture processor. For example, in Apple’s demo App, they calculate the distance between the thumb and the tip of the index finger to create a pinch gesture.

start

Now that we know the basics of visual gesture requests, we can dive into how to implement them!

Open Xcode and create a new UIKit App. Make sure you have iOS 14 as your development target and set the NSCameraUsageDescription string in info.plist.

In my previous article, I described how to create a Tinder-style card with an animation, which is now a direct reference to the final code at the time.

Similarly, you can refer here to the code for the StackContainerView.swift category, which is used to store multiple Tinder cards.

Use AVFoundation to set up the camera

Next, let’s use Apple’s AVFoundation framework to build a custom camera.

Here is the code for the viewController.swift file:

class ViewController: UIViewController, HandSwiperDelegate{ //MARK: - Properties var modelData = [DataModel(bgColor: .systemYellow), DataModel(bgColor: .systemBlue), DataModel(bgColor: .systemRed), DataModel(bgColor: .systemTeal), DataModel(bgColor: .systemOrange), DataModel(bgColor: .brown)] var stackContainer : StackContainerView! var buttonStackView: UIStackView! var leftButton : UIButton! , rightButton : UIButton! var cameraView : CameraView! //MARK: - Init override func loadView() { view = UIView() stackContainer = StackContainerView() view.addSubview(stackContainer) configureStackContainer() stackContainer.translatesAutoresizingMaskIntoConstraints = false addButtons() configureNavigationBarButtonItem() addCameraView() } override func viewDidLoad() { super.viewDidLoad() title = "HandPoseSwipe" stackContainer.dataSource = self } private let videoDataOutputQueue = DispatchQueue(label: "CameraFeedDataOutput", qos: .userInteractive) private var cameraFeedSession: AVCaptureSession? private var handPoseRequest = VNDetectHumanHandPoseRequest() let message = UILabel() var handDelegate : HandSwiperDelegate? func addCameraView() { cameraView = CameraView() self.handDelegate = self view.addSubview(cameraView) cameraView.translatesAutoresizingMaskIntoConstraints = false cameraView.bottomAnchor.constraint(equalTo: view.bottomAnchor).isActive = true cameraView.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = true cameraView.widthAnchor.constraint(equalToConstant: 150).isActive = true cameraView.heightAnchor.constraint(equalToConstant: 150).isActive = true } //MARK: - Configurations func configureStackContainer() { stackContainer.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = true stackContainer.centerYAnchor.constraint(equalTo: view.centerYAnchor, constant: -60).isActive = true stackContainer.widthAnchor.constraint(equalToConstant: 300).isActive = true stackContainer.heightAnchor.constraint(equalToConstant: 400).isActive = true } func addButtons() { //full source of UI setup at the end of this article } @objc func OnButtonPress (Sender: UIButton){uiView. animate(withDuration: 2.0, delay: 0, usingSpringWithDamping: CGFloat (0.20), initialSpringVelocity: CGFloat (6.0), the options: UIView. AnimationOptions. AllowUserInteraction, animations: { sender.transform = CGAffineTransform.identity }, completion: { Void in() }) if let firstView = stackContainer.subviews.last as? TinderCardView{ if sender.tag == 0{ firstView.leftSwipeClicked(stackContainerView: stackContainer) } else{ firstView.rightSwipeClicked(stackContainerView: stackContainer) } } } func configureNavigationBarButtonItem() { navigationItem.rightBarButtonItem = UIBarButtonItem(title: "Reset", style: .plain, target: self, action: #selector(resetTapped)) } @objc func resetTapped() { stackContainer.reloadData() } override func viewDidAppear(_ animated: Bool) { super.viewDidAppear(animated) do { if cameraFeedSession == nil { cameraView.previewLayer.videoGravity = .resizeAspectFill try setupAVSession() cameraView.previewLayer.session = cameraFeedSession } cameraFeedSession? .startRunning() } catch { AppError.display(error, inViewController: self) } } override func viewWillDisappear(_ animated: Bool) { cameraFeedSession? .stopRunning() super.viewWillDisappear(animated) } func setupAVSession() throws { // Select a front facing camera, make an input. guard let videoDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .front) else { throw AppError.captureSessionSetup(reason: "Could not find a front facing camera.") } guard let deviceInput = try? AVCaptureDeviceInput(device: videoDevice) else { throw AppError.captureSessionSetup(reason: "Could not create video device input.") } let session = AVCaptureSession() session.beginConfiguration() session.sessionPreset = AVCaptureSession.Preset.high // Add a video input. guard session.canAddInput(deviceInput) else {  throw AppError.captureSessionSetup(reason: "Could not add video device input to the session") } session.addInput(deviceInput) let dataOutput = AVCaptureVideoDataOutput() if session.canAddOutput(dataOutput) { session.addOutput(dataOutput) // Add a video data output. dataOutput.alwaysDiscardsLateVideoFrames = true dataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)] dataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue) } else { throw AppError.captureSessionSetup(reason: "Could not add video data output to the session") } session.commitConfiguration() cameraFeedSession = session } }Copy the code

There are a number of steps in the above code, so let’s go through them one by one:

  • CameraViewIs a custom UIView class that renders the camera’s contents on the screen. We’ll talk more about that category later.
  • We will be insetupAVSession()Set the front camera lens and set it toAVCaptureSessionThe input.
  • Next, we were inAVCaptureVideoDataOutputOn the callsetSampleBufferDelegate.

The ViewController class follows the HandSwiperDelegate convention:

protocol HandSwiperDelegate {
  func thumbsDown()
  func thumbsUp()
}
Copy the code

When the gesture is detected, we will trigger the corresponding method. Now, let’s look at how to perform a visual request in a captured image.

Perform visual gesture requests in captured images

In the following code, we created an extension of the aforementioned ViewController (extension), and the extension follow AVCaptureVideoDataOutputSampleBufferDelegate agreement:

extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate { public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) { var thumbTip: CGPoint? var wrist: CGPoint? let handler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: .up, options: [:]) do { // Perform VNDetectHumanHandPoseRequest try handler.perform([handPoseRequest]) guard let observation = handPoseRequest.results? .first else { cameraView.showPoints([]) return } // Get points for all fingers let thumbPoints = try observation.recognizedPoints(.thumb) let wristPoints = try observation.recognizedPoints(.all) let indexFingerPoints = try observation.recognizedPoints(.indexFinger) let middleFingerPoints = try observation.recognizedPoints(.middleFinger) let ringFingerPoints = try observation.recognizedPoints(.ringFinger) let littleFingerPoints = try observation.recognizedPoints(.littleFinger) // Extract individual points from Point groups. guard let thumbTipPoint = thumbPoints[.thumbTip], let indexTipPoint = indexFingerPoints[.indexTip], let middleTipPoint = middleFingerPoints[.middleTip], let ringTipPoint = ringFingerPoints[.ringTip], let littleTipPoint = littleFingerPoints[.littleTip], let wristPoint = wristPoints[.wrist] else { cameraView.showPoints([]) return } let confidenceThreshold: Float = 0.3 guard thumbTipPoint. Confidence > confidenceThreshold && indexTipPoint. Confidence > confidenceThreshold && middleTipPoint.confidence > confidenceThreshold && ringTipPoint.confidence > confidenceThreshold && littleTipPoint.confidence > confidenceThreshold && wristPoint.confidence > confidenceThreshold else { cameraView.showPoints([]) return } // Convert points from Vision coordinates to AVFoundation coordinates. thumbTip = CGPoint(x: thumbTipPoint.location.x, y: 1 - thumbTipPoint.location.y) wrist = CGPoint(x: wristPoint.location.x, y: 1 - wristPoint.location.y) DispatchQueue.main.async { self.processPoints([thumbTip, wrist]) } } catch { cameraFeedSession? .stopRunning() let error = AppError.visionError(error: error) DispatchQueue.main.async { error.displayInViewController(self) } } } }Copy the code

It is worth noting that the markers returned by VNObservation belong to the Vision coordinate system. We have to convert them to UIKit coordinates before we can draw them on the screen.

Therefore, we convert them to AVFoundation coordinates in the following way:

wrist = CGPoint(x: wristPoint.location.x, y: 1 – wristPoint.location.y)

Next, we will pass these marker points to the processPoints function. To simplify the process, we use only the fingertip of the thumb and the wrist to detect gestures.

Here is the code for the processPoints function:

func processPoints(_ points: [CGPoint?] ) { let previewLayer = cameraView.previewLayer var pointsConverted: [CGPoint] = [] for point in points { pointsConverted.append(previewLayer.layerPointConverted(fromCaptureDevicePoint: point!) ) } let thumbTip = pointsConverted[0] let wrist = pointsConverted[pointsConverted.count - 1] let yDistance = thumbTip.y - wrist.y if(yDistance > 50){ if self.restingHand{ self.restingHand = false self.handDelegate? .thumbsDown() } }else if(yDistance < -50){ if self.restingHand{ self.restingHand = false self.handDelegate? .thumbsUp() } } else{ self.restingHand = true } cameraView.showPoints(pointsConverted) }Copy the code

We can convert AVFoundation coordinates to UIKit coordinates using the following stroke codes:

previewLayer.layerPointConverted(fromCaptureDevicePoint: point!)
Copy the code

Finally, depending on the absolute threshold distance between the two markers, we trigger a swipe to the left or right of the push-stack card.

ShowPoints (pointspoints) on cameraView sublayer we can draw a line connecting two marker points in line with cameraView.

Here is the complete code for the CameraView category:

import UIKit
import AVFoundation
class CameraView: UIView {
    private var overlayThumbLayer = CAShapeLayer()
    var previewLayer: AVCaptureVideoPreviewLayer {
        return layer as! AVCaptureVideoPreviewLayer
    }
    override class var layerClass: AnyClass {
        return AVCaptureVideoPreviewLayer.self
    }
    override init(frame: CGRect) {
        super.init(frame: frame)
        setupOverlay()
    }
    required init?(coder: NSCoder) {
        super.init(coder: coder)
        setupOverlay()
    }
    override func layoutSublayers(of layer: CALayer) {
        super.layoutSublayers(of: layer)
        if layer == previewLayer {
            overlayThumbLayer.frame = layer.bounds
        }
    }
    private func setupOverlay() {
        previewLayer.addSublayer(overlayThumbLayer)
    }
    func showPoints(_ points: [CGPoint]) {
        guard let wrist: CGPoint = points.last else {
            // Clear all CALayers
            clearLayers()
            return
        }
        let thumbColor = UIColor.green
        drawFinger(overlayThumbLayer, Array(points[0...1]), thumbColor, wrist)
    }
    func drawFinger(_ layer: CAShapeLayer, _ points: [CGPoint], _ color: UIColor, _ wrist: CGPoint) {
        let fingerPath = UIBezierPath()
        for point in points {
            fingerPath.move(to: point)
            fingerPath.addArc(withCenter: point, radius: 5, startAngle: 0, endAngle: 2 * .pi, clockwise: true)
        }
        fingerPath.move(to: points[0])
        fingerPath.addLine(to: points[points.count - 1])
        layer.fillColor = color.cgColor
        layer.strokeColor = color.cgColor
        layer.lineWidth = 5.0
        layer.lineCap = .round
        CATransaction.begin()
        CATransaction.setDisableActions(true)
        layer.path = fingerPath.cgPath
        CATransaction.commit()
    }
    func clearLayers() {
        let emptyPath = UIBezierPath()
        CATransaction.begin()
        CATransaction.setDisableActions(true)
        overlayThumbLayer.path = emptyPath.cgPath
        CATransaction.commit()
    }
}
Copy the code

The final results

The App will end up like this:

conclusion

Vision’s new gesture estimation request can be used in a variety of situations, including using gestures to take selfies, draw signatures, and even identify trump’s different gestures during speeches.

You can also string together visual requests with body posture requests to construct more complex postures.

You can refer to the project’s full code in the Github repository.

So much for this article, thank you for reading!

Recommended at the end of the article: iOS popular anthology

  • 1.BAT and other major manufacturers iOS interview questions + answers

  • 2.Must-read books for Advanced Development in iOS (Classics must Read)

  • 3.IOS Development Advanced Interview “resume creation” guide

  • (4)IOS Interview Process to the basics