IOS Computer Vision - ARKit - Moment For Technology

ARKit introduction

AR, short for Augmented Reality, is a technology that visually presents virtual objects combined with real scenes. Apple officially launched ARKit in June 2017, which allows iOS developers to develop AR applications using simple and convenient apis. To get the full functionality of ARKit, A9 and above chips are required. That’s most devices running iOS 11, including the iPhone 6S.

During the research process, I made a tape measure Demo, now I will introduce the technical points used in the project.

The project practice

AR applications on iOS platforms are usually composed of ARKit and rendering engine:

The following mainly talk about ARKit function points and rendering part SceneKit two aspects.

ARKit

ARKit’s ARSession is responsible for managing information for each frame. ARSession does two things: take images and capture sensor data; After analyzing and processing the data, output frame by frame. The diagram below:

Equipment tracking

Device tracking ensures that the position of virtual objects is not affected by device movement. A subclass of ARSessionConfiguration is passed in to start ARSession to distinguish between the three tracing modes:

ARFaceTrackingConfiguration
ARWorldTrackingConfiguration
AROrientationTrackingConfiguration

The ARFaceTrackingConfiguration can recognize faces location, direction, and obtain the topological structure. In addition, it can also detect 52 pre-programmed rich facial movements, such as blinking, smiling, frowning and so on. ARFaceTrackingConfiguration needs to call support TrueDepth front-facing camera to track. ARWorldTrackingConfiguration this project is mainly used for tracking and get feature points.

Track the steps

/ / create a ARSessionConfiguration. / / don't need to care about ARWorldTrackingSessionConfiguration temporarily.let configuration = ARWorldTrackingSessionConfiguration()
// Create a session.
let session = ARSession()
// Run.
session.run(configuration)
Copy the code

From the above code, the process of running an ARSession is very simple, but how does the ARSession layer do world tracking?

First, the ARSession layer uses AVCaputreSession to get the video (a frame-by-frame sequence of images) taken by the camera.
Second, ARSession uses CMMotionManager underneath to get motion information about the device (such as rotation Angle, movement distance, etc.)
Finally, ARSession analyzes the obtained image sequence and the motion information of the device, and finally outputs ARFrame, which contains all the information required for rendering the virtual world.

Trace information point

The ar-world coordinate system is as follows. When we run ARSession, the location of the device is the origin of the ar-world coordinate system.

In this AR-World coordinate system, ARKit tracks the following information:

Track the position and rotation of the device, both of which are relative to the start of the device.
Tracking physical distances (in “meters”), for example ARKit detects a plane and we want to know how big the plane is.
Track points that we manually add that we want to track, such as a virtual object that we manually add.

Tracking how it works

Apple’s documentation explains how the world is tracked: ARKit uses visual inertial ranging technology to perform computer vision analysis of sequences of images captured by the camera, combined with information from the device’s motion sensors. ARKit will identify the feature points in each image frame and compare them with the information provided by the motion sensor based on the position changes of the feature points between successive image frames to obtain the high-precision device position and deflection information.

In the figure above, the moving point marked by a curve represents the device. It can be seen that a coordinate system is also moving and rotating around the device, which means that the device is constantly moving and rotating. This information is acquired through the device’s motion sensors.
The yellow dots on the right of the GIF are 3D feature points. 3D feature points are points that can represent the features of objects obtained by processing captured images. For example, the texture of the floor, the edges and corners of objects can become feature points. In the image above, we see that ARKit is constantly tracking the feature points captured in the image as the device moves.
ARKit combines these two pieces of information together to obtain highly accurate device position and deflection information.

ARWorldTrackingConfiguration

ARWorldTrackingConfiguration provide 6 dof (Six Degree of Freedom) equipment tracking. Yaw (Yaw Angle), Pitch (Pitch Angle) and Roll (Roll Angle), and the offset along the X, Y and Z axes in the Cartesian coordinate system:

In addition, ARKit uses VIO (Visual-Inertial Odometry) to improve the accuracy of device motion tracking. The inertial measurement unit (IMU) is used to detect the trajectory of the motion, and the images taken by the camera during the motion are processed. After comparing the change trajectory of some feature points in the image with the sensor result, the final high-precision result is output. From the perspective of the dimension and accuracy of tracking, ARWorldTrackingConfiguration very tough. But it also has two fatal drawbacks, according to official documentation:

It is affected by the quality of the ambient light
Affected by strenuous exercise

In the process of tracking, feature points are extracted by collecting images, so the quality of images will affect the results of tracking. In low-light environments (such as night or bright light), the images do not provide a proper reference, and the tracking quality deteriorates.

During the tracking process, the image and sensor results will be compared frame by frame. If the device moves violently in a short period of time, the tracking results will be greatly disturbed.

Tracking state

There are three states of world tracking. We can obtain the current trackingState through camera. TrackingState.

From the figure above, we can see that there are three tracking states:

Not Available: World tracking is being initialized and is Not yet working.
Normal: Indicates the Normal working state.
Limited: Restricted state, which may change to Limited when trace quality is affected.

Associated with TrackingState an information is ARCamera TrackingState. “Reason, this is an enumeration type:

Case excessiveMotion: The device moves too fast and cannot be tracked properly.
Case Initializing: Initializing.
Case insufficientFeatures: There are too few features and the sufficientFeatures cannot be traced.
Case None: Works normally.

We can use the ARSessionObserver protocol to track state changes, which is relatively easy to see directly in the interface documentation.

ARFrame

ARFrame contains all the information captured by the world tracking process, and the main ARFrame world tracking information is: anchors and camera:

Camera: Contains the position, rotation, and shooting parameters of the camera.

var camera: [ARCamera]
Copy the code

Ahchors: Points or surfaces that are tracked.

var anchors: [ARAnchor]
Copy the code

ARAnchor

ARAnchor is the position and Angle relative to the real world in space.
ARAnchor can be added to or removed from a scene. Basically, they are used to represent the anchoring of virtual content in a physical environment. So if you want to add a custom Anchor, just add it to the session. It persists throughout the session lifecycle. But if you are running something like flat detection, ARAnchor will be automatically added to the session.
To respond to the added Anchors, you can get the full list of all the Anchors that the Session is tracking from current ARFrame.
Or you can respond to delegate methods, such as add, Update, and remove, which notify the anchor in the session when it is added, updated, or removed.

ARCamera

Each ARFrame will contain an ARCamera. The ARCamera object represents the virtual camera. The virtual camera represents the Angle and position of the device.

ARCamera provides a transform. Transform is a 4×4 matrix. Provides the transformation of the physical device relative to its initial position.
ARCamera provides tracking state that tells you how to use Transform, which we’ll cover later.
ARCamera provides camera intrinsics. Includes focal length and primary focus for finding projection matrices. The Projection matrix is a convenience method on ARCamera that can be used to render virtual geometry.

Scenario analysis

The main function of scene analysis is to analyze the scenes in the real world, and analyze the information such as the plane of the real world, so that we can put some virtual objects in some real objects. Scene analysis provided by ARKit mainly includes plane detection, scene interaction and illumination estimation, which are analyzed one by one as follows.

Plane Detection

ARKit’s plane detection is used to detect horizontal planes in the real world.

As can be seen from the figure above, ARkit has detected two planes. The two three-dimensional coordinate systems in the figure are the local coordinate systems of the detected plane. In addition, the detected plane has a size range.

Plane detection is a dynamic process. When the camera moves continuously, the detected plane changes constantly. As can be seen in the following figure, the coordinate origin and range of the detected plane are constantly changing when the camera is moved.

In addition, with the dynamic detection of planes, different planes may be merged into a new plane. In the image below, you can see that the detected planes are merged into one plane as the camera moves.

Enable Plane Detection

To enable planeDetection, set the planeDetection attribute of ARSessionConfiguration to true before running ARSession.

// Create a world tracking session configuration.
let configuration = ARWorldTrackingSessionConfiguration()
configuration.planeDetection = .horizontal
// Create a session.
let session = ARSession()
// Run.
session.run(configuration)
Copy the code

The representation of the plane

When ARKit detects a plane, ARKit will automatically add an ARPlaneAnchor for the plane, and this ARPlaneAnchor represents a plane.

When ARKit system detects a new plane, ARKit automatically adds an ARPlaneAnchor to ARSession. We can use ARSessionDelegate to get notification of changes to the ARAnchor of the current ARSession in the following three cases:

ARAnchor has been added

func session(_ session: ARSession, didAdd anchors: [ARAnchor])
Copy the code

For plane detection, when a new plane is detected, we will receive the notification, and the ARAnchor array in the notification will contain the newly added plane, whose type is ARPlaneAnchor, which can be used as follows:

func session(_ session: ARSession, didAdd anchors: [ARAnchor]) {
    for anchor in anchors {
        if let anchor = anchor as? ARPlaneAnchor {
            print(anchor.center)
            print(anchor.extent)
        }
    }
}
Copy the code

ARAnchor update

func session(_ session: ARSession, didUpdate anchors: [ARAnchor])
Copy the code

We know from above that as the device moves, the detected plane is constantly updated, and when the plane is updated, the interface is called back.

Delete ARAnchor

func session(_ session: ARSession, didRemove anchors: [ARAnchor])
Copy the code

This method is called back when an Anchor is manually removed. In addition, for detected planes, if two planes merge, one of them is deleted, at which point this method is also called back.

Scenario interaction (Hit-testing)

Hit-testing is to obtain information about a click position in the currently captured image (including plane, feature point, ARAnchor, etc.).

Schematic diagram is as follows

When the screen is clicked, ARKit will emit a ray. Assuming that the screen plane is the XY plane in the THREE-DIMENSIONAL coordinate system, the ray will be shot into the screen along the Z axis. This is a hit-testing process. All useful information encountered by the ray is returned, sorted in order of distance from the screen, with those closest to the screen placed first.

ARFrame provides an interface for hit-testing:

func hitTest(_ point: CGPoint, types: ARHitTestResult.ResultType) -> [ARHitTestResult]
Copy the code

There is a types parameter in the above interface, which represents the type of information that the hit-testing process needs to fetch. There are four types of ResultType:

featurePoint

Indicates that the hit-testing procedure wants to return the 3D feature points passed by the Hit-testing ray in the current image. The diagram below:

estimatedHorizontalPlane

Indicates that the hit-testing procedure wants to return the estimated plane through which the Hit-testing ray passes in the current image. The estimated plane means that ARKit currently detects an information that may be a plane, but it has not been confirmed as a plane, so ARKit has not added ARPlaneAnchor to the estimated plane. The diagram below:

existingPlaneUsingExtent

Indicates that the hit-testing procedure wants to return the plane with the size range that the Hit-testing ray passes through in the current image.

In the figure above, a hit-testing ray will return if it passes through a green plane with a range of sizes, and will not return if it falls outside the green plane.

existingPlane

Indicates that the hit-testing procedure wants to return the infinite plane through which the Hit-testing ray in the current image passes.

In the figure above, the plane size is the size shown by the green plane, but the exsitingPlane option means that the plane will be returned even if the Hit-testing ray falls outside the green plane. In other words, if you extend all planes indefinitely, the hit-testing ray will return to any plane that has been extended indefinitely.

Sample code is as follows

// Adding an ARAnchor based on hit-test
letPoint = CGPoint(x: 0.5, y: 0.5) // Image Center // Perform hit-test on frame.let results = frame. hitTest(point, types: [.featurePoint, .estimatedHorizontalPlane])

// Use the first result.
if let closestResult = results.first {
    // Create an anchor for it.
    anchor = ARAnchor(transform: closestResult.worldTransform)
    // Add it to the session.
    session.add(anchor: anchor)
}
Copy the code

In the code above, the point of hit-testing (0.5, 0.5) represents the center of the screen, and the upper left corner of the screen is (0, 0) and the lower right corner is (1, 1). For the results of featurePoint and estimatedHorizontalPlane, ARKit does not add ARAnchor to them. We can use hit-testing to obtain information and add ARAnchor to ARSession. The code above shows this process.

Light Estimation

Above, a virtual object teacup is placed on a table in the real world.

When the ambient light is good, the camera captures better light intensity, and the teacup we put on the table looks more realistic, as shown in the image on the far left. But when the ambient light is low, so is the image captured by the camera, as in the middle image above, the teacup’s brightness seems out of place in the real world.

In this case, ARKit provides illumination estimation, which, when turned on, gives us the illumination intensity of the current image, allowing us to render virtual objects with more natural illumination intensity, such as the image on the far right.

Light estimation gives an estimated value of light intensity (lumen) based on information such as exposure of the currently captured image. The default light intensity is 1000lumen. When the real world is bright, we get a value above 1000lumen, and conversely, when the real world is dark, we get a value below 1000lumen.

ARKit lighting estimation is enabled by default and can also be manually configured in the following ways:

configuration.isLightEstimationEnabled = true
Copy the code

It’s also easy to get the estimated light intensity, just get the current ARFrame and get the estimated light intensity with the following code:

letintensity = frame.lightEstimate? .ambientIntensityCopy the code

SceneKit

Rendering is the final process of rendering AR World. This process combines the virtual world created, the real world captured, the information tracked by ARKit, and the information parsed by ARKit scenes to render an AR World. The rendering process needs to do the following to render the AR World correctly:

Use video of the real world captured by the camera as the background.
Updates the camera status information tracked by the world to the camera in AR World in real time.
Processing light intensity estimates for light.
Renders the position of virtual world objects on the screen in real time.

If we handle this process ourselves, we can see that it is still quite complicated. ARKit simplifies the rendering process for developers by providing developers with easy-to-use views rendered by SceneKit(3D engine) and SpriteKit(2D engine) ARSCNView and ARSKView. Of course, developers can also use other engines to render, just need to process the above information together.

SceneKit coordinate system

We know that UIKit uses a CGPoint containing x and Y information to represent the position of a point, but in 3D system, a Z parameter is needed to describe the depth of an object in space. The coordinate system of SceneKit can be referred to the following figure:

In this three dimensional coordinate system, the position of a point is represented in terms of the (x,y,z) coordinates. The red square is on the x axis, the green square is on the Y axis, the blue square is on the Z axis, and the gray square is on the origin. In SceneKit we can create a 3d coordinate like this:

let position = SCNVector3(x: 0, y: 5, z: 10)
Copy the code

Scenes and nodes in SceneKit

We can think of the SCNScene in SceneKit as a virtual 3D space, and then we can add scnNodes to the scene. SCNScene has a single root node (coordinates (x:0, Y :0, z:0)), and all nodes added to the SCNScene in addition to the root node need a parent node.

The root node is in the center of the coordinate system, and there are two additional nodes, NodeA and NodeB, where NodeA’s parent node is the root node and NodeB’s parent node is NodeA:

When a node is added to the SCNScene, it can specify a three-dimensional coordinate (default: (x:0, y:0, z:0)) relative to its parent node. Two concepts are illustrated here:

Local coordinate system: A three-dimensional coordinate system based on a node (not the root node) in the scene
World coordinate system: The three-dimensional coordinate system created with the root node as the origin is called world coordinate system.

In the figure above, we can see that NodeA’s coordinates are relative to the world coordinate system (since NodeA’s parent is the root node), and NodeB’s coordinates represent NodeB’s local coordinate system position on NodeA (NodeB’s parent is NodeA).

Cameras in SceneKit

With SCNScene and SCNNode, we also need a SCNCamera to determine which areas of the scene we can see (just like in the real world, there are various objects, but we still need human eyes to see objects). The working mode of the camera in SCNScene is shown below:

The figure above contains the following information:

In SceneKit, the direction of SCNCamera shooting is always the negative direction of z-axis.
A Field of View is the Angle at which the camera can View the Field. The smaller the Angle, the narrower the field of vision, and vice versa, the larger the Angle, the wider the field of vision.
The Viewing cone (Viewing Frustum) determines the depth of the Viewing area of the camera (z axis indicates the depth). Any objects that are not in this area will be cropped out (too close or too far away from the camera) and won’t show up in the final image.

In SceneKit we can create a camera as follows:

let scene = SCNScene()
let cameraNode = SCNNode()
let camera = SCNCamera()
cameraNode.camera = camera
cameraNode.position = SCNVector3(x: 0, y: 0, z: 0)
scene.rootNode.addChildNode(cameraNode)
Copy the code

SCNView

Finally, we need a View to render the contents of the SCNScene onto the display screen, which is done by the SCNView. This step is pretty simple, just create an instance of SCNView, set the Scene property of the SCNView to the scene you just created, and add the SCNView to UIKit’s View or window. Example code is as follows:

let scnView = SCNView()
scnView.scene = scene
vc.view.addSubview(scnView)
scnView.frame = vc.view.bounds
Copy the code

Pay attention to my

Welcome to follow the public account: Jackyshan, technical dry goods first wechat, the first time push.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

IOS Computer Vision – ARKit

ARKit introduction

The project practice

ARKit

Equipment tracking

Track the steps

Trace information point

Tracking how it works

ARWorldTrackingConfiguration

Tracking state

ARFrame

ARAnchor

ARCamera

Scenario analysis

Plane Detection

Scenario interaction (Hit-testing)

Light Estimation

SceneKit

SceneKit coordinate system

Scenes and nodes in SceneKit

Cameras in SceneKit

SCNView

Pay attention to my

IOS Computer Vision – ARKit

ARKit introduction

The project practice

ARKit

Equipment tracking

Track the steps

Trace information point

Tracking how it works

ARWorldTrackingConfiguration

Tracking state

ARFrame

ARAnchor

ARCamera

Scenario analysis

Plane Detection

Scenario interaction (Hit-testing)

Light Estimation

SceneKit

SceneKit coordinate system

Scenes and nodes in SceneKit

Cameras in SceneKit

SCNView

Pay attention to my

Related Posts

Purpose of type aliases in Swift

IOS Reverse development – Simple instructions for theOS installation and configuration

英 文 原 文 : the iOS Flutter

英文原文 : the iOS Flutter