## instructions

ARKit series of articles directory

At WWDC 2017, Apple demonstrated a Demo of ARKit, called AR Interaction, which not only demonstrated the effects of ARKit, but also demonstrated the design principles and Interaction logic of AR applications. So Apple called Handling 3D Interaction and UI Controls in Augmented Reality

That is: handle 3D interaction and UI control in augmented reality.

Let’s take a look at this project in three steps:

• The basic structure
• The logic of the main classes
• A couple of interesting ways

## The basic structure

As shown in the figure below, it is divided into the following parts: controller, classification of controller, class dealing with virtual object interaction, custom gesture, Custom ARView, virtual object and its loader, focus box, top state sub-controller, bottom list sub-controller.

ViewController+ARSCNViewDelegate:AR scene update, node addition, error message ViewController+Actions: UI Actions, button click, touch, etc. ViewController+ObjectSelection: load and move virtual objects

## A couple of interesting ways

#### VirtualObjectARView class

This class HitTestRay structure, intersectionWithHorizontalPlane (atY planeY: Float method is used to find the distance from the origin of the ray to the intersection (with the plane). The concept of dot product in linear algebra is used here: the multiple of the vector from the origin of the ray to the intersection and the normalized direction vector is actually the distance. But since the coordinates of the intersection are uncertain (only the y value is determined, which must be in the plane), both of them are dotted with the plane normal vector, which cleverly cancellations the x and z values to get the distance.

In fact, we can also use the knowledge of junior high school and similar triangles to understand. Red is the normalized direction vector:

This method has been changed since ARKit later introduced the ability to identify vertical planes, and the corresponding logic has also been changed in the official demo. Please see the annotated version of the updated code for details.

In addition, worldPosition(fromScreenPosition position: CGPoint, objectPosition: Float3? , infinitePlane: Bool = false) method to find the location of the anchor or feature point cloud hit by the ray emitted from the center of the screen after clicking. There are 5 steps:

1. Use first`hitTest(position, types: .existingPlaneUsingExtent)`Get the hit plane and return it if any;
2. with`HitTestWithFeatures (position, coneOpeningAngleInDegrees: 18, minDistance: 0.2, maxDistance: 2.0). The first`Obtain the feature point clouds found in the range of ray cone, and do not return temporarily;
3. If the search in the infinite plane is allowed, or if the cone of the previous step is not found, the intersection on the infinite plane is returned;
4. If the search on the infinity plane is not allowed and the feature point is found in step 2, the point in step 2 is returned;
5. Finally, if none is available, find the nearest feature point in the vicinity of the ray and make a suitable one;
``````    /** Hit tests from the provided screen position to return the most accuarte result possible. Returns the new world position, an anchor if one was hit, and if the hit test is considered to be on a plane. Launches a hit test from the specified screen location, returning the most accurate result. Return to the new world coordinates and hit the plane anchor. */
func worldPosition(fromScreenPosition position: CGPoint, objectPosition: float3? , infinitePlane: Bool =false) -> (position: float3, planeAnchor: ARPlaneAnchor? , isOnPlane:Bool)? {
/* 1. Always do a hit test against exisiting plane anchors first. (If any such anchors exist & only within their extents.) 1. Priority is given to hit tests for existing flat anchors (if any anchors exist & within their range) */
let planeHitTestResults = hitTest(position, types: .existingPlaneUsingExtent)

if let result = planeHitTestResults.first {
let planeHitTestPosition = result.worldTransform.translation
let planeAnchor = result.anchor

// Return immediately - this is the best possible outcome.
// Return directly - this is the best output.
return (planeHitTestPosition, planeAnchor as? ARPlaneAnchor.true)}/* 2. Collect more information about the environment by hit testing against the feature point cloud, but do not return the result yet. 2. Collect more environment information based on the feature point cloud encountered in the hit test, but do not return the result for now. */
let featureHitTestResult = hitTestWithFeatures(position, coneOpeningAngleInDegrees: 18, minDistance: 0.2, maxDistance: 2.0).first
letfeaturePosition = featureHitTestResult? .position/* 3. If desired or necessary (no good feature hit test result): Hit test against an infinite, horizontal plane (ignoring the real world). 3. If needed (not finding a good enough feature hit test result): The hit test encounters an infinite horizontal plane (ignoring the real world). */
if infinitePlane || featurePosition == nil {
if let objectPosition = objectPosition,
let pointOnInfinitePlane = hitTestWithInfiniteHorizontalPlane(position, objectPosition) {
return (pointOnInfinitePlane, nil.true)}}/* 4. If available, return the result of the hit test against high quality features if the hit tests against infinite planes were skipped or  no infinite plane was hit. 4. If available, returns the high quality feature points encountered by the hit test when the infinite plane is ignored or not encountered. */
if let featurePosition = featurePosition {
return (featurePosition, nil.false)}/* 5. As a last resort, perform a second, unfiltered hit test against features. If there are no features in the scene, the result returned here will be nil. 5. As a last resort, execute a backup scheme that returns the feature points encountered by unfiltered hit tests. If there are no feature points in the scene, the return result will be nil. */
let unfilteredFeatureHitTestResults = hitTestWithFeatures(position)
if let result = unfilteredFeatureHitTestResults.first {
return (result.position, nil.false)}return nil
}

Copy the code``````

Apple gives an almost perfect solution here <ARKit1.5 logic has changed, see the updated code >:

1. When the plane is identified, the plane position is given;
2. If an infinite plane is allowed, the feature points on the infinite plane are returned;
3. If no, the position of feature point near the ray is given;
4. If not, it still returns the feature points on the infinity plane;
5. Finally, when there is no time to use the nearest feature point to create a position;
6. There is no feature point. Return null.

In this way, the feature point cloud data is fully utilized. Even if THE AR recognition is unstable and the plane has not been identified, the feature points can be used to continue to play AR. Of course, it is inevitable to sacrifice some accuracy again.

In addition, when the plane is recognized, the objects on the nearby feature points will be slowly moved to the newly discovered plane. This way, the experience is more complete, and you don’t let temporary accuracy problems affect the AR experience all the time.

The move is implemented by calling the following method in the ViewController+ARSCNViewDelegate:

``````updateQueue.async {
}
}
Copy the code``````

#### FocusSquare class

UpdateTransform (for Position: Float3, Camera: ARCamera?) This method is specifically designed to deal with this problem:

1. Average the latest 10 positions to avoid jitter;
2. According to its position to the camera distance, control the size;
3. Correction of Y-axis rotation;

The Y-axis is corrected so that when the person holds the phone and turns left or right, the focus box not only stays in the middle of the phone screen, but also rotates synchronously so that it is always parallel to the bottom of the phone screen.

First, let tilt = ABS (Camera.Eulerangles. X) to get the phone’s tilt status (horizontal or vertical), and then divided into three situations:

• 0..
• threshold1.. < Threshold2: intermediate state, calculate linear interpolation coefficient`relativeInRange`And then use`normalize()`Calculate the shortest rotation Angle (after all, a 270 ° turn to the right is the same as a 90 ° turn to the left) and use linear interpolation to get the mixed rotation Angle.
• Default (> Threshold2): Almost level, using the phone`Yaw drift`Value (left and right turn Angle), namely azimuth Angle;
``````        // Correct y rotation of camera square.
// Correct the Y-axis rotation of the camera
guard let camera = camera else { return }
let tilt = abs(camera.eulerAngles.x)
let threshold1: Float = .pi / 2 * 0.65
let threshold2: Float = .pi / 2 * 0.75
let yaw = atan2f(camera.transform.columns.0.x, camera.transform.columns.1.x)
var angle: Float = 0

switch tilt {
case 0..<threshold1:
angle = camera.eulerAngles.y

casethreshold1.. <threshold2:let relativeInRange = abs((tilt - threshold1) / (threshold2 - threshold1))
let normalizedY = normalize(camera.eulerAngles.y, forMinimalRotationTo: yaw)
angle = normalizedY * (1 - relativeInRange) + yaw * relativeInRange

default:
angle = yaw
}
eulerAngles.y = angle
Copy the code``````

The azimuth function atan2f(y,x) is used to calculate the yaw value. As long as the corresponding y value and x value are passed in, the Angle between (x,y) and the origin of the coordinate can be obtained

``````let yaw = atan2f(camera.transform.columns.0.x, camera.transform.columns.1.x)
Copy the code``````

#### Matrix fundamental correlation

This uses the matrix related knowledge: mathematics textbook and Microsoft D3D use the left hand rule (row main order), while OpenGL and Apple SceneKit use the right hand rule (column main order). The arrangement is as follows:

(Xx,Xy,Xz) is the X-axis position of the local coordinate system,(Yx,Yy,Yz) is the Y-axis position of the local coordinate system, and (Zx,Zy,Zz) is the z-axis position of the local coordinate system. The last 1 in the lower right corner corresponds to the global scale, which is generally not adjusted.

So when calculating atan2f(), (Yx,Xx) is actually used to calculate the azimuth:

• In the vertical state: when facing forward, it is (0,1); Both left and right are (0,0); So in the vertical state you can’t really tell right from left, so in this case I’m using the Euler Angle on the Y axis;
• In the horizontal state: in the forward state, it is (0,1); To the left is (-1,0) and to the right is (1,0);

## The end of the

Apple’s Demo is a best practice of AR application development, which not only fully displays the full potential of ARKit (as of early 2018) from the technical level, but also ensures a good user experience and interactive logic.

If you need to develop your own AR application, it is recommended to imitate the interactive logic of this Demo to enhance the user experience of your own app.