preface

Augmented Reality is a technology that visually presents virtual objects combined with real-world scenarios. Apple officially launched ARKit in June 2017, which allows iOS developers to develop AR applications using simple and convenient apis.

This paper will introduce an AR application based on LOCATION service (LBS) in combination with meituan to food business scenario. Using AR to show the location of businesses relative to users will bring users an immersive experience. Here is the implementation:

Project implementation

AR applications on iOS platforms are usually composed of ARKit and rendering engine:

ARKit is the bridge between the real world and the virtual world, while the rendering engine renders the content of the virtual world onto the screen. This section will focus on these two aspects.

ARKit

ARKit’s ARSession is responsible for managing information for each frame. ARSession does two things: take images and capture sensor data; After analyzing and processing the data, output frame by frame. The diagram below:

Equipment tracking

Device tracking ensures that the position of virtual objects is not affected by device movement. A subclass of ARSessionConfiguration is passed in to start ARSession to distinguish between the three tracing modes:

  • ARFaceTrackingConfiguration
  • ARWorldTrackingConfiguration
  • AROrientationTrackingConfiguration

The ARFaceTrackingConfiguration can recognize faces location, direction, and obtain the topological structure. In addition, it can also detect 52 pre-programmed rich facial movements, such as blinking, smiling, frowning and so on. ARFaceTrackingConfiguration needs to call support TrueDepth front-facing camera to track, obviously can not meet our requirements, here is not to do too much introduction. The following is just a comparison of the two other types that use the rear camera.

ARWorldTrackingConfiguration

ARWorldTrackingConfiguration provide 6 dof (Six Degree of Freedom) equipment tracking. Yaw (Yaw Angle), Pitch (Pitch Angle) and Roll (Roll Angle), and the offset along the X, Y and Z axes in the Cartesian coordinate system:

In addition, ARKit uses VIO (Visual-Inertial Odometry) to improve the accuracy of device motion tracking. The inertial measurement unit (IMU) is used to detect the trajectory of the motion, and the images taken by the camera during the motion are processed. After comparing the change trajectory of some feature points in the image with the sensor result, the final high-precision result is output.

From the perspective of the dimension and accuracy of tracking, ARWorldTrackingConfiguration very tough. But it also has two fatal drawbacks, according to official documentation:

  • It is affected by the quality of the ambient light
  • Affected by strenuous exercise

In the process of tracking, feature points are extracted by collecting images, so the quality of images will affect the results of tracking. In low-light environments (such as night or bright light), the images do not provide a proper reference, and the tracking quality deteriorates.

During the tracking process, the image and sensor results will be compared frame by frame. If the device moves violently in a short period of time, the tracking results will be greatly disturbed. If the tracking results deviate from the actual movement, the location of the merchant is not accurate.

AROrientationTrackingConfiguration

AROrientationTrackingConfiguration only provides for three attitude Angle tracking (3 dof), and do not open the VIO.

Because 3DOF tracking creates limited AR experiences, you should generally not use the AROrientationTrackingConfiguration class directly. Instead, use the subclass ARWorldTrackingConfiguration for tracking with six degrees of freedom (6DOF), plane detection, and hit testing. Use 3DOF tracking only as a fallback in situations where 6DOF tracking is temporarily unavailable.

Generally speaking, because of limited AROrientationTrackingConfiguration tracking ability, the official document is not recommended to use directly. But given that:

  1. The tracking of the three attitude angles is sufficient to correctly represent the position of the business relative to the user.
  2. ARWorldTrackingConfiguration high-precision tracking and more suitable for close follow-up. For example, the device relative to the table, the ground displacement. However, the distance between merchants and users is often several hundred meters, so too accurate displacement tracking is of little significance.
  3. ARWorldTrackingConfiguration standardizing the operation of the user, to ensure the ambient light is good. This is not user friendly.

Finally we decided to use AROrientationTrackingConfiguration. This way, even at night, the location of the business can be correctly displayed, even if the camera is covered. In addition, the impact of severe shaking is very small. Although there will be a temporary Angle deviation of the merchant position, it will be calibrated after the sensor value stabilizes.

coordinate

ARKit measures the real world using cartesian coordinates. The device position when ARSession is opened is the origin of the coordinate axes. The worldAlignment attribute of ARSessionConfiguration determines the orientation of the three axes and has three enumerated values:

  • ARWorldAlignmentCamera
  • ARWorldAlignmentGravity
  • ARWorldAlignmentGravityAndHeading

The corresponding coordinate axes of the three enumerated values are shown in the figure below:


For ARWorldAlignmentCamera, the posture of the device determines the orientation of the three axes. This kind of coordinate setting is suitable for the coordinate calculation with the equipment as the reference frame, and has nothing to do with the real geographical environment, for example, AR technology is used to measure the size of real world objects.

For ARWorldAlignmentGravity, the Y direction is always parallel to the gravity direction, while the X and Z directions are still determined by the posture of the device. This is useful for calculating the coordinates of objects with gravity properties, such as placing a row of hydrogen balloons or performing an animation of a falling basketball.

For ARWorldAlignmentGravityAndHeading, X, Y and Z axes toward the east, south, positive. In this mode, internal adjustments are made to ensure that the -Z direction of the ARKit coordinate system matches the true north direction of our real world, based on the Angle between the yaw Angle of the device and the magnetic true north (non-magnetic north) direction. With this prerequisite, the real world coordinates can be correctly mapped to the virtual world. Obviously, ARWorldAlignmentGravityAndHeading is we need.

Merchants coordinates

The determination of merchant coordinates includes horizontal coordinates and vertical coordinates:

Horizontal coordinates

The horizontal location of a merchant is just a set of latitude and longitude values, so how does it correspond to ARKit? Let’s use the following figure to illustrate:

With the help of the CLLocation distanceFromLocation: location method, can calculate the distance between the two latitude and longitude coordinates, the return value unit is m. We can take the user’s longitude Lng1 and the merchant’s latitude lat2 as an auxiliary point (lng1, lat2), and then calculate the distance x between the auxiliary point and the merchant and the distance Z between the auxiliary point and the user respectively. The ARKit coordinate system is also in meters, so the horizontal coordinates of the merchant (x, -z) can be determined directly.

The vertical coordinate

Chinese word segmentation of the business address can extract the number of floors where the business is located, and then multiply it by the approximate height of a floor to determine the vertical coordinate Y value of the business:

Card apply colours to a drawing

Usually, the information that we want to display is done through UIView and its subclasses. But ARKit only Bridges the gap between the real world and the virtual world, leaving rendering to the rendering engine. Apple gives us three engines to choose from:

  • Metal
  • SpriteKit
  • SceneKit

The powerful Metal engine includes tools such as MetalKit, Metal shaders, and standard libraries to make more efficient use of the GPU for highly customized rendering requirements. But Metal is a bit overkill for current needs.

SpriteKit is a 2D rendering engine that provides interfaces for animation, event handling, physics collisions, and more, and is often used to make 2D games. SceneKit is a 3D rendering engine that is built on OpenGL and supports multi-channel rendering. In addition to handling physical collisions and animations of 3D objects, it can also render realistic textures and particle effects. SceneKit can be used to make 3D games or add 3D content to apps.

Although we can use SpriteKit to place 2D cards into 3D AR world, we use 3D rendering engine SceneKit to add new functions to AR pages in consideration of expansibility and convenience.

We can use SceneKit directly by creating ARSCNView. ARSCNView is a subclass of SCNView and it does three things:

  • The image information of each frame captured by the camera of the device is used as the background of the 3D scene
  • The camera position of the device is used as the camera (observation point) position of the 3D scene
  • The real world axes tracked by ARKit are overlapped with the 3D scene axes

Card information

SceneKit uses SCNNode to manage 3D objects. Setting the Geometry attribute of a SCNNode can change the appearance of the object. The system has provided us with some common shapes such as SCNBox, SCNPlane, SCNSphere and so on, among which SCNPlane is the card shape we need. There are methods in UIGraphics that render a drawn UIView into a UIImage object. Create a SCNPlane from this image to act as the appearance of an SCNNode.

The card size

Objects in ARKit are near large and far small. As long as the width and height of SCNPlane are fixed, ARKit will automatically set the size of SCNPlane according to the distance. Here is a rough calculation formula of the specific pixel number and distance on the screen, which is the author’s experience in the development process:

That is, if the width of SCNPlane is 30 and 100 meters away from the user, then the width of the SCNPlane seen on the screen is approximately \(530/100 \times 30 = 159\) pt.

Card position

For merchant cards that are too close to the user, two problems occur:

  • Since ARKit automatically displays cards close to each other, large and far from each other, cards around you are so large that they block your view
  • ARSession use AROrientationTrackingConfiguration tracking mode mentioned above, because there is no horizontal displacement of tracking equipment, when a user to the businessman, will not find more and more close business card

Here we map cards that are too close to the user to slightly further away. As shown in the following figure, cards whose distance from the user is less than D are mapped to the range from D-k to d.

Assume that the real distance between a merchant and the user is X and the mapped distance is Y. The mapping is as follows:

This not only solves the problem of being too close to each other, but also keeps the distance between cards. When the user location shifts to a certain threshold, a new network request is triggered to recalculate the location of merchants according to the new user location. The position of the card is constantly updated as the user moves.

Card toward

SceneKit automatically adjusts the card’s various behaviors, such as collision, position, speed, orientation, etc., according to SCNNode constraints before rendering each frame. The SCNLookAtConstraint and SCNBillboardConstraint subclasses of SCNConstraint constrain the card orientation.

SCNLookAtConstraint keeps the card always pointing towards a point in space. In this way, adjacent cards will cross over and the user will likely see incomplete card information. Using the SCNBillboardConstraint solves this problem so that the card orientation is always parallel to the camera orientation.

Here is sample code to create a card:

/ / position

SCNVector nodePosition = SCNVectorMake(- 200..5.- 80.);



/ / appearance

SCNPlane *plane = [SCNPlane planeWithWidth:image.size.width

                                    height:image.size.height];

plane.firstMaterial.diffuse.contents = image;



/ / constraints

SCNBillboardConstraint *constraint = [SCNBillboardConstraint billboardConstraint];

constraint.freeAxes = SCNBillboardAxisY;



SCNNode *node = [SCNNode nodeWithGeometry:plane];

node.position = nodePosition;

node.constraints = @[constraint];

Copy the code

To optimize the

Keep out the problem

If there are many merchants in the same direction, the cards will overlap and users will only see the cards that are close to them. This is a tricky problem, because laying cards flat on the screen compromises the perception of the height of the merchant and the proximity of the merchant to the user.

Click to spread out the interaction

After a long discussion, we finally decided to adopt the interactive mode of clicking on the overlapping area and dispersing the cards around to solve the overlapping problem. The results are as follows:

The following introduces the realization principle of this effect around the two parts of click and projection.

Click on the

Those of you familiar with Cocoa Touch know that the UIView hierarchy is hit-testing to determine which view responds to events, and ARKit is no exception.

ARSCNView can use two types of hit-testing:

  • From ARSCNViewhitTest:types:Method: Find the corresponding click positionThe real worldThe object or position in
  • From the SCNSceneRenderer protocolhitTest:options:Method: Find the corresponding click positionThe virtual worldThe content of.

Obviously, hitTest:options: is what we need. Hit-testing in the 3D world is like a laser beam that is fired in the direction of the click, and the return value of hitTest:options: is an array of all the cards that were pierced by the laser. This allows you to detect which cards overlap where the user clicks.

The projection

Here is a brief introduction to the implementation principle of fan out. The SCNSceneRenderer protocol has two methods for projecting coordinates:

  • projectPoint:: projects the coordinates of a point in the 3d coordinate system to the screen coordinate system
  • unprojectPoint:: projects the coordinates of a point in the screen coordinate system to a three-dimensional coordinate system

The point in the screen coordinate system is also SCNVector3, whose z-coordinate represents the depth, from 0.0 (near slice) to 1.0 (far slice). The overall process of unraveling is as follows:


After spread out, click the blank will restore the spread out state, back to the initial position. Cards that do not participate in the spread are diluted to highlight the point and reduce visual stress.

The background clustering

For more densely packed businesses, the overlap of cards will be very serious. The number of cards that click apart is not very user friendly. When the background returns the business data near the user, it uses k-means clustering algorithm to perform two-dimensional clustering according to the latitude and longitude coordinates of the business, and aggregates the businesses close to the user into a card. Since the locations of these businesses are generally the same, you can use a card with a number to represent the locations of several businesses:

Flicker problem

In actual measurement, it is found that cards with a close distance will flicker in the overlapping region:

Here we introduce a common problem faced by 3D rendering engines — visibility. In simple terms, what objects on the screen should be shown and what objects should be blocked. The GPU should eventually render all the pixels on the screen that should be displayed.

A typical solution to the visibility problem is the painter algorithm, which, like a simple-minded painter, draws the farthest object first and then layers it down to the nearest object. It is conceivable that the efficiency of the painter’s algorithm is very low, and it will consume resources to draw a delicate scene.

The depth buffer

Depth buffering compensates for the shortcomings of painter’s algorithm, which uses a two-dimensional array to store the depth of each pixel on the current screen. As shown below, a pixel has rendered a pixel with a depth of 0.5 and stored the depth of that pixel:

In the next frame, when a pixel of another object is also rendered at the same pixel, the GPU will compare the depth of the pixel with the depth in the buffer. The depth of the pixel is retained and stored in the buffer, and the depth of the pixel is not rendered. As shown below, the pixel to be rendered in the next frame has a pixel depth of 0.2, less than the buffer store of 0.5, its depth is stored, and the pixel is rendered on the screen:

Obviously, depth buffering technique can greatly improve rendering efficiency compared to painter algorithm. But it can also create problems of deep conflict.

The depth of the conflict

When dealing with pixels with the same depth, the depth buffer technology will appear z-fighting. Only one of these pixels of the same depth “wins” the competition and appears on the screen. As shown below:

If these two pixels alternate and “win,” we see a flickering effect. Because each card is set to the SCNBillboardConstraint, it always faces the camera. The slightest change in camera Angle can cause some overlap between the cards. Unlike thick objects, the depth relationship between cards changes quickly, and it’s easy to have multiple cards rendered in the same spot on the screen. So there is often a flicker:

In order to solve this buggy experience, it was decided to sacrifice the rendering efficiency of depth buffering. SceneKit exposes the depth of the write/read buffer interface for us, we can disable it:

    plane.firstMaterial.writesToDepthBuffer = NO;

    plane.firstMaterial.readsFromDepthBuffer = NO;

Copy the code

Since the card contents are relatively simple, disabling buffers has little effect on frame rates.

conclusion

In the scene of catering business, AR+LBS shows the business information, which can bring users an immersive experience. This article introduces some details of the use of ARKit, summarizes the problems encountered in the development process and solutions, hoping to bring some reference value to other developers.

Author’s brief introduction

Cao Yu is an iOS development engineer at Meituan. In 2017, I joined meituan in-store catering business group and participated in the development of Meituan client food channel.

Recruitment information

The Catering technology Department of Jiudian is responsible for the related business of food channel on meituan and Dianping platforms, serving hundreds of millions of users, providing better decision support for users through better lists, real evaluations and perfect information, and committed to improving user experience. At the same time, we carry the flow of all catering merchants, provide catering merchants with a variety of marketing tools, improve the marketing efficiency of catering merchants, and finally achieve the beautiful vision of “Eat Better, Live Better” for Chinese people! Our team is looking for senior/senior engineer and technical expert with rich experience in FE field. Please send your resume to wangying49#meituan.com.