instructions

On the current iPhone, the camera component can be used in a variety of ways to retrieve depth information, either while taking a photo or in ARKit. However, there are different limitations and differences in their use:

The depth of the source Depend on the hardware Using the environment
Front depth sensor TrueDepth lens Photo recording, AR
Post monocular + ML Rear main + ARKit Human body in AR
Posterior binocular parallax The rear double perturbation Photo video
The rear LiDAR The rear LiDAR Photo recording, AR

What are the detailed differences? Let’s go through them one by one.

AVDepthData in normal cameras

Ordinary cameras, that is, when ARKit is not turned on and the camera is directly started to take pictures or record videos, can directly obtain depth information. The depth can be obtained by using pre-trueDepth or post-binocular parallax. If LiDAR is available, the depth can be better obtained without any impact.

It should be noted that binocular cameras can be: wide Angle (i.e. main shot) + ultra wide Angle, wide Angle (i.e. main shot) + telephoto. According to Apple’s instructions, trisome does not support depth. In addition, in binocular case, if you adjust the camera parameters: exposure, ISO, aperture, etc., you may not be able to obtain depth.

Tips: How to determine whether the device supports binocular depth function without writing code and installing app?

  • Open your own camera – portrait mode, aiming at any object can show the background blurred, and can adjust the depth of field, is support;
  • If the front camera is switched on after portrait is selected, it is not supported.
  • If the selection of portrait, must be aligned to the talent can show the background blurred, for any object can not be blurred, it is not supported, here is the portrait of the graph.

Note: According to feedback, rear binocular depth is not supported on the iPad Pro, even on models with LiDAR.

The code for opening binocular lens is as follows:

let trueDepthCameraDevice = AVCaptureDevice.default(.builtInTrueDepthCamera, for: .video, position: .front)

let dualCameraDevice = AVCaptureDevice.default(.builtInDualCamera, for: .video, position: .back)
let dualWideCameraDevice = AVCaptureDevice.default(.builtInDualWideCamera, for: .video, position: .back)
// Check if depth is available and enable it
var photoOutput = AVCapturePhotoOutput()
photoOutput.isDepthDataDeliveryEnabled =  photoOutput.isDepthDataDeliverySupported
Copy the code

The depth information obtained is in AVDepthData. It should be noted that the resolution and frame rate of the depth map are not consistent with that of RGB images. For details, please refer to WWDC17:Capturing Depth in iPhone Photography and Capturing Photos with Depth.

/** @property depthData @abstract An AVDepthData object wrapping a disparity/depth map associated with this photo. @discussion If you requested depth data delivery by calling -[AVCapturePhotoSettings setDepthDataDeliveryEnabled:YES], this property offers access to the resulting AVDepthData object. Nil is returned if you did not request depth data delivery. Note that the depth data is only embedded in the photo's internal file format container if you set -[AVCapturePhotoSettings setEmbedsDepthDataInPhoto:YES]. */
open var depthData: AVDepthData? { get }

Copy the code

In WWDC17:Capturing Depth in iPhone Photography, Apple explained in detail the principle of binocular disparity acquisition Depth in pinhole camera, and explained why the disparity acquisition Depth value is inconsistent with that in 3D scene. Binocular parallax depth cannot be used in 3D and AR. Therefore, the depth information obtained by binocular parallax is only the relative depth. Currently, it is only used in photographing and recording, and cannot be directly used in AR.

Depth in ARKit

When we turn on AR, the screen displays an image of the phone’s front-facing lens or rear-facing wide-angle lens (i.e., main camera). At the same time, a variety of depth information can be obtained from ARFrame, such as the face depth map captured by the front camera, the scene depth map captured by the rear LiDAR, and the human body depth map obtained by human body segmentation, etc.

The corresponding attributes in ARFrame are as follows:

/** Depth map captured by the front-facing TrueDepth camera. Need to open ARFaceTrackingConfiguration configuration items, and the frame rate is different from RGB camera * /
@available(iOS 11.0.*)
open var capturedDepthData: AVDepthData? { get }

/** Depth map after human body recognition segmentation image. You need to set up personSegmentationWithDepth, low resolution @ see - [ARConfiguration setFrameSemantics:] @ see - [ARFrame segmentationBuffer] * /
@available(iOS 13.0.*)
open var estimatedDepthData: CVPixelBuffer? { get }

/** Depth map of the scene. Need a LiDAR @ see ARFrameSemanticSceneDepth. @ see - [ARConfiguration setFrameSemantics:] * /
@available(iOS 14.0.*)
open var sceneDepth: ARDepthData? { get }

/** Smoothed scene depth map. Need a LiDAR @ see ARFrameSemanticSmoothedSceneDepth. @ see - [ARConfiguration setFrameSemantics:] * /
@available(iOS 14.0.*)
open var smoothedSceneDepth: ARDepthData? { get }
Copy the code

CapturedDepthData is a depth map captured by the front-mounted TrueDepth camera, while sceneDepth is a depth map obtained by LiDAR, which is easier to understand.

The most confusing thing is estimatedDepthData, which is the depth map of human body recognition. Why don’t you need LiDAR and binocular lenses to get depth? Why is the depth obtained without LiDAR very inaccurate?

Actually, Apple isWWDC19:Bringing People into ARAs made clear, this segmentation and depth map of the human body was derived from the camera image using A12 processor and machine learning: Now, the amazing thing about this feature is that the way we’re generating these buffers is by leveraging the power of the A12 chip and using machine learning in order to generate these buffers, usingonly the camera image

That said, the depth map of the human body was created using camera imagery and machine learning, which means it could be done with a single-camera SE II phone or iPad Pro 2018, and is more accurate on devices with LiDAR, but Apple didn’t say whether binocular parallax was used. In my personal estimation, monocular VIO data may be used, but binocular parallax is not used, because the depth obtained by parallax is not accurate in 3D, and an extra camera needs to be turned on. While hardware itself consumes electricity, the amount of computation increases significantly, so does software, and the overall gain outweights the loss. Thus, as of 2021, Apple doesn’t use binocular parallax technology to get depth at all in AR.

Here is zhong wenze’s test comparison from Station B. It can be seen that in AR, iPad Pro 2018 with a single camera can also get the depth of the human body, but it is less accurate than the 2020 model with LiDAR:

Finally, although binocular parallax is not directly used in AR, it does not mean that binocular parallax technology is irrelevant to AR. In WWDC21 this year, Apple’s new RealityKit Object Capture feature can use binocular disparity depth maps taken to help rebuild 3D models. With depth information, the reconstructed 3D model will be closer in size to the real thing and of higher quality.

reference

WWDC17:Capturing Depth in iPhone Photography Capturing photos with depth Capturing Photographs for RealityKit Object Capture