When photographing a document, the phone will distort the document if it is not facing the document. ARKit can be used to correct the graphics taken to obtain the effect of vertical shooting.

Here are the ideas and procedures for implementation.

## The premise condition

In the process of shooting, the default image to be shot is on a horizontal plane or vertical plane (because ARKit can only detect horizontal plane or vertical plane currently).

## Train of thought

1. The plane to be photographed was obtained through ARKit, and the equation to express the plane was derived
2. According to the internal reference of the camera and the current position of the camera, four points on the upper left, lower left, upper right and lower right of the picture were taken to calculate the lines in the THREE-DIMENSIONAL world taken by these four points
3. Calculate the four intersections of the four lines in Step 2 and the plane in Step 1, and the images taken by the camera are the contents of the four intersections
4. According to the projection transformation, the four points on the upper left, lower left, upper right and lower right of the picture taken and the four intersection points in Step 3 were transformed to obtain the corrected picture

# The calculation process

## Get the equation of the plane to be photographed

It is possible to calculate the expression of a plane equation by looking at three points on a plane (not on a line).

Capabilities ARKit can provide:

1. Identify horizontal and vertical planes in world space
2. X-ray detection: It can test the intersection of points and feature points on the phone image

Ax+By+Cz+D=0Ax +By+Cz+D=0Ax +By+Cz+D=0.

You can select three points in the middle of the phone plane that are adjacent but not on a straight line, and see whether they are on the same plane according to the X-ray detection. If it is on the same plane, the corresponding three points are obtained to calculate the plane formula.

Assuming three intersections for P1 (x1, y1, z1) P_1 (x_1, y_1, z_1) P1 (x1, y1, z1), P2 (x2, y2, z2) P_2 (x_2, y_2, z_2) P2 (x2, y2, z2), P3(x3,y3,z3)P_3(x_3, y_3, z_3) P3(x3,y3,z3), the values of A, B, C, DA, B, C, DA, B, C and D were obtained according to the coordinates of three points, and the formula can be deduced as follows:

$A = (y_3 – y_1)*(z_3 – z_1) – (z_2 -z_1)*(y_3 – y_1) \\ B = (x_3 – x_1)*(z_2 – z_1) – (x_2 – x_1)*(z_3 – z_1) \\ C = (x_2 – x_1)*(y_3 – y_1) – (x_3 – x_1)*(y_2 – y_1) \\ D = -(Ax_1 + By_1 + Cz_1)$

## The mapping between camera picture coordinates and world coordinates

### The 3d points on the plane and camera coordinates are obtained by camera internal parameters

You can use the camera internal parameter to push out the corresponding relationship between the points on the image and the points in the left space of the camera.

Suppose: the coordinates of the points on the plane are P(xphoto,yphoto)P(x_{photo}, y_{photo})P(xphoto,yphoto), and the corresponding points in the camera space are: P(xcamera,ycamera,zcamera)P(x_{camera}, y_{camera}, z_{camera})P(xcamera,ycamera,zcamera) [fx0OX0fyoy001]\begin{bmatrix}fx& 0&ox \0&fy&oy\ 0&0&1\\ end{bmatrix}⎣⎢, repeatable:

$\begin{bmatrix}f_x&0 &o_x\\0&f_y&o_y\\0&0&1\\ \end{bmatrix}\begin{bmatrix}x_{camera}\\y_{camera}\\z_{camera}\\ \end{bmatrix}=\begin{bmatrix}x’_{photo}\\y’_{photo}\\z’_{photo}\\ \end{bmatrix} \to\begin{bmatrix}x’_{photo}\\y’_{photo}\\z’_{photo}\\ \end{bmatrix} / z’_{photo} = \begin{bmatrix}x_{photo}\\y_{photo}\\1\\ \end{bmatrix}$

So P(xphoto,yphoto)P(x_{photo}, y_{photo})P(xphoto,yphoto) and P(xcamera,ycamera,zcamera)P(x_{camera}, y_{camera}, Z_ {camera})P(xcamera,ycamera,zcamera)

$x_{photo} = f_xx_{camera}/z_{camera} + o_x \\ y_{photo} = f_yy_{camera}/z_{camera} + o_y$

### Calculate the correspondence between the world coordinates and the camera coordinates

The relationship between camera coordinates and world coordinates can be obtained by camera’s transform. Namely: (P camera coordinate, 1) = T add posture – 1 (and the world coordinate of P w) (P_ {} camera coordinates, 1) = {T_ {additive posture}} ^ {1} (P_ {} world coordinates, w) (P camera coordinate, 1) = T add posture – 1 (and the world coordinate of P w)

Assume that the coordinate point of the camera is: P (xcamera ycamera, zcamera) P (x_ {camera}, y_ {camera}, z_ {camera}) P (xcamera ycamera, zcamera), The coordinates of the corresponding world are P(xworld,yworld,zworld)P(x_{world}, y_{world}, z_{world})P(xworld,yworld,zworld), Transform is T camera pose T_{camera pose}T camera pose, and the inverse matrix of the camera pose is T−1{T}^{-1}T−1, The [t00 ‘t10’ t20 ‘t30’ t01 ‘ ‘t11 t21’ no.t31.welcome ‘t02’ t12 ‘t22’ t32 ‘t03’ t13 ‘t23’ t33 ‘] \ begin {bmatrix} t ‘_ {00} & t _ {10}’ _ {20} & t & t _ {30} \\t’_{01}&t’_{11} &t’_{21}&t’_{31} \\t’_{02}&t’_{12} &t’_{22}&t’_{32} \\t’_{03}&t’_{13} &t’_{23}&t’_{33}\\ {bmatrix} \ end ⎣ ⎢ ⎢ ⎢ ⎡ t00 ‘t01’ t02 ‘t03’ ‘t10 level t11 t12’ t13 ‘t20’ t21 ‘t22’ t23 ‘t30’ no.t31.welcome ‘t32’ t33 ‘⎦ ⎥ ⎥ ⎥ ⎤

Is:

$\begin{bmatrix}x_{camera}\\y_{camera}\\z_{camera}\\1\\ \end{bmatrix}=\begin{bmatrix} t’_{00}&t’_{10} &t’_{20}&t’_{30} \\t’_{01}&t’_{11} &t’_{21}&t’_{31} \\t’_{02}&t’_{12} &t’_{22}&t’_{32} \\t’_{03}&t’_{13} &t’_{23}&t’_{33}\\ \end{bmatrix} \begin{bmatrix}x_{world}\\y_{world}\\z_{world}\\w\\ \end{bmatrix}$

To sum up, The coordinates P(xphoto,yphoto)P({x_{photo}, y_{photo}})P(xphoto,yphoto) and the world coordinates P(xworld,yworld,zworld)P({x_{world}, Y_ {world}, z_{world}})P(xworld,yworld,zworld)

$\left\{ \begin{array}{c} \begin{bmatrix}x_{camera}\\y_{camera}\\z_{camera}\\1\\ \end{bmatrix}= \begin{bmatrix} t’_{00}&t’_{10} &t’_{20}&t’_{30} \\t’_{01}&t’_{11} &t’_{21}&t’_{31} \\t’_{02}&t’_{12} &t’_{22}&t’_{32} \\t’_{03}&t’_{13} &t’_{23}&t’_{33}\\ \end{bmatrix} \begin{bmatrix}x_{world}\\y_{world}\\z_{world}\\w\\ \end{bmatrix} \\ x_{photo} = f_xx_{camera}/z_{camera} + o_x \\ y_{photo} = f_yy_{camera}/z_{camera} + o_y \end{array} \right.$

It can be deduced that: {(fxt00 fxt02 ‘+’ – xphotot02 ‘) xworld + (fxt10 oxt12 ‘+’ – xphotot12 ‘) yworld + (t20 ‘fx + oxt22’ – xcameratxx) zworld + t30 ‘fx + oxt32’ – xphot Ot32 ‘= 0 (fyt01 oyt02’ + ‘- yphotot02’) xworld + (fyt11 oyt12 ‘+’ – yphotot12 ‘) yworld + (fyt21 oyt22 ‘+’ – yphotot22 ‘) zworld + fyt31 ‘+ oyt32’ – Yphotot32 ‘= 0 \ left \ {\ begin {array} {c} (f_xt’ _ {00} + f_xt _ {02} – x_ {photo} t ‘_ {02}) x_ + {world} (f_xt’_{10}+o_xt’_{12}-x_{photo}t’_{12})y_{world}+(t’_{20}f_x+o_xt’_{22}-x_{camera}txx)z_{world}+t’_{30}f_x+o_xt’_{32}-x _{photo}t’_{32}=0 \\ (f_yt’_{01}+o_yt’_{02}-y_{photo}t’_{02})x_{world}+(f_yt’_{11}+o_yt’_{12}-y_{photo}t’_{12})y_{world}+(f_yt’_{21}+o_yt’_{2 2}-y_{photo}t’_{22})z_{world}+f_yt’_{31}+o_yt’_{32}-y_{photo}t’_{32}=0 \end{array} Fxt02 \ right. {(fxt00 ‘+’ – xphotot02 ‘) xworld + (fxt10 oxt12 ‘+’ – xphotot12 ‘) yworld + (t20 ‘fx + oxt22’ – xcameratxx) zworld + t30 ‘fx + oxt32’ – xphotot32 ‘= 0 (fyt01 oyt02’ + ‘- yphotot02’) xworld + (fyt11 oyt12 ‘+’ – yphotot12 ‘) yworld + (fyt21 oyt22 ‘+’ – yphotot22 ‘) zworld + fyt31 ‘+ o Yt32 ‘- yphotot32’ = 0

## Calculate the focal points of the four points and plane on the ARKit image

With the relationship between the coordinates of the picture taken by the camera and the coordinates of the world, and the plane equation, the intersection of the coordinates of the picture taken by the camera on the specified plane can be calculated.

A point on the picture taken by the camera corresponds to a ray in the world, namely, the focal point of this ray and the plane.

If the plane is Ax+By+Cz+D=0Ax +By+Cz+D=0Ax +By+Cz+D=0, then a ternary system of equations can be obtained: {(fxt00 fxt02 ‘+’ – xphotot02 ‘) xworld + (fxt10 oxt12 ‘+’ – xphotot12 ‘) yworld + (t20 ‘fx + oxt22’ – xcameratxx) zworld + t30 ‘fx + oxt32’ – xphot Ot32 ‘= 0 (fyt01 oyt02’ + ‘- yphotot02’) xworld + (fyt11 oyt12 ‘+’ – yphotot12 ‘) yworld + (fyt21 oyt22 ‘+’ – yphotot22 ‘) zworld + fyt31 ‘+ oyt32’ – Yphotot32 ‘= 0 axworld + Byworld + Czworld + D = 0 \ left \ {\ begin {array} {c} (f_xt’ _ {00} + f_xt _ {02} – x_ {photo} t ‘_ {02}) x_ + {world} (f_xt’_{10}+o_xt’_{12}-x_{photo}t’_{12})y_{world}+(t’_{20}f_x+o_xt’_{22}-x_{camera}txx)z_{world}+t’_{30}f_x+o_xt’_{32}-x _{photo}t’_{32}=0 \\ (f_yt’_{01}+o_yt’_{02}-y_{photo}t’_{02})x_{world}+(f_yt’_{11}+o_yt’_{12}-y_{photo}t’_{12})y_{world}+(f_yt’_{21}+o_yt’_{2 2}-y_{photo}t’_{22})z_{world}+f_yt’_{31}+o_yt’_{32}-y_{photo}t’_{32}=0\\Ax_{world} + By_{world} + Cz_{world} + D = 0 {array} \ \ end right. ⎩ ⎪ ⎨ ⎪ ⎧ (fxt00 fxt02 ‘+’ – xphotot02 ‘) xworld + (fxt10 oxt12 ‘+’ – xphotot12 ‘) yworld + (t20 ‘fx + oxt22’ – xcameratxx) zwor Ld + t30 ‘fx + oxt32’ – xphotot32 ‘= 0 (fyt01 oyt02’ + ‘- yphotot02’) xworld + (fyt11 oyt12 ‘+’ – yphotot12 ‘) yworld + (fyt21 oyt22 ‘+’ – yphotot22 ‘ ) zworld + fyt31 oyt32 ‘+’ – yphotot32 ‘= 0 axworld + Byworld + Czworld + D = 0

After determining a point on the camera’s image, plug in the equations above, P(x_{world}, y_{world},z_{world})P(xworld,yworld,zworld).

$\left\{ \begin{array}{c} A_1x_{world} + B_1y_{world} + C_1z_{world} + D_1 = 0\\A_2x_{world} + B_2y_{world} + C_2z_{world} + D_2 = 0\\Ax_{world} + By_{world} + Cz_{world} + D = 0\end{array}\right.$

According to Klem’s rule, P(xworld,yworld,zworld)P(x_{world}, y_{world},z_{world})P(xworld,yworld,zworld) can be calculated.

Using the same process, we can obtain the world coordinates of the point 4 on the camera picture.

## Projection transformation

After obtaining the world coordinates of the four vertices on the picture taken by the photo, the projection transformation can be applied to correct the picture.