An overview of the

When two cameras shoot an object from different angles at the same time, we can get the length and width of the object, and the distance from the camera, etc. This paper expounds the principle of this.

Principle of camera imaging

The following figure is the imaging principle of the camera. The photographed point (x,y)(x, y)(x,y) in the image can be any position on the line connected by the point and the camera in reality.

In order to determine the position of the photographed point in reality, two cameras need to shoot simultaneously from different angles, as shown in the figure below. (C1,P1), (C2,P2)(C_1, P_1), (C_2, P_2)(C1,P1) and (C2,P2) two straight lines intersect at PPP point, namely the position of the photographed point in reality.

Coordinate system

Camera coordinate system & world coordinate system

The two coordinate systems drawn in the figure above are called camera coordinate systems, starting from the actual positions of the camera C1, C2C_1, C_2C1 and C2.

The coordinate system PPP is in is called the world coordinate system, and the origin position can be selected at any time. For example, the left camera coordinate system can be used as the world coordinate system, with C1C_1C1 as the origin.

Projection coordinate system & image coordinate system

The coordinate system on the plane where P1, P2P_1, P_2P1 and P2 are drawn in the figure above is called the projection coordinate system and is based on physical size. The coordinate system on the resulting image is called the image coordinate system and is in pixels.

An image is actually an array, and the elements in the array are pixels. Three integer values are used to describe a pixel in a color map, and one integer value is used to describe a pixel in a gray map.

In the figure, O0O_0O0 is the image coordinate system, and the origin is the pixel position in the upper left corner of the image. O1O_1O1 is the projection coordinate system, the origin is like the main point (u0,v0)(u_0, v_0)(u0,v0) position.

The main image point is the vertical point from the origin of the camera coordinate system to the imaging plane, that is, the intersection point between ZZZ axis and the imaging plane in Figure 2.

Coordinate transformation

The world coordinates of the photographed point are obtained through the two image coordinates of the photographed point, which is stereo vision.

Image projection

Suppose the physical size of a single pixel corresponding to the photographic plane of the camera is (dx,dy)(d_x, d_y)(dx,dy), and the coordinates of the main point in the image coordinate system are (u0,v0)(u_0, v_0)(u0,v0). Have the projection coordinates (x, y) (x, y) (x, y) and image coordinates (u, v) (u, v) (u, v) the conversion relationship between the following:


u u 0 = x d x u – u_0 = \frac{x}{d_x}

v v 0 = y d y v – v_0 = \frac{y}{d_y}

The mathematical operations in the algorithm generally use matrix operations, so the above formula is equivalent to:


[ u v 1 ] = [ 1 d x 0 u 0 0 1 d y v 0 0 0 1 ] [ x y 1 ] \begin{bmatrix} u \\ v \\ 1 \\ \end{bmatrix} = \begin{bmatrix} \frac{1}{d_x} & 0 & u_0 \\ 0 & \frac{1}{d_y} & v_0 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \\ \end{bmatrix}

Due to the deviation of the manufacturing process, dx≠dyd_x ≠ d_ydx=dy, the pixel is not a square, but a parallelogram. It is assumed that the pixel is a rectangle first, and the problem of tilt factor will be mentioned later.

Projection camera

As shown in the figure below, O1O1O1 is the projection coordinate system and OOO is the camera coordinate system. Suppose that the coordinates of PPP points in the projection coordinate system are (x,y)(x, y)(x, Y), then the coordinates of PPP points in the camera coordinate system are (x,y,f)(x, y,f), FFF represents the focal length, the distance from OOO point to O1O1O1 point.

Based on the similar triangle principle, the conversion relation between projection coordinates (x,y)(x, y)(x,y) and camera coordinates (Xc,Yc,Zc)(X_c, Y_c, Z_c)(Xc,Yc,Zc) can be obtained as follows:


x X c = y Y c = f Z c \frac{x}{X_c} = \frac{y}{Y_c} = \frac{f}{Z_c}

Again, the above formula is equivalent to:


[ x y 1 ] = [ f Z c 0 0 0 f Z c 0 0 0 1 Z c ] [ X c Y c Z c ] \begin{bmatrix} x \\ y \\ 1 \\ \end{bmatrix} = \begin{bmatrix} \frac{f}{Z_c} & 0 & 0 \\ 0 & \frac{f}{Z_c} & 0 \\ 0 & 0 & \frac{1}{Z_c} \\ \end{bmatrix} \begin{bmatrix} X_c \\ Y_c \\ Z_c \\ \end{bmatrix}

Here x=Zx, y=Zy, f=Zcx = Z_x, y= Z_y, f= Z_cx=Zx, y=Zy, f=Zc, but FFF doesn’t cancel out in the formula, as I’ll show you later.

Camera World

The above assumes that the world coordinate system is consistent with the left camera coordinate system, but the angular point of the calibration plate should be taken as the origin of the world coordinate system during camera calibration. Conversion of 3 – dimensional coordinates can be achieved by rotation and translation.

Where R3×3R_{3\times3}R3×3 is the rotation matrix, T3×1T_{3\times1}T3×1 is the translation matrix, Camera coordinates (Xc, Yc, Zc) (X_c Y_c, Z_c) (Xc, Yc, Zc) and the world coordinate system (Xw, Yw, Zw) (X_w Y_w, Z_w) (Xw, Yw, Zw) the conversion relationship between the following:


[ X c Y c Z c ] = R 3 x 3 [ X w Y w Z w ] + T 3 x 1 = [ R 3 x 3 T 3 x 1 ] [ X w Y w Z w ] \begin{bmatrix} X_c \\ Y_c \\ Z_c \\ \end{bmatrix} = R_{3\times3} \begin{bmatrix} X_w \\ Y_w \\ Z_w \\ \end{bmatrix} + T_{3\times1} = \begin{bmatrix} R_{3\times3} & T_{3\times1} \\ \end{bmatrix} \begin{bmatrix} X_w \\ Y_w \\ Z_w \\ \end{bmatrix}

The camera parameters

(dx,dy), (u0,v0), F, R, T(d_x, d_y), (u_0, v_0), F, R, T(dx,dy), (u0,v0), f, R, T are all camera parameters, which can be obtained through camera calibration.

internal

By image projection, projection of the two coordinates of the transformation of the camera, can be obtained image of the transformation of the camera coordinates as follows:


[ u v 1 ] = 1 Z c [ f d x 0 u 0 0 f d y v 0 0 0 1 ] [ X c Y c Z c ] \begin{bmatrix} u \\ v \\ 1 \\ \end{bmatrix} = \frac{1}{Z_c} \begin{bmatrix} \frac{f}{d_x} & 0 & u_0 \\ 0 & \frac{f}{d_y} & v_0 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} X_c \\ Y_c \\ Z_c \\ \end{bmatrix}

Where, FDX, fdy\frac{f}{d_x}, \frac{f}{d_y} DXF, dyf means to convert the focal length FFF, which is based on the physical size, into the focal length value, which is based on the pixel. F = FDX, f =fdyf_x = \frac{f}{d_x}, f_y = \frac{f}{d_y}fx= DXF, fy=dyf.

The problem of f=Zcf = Z_cf=Zc that FFF is not cancelled is made up. Because the calibration camera obtains FX, FYF_x, F_YFX and FY, it is more convenient to calculate without cancellation.


K = [ f x s u 0 0 f y v 0 0 0 1 ] K = \begin{bmatrix} f_x & s & u_0 \\ 0 & f_y & v_0 \\ 0 & 0 & 1 \\ \end{bmatrix}

The matrix KKK is called the internal parameter of the camera, and the internal parameter is fixed.

To supplement the problem that the pixel is a parallelogram proposed above, SSS here represents the tilt factor of the longitudinal boundary of the pixel compared with the YYY axis, which is used for auxiliary calculation.

Outside the cords

Camera ⇌ R in the world coordinate transformation relations, TR, TR, T matrix is the external parameters of camera, said the world coordinate system to the camera coordinate system rotation and translation matrix, in 3 d reconstruction, we usually don’t care about the world coordinate system to the two camera coordinate transformation relation, but with two camera coordinate system transformation.

Camera calibration can obtain R1, T1R_1, T_1R1, T1 of the left camera and R2, T2R_2, T_2R2, T2 of the right camera. With these, the rotation and translation matrices Rc, TcR_c, T_cRc, Tc of the right camera coordinate system to the left camera coordinate system can be calculated. Or vice versa).

Stereo vision

The world coordinates of the photographed point are obtained through the two image coordinates of the photographed point, which is stereo vision.

Combined with the above three conversion relations, set λ=Zcλ = Z_cλ=Zc we can get the image of the transformation of the world coordinates as follows:


Lambda. [ u v 1 ] = K 3 x 3 [ R 3 x 3 T 3 x 1 ] [ X w Y w Z w 1 ] λ\begin{bmatrix} u \\ v \\ 1 \\ end{bmatrix} = K_{3\times3} \begin{bmatrix} R_{3\times3} & T_{3\times1} \\ \end{bmatrix} \begin{bmatrix} X_w \\ Y_w \\ Z_w \\ 1 \\ \end{bmatrix}

Let P=[R3×3T3×1]P = \begin{bmatrix}R_{3\times3} & T_{3\times1}\end{bmatrix}P=[R3×3T3×1].


Lambda. [ u v 1 ] = P 3 x 4 [ X w Y w Z w 1 ] Lambda \ begin {bmatrix} u \ \ 1 \ \ \ \ \ v end = P_ {bmatrix} {3 \ times4} \ begin {bmatrix} X_w \ \ Y_w \ \ Z_w \ \ 1 \ \ \ end {bmatrix}

The equation above is for a single camera, as we mentioned at the beginning, and the solution set of the equation is a straight line, two cameras correspond to two equations, and the solution set is a point.

Ending

The world coordinates of the photographed points can be obtained, and the distance between the photographed points and the camera can be obtained, as well as the distance between two photographed points. The reason for writing this article is that I need to finish my graduation project. I have never had any knowledge of machine vision before. Please point out any unprofessional or wrong places.

This article does not cover camera calibration and how the code is implemented, which will be covered in a later article.