Shiv ENOW large front end

Company official website: CVTE(Guangzhou Shiyuan Stock)

Team: ENOW team of CVTE software Platform Center for Future Education

The author:

preface

This article assumes that the reader has a certain understanding of vectors and matrices. For those of you who don’t know, take a look at this article: “Riding the Waves of WebGL series – The Mathematical Foundations of Affine Transformations.”

In the previous article, WebGL Coordinate System Basics (PART 1), we introduced several common coordinate systems in WebGL and their relationships. This video is going to be a little bit more hardcore, mathematically deriving the transformation matrices that we talked about in the last video.

Basic concept

Column vector

In the derivation below, we uniformly use column vectors to represent a coordinate, and the column vector is oneN*1Matrix. For example, the coordinate(x, y, z)Can be expressed as:

Why do I use this N1 matrix to represent coordinates? Because we’re representing all kinds of transformations through matrices, and if we change the coordinates to N1, we can manipulate them with the transformation matrix, and change our coordinates.

Homogeneous coordinates

The N by 1 matrix above, we call it the n-dimensional coordinate, and if we add one more dimension, it becomes N plus 1, we call it the homogeneous coordinate, and the corresponding N plus 1 dimensional matrix is the homogeneous matrix.

Why introduce the concept of homogeneous coordinates? It is commonly said that points and vectors are distinguished by the newly added dimension of homogeneous coordinates. If the value of the new dimension is 0, it represents vectors, and if the value is not 0, it represents points.

For an explanation of why the new dimension can be used to distinguish points from vectors, see this article: Understanding homogeneous Coordinates

But in my opinion, the neat thing about homogeneous coordinates is the ability to convert translation into matrix multiplication.

We will see below that rotation, scaling, and other transformations are performed by matrix multiplication, and that combinations of transformations (such as scaling and then rotation) are also obtained by matrix multiplication. But translation alone is done by matrix addition. If a coordinate is shifted in x, y, and z directions Tx, Ty, Tz, then it can be expressed by matrix addition:

The right hand side of the equation is the coordinate that we need to get after the translation transformation, because of the constant Tx, Ty, Tz, we obviously can’t get this coordinate through simple matrix multiplication, so how can we get the same coordinate through multiplication? We just have to add one dimension.

And remember the component W that we talked about last time? In fact, it is an extra dimension brought by homogeneous coordinates, so the introduction of homogeneous coordinates is also convenient for us to simulate the perspective effect.

In general, the introduction of homogeneous coordinates has three functions:

  1. Distinguish between vectors and points
  2. So that the translation transformation can be achieved by matrix multiplication
  3. Convenient to simulate perspective effect

The basic transform

In homogeneous coordinates, affine transformation can be used in the following form:

So each of these components fits the definition of an affine transformation.

Translational transform

According to the definition of translation transformation, translation of a point (x, y, z) means adding a constant to each of its three components:

Insert the matrix template mentioned above, i.eA =1, b=0, c=0Obviously, the translation matrix can be written as:

Scaling transformation

Scaling transformation means that each component of a point is S times the original, taking the scaling of X-axis component Sx times as an example:

Insert the matrix template above, i.eA =Sx, b=0, c=0, Tx=0Obviously, the scaling matrix can be written as:

Rotation transformation

The derivation of the above two matrices is very intuitive, but the derivation of rotation matrix is a little more complicated. First of all, let’s define rotation as the rotation of point P by theta degrees counterclockwise about the origin.

Let’s use polar coordinates for the sake of our derivation. In polar coordinates,PThe coordinates of this point are zero(r, alpha).ProtatingTheta.The coordinate after degree is zero(r, α + θ). Then convert the polar coordinates back to the cartesian coordinates:

Substitute the original transformation relation between polar coordinates and cartesian coordinates of point P into:

The derivation above is two-dimensional, but we can easily equate the rotation above with phixyzIn the coordinate system, around the axiszRotation, because it’s about the axeszRotate, so rotate the back and forth pointsPzThe components stay the same,The x, y,The change of the component is consistent with the result of the above derivationzThe result of axis rotation is:

Substitute into our matrix template, the rotation matrix around the Z-axis can be obtained:

Similarly, the rotation matrix about the X-axis and Y-axis can be obtained, which is not listed here. Readers can write it by themselves, and the answer can be referred to:

LearnOpenGL

It is important to note that the more time we need to be around any axis rotation, although around any axis rotation can be realized through the combination of the above three axis rotation, but there will be a universal joint of deadlock problem, a better way is one pace reachs the designated position, solving a around any axis rotation matrix, but this matrix is complicated and can’t completely avoid the deadlock problem of universal joint.

This matrix I first posted, because it is not the focus of this period so omit the derivation process, interested children can have a look at this article: three-dimensional space about any axis rotation matrix derivation

Where (Rx,Ry, Rz) represents the vector of the rotation axis.

The complete solution to this problem requires the use of quaternions, and interested readers can find other sources (see quaternions and THREE-DIMENSIONAL rotation), which will not be expanded here.

Model transformation matrix

A quick review of the model transformation: the transformation used to convert the model coordinate system to the world coordinate system, that is, to place our car model somewhere in the world coordinate system.

And to realize the transformation of the model, obviously a good way is to use the matrix, a powerful mathematical tool. The coordinate of a vertex of the model is expressed by the column vector mentioned above. Then the coordinate of the vertex after transformation can be obtained by multiplying the left of our model transformation matrix by the column vector.

Model transformation is the combination of the three basic transformations mentioned above. We also know that matrix multiplication does not comply with the commutative law, so the combination order of basic transformations is very important. Specifically, model transformation has the following formula:Among themTIs the translation matrix,RIs the rotation transformation matrix,SIt’s the scaling transformation matrix. There are two ways to understand this sequence.

Qualitative understanding

When we derive the rotation matrix, we actually agree that the vertex rotates around the origin of the coordinates, and if the origin of the model coincides with the origin of the world coordinates, then the vertex rotates around the origin of the model and the origin of the coordinates are the same. However, if we shift first so that the two origins do not coincide, then when we apply the rotation matrix, the vertices still rotate around the origin of the coordinates, but we generally expect the vertices to rotate around the origin of the model, which is not what we expect. So you have to do the rotation, and then the translation. Similarly, scale transformation implies the condition of being centered on the origin of coordinates, so it is also necessary to scale first and then translate.

As for the order of scaling and rotation, when we define the scaling matrix, it is the scaling proportion of each component defined for the current coordinates. If the coordinates have been changed after rotation at this time, it will be problematic to use the previous scaling proportion. Rotation does not have a similar problem, because it defines that any point needs to be rotated by an Angle about the origin of the coordinates. This definition applies to all coordinates, and there is no problem that the coordinates have changed so that the original definition does not apply.

Mathematical understanding

I’m going to use translation and rotation, rotation and scaling as examples.

Let’s say you shift and then you rotate

Erratum: The matrix multiplication here should be the dot product

Notice that the second half of the coordinate, from the point of view of the result, is no longer shifted in the original direction, the shifted direction is also rotated.

Let’s say I rotate and then I scale

From the transformed coordinates, we can see that Sx operates not only on the original x component, but also on the original y component, and the same thing happens to Sy. We also find that if Sx, Sy and Sz are the same, the order of rotation and scaling is no longer important and the results are the same.

View transformation matrix

This matrix is the matrix that transforms points from the world coordinate system to the observation (camera) coordinate system. As we learned in the last installment, we will place a camera (view point) in the world coordinate system, and the camera will have orientation. This is very much like what we did in model transformations when we put objects in world coordinates. We put the camera in a certain place by translation, and then we turn the camera in a certain direction by rotation. setP1Is the coordinates of the vertices in the observed coordinate system,P0Is the coordinates in the world coordinate system, then we have:

Meanwhile, according to the properties of the two matrices:

  1. The matrix times the inverse is equal to the identity matrix
  2. The identity matrix times the matrix is equal to the original matrix

We can easily derive the following process:

Therefore, the view transformation matrix can be combined by simply finding the translation matrix of the camera and the inverse matrix of the rotation matrix, and the coordinate P0 of the vertex in the world coordinate system can be converted into the coordinate P1 of the camera coordinate system.

Given that the coordinates of the camera are (ex, EY, ez), it is easy to obtain the translation matrix based on the previous knowledge. And the inverse of the translation matrix is easy to see, so I’m not going to do that.

Let’s derive the following rotation matrix.

Firstly, the three basis vectors Ux, Uy and Uz of the camera coordinate system are defined. The components of x, y and z of Ux in the world coordinate system are Uxx, Uxy and Uxz respectively. Uy is the same as Uz. Thus, the transformation matrix R of a vertex from coordinates P1 in camera coordinates to coordinates P0 in world coordinates can be expressed as components Ux, Uy, Uz:

We can remember how we got this matrix, the rotation matrix, but this time instead of rotating the vertices, we’re going to rotate the frame, and then we’re going to transform the vertices from being in the post-rotation frame to being in the pre-rotation frame. For simplicity, let’s start with a 2-d rotation.

What we need is the inverse of this transformation matrix, and since the three axes of the camera coordinate system after rotation are perpendicular to each other, the matrix is an orthogonal matrix, and the inverse of the orthogonal matrix is equal to the transpose of the matrix. Then the inverse matrix R-1 is:

I’m going to plug in the matrix

And then the last question is, how do we figure out the components of these basis vectors in world coordinates.

Firstly, we will define the position and direction of the camera through two points (both of which are points in the world coordinate system) and the direction vector on the top of the camera:

  1. The position of the camerae (ex, ey, ez)
  2. The point the camera is looking atT (Tx, Ty, Tz)
  3. The direction vector above the camerau (ux, uy, uz)

It fits our real world, doesn’t it?

Then we will define the direction of observation as Uz, so by vector addition and subtraction Uz is obviously equal to:

The second equation represents normalization. So you have Uz and you take the cross product with the top direction vector u and you normalize and you get Ux. Similarly, Uy is obtained by cross product and normalization of Uz and Ux. So now I have my entire view transformation matrix.

Projection transformation matrix

As mentioned last time, there are orthogonal projections and perspective projections, which will be derived separately.

Orthogonal projection

In orthogonal projection, we use a cuboid to define the visual range:

We need to project a point P in our model onto the near plane as the coordinate P prime. P’ (x’, y’, z’) coordinates have the following characteristics:

  • X prime is between minus 1 and 1
  • Y prime is between minus 1 and 1
  • Z prime is between minus 1 and 1

This means that the transformation takes place in one step to the normalized device coordinate system (NDC).

Based on this feature, we take x as an example to deduce the relationship between X and x’ :

Where L is the left plane position and R is the right plane position. In a similar way, we can find the relationship between y prime and y, and z prime and z. The orthogonal transformation matrix can be obtained by substituting into the matrix:

Where t, b, n, and f correspond to the positions of the upper plane, lower plane, near plane, and far plane respectively.

Perspective projection

Perspective projection is a little bit more complicated. Since the perspective projection’s visual range is a cone, we will need to use the knowledge of similar triangles. Recall:

Assuming the above coordinate relation, then the x coordinate x1 of the intersection point p’ on n can be obtained as follows:

Once again, x prime, y prime, z prime are all between minus 1 and 1. So let’s first derive the values of x and y. Taking x as an example, the normalized value of x1 on the near plane is:

We saw this formula up here when we derived the orthogonal transformation. But unlike orthogonality, x is not equal to x1, but we can solve it by using similar triangle principle:

Where n is the position near the plane. The minus sign here is because the projection coordinate system is opposite to the camera coordinate system’s Z axis. Same thing for y prime. So we have:

And then we want to solve for the relationship between z and z prime. It is known that when z is the near plane, z in the clipping space is -1, and when z is the far plane, it is 1. So are:

So:

As you can see from the relationship above, x’ and y’ both bring with them a constant that is related to z, remember the w component that fell last time, that constant was the W component, so the bigger z is, the further away from the near plane, the smaller it is.

Together, our matrix can be written as:

And you can also see from the matrix that, after the transformation, the homogeneous coordinate that was 1 becomes minus z with respect to z.

conclusion

In this issue, we have derived a variety of matrices that are often used, and I have also learned and sold them now. There will be some mistakes and omissions, please point out in the comments section.

The resources

  1. Introduction and Practice of WebGL
  2. LearnOpenGL
  3. Interactive Computer Graphics — A Top-down Approach Based on WEBGL (7th edition)
  4. DirectX: why is a 4*4 matrix used to represent 3d coordinates?
  5. Mathematical derivation of OpenGL matrix transformation
  6. Webgl developed the first hurdle: matrix and coordinate transformation
  7. Computer graphics view matrix derivation process