WebGL is what it is: The meaning of 3d coordinate transformations

Remember just in a daze over a native WebGL projects, for I don’t have any graphics based, just gourd used to gather vertex shaders need data, for the console output decimal or between 0 and 1 for why shader to the transformation matrix natural, spring has realized the function is also very happy.

As I gradually learned, I found that I lacked the most basic knowledge of graphic rendering at that time.

For front-end students, graphics rendering is a relatively minority and slightly higher threshold direction in the rapid iteration of web framework, in addition to the necessary programming ability, but also the application of advanced mathematics and computer graphics knowledge. However, with the enhancement of the rendering capability of browser engines, the requirements for interactive effects are becoming higher and higher, and with the assistance of libraries such as Three.js, 3D rendering has gradually become an important skill of front-end research and development.

So, since we want to do graphic rendering, let’s know what it is and why

The text 📚

3d rendering, as the name implies, is to draw a three-dimensional object observable from multiple angles onto a fixed two-dimensional rectangular screen, so how does 3d map to 2D? There must have been some coordinate transformation going on. This is the most basic cognition of graphic rendering that I said before — coordinate transformation.

First post a WebGL/OpenGL coordinate transformation flow chart I drew to feel free:

Isn’t that too complicated?

Imagine a common scene in life — taking photos.

You spot a good spot to take a picture of and stand there and pose. (Model transform, Model Transform)
The photographer finds a good Angle and points his camera at you. (View transform, View Transform)
The photographer adjusted the focus and removed distractions. (Projection Transform, projection Transform)

This is what’s called the MVP transformation, the heart of the whole coordinate system transformation.

So once we have a little bit of an idea, let’s take a look at what these transformations are all about.

1. Object Coordinate system -> World coordinate System (model transformation)

Suppose we want to render a cat and a woof on the screen. For the cat and woof models, they each have their own coordinate system, which is the object coordinate system.

The world coordinate system can be understood as the virtual scene space shared by all models. The two models, meow and Wang, should be imported into the big scene and their respective positions should be specified. This requires translating and rotating the object to place it in the right position.

The operation on the object here is Model Transform. The coordinates of the object in the world coordinate system can be obtained by multiplying the coordinates of the object and the Model matrix.

Remark:

In specific programming, we can find that the vertex coordinates received in the shader are added to the original (x,y,z) based on one-dimensional 1, namely homogeneous coordinates, the code is as follows:
attribute vec3 a_position;
void main() {
    gl_Position = vec4(a_position, 1);
}
Copy the code
The reason why homogeneous coordinate is used is that translation is different from linear transformation of scaling and rotation. In order to carry out matrix calculation of translation in a unified way, one dimension is added to do translation, which is collectively called affine transformation.

2. World Coordinate system -> Observation space (View transformation)

Just like shooting, for 3d scenes that can be observed from multiple angles, we can’t see all the visual angles at once on the screen, but the specific scene presented after the camera simulates human eye clipping. In this case, coordinates need to be transformed to the observation space with the Camera as the origin. This Transformation process is called View/Camera Transformation.

It is easy for us to understand a phenomenon, if the relative position of the object and the camera remains unchanged, moving the camera and the object at the same time will produce the same results.

Therefore, for the convenience of calculation, we first define three elements of the camera: the positioning position is at the origin (0,0,0), the orientation is -z and the upward direction is y.

As shown in the picture above, the camera is shifted to the agreed position by translation and rotation, as long as the objects are relative to each other. Multiply the translation matrix times the rotation matrix, and you get the view transformation matrix Mview=RviewTviewMview =RviewTviewMview =RviewTview.

Remark:

1. A right-handed coordinate system is used to observe the space. Z-axis is directly in front of the camera, so the value of z-axis represents the distance between the object and the camera, namely, the depth.

2. How to get the rotation matrix RviewRviewRview here? It’s not easy to go from -g to -z in any position, but it’s easy to go the other way. So you take the inverse here, and you use the inverse of an orthogonal matrix is equal to the transpose property to get your RviewRviewRview.

3. After the position of the camera is agreed in this way, in combination with the previous step, the transformation is actually applied to the object. The two matrices can be combined into a matrix, which is commonly known as the ModelView Transform.

3. Observation space -> Clipping space (Projection transformation)

In viewing space, we know that only objects in the visual vertebra are rendered visible by the camera, so how can visible 3D objects be mapped to a 2D plane? The next Projection Transform is needed.

Projection transformations include Orthographic Projection and Perspective Projection. The difference between Orthographic Projection and Perspective Projection is shown in the figure below. Orthographic Projection constructs a cube. Perspective projection constructs an optical vertebra (Frustum) whose upper, lower, left and right surfaces are not parallel and the size of near and far surfaces is different. Therefore, the size of near and far objects obtained by orthogonal projection transformation is the same, while perspective projection transformation can produce the effect of larger near and smaller far.

It is easy to realize the standard cube of orthogonal projection in order to realize the object in near and far clipped plane can be projected onto the near plane, but it is more complicated for perspective projection when the far plane is larger than the near plane. So the disassembly problem, first squeeze the upper, lower, left and right planes of the vertebral body into a cube like the orthogonal projection, and then do the orthogonal projection.

Extrusion rules agreed three points: (1) near plane unchanged; ② The z value of the far plane remains unchanged; ③ The center point of the far plane remains unchanged;

Combining with the properties of similar triangles, the matrix for transforming perspective projection into orthogonal projection can be derived.

In addition, two variables were defined for visual vertebra: ① Aspect ratio = width/height; (2) Vertical view-angle fovY

The resulting perspective matrix:

Near: indicates the distance near the clipping plane
Far: indicates the distance from the clipping plane
Fov: The vertical opening Angle of the vertebral body (objects usually become smaller when the field of view is larger)
Aspect: Aspect ratio of the camera (this parameter solves the problem of deforming the model when the canvas is resized and shaped)

So far, byThe transformation is doneCut out the coordinate. X,y and Z components are scaled and shifted in different degrees during the conversion process. X,y are the horizontal and vertical coordinates of the screen, and Z is the depth coordinate of the vertical screen.Clipping is to compare the value of x,y,z and W after transformation. If it is in the range of [-w,w], it is retained, otherwise it is eliminated.

Remark:

1., because matrix multiplication is time-consuming and matrix has associative law, we usually multiply MV matrix first to get a matrix multiplied by point vector coordinates.

2. Properties: Multiply each component of the homogeneous coordinate (x,y,z,1) by a constant k not equal to 0, and the obtained (kx,ky,kz,k) represents the same point with (x,y,z,1) in 3D space.

3. The perspective matrix needs to be noted that it will flip the Z axis. The clipped space coordinate system is the left hand coordinate system (the z axis points away from the observer and into the screen)

4. Clipping space -> Standardized equipment space (homogeneous division)

After the previous transformation, we get the trimmed part to be displayed in the visual vertebra, and then map the objects in the visual vertebra to the near plane to be displayed in the 2D plane.

Normalized Deviced Coordinates are translated to a hardware-independent NDC describing the Coordinates mapped to the near plane. This transformation is called homogeneous or perspective division, dividing the X,y, and Z components by the W component, Transform it to the range of [−1,1]3[-1,1]^3[−1,1]3. Taking X-axis as an example, the conversion formula is as follows:

Until now, to facilitate affine transformations, we have been using homogeneous coordinates (x,y,z,w), which are converted back to Cartesian coordinates (x,y,z) by homogeneous division. As shown in the figure below, the transformed origin is in a cube with (0,0,0) and xyz components of 2 units, and the z-axis is flipped (from right-handed to left-handed).

Remark:

1. After the projection transformation and homogeneous division, the object coordinates are changed to within the range of [−1,1]3[-1, 1]^3[−1,1]3, which causes the object to stretch, followed by a viewport transformation and then stretched back.

5. Standard device space -> Screen coordinates (viewport transformation)

The last step is viewport transformation, which transforms the coordinates of the NDC’s [−1,1]3[-1,1]^3[−1,1]3 cube into viewport coordinates (screen coordinates) for pixel rendering on the screen.

The canvas drawn by WebGL is a Canvas element, which defines the lower left corner as the origin of coordinates and the pixel coordinates in the upper right corner as (pixelWidth, pixelHeight).

The coordinate transformation for x, y, by [- 1, 1] [1, 1] ^ 2 [- 1, 1] 2 transformation to [0, width] * [0, height]. Scale and stretch x and y to the origin of the screen as shown in the matrix below. One – to – one correspondence between standard device coordinates and screen window pixels is realized.

In WebGL there is a direct way to set the ViewPort:

gl.viewport(0.0.this.cvs.width, this.cvs.height);
Copy the code

Wait, what about the z coordinate? The Z-component obtained by the previous homogeneous division is used to represent the depth information and will be used in the Z-buffer algorithm to calculate the depth test for each pixel to achieve the correct occlusion effect. This belongs to the coloring content, this article will not repeat.

conclusion

Wow, you are great, patience to see here ~

Finally paste a graph summary, after reading the above transformation principle, this graph is not suddenly understand it? !

So in WebGL, how shaders apply the above coordinate transformation and combine the texture rendering image, here is a picture, which will be detailed later in the coloring principle.

WebGL is what it is: The meaning of 3d coordinate transformations

The text 📚

1. Object Coordinate system -> World coordinate System (model transformation)

2. World Coordinate system -> Observation space (View transformation)

3. Observation space -> Clipping space (Projection transformation)

4. Clipping space -> Standardized equipment space (homogeneous division)

5. Standard device space -> Screen coordinates (viewport transformation)

conclusion

Related Posts

🧩 UE5 confidential

Front-end interview walkthrough – algorithm part of the binary tree

Want to teach yourself JS? Want to improve the underlying principles of JS? 76 brain maps take you through native JS