Since the CPU to chat

The CENTRAL process (CPU) is the brain of the computer. It provides the set of instructions that ultimately control the operation of the computer through which we write programs.

The CPU decodes the instruction and executes it through the logic circuit. The whole execution process is divided into several stages, called pipeline. The instruction pipeline includes five steps: fetch instruction, decode, execute, fetch number and write back, which is an instruction cycle. The CPU will continuously execute instruction cycles to complete various tasks.

Instructions and data are first loaded into memory and then fetched to the CPU as the program runs. Although CPU access memory is relatively fast, but compared with CPU execution speed is still relatively slow, in order to alleviate this speed contradiction, CPU designed three levels of cache, that is, L1, L2, L3 cache.

As shown in the figure, each core of a multi-core CPU has its own independent L1 and L2 cache, and then shares L3 cache. The capacity of these three levels of cache is gradually increasing, but the speed is gradually decreasing, but it will also be faster than accessing memory.

With this level of caching, the CPU’s execution speed and memory access speed can be alleviated. Instead of accessing memory all the time, the CPU will load one cache line, or 64 bytes of data, into the cache at a time. This allows you to access the cache directly when accessing neighboring data.

Load data and instructions from the memory to the CPU’s cache, and then control the decoding and execution of instructions through the controller, carry out operations through the arithmetic unit, and then write the results back to the memory. This is how the CPU works.

CPU has only one thread per core, that is, single control flow, single data flow. This architecture causes the CPU to be inefficient in some scenarios, such as 3d rendered scenes.

3D rendering process

3d rendering is to build a 3D model, which consists of a series of vertices in 3d space. Three vertices form a triangle, and then the triangle formed by all vertices is spliced together to form a 3D model.

Vertices, triangles, that’s the basis of 3D. The 3D engine first computes vertex data to determine the shape of the 3D graphics. Then you have to map each face, you can paint a different texture on each triangle.

To display 3d graphics on a two-dimensional screen, projection is required, a process called rasterization. (Grating is an optical device, in this case a 3D projection onto a 2D screen.)

Rasterization calculates the color of each pixel of the 3D graphics projected onto the screen. After all the pixels are calculated, they are written to the frame buffer in video memory. After one frame is rendered, it continues to calculate the color of the next frame.

In other words, the 3d rendering process is:

  • Computes vertex data to form 3d graphics
  • Map each triangle and paint a texture
  • Project to a two-dimensional screen, calculate the color of each pixel (rasterization)
  • To write a frame of data to a frame buffer in video memory

The number of vertices is very large, and the CPU can only compute them sequentially one by one, so it is very difficult to process such 3D rendering. Therefore, the hardware for parallel computation of such 3D data, namely GPU, emerged.

The composition of the GPU

Unlike cpus, which compute data one by one, gpus are parallel, with hundreds or thousands of cores for parallel computing.

Gpu also has the process of instruction, decoding and execution, but each instruction will execute N calculations in parallel. It is a single control flow with multiple data flows, while CPU is a single control flow with single data flows.

So, for 3D rendering scenes that have to compute tens of thousands of vertices and pixels, the GPU is much more efficient than the CPU.

But is the GPU all good? Also is not.

CPU and GPU differences

Cpus are general-purpose and can perform various logic and calculations, while Gpus are mainly used for parallel computation of large quantities of repetitive tasks and cannot process complex logic.

As shown in the figure above, controllers and caches are a large part of a CPU, while they are much smaller in a GPU, but more of the core is devoted to computing.

By comparison, CPU is like a college student, able to solve various problems, but not so fast to add 10,000, while GPU is like a group of primary school students, unable to solve problems, but it is fast to add, because there are so many people.

That is to say, if the logic is complex, only CPU can be used, but if the calculation is large and each calculation is repeated, it is more suitable for GPU.

3d rendering has a lot of these repetitive but simple calculations, such as vertex data and rasterized pixel data, which can be performed concurrently by hundreds or thousands at a time using the GPU.

Opengl, WebGL, CSS hardware acceleration

The graphics card integrates a GRAPHICS processing unit (GPU) and provides drivers. To use the GPU capability, you need to use the DRIVER API. The GPU API has a set of open source standards called OpengL, which has more than 300 functions for drawing various graphics. (Under Windows there is a standard of its own called DirectX.)

We use WebGL API to draw 3D graphics in web pages, and the browser is also based on OpengL API when implementing WebGL, which will eventually drive GPU to render.

Most CSS styles are calculated by THE CPU, but CSS also has some 3D styles and animation styles, computing these styles also has a lot of repetitive and large computing tasks, can be handed to the GPU to run.

The browser will use gpu rendering for the following CSS:

  • transform
  • opacity
  • filter
  • will-change

The browser will divide the content into different layers and render them separately, then merge them together. Triggering the GPU rendering will create a new layer and hand the calculation of the element style to the GPU.

Opacity needs to change the value of each pixel, which conforms to the characteristics of repetition and large number. A new layer will be created and handed over to THE GPU for rendering. Transform is animation, and the calculation of each style value is repetitive and large, and gpu acceleration is used by default. The same is true for Fiter.

Note that gpu hardware acceleration requires creating a new layer, and moving the element to the new layer is a time-consuming operation that may cause flash, so it’s best to do it in advance. Will-change is to tell the browser in advance to put elements on a new layer at the beginning, so that later gpu rendering does not need to do a new layer.

Of course, sometimes we want to force hardware rendering to be triggered, we can use the above properties, for example

will-change: transform; 
Copy the code

or

transform:translate3d(0.0.0);
Copy the code

Chrome DevTools can see if it’s CPU rendering or GPU rendering. Open rendering panel and check Layer Borders to see blue and yellow boxes. The blue ones are CPU rendered, and the yellow ones are GPU rendered.

For example, there is no separate layer for this text:

Add a will-change: transform property and the browser will create a new layer to render the element and use the GPU to render it:

Gpu hardware acceleration can reduce CPU pressure and make rendering more smooth, but also increase memory consumption. Hardware acceleration is enabled by default for transform, opacity, and filter. In other cases, it is recommended to use it only when necessary.

Opencl and neural networks

Is 3d rendering the only scene for repetitive and massive computing tasks?

No, machine learning in THE FIELD of AI is also typical. It is characterized by a large number of neurons that need to be calculated, but each calculation is relatively simple and suitable for gpu running.

Today’s Gpus are not only capable of graphics rendering, but also provide some programming capabilities. This part of the API has the OpencL standard. The parallel computing capability of the GPU can be used to run some tasks with a lot of computation but not much logic, which can be more efficient than the CPU.

conclusion

The CPU provides the instruction set, which will continuously execute the instruction cycle of fetch, decode, execute, fetch and write back, controlling the operation of the computer.

In order to alleviate the contradiction between CPU and memory, multi-level cache system L1, L2, and L3 is introduced. L1, L2, and L3 are the relationship between the container gradually getting bigger and the access speed gradually slowing down, but it is still faster than memory access. Memory is read into the cache in units of the size of a cache line (64 bytes) for CPU access.

The process of 3D rendering is to compute the data of each vertex, connect them into triangles, then map the texture, then compute the color of each pixel projected onto a two-dimensional screen, known as rasterization, and finally write it into the video memory frame buffer, thus rendering it frame by frame.

CPU computations are performed sequentially, which is not suitable for 3d rendering, which involves a large number of vertices and pixels to be computed, hence the appearance of gpus.

Gpus can perform a large number of repeated calculations in parallel, with hundreds of thousands of computing units, compared to cpus that can’t perform complex logic, but can perform a large number of repeated calculations. Opengl provides a standard API.

In CSS, GPU accelerated rendering can be used to reduce CPU pressure and make the page experience smoother. By default, transform, opacity and filter will create new layers and hand them to GPU rendering. For such elements you can use will-change: attribute name; To tell the browser to render the element on a new layer at the beginning.

The parallel computing power of the GPU is not only available for 3D rendering, but also for machine learning, where the GPU can be controlled through OpencL apis.

The GPU is closely related to the front end, whether it is WebGL, CSS hardware acceleration, or web page performance. I hope this article can help you understand the principle and application of GPU.