This is history, not for those of you who are going straight to the point.

① Why WebGPU and not WebGL 3.0

If you dig deep into the bottom of Web graphics technology, you can certainly trace back to the OpenGL technology proposed in the 1990s, and you can certainly see that WebGL is based on OpenGL ES to make this information. OpenGL came into its own at a time when graphics cards were weak.

Driver.

We all know that the graphics card has to install the graphics card driver, through the API exposed by the graphics card driver, we can operate the GPU to complete the operation of the graphics processor.

The problem is that the graphics card driver and the assembly of the common programming world, the bottom layer, not good to write, so the major factories did the packaging – the code of the base operation.

A simple chronology of the graphics API

OpenGL does just that, wrapping the top interfaces and dealing with the bottom graphics drivers, but its design style is notoriously out of step with modern Gpus.

Microsoft’s latest graphics API for this is Direct3D 12, Apple’s latest graphics API for this is Metal, there is a well-known organization to make Vulkan, this organization is called Khronos. D3D12 is currently in the hot spot for Windows and PlayStation, Metal for Mac and iPhone, Vulkan you’ve probably seen in android phone reviews. These three graphics apis are known as the Three modern graphics apis and are closely tied to modern graphics cards, whether PC or mobile.

Why WebGL runs in all browsers

Oh, and I forgot to mention that OpenGL lost to Khronos in 2006, and now almost no operating system has this old graphics driver.

The question then arises, why can OpenGL ES based WebGL run in all operating system browsers?

Since WebGL is no longer OpenGL ES, it is now D3D translation to graphics drivers on Windows and Metal on macOS, but the closer we get to the present, the more difficult it becomes to implement this non-son style.

Apple’s Safari browser only supports WebGL 2.0 in recent years, and it has abandoned the GPGPU feature in OpenGL ES. It may not be possible to see WebGL 2.0 GPGPU implemented in Safari. I’m busy with Metal and the even greater m-series chips.

Origin of WebGPU name

So, with all that said, is it clear why the next generation of Web graphics interfaces won’t be called WebGL 3.0? It is not GL any more. In order to prevent the modern giants from fighting on the name, WebGPU, which is closer to the hardware name, is adopted. WebGPU is not in the same era with WebGL in terms of coding style and performance.

As an aside, OpenGL is not without learning value and will be around for a while, as will WebGL.

2. Compare coding style with WebGL

WebGL is actually the shadow of OpenGL, and The style of OpenGL has a great influence on the style of WebGL.

Those of you who have studied the WebGL interface know one thing: the GL variable, specifically the WebGLRenderingContext object, and WebGL 2.0 is WebGLRenderingContext2.

OpenGL coding style

Whether you are manipulating shaders, VBO, or creating Buffer and Texture objects, you basically have to go function by function through gl variables. For example, here is the code for creating two shaders and compiling and linking them:

const vertexShaderCode = ` attribute vec4 a_position; void main() { gl_Position = a_position; } `

const fragmentShaderCode = ` precision mediump float; Void main() {gl_FragColor = vec4(1, 0, 0.5, 1); } `

const vertexShader = gl.createShader(gl.VERTEX_SHADER)
gl.shaderSource(vertexShader, vertexShaderCode)
gl.compileShader(vertexShader)
const fragmentShader = gl.createShader(gl.FRAGMENT_SHADER)
gl.shaderSource(fragmentShader, fragmentShaderCode)
gl.compileShader(fragmentShader)

const program = gl.createProgram()
gl.attachShader(program, vertexShader)
gl.attachShader(program, fragmentShader)
gl.linkProgram(program)

You also need to explicitly specify which program you want to use
gl.useProgram(program)
// Continue to manipulate vertex data and trigger drawing
// ...
Copy the code

The three lines of JS WebGL calls that create the shader, assign the shader code, and compile it must be written this way. At most, the order of creation and compilation of VS and FS can be changed, but these operations must be completed before program.

CPU load Problem

Some say it doesn’t matter, you can wrap it up as a JavaScript function, hide the details of the procedure, and just pass the parameters. Yes, this is a good wrapper, many JS libraries have done, and are very useful.

But there’s still a big gap — and that’s OpenGL itself.

Every time gl. XXX is called, the signal transfer from CPU to GPU will be completed, and the state of GPU will be changed immediately. Those who are familiar with basic computer should know how important the distance between the time inside the computer and the hardware is. The world has spent decades making great efforts in signal transmission. The process of any of the above GL functions changing the GPU state roughly goes through such a long distance as CPU ~ bus ~ GPU.

We all know that it’s better to do things all at once, rather than going back and forth so many times, and that’s how OpenGL works. Somebody said why do you do that instead of just sending it once? For historical reasons, gpus weren’t as complex when OpenGL was popular, so they didn’t need to be as advanced in design.

To sum up, WebGL is a CPU load hazard, which is determined by the OpenGL state mechanism.

That’s not the case with the big three modern graphics apis. They tend to get things ready first, and the last thing they present to the GPU is a complete design drawing and buffered data that the GPU can just hold and focus on.

WebGPU assembly coding style

The WebGPU also has a header object — Device. The type is GPUDevice, which represents a high-level abstraction that can operate on the GPU device. It is responsible for creating various objects that operate on graphics operations, and finally assembling them into a “CommandBuffer”. GPUCommandBuffer) “and submit to the queue, which completes the CPU side of the work.

Therefore, when creating an object, device.createXXX does not immediately notify the GPU of its state change, as WebGL does. Instead, the code written on the CPU side logically and typos the object to be passed to the GPU accurately, and tells them to position themselves according to their pit. Ready to commit to GPU.

Here, the instruction buffer object has complete data (geometry, textures, shaders, pipeline scheduling logic, etc.) that the GPU knows what to do as soon as it gets it.

// In asynchronous functions
const device = await adapter.requestDevice()
const buffer = device.createBuffer({
  /* Assembles geometry, passes in-memory data, and eventually becomes resources like vertexAttribute and Uniform */
})
const texture = device.createTexture({
  /* Assembly texture and sampling information */
})

const pipelineLayout = device.createPipelineLayout({
  /* Create the pipeline layout, passing the binding group layout object */
})

/* Create shader module */
const vertexShaderModule = device.createShaderModule({ / *... * / })
const fragmentShaderModule = device.createShaderModule({ / *... * / })

/ * calculation shader can use shader module const computeShaderModule = device. CreateShaderModule ({/ *... * * / /})

const bindGroupLayout = device.createBindGroupLayout({
  /* Create a layout object for the binding group */
})

const pipelineLayout = device.createPipelineLayout({
  /* Pass the binding group layout object */
})

/ * the above two layout objects can be too lazy to actually create, binding although need binding group layout to inform the corresponding resource binding group looks like pipeline stage, but the binding group can be by pipeline layout object through the programmable phase code deduced from their binding group layout object this example code * / save the complete process

const pipeline = device.createRenderPipeline({
  There are three stages that can pass shaders for programmability: vertices, fragments, computations. Each stage can also specify data, information, such as buffers, etc. In addition, the pipeline also needs a layout object for the pipeline. The built-in binding group layout object lets the shader know what the binding group resource will look like when it is later used in the channel
})

const bindGroup_0 = deivce.createBindGroup({
  /* Group the buffer and texture into logical groups to facilitate each procedure call. The procedure is called the pipeline, and the binding group layout object must be passed. You can infer from the pipeline or pass the binding group layout object itself directly
})

const commandEncoder = device.createCommandEncoder() // Create an instruction buffer encoder object
const renderPassEncoder = commandEncoder.beginRenderPass() // Start a render channel encoder
// You can also start a computing channel
// const computePassEncoder = commandEncoder.beginComputePass({ /* ... */ }) 

/* Take renderPassEncoder as an example. Use renderPassEncoder to set the sequence of what to do in the pass, e.g. */

// Set pipeline 0, bind group 0, bind group 1, vBO, and trigger draw
renderPassEncoder.setPipeline(renderPipeline_0)
renderPassEncoder.setBindGroup(0, bindGroup_0)
renderPassEncoder.setBindGroup(1, bindGroup_1)
renderPassEncoder.setVertexBuffer(0, vbo, 0, size)
renderPassEncoder.draw(vertexCount)

// Set up pipeline 1, another binding group and trigger draw
renderPassEncoder.setPipeline(renderPipeline_1)
renderPassEncoder.setBindGroup(1, another_bindGroup)
renderPassEncoder.draw(vertexCount)

// End channel encoding
renderPassEncoder.endPass()

CommandEncoder calls Finish to complete the coding and return an instruction buffer
device.queue.submit([
  commandEncoder.finish()
])
Copy the code

The above procedure is a general WebGPU code, very rough, no details, but basically this logic.

For the part of the code of channel encoder, the author reserved more complete, so that readers can better observe how an instruction encoder encodes channel, and finally end the coding to create instruction buffer submitted to the queue.

The cook trick

Using cooking analogy, OpenGL programming is like making a dish to take what seasoning is needed, and then continue to do the next dish. Modern graphics apis, on the other hand, have multiple stovetop fires, with everything in place, including processed ingredients and accessories, so that even a single cook (CPU) can cook several dishes at once, efficiently.

Multithreading with powerful general-purpose computing (GPGPU) capabilities

WebWorker multithreading

WebGL’s header object is gl variable, which must rely on HTML Canvas element, that is to say, it must be obtained by the main thread and can only schedule GPU state in the main thread. The multi-threaded ability of WebWorker technology can only process data, which is quite weak.

WebGPU has changed the way of obtaining the supervisor object. The Navigator object that adapter object depends on can also be accessed in WebWorker, so device can also be created in Worker, and instruction buffer can also be installed. In this way, multithreading instruction buffer is realized and CPU side multithreading GPU scheduling ability is realized.

General purpose Computing (GPGPU)

If WebWorker is multi-threading on the CPU side, then the multi-threading of GPU itself should also be used.

What makes this possible is something called a “computational shader,” a programmable phase of the programming pipeline that came late to OpenGL (because the early graphics cards didn’t exploit its parallel general-purpose computing capabilities), not to mention WebGL’s 2.0 support. Apple dude didn’t even bother to implement this feature in WebGL 2.0.

Webgpus ship with this stuff, using the shared memory next to the CU (Compute Unit) in the GPU through the Compute shader, which is much faster than normal video memory.

There’s not a lot of information about computational shaders, so for now you can only look at examples, and a blog post is included in Resources.

By bringing gpGpus to the Web, scripting language runtimes (deno, browser JavaScript, and possibly even future NodeJS support WebGpus) can access the GPU’s powerful parallel computing capabilities, It is said that the performance of Tensorflow. js is greatly improved by using WebGPU as the post-technology, which is of great help to the fields of deep learning. Even if the user’s browser is not so trendy and the rendering programming is not so fast to replace WebGL, the general computing capacity of WebGPU can also shine in other fields. Not to mention computational shaders can also be used in rendering.

How tempting!

4 Browser implementation

Edge and Chrome, as of the post, can be opened in the Canary version of the trial through Flag.

Both Edge and Chrome use Chromium core, which is the WebGPU API realized through the module Dawn. According to relevant information, DawnNative part of Dawn is responsible for communicating with the three graphics apis. Up you pass information to a module called DawnWire, which communicates with the JavaScript API, the WebGPU code you write. WGSL is also implemented in this part. Dawn is implemented in C++, and you can find the link in resources.

FireFox uses the GFX-RS project to implement the WebGPU, which is clearly a WebGPU implemented in Rust and has a module design similar to Dawn.

Safari is updating its own WebKit implementation, WebGPU.

5 the future

However, as the Gpus of red, green and Blue become more and more sophisticated and the Gpus of various mobile terminals gradually improve, the three modern graphics apis are definitely still developing. WebGPU is bound to release the powerful power of modern graphics processors (Gpus) on the Web end. Or the machine learning, AI capabilities of general-purpose parallel computing.

The resources

  • Google Dawn Page

  • gfx-rs GitHub Home Page

  • Get started with GPU Compute on the web