Slides from the 2022 WebGL & WebGPU Meetup, available at the end

1 Use the label attribute wherever it is available

Every object in a WebGPU has a label attribute, whether you create it by passing the label attribute in descriptor, or you access the label attribute directly after creation. This property is similar to an ID, and it makes objects easier to debug and observe. It costs almost nothing to write, but it’s very, very fun to debug.

const projectionMatrixBuffer = gpuDevice.createBuffer({
  label: 'Projection Matrix Buffer'.size: 12 * Float32Array.BYTES_PER_ELEMENT, // The matrix should be 16
  usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
})
const projectionMatrixArray = new Float32Array(16)

gpuDevice.queue.writeBuffer(projectionMatrixBuffer, 0, projectionMatrixArray)
Copy the code

The size of the GPUBuffer used by the intentional error matrix in the above code will be added to the label information during error verification:

// Write range (bufferOffset: 0, size: 64) does not fit in [Buffer "Matrix Buffer"] size (48).Copy the code

2 Use a debug group

The CommandBuffer allows you to add and remove debug groups, which are basically strings that indicate which parts of code are being executed. Error verification displays the call stack with error messages:

// -- mark the current frame --
commandEncoder.pushDebugGroup('Frame ${frameIndex}');
  // -- The first subswitch: mark light update --
  commandEncoder.pushDebugGroup('Clustered Light Compute Pass');
		// For example, update the light here
    updateClusteredLights(commandEncoder);
  commandEncoder.popDebugGroup();
  // -- End the first sub-adjustment pilot --
  // -- second subtune: mark render channel start --
  commandEncoder.pushDebugGroup('Main Render Pass');
    // Trigger drawing
    renderScene(commandEncoder);
  commandEncoder.popDebugGroup();
  // -- End the second sub-adjustment pilot
commandEncoder.popDebugGroup();
// -- End the first experiment
Copy the code

This way, if an error message is reported, it will prompt:

// Console outputs Binding sizes are too small for bind group [BindGroup] at index 0 Debug group stack: > "Main Render Pass" > "Frame 234"Copy the code

Load texture image from Blob

ImageBitmaps created using Blob get the best performance for decoding JPG/PNG textures.

/** * Asynchronously create a texture object based on the texture image path and copy the texture data to the object *@param {GPUDevice} GpuDevice Device object *@param {string} Url Texture image path */
async function createTextureFromImageUrl(gpuDevice, url) {
  const blob = await fetch(url).then((r) = > r.blob())
  const source = await createImageBitmap(blob)
  
  const textureDescriptor = {
    label: `Image Texture ${url}`.size: {
      width: source.width,
      height: source.height,
    },
    format: 'rgba8unorm'.usage: GPUTextureUsage.TEXTURE_BINDING | GPUTextureUsage.COPY_DST
  }
  const texture = gpuDevice.createTexture(textureDescriptor)
  gpuDevice.queue.copyExternalImageToTexture(
    { source },
    { texture },
    textureDescriptor.size,
  )
  
  return texture
}
Copy the code

Texture resources in compressed format are preferred

Use what you can.

WebGPU supports at least 3 types of compressed textures:

texture-compression-bc
texture-compression-etc2
texture-compression-astc

Support depends on the hardware capabilities, according to Github Issue 2083, all platforms should support BC (DXT, S3TC), ETC2, ASTC compression, to ensure that you can use texture compression capabilities.

It is highly recommended to use a hypercompressed texture format (such as Basis Universal). The advantage is that it can be converted to any format supported by the device regardless of the device, thus avoiding the need to prepare textures for both formats.

Github toji/web-texture-tool is a library for loading compressed textures in WebGL and WebGPU classes

WebGL’s support for compressed textures is not very good and is now native to WebGPU, so use it when you can!

4 Use the glTF processing library GLTF-transform

This is an open source library, available on GitHub, that provides command-line tools.

For example, you can use it to compress GLB textures:

> gltf-transform etc1s paddle.glb paddle2.glbPaddle.glb (11.92 MB) → paddle2.glb (1.73 MB)Copy the code

It’s visually lossless, but the size of the model exported from Blender can be much smaller. The textures of the original model are 5 2048 x 2048 PNG images.

In addition to compressing textures, the library can scale textures, resampling, add Google Draco compression to geometric data, and more. Finally optimized, GLB volume is less than 5% of the original.

> gltf-transform resize paddle.glb paddle2.glb --width 1024 --height 1024
> gltf-transform etc1s paddle2.glb paddle2.glb
> gltf-transform resample paddle2.glb paddle2.glb
> gltf-transform dedup paddle2.glb paddle2.glb
> gltf-transform draco paddle2.glb paddle2.glbPaddle.glb (11.92 MB) → paddle2.glb (596.46 KB)Copy the code

5 Upload cached data

There are many ways to pass data into a buffer in a WebGPU, and the writeBuffer() method is not necessarily a bad use. When you call WebGPU in WASM, you should give priority to the writeBuffer() API to avoid extra buffer copying.

const projectionMatrixBuffer = gpuDevice.createBuffer({
  label: 'Projection Matrix Buffer'.size: 16 * Float32Array.BYTES_PER_ELEMENT,
  usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});

// When the projection matrix changes (e.g. window changes size)
function updateProjectionMatrixBuffer(projectionMatrix) {
  const projectionMatrixArray = projectionMatrix.getAsFloat32Array();
  gpuDevice.queue.writeBuffer(projectionMatrixBuffer, 0, projectionMatrixArray);
}
Copy the code

The authors point out that it is not necessary to set the mappedAtCreation when creating buffers, and sometimes it is possible to create buffers without mapping, such as loading buffers in glTF.

6 Asynchronous pipeline creation is recommended

If you are not immediately render pipeline or pipeline calculation, as far as possible with createRenderPipelineAsync and createComputePipelineAsync both apis to create alternative synchronization.

Create pipelines synchronously, potentially compiling pipeline resources underneath, which interrupts gPU-related steps.

For asynchronous creation, pipeline will not resolve Promise if it is not ready, that is to say, it can finish what GPU is currently doing first, and then bother the pipeline I need.

Here’s the comparison code:

// Create the calculation pipeline synchronously
const computePipeline = gpuDevice.createComputePipeline({/ *... * /})

computePass.setPipeline(computePipeline)
computePass.dispatch(32.32) // If the scheduler is triggered, the shader may be compiling and will freeze
Copy the code

Now look at the code created asynchronously:

// Create the calculation pipeline asynchronously
const asyncComputePipeline = await gpuDevice.createComputePipelineAsync({/ *... * /})

computePass.setPipeline(asyncComputePipeline)
computePass.dispatch(32.32) // By this time the shader is already compiled, no lag, awesome
Copy the code

Use implicit pipeline layout with caution

Implicit pipeline layouts, especially independent calculation pipelines, may be cool when writing JS, but doing so presents two potential problems:

Interrupts the shared resource binding group
Something strange happened while updating the shader

If your situation is particularly simple, you can use an implicit pipeline layout, but you can create a pipeline layout explicitly if you can.

Here is how the so-called implicit pipeline layout is created: the pipeline object is created, and then the pipeline’s getBindGroupLayout() API is called to infer the desired pipeline layout object in the shader code.

const computePipeline = await gpuDevice.createComputePipelineAsync({
  // Do not pass the layout object
  compute: {
    module: computeModule,
    entryPoint: 'computeMain'}})const computeBindGroup = gpuDevice.createBindGroup({
  // Get the implicit pipeline layout object
  layout: computePipeline.getBindGroupLayout(0),
  entries: [{
    binding: 0.resource: { buffer: storageBuffer },
  }]
})
Copy the code

7 Shared resource binding group and binding group layout object

If there are some values that do not change but are frequently used during rendering/calculation, you can create a simpler resource binding group layout that can be used on any pipeline object that uses the same number of binding groups.

First, create the resource binding group and its layout:

// Create a camera UBO resource binding group layout and its binding group ontology
const cameraBindGroupLayout = device.createBindGroupLayout({
  label: `Camera uniforms BindGroupLayout`.entries: [{
    binding: 0.visibility: GPUShaderStage.VERTEX | GPUShaderStage.FRAGMENT,
    buffer] : {},}})const cameraBindGroup = gpu.device.createBindGroup({
  label: `Camera uniforms BindGroup`.layout: cameraBindGroupLayout,
  entries: [{
    binding: 0.resource: { buffer: cameraUniformsBuffer, },
  }],
})
Copy the code

Next, create two render pipelines. Note that both of these pipelines use two resource binding groups. The difference is that they use different material resource binding groups and share the camera resource binding group:

const renderPipelineA = gpuDevice.createRenderPipeline({
  label: `Render Pipeline A`.layout: gpuDevice.createPipelineLayout([cameraBindGroupLayout, materialBindGroupLayoutA]),
  /* Etc... * /
});

const renderPipelineB = gpuDevice.createRenderPipeline({
  label: `Render Pipeline B`.layout: gpuDevice.createPipelineLayout([cameraBindGroupLayout, materialBindGroupLayoutB]),
  /* Etc... * /
});
Copy the code

Finally, in each frame of the rendering loop, you only need to set the camera’s resource binding group once to reduce CPU ~ GPU data transfer:

const renderPass = commandEncoder.beginRenderPass({/ *... * /});

// Set the resource binding group for the camera only once
renderPass.setBindGroup(0, cameraBindGroup);

for (const pipeline of activePipelines) {
  renderPass.setPipeline(pipeline.gpuRenderPipeline)
  for (const material of pipeline.materials) {
	  // For the material resource binding group in the pipeline, it is set separately
    renderPass.setBindGroup(1, material.gpuBindGroup)
    
    // Set VBO and issue draw command, skipped
    for (const mesh of material.meshes) {
      renderPass.setVertexBuffer(0, mesh.gpuVertexBuffer)
      renderPass.draw(mesh.drawCount)
    }
  }
}

renderPass.endPass()
Copy the code

Information attached to the original

Brandon Jones, Twitter @Tojiro
The original slides: docs.google.com/presentatio…
More additional reading: Toji.github. IO /webgpu-best…
Alain. Xyz /blog/raw-we…
For texture contrast details: toji.github. IO /webgpu-best…
For buffered upload details: toji.github. IO /webgpu-best…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

A few best practices for WebGpus

1 Use the label attribute wherever it is available

2 Use a debug group

Load texture image from Blob

Texture resources in compressed format are preferred

4 Use the glTF processing library GLTF-transform

5 Upload cached data

6 Asynchronous pipeline creation is recommended

Use implicit pipeline layout with caution

7 Shared resource binding group and binding group layout object

Information attached to the original

A few best practices for WebGpus

1 Use the label attribute wherever it is available

2 Use a debug group

Load texture image from Blob

Texture resources in compressed format are preferred

4 Use the glTF processing library GLTF-transform

5 Upload cached data

6 Asynchronous pipeline creation is recommended

Use implicit pipeline layout with caution

7 Shared resource binding group and binding group layout object

Information attached to the original

Related Posts

CPU Analysis for Performance Analysis – from CPU calls to specific lines of code (JAVA)

[Share] Encapsulates the JSONP library used by the VUE project

Network protocols – layer 7/5 model, three-way handshake and four-way wave, UDP/TCP, TCP reliable transmission