This is the ninth day of my participation in the First Challenge 2022. For details: First Challenge 2022.

This article is illustrated based on code from the austinEng/ WebGPU-Samples: WebGPU Samples (github.com) repository.

preface

In Hello WebGPU — Compute Shader Foundation — Mining gold (juejin. Cn), we use WebGPU to complete the matrix calculation. Today we start to do a little bit more difficult things: Simulate the movements of birds in nature. Let’s take a look at the final implementation:

These triangles represent individual birds, and they all start flying in a random way.

After a period of time, their movements become the following:

We can see that as the program runs, individual birds gradually form a collective movement. We define the movement of each individual bird as follows:

  1. They want each other to be closer to the larger group
  2. If they get too close to each other, they want to be a little bit farther apart
  3. If they’re too far apart from each other, they’ll move closer to each other.

Today we use the general computing power provided by WebGPU to simulate such bird colony behavior.

Similarly, today’s work is divided into the following steps:

  1. Data preparation
  2. Write the compute shader
  3. Write vertex Shader & Fragment shader
  4. Creating a calculation pipeline
  5. Create render pipeline
  6. Write the render flow

Coding

Let’s start Coding today. First of all, we need to prepare relevant data.

Data preparation

First, we determine the initial position information of each individual in the flock:

  const numParticles = 1500;
  const initialParticleData = new Float32Array(numParticles * 4);
  for (let i = 0; i < numParticles; ++i) {
    initialParticleData[4 * i + 0] = 2 * (Math.random() - 0.5);
    initialParticleData[4 * i + 1] = 2 * (Math.random() - 0.5);
    initialParticleData[4 * i + 2] = 2 * (Math.random() - 0.5) * 0.1;
    initialParticleData[4 * i + 3] = 2 * (Math.random() - 0.5) * 0.1;
  }
Copy the code

Set the parameters for the motion rules we mentioned above and create buffers for them to be written to on each render

const simParams = {
    deltaT: 0.04.rule1Distance: 0.1.If the distance between two individuals is less than 0.1, we consider them to be a group
    rule2Distance: 0.025.// If the distance between two individuals is less than 0.025, they are considered too close and need to be separated
    rule3Distance: 0.03.// If the distance between two individuals is less than 0.03, they are considered too far apart and are expected to move closer to each other
    rule1Scale: 0.02.// The weight of rule 1
    rule2Scale: 0.05.// The weight of rule 2
    rule3Scale: 0.005.// The weight of rule 3
  };
  
  

  const simParamBufferSize = 7 * Float32Array.BYTES_PER_ELEMENT;
  const simParamBuffer = device.createBuffer({
    size: simParamBufferSize,
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
  });
Copy the code

Then, we need two sets of GpuBuffers and GPUBindGroup objects, one to store the current flock information (including location, speed) and the other to store the results of the calculation. After the calculation, we need to switch the order of the two sets of objects, that is, take the result of the first time as the input of the second time, and use the input of the first time to receive the result of the second time.


  const particleBuffers: GPUBuffer[] = new Array(2);
  const particleBindGroups: GPUBindGroup[] = new Array(2);
  for (let i = 0; i < 2; ++i) {
    particleBuffers[i] = device.createBuffer({
      size: initialParticleData.byteLength,
      usage: GPUBufferUsage.VERTEX | GPUBufferUsage.STORAGE,
      mappedAtCreation: true});new Float32Array(particleBuffers[i].getMappedRange()).set(
      initialParticleData
    );
    particleBuffers[i].unmap();
  }

  for (let i = 0; i < 2; ++i) {
    particleBindGroups[i] = device.createBindGroup({
      layout: computePipeline.getBindGroupLayout(0),
      entries: [{binding: 0.resource: {
            buffer: simParamBuffer,
          },
        },
        {
          binding: 1.resource: {
            buffer: particleBuffers[i],
            offset: 0.size: initialParticleData.byteLength,
          },
        },
        {
          binding: 2.resource: {
            buffer: particleBuffers[(i + 1) % 2].offset: 0.size: initialParticleData.byteLength,
          },
        },
      ],
    });
  }

Copy the code

Note here that particleBuffers are used in both compute shaders and vertex shaders, so the usage is: GPUBufferUsage. VERTEX | GPUBufferUsage. STORAGE. OK, so far, the need to participate in the calculation of the data is ready, we are prepare to render the birds need to use the data. Let’s prepare the vertex data for each individual in the flock. For simplicity, let’s use a simple triangle to represent an individual bird.

  const vertexBufferData = new Float32Array([-0.01, -0.02.0.01,
    -0.02.0.0.0.02,]);const spriteVertexBuffer = device.createBuffer({
    size: vertexBufferData.byteLength,
    usage: GPUBufferUsage.VERTEX,
    mappedAtCreation: true});new Float32Array(spriteVertexBuffer.getMappedRange()).set(vertexBufferData);
  spriteVertexBuffer.unmap();
Copy the code

Write the compute shader

struct Particle { pos : vec2<f32>; vel : vec2<f32>; }; struct SimParams { deltaT : f32; rule1Distance : f32; rule2Distance : f32; rule3Distance : f32; rule1Scale : f32; rule2Scale : f32; rule3Scale : f32; }; struct Particles { particles : array<Particle>; }; @binding(0) @group(0) var<uniform> params : SimParams; @binding(1) @group(0) var<storage, read> particlesA : Particles; @binding(2) @group(0) var<storage, read_write> particlesB : Particles; @stage(compute) @workgroup_size(64) fn main(@builtin(global_invocation_id) GlobalInvocationID : vec3<u32>) { var index : u32 = GlobalInvocationID.x; var vPos = particlesA.particles[index].pos; var vVel = particlesA.particles[index].vel; Var cMass = vec2<f32>(0.0, 0.0); Var cVel = vec2<f32>(0.0, 0.0); Var colVel = vec2<f32>(0.0, 0.0); var cMassCount : u32 = 0u; var cVelCount : u32 = 0u; var pos : vec2<f32>; var vel : vec2<f32>; For (var I: u32 = 0u; i < arrayLength(&particlesA.particles); i = i + 1u) { if (i == index) { continue; } pos = particlesA.particles[i].pos.xy; vel = particlesA.particles[i].vel.xy; If (distance(pos, vPos) < params.rule1distance) {cMass = cMass + pos; cMassCount = cMassCount + 1u; } if (distance(pos, vPos) < params.rule2distance) {colVel = colVel - (pos - vPos); } if (distance(pos, vPos) < params.rule3distance) {cVel = cVel + vel; cVelCount = cVelCount + 1u; }} if (cMassCount > 0u) {var temp = f32(cMassCount); cMass = (cMass / vec2<f32>(temp, temp)) - vPos; } if (cVelCount > 0u) { var temp = f32(cVelCount); cVel = cVel / vec2<f32>(temp, temp); } vVel = vVel + (cMass * params.rule1Scale) + (colVel * params.rule2Scale) + (cVel * params.rule3Scale); VVel = normalize(vVel) * clamp(length(vVel), 0.0, 0.1); VPos = vPos + (vVel * params.deltat); If (vPos. X < -1.0) {vPos. X = 1.0; } if (vPos. X > 1.0) {vPos. X = -1.0; } if (vPos. Y < -1.0) {vPos. Y = 1.0; } if (vPos. Y > 1.0) {vPos. Y = -1.0; } // Write the result back to Buffer particlesb.particles [index]. Pos = vPos; particlesB.particles[index].vel = vVel; }Copy the code

Workgroup_size represents the size of the workgroup and the core number of shaders that can be executed in parallel at one time. The DISPATCH API we call in the TS code specifies the number of workgroups. So the number of times the shader program executes is equal to workgroup_size * workgroup_count. In this example, our workgroup_size is (64, 1, 1). When we call dispatch, Workgroup_count is 24. So, we actually call compute Shader 24 x 64 = 1536 times.

The following code logic can be reviewed in the comments in the above code without further elaboration here.

Write vertex Shader & Fragment shader

Next, we write the shader code for rendering

@stage(vertex) fn vert_main(@location(0) a_particlePos : vec2<f32>, @location(1) a_particleVel : vec2<f32>, @location(2) a_pos : vec2<f32>) -> @builtin(position) vec4<f32> { let angle = -atan2(a_particleVel.x, a_particleVel.y); let pos = vec2<f32>( (a_pos.x * cos(angle)) - (a_pos.y * sin(angle)), (a_pos.x * sin(angle)) + (a_pos.y * cos(angle))); Return vec4<f32>(pos + a_particlePos, 0.0, 1.0); } @stage(fragment) fn frag_main() -> @location(0) vec4<f32> {return vec4<f32>(1.0, 1.0, 1.0, 1.0); }Copy the code

First, we find the orientation (Angle) of the bird based on its current individual speed, then rotate the vertex to this Angle, and finally add the latest position we found in the previous calculation shader.

For the chip shader, it’s easier to write it here, just set it to white.

Creating a calculation pipeline


  const computePipeline = device.createComputePipeline({
    compute: {
      module: device.createShaderModule({
        code: updateSpritesWGSL,
      }),
      entryPoint: 'main',}});Copy the code

The code to create the calculation pipeline is simple and will not be covered here.

Create render pipeline

const spriteShaderModule = device.createShaderModule({ code: spriteWGSL });
  const renderPipeline = device.createRenderPipeline({
    vertex: {
      module: spriteShaderModule,
      entryPoint: 'vert_main'.buffers: [{// instanced particles buffer
          arrayStride: 4 * 4.stepMode: 'instance'.attributes: [{// instance position
              shaderLocation: 0.offset: 0.format: 'float32x2'}, {// instance velocity
              shaderLocation: 1.offset: 2 * 4.format: 'float32x2',},],}, {// vertex buffer
          arrayStride: 2 * 4.stepMode: 'vertex'.attributes: [{// vertex positions
              shaderLocation: 2.offset: 0.format: 'float32x2',},],},},fragment: {
      module: spriteShaderModule,
      entryPoint: 'frag_main'.targets: [{format: presentationFormat,
        },
      ],
    },
    primitive: {
      topology: 'triangle-list',}});Copy the code

Most of the code here is pretty much the same as the rendering process explained earlier, but there is one cavtake: we specify stepMode as instance for the first buffer in the Buffers array.

StepMode has two modes: Vertex and instance.

  • Vertex: Addresses of vertex data based onarrayStrideThe sum is continuously added, but the address of the vertex data is reset between the two instances
  • Instance: Address of vertex data based onarrayStrideIt keeps adding, but between the two instances, but the address of the vertex dataDon’tBe reset

That is to say, if we use the method of multi-instance drawing to draw graphs, and we want the buffer data read between different instances to be different, we need to use this instance mode.

Write the render flow

Now we are almost ready to start writing the render flow.

First of all, before rendering, the flow of calculation is carried out. The entire process uses a commandEncoder.

// Here a t temporary variable is used to exchange two BindGroups and particleBuffers
let t = 0;
const commandEncoder = device.createCommandEncoder();
{
  const passEncoder = commandEncoder.beginComputePass();
  passEncoder.setPipeline(computePipeline);
  passEncoder.setBindGroup(0, particleBindGroups[t % 2]);
  passEncoder.dispatch(Math.ceil(numParticles / 64));
  passEncoder.endPass();
}
Copy the code

The calculation process differs from the rendering process in that:

  1. passEncoderThe creation of is no longer throughbeginRenderPassCreate, but throughbeginComputePassCreated.
  2. Without callingdrawCommand to render, instead calldispatchAPI to do the calculation.

With that done, we’re ready to render.

{
  const passEncoder = commandEncoder.beginRenderPass(renderPassDescriptor);
  passEncoder.setPipeline(renderPipeline);
  passEncoder.setVertexBuffer(0, particleBuffers[(t + 1) % 2]);
  passEncoder.setVertexBuffer(1, spriteVertexBuffer);
  passEncoder.draw(3, numParticles, 0.0);
  passEncoder.endPass();
}

Copy the code

It’s worth noting that when we created the render pipeline object, we had three attributes bound to three locations, but here we only set two gpuBuffers, This is because particleBuffer contains both position and Velocity data. Note that the first argument to the setVertexBuffer function does not correspond to location, but to the index of buffers in PSO.

Later, all we need to do is submit our commandEncoder, update our temporary variable T, and start the next rendering

device.queue.submit([commandEncoder.finish()]);
++t;
requestAnimationFrame(frame);
Copy the code

conclusion

Let’s briefly review what we learned today:

  1. First we set rules for the birds to fly;
  2. Then Compute Shader is used to calculate the position of the bird flock, and the result of the calculation is written toGPUBuffer, and reuse in the rendering process containing the calculation resultsGPUBufferRender it.
  3. Learn the rendering process forGPUBufferdifferentstepModeIs required when different instances apply different vertex data in multi-instance renderinginstancethestepMode.

OK, today’s content is so much, if you think this article is useful, please give the author a thumbs up oh ~ your appreciation is the author’s motivation to update ~