I am Chen Jiaming from Tencent Rubik’s Cube Studio Group. Today, TOGETHER with my partner hu Youwei from Tianmei Studio Group, I will share Aqua, a large area deep interactive water simulation and rendering technology.

1. Introduction to Aqua Project

Youwei and I are from two departments. What is the reason for us to work together on this water body project? Because we participated in Tech Future, an open source program held by Tencent Games, the purpose is to strengthen internal technical exchanges and promote the development of cutting-edge game technology. Aqua is one of the projects, whose goal is to study interactive water simulation and rendering technology. After one year of business development, partners from different departments and positions have made some achievements, which will be shared in this conference.

Water simulation and rendering has long been a subject of frequent research, such as the ocean in Ubisoft’s Assassin’s Creed, the water in naughty Dog’s Uncharted series, and the water system released in Unreal Engine 4.26.

The water system of Unreal Engine is close to our goal. It first defines various water ranges with curves during editing, and then constructs a Mesh of water during running. The function of this system is very powerful, but it is a little different from the fully dynamic water body we hope to do. For example, we hope to cause floods when it rains and drown the whole scene. Or you can use your skill to put water anywhere you want.

So the Aqua team wanted to make a breakthrough in this area, developed a solution and made a Tech demo. In the demo, the range of water can be seen is 1km x 1km, while the simulation accuracy can reach 25cm, and the rendering accuracy can reach 6.5cm. Players can spray water in the scene with skills as mentioned above, and put out the bonfire, while the spout water can also stay in the scene. In addition to some graphics effects, we also implemented some features that need to communicate with the CPU, such as vehicles and buoyancy, mainly to verify the practicality of the solution. Also seen in the demo, an event in which rain causes water levels to rise and then floods the scene. In addition to lakes, rivers and oceans can be simulated.

2.1 Simulation: multi-level water simulation

What’s the philosophy behind Aqua? First talk about Aqua simulation algorithm shape selection. Common water algorithms in the game are as follows:

PBD/SPH: The most realistic simulation of water dynamics through particles generally requires a very large number of particles. Therefore, when we apply it in a large area of water, such as hundreds of meters, it is very difficult to solve the performance problems in simulation or rendering.

Wave Particles: The idea is to represent a Wave as a particle, and then render all the waves as a height map. The effect is manageable, but it is harder to simulate changes in water capacity.

Grid method: the most traditional, although it cannot support Hydraulic jump like SPH, most water body characteristics can be simulated, and it has very good characteristics and scalability. Aqua chose the grid method as the basis for the solution, considering that we were aiming to support a large range of several hundred meters or more.

However, the simulation cannot be carried out simply with a very high resolution grid, because the memory and computation amount will not be able to support on a general GPU. Therefore, we refer to the principle of Clip Mapping to carry out multi-level simulation. The idea is to cover the whole simulation range with different precision grid, remove some invisible details to exchange for performance; The center of each Clip is roughly aligned with the camera. The closer the water is to the camera, the higher the accuracy will be. We start with the largest layer, and then synchronize the red edges to the next layer. You can transmit the important influence from outside the Clip.

Each layer of clip is divided into two stages: “data collection” and “simulation”. First, the simulation input within the clip is collected, including the height map of the scene or terrain, which is mainly used to block or collide with water bodies. In addition, a series of additional simulation inputs are collected and converted into maps for later GPU simulations.

Then, we use Compute Shader to solve the simulation; We implemented two grid algorithms, LBM and Pipe Water, which we will share further later. In order to avoid that the water body speed is too fast and exceeds CFL conditions, we also need to consider the support of sub-stepping and improve the stability of simulation by reducing delta time. After completing the simulation of the first level clip, we will transfer the edge state. And so on, when all clips have been simulated, the results will be exported to a TextureArray or TextureAtlas, which will be handed to the rendering module for drawing.

I mentioned that we need to collect the height map of the scene or terrain, which is actually done through the unreal SceneCapture function. The user defines the minimum and maximum height of the simulation in the world space in advance. When the camera moves horizontally beyond a threshold, we will take a depth map from top to bottom at the highest point, and then reverse the depth and add the height of the lowest point to obtain the height of the scene facing up.

In addition, we also need to collect a series of additional simulation inputs, such as the influence of water capacity or speed. The difficulty here is how to unify various influences into a fixed input for GPU simulation.

Relevant content of simulation algorithm: So far, we have implemented PipeWater and LBM two grid algorithms. Due to time constraints, please refer to relevant articles listed in the following figure for details. Although the formulas of the two algorithms are different, they both calculate the water flow F_i in some specific directions of the grid. For example, Pipe calculates the water flow F_i in four directions, while LBM calculates the water flow F_i in nine directions in D2Q9. The simulation CS will calculate the F_i of the next frame through the F_i of the current frame, plus the height difference with the adjacent lattice, viscosity, gravity and other parameters, and so on. The simulated CS uses a structured buffer to store F_i and export intermediate data, such as volume data and speed, to render.

At the beginning, we mentioned that there is an operation of edge state transfer in the scheme. The so-called edge state refers to the F_i at each Clip boundary. Before each Clip starts the simulation, we must copy the F_i of the two grid widths corresponding to the boundary of this layer from the previous layer. In this way, we can transfer influences outside the clip to this layer of the clip. Take the following two videos for example. The first video demonstrates that when state passing is turned off, water flows out of the central area of the arena. In the second video, the water flowed into the center of the scene normally because state transfer was turned on.

Although we used a multi-tiered strategy to increase the range of the simulation, another technique was needed to incrementally expand the range of the simulation in the face of a large scene with several kilometers. We call this method Scroll Update or Sliding Window. First, each Clip is centered with the camera, so that when the camera pans, the Clip’s coverage automatically updates. Since the coverage has changed, we need to copy the updated F_i to the new grid according to the world coordinates. The camera position will be snapped according to the grid size to avoid jumping when copying F_i. To avoid too many updates, we wait for the camera to have more than one grid before triggering updates. When the Cache miss area appears in the moving direction, we can read F_i from the larger level Clip to improve stability.

As with the general physical simulation, when the water flow velocity is too fast, it is easy to exceed the CFL conditions and lead to numerical overflow (the water explosion in the picture below). This is especially likely to occur when simulation accuracy is improved, because the delta x in the simulation formula is relatively low and the speed can easily exceed the Cmax limit. Therefore, it can only be solved by substepping to reduce delta T, which is simply the simulation of running several times in a frame. But this is not very performance friendly, because the computation is multiplied several times. It is observed that the delta x of each Clip is doubled in the multi-level scheme, which means that the number of sub-steps can be halved. Through this optimization, compared with the original regardless of the level, can reduce about ~90% of the calculation.

Finally, let’s take a look at the comparison of the effects of multi-level simulation: the left image has only one level, while the right image is divided into two levels of simulation, you can see that the effects are very similar. From the data of our Tech demo, we can see that multi-level simulation has been significantly improved both in memory and performance.

2.2 Simulation: actor system

Next, I will turn to Youwei to introduce the actor system and rendering part of water simulation. Next we will show you the purpose and challenges of developing the actor system, how to decouple the simulation, and how to customize your own action equipment qualities, and finally we will show some practical Demo cases.

In fact, both LBM simulation algorithm and Pipe simulation algorithm are essentially calculating the height field and velocity field of water body. If there is no interference from other factors, the simulation results will eventually tend to be stable.

However, if we add some external influences into the calculation process, we can get some interactive results, such as the water wave generated by the character swimming in the river, the ripples generated by rain, the waves driven by the sea breeze, etc., these are all interactive simulations. Then how do we achieve these interactive simulations? The answer is an agent.

We once thought of allowing actors to directly participate in the calculation of the simulation ComputeShader, but this would lead to a problem: there are many types of actors, and each actor has its own algorithm implementation, which makes it difficult to directly affect the simulation results through a unified function entry in the simulation stage. This will make the simulated shader extremely complex and inconvenient for future management and maintenance.

So we use a decoupling method, is all part of the results are output in the action figure, for all the role of the device, even if their algorithm implementation is different, but they all output the result of the unified, is the rigid body into the water depth, velocity and volume, then we will put these results are saved in action figure RGBA channels. Then we can sample the RT action diagram in LBM simulation or Pipe simulation, and take out the RGBA channel data to affect each simulation parameter respectively to complete the interactive simulation process.

The important thing to share here is the framework of our actor system, which first collects valid actors for each frame and then updates all of them with Gameplay. The Instance information they generate is then collected, including location, size, custom data, and so on. All possible buffers accessed by agents are also collected. Including the last frame simulation height field, speed field, scene height, as well as custom map data. Then, in the form of Draw Instance, each actor is rendered in batches to quadrilateral by the same material and drawn on RT. Finally, the four results of unified output volume, XY direction speed and rigid body data of each actor are respectively saved in the RGBA channel of the map. Then different simulation algorithms can access this RT and add external influences to the simulation results. Because the output on RT is uniform physical results, no matter what kind of simulation algorithm can be used, regardless of the class of the actor, to achieve the purpose of decoupling.

How can users quickly develop their own agents? The original idea was to allow TA or programs to implement their own agent algorithms in the material editor via a wire in the same way that they develop materials, which we call agent materials. As shown in the figure, some common functions of the equipment have been implemented in the current open project in a customized way.

In order to enrich the height customization content, we also packaged many material functions for developers to use, such as access to the custom data of each actor, uniform buffer data, height field, speed field map data, etc. Can be dragged and used in the material editor. The screenshot below shows the implementation of a water source actuator.

How do we solve the problem of increasing the number of drawcalls caused by the use of a large number of different devices? As I mentioned in the previous share, we will also batch agents of the same material. We can see in the screenshot that in the open project, we used a lot of actors to simulate water, waves, etc., but the final drawCall was only 3 times.

Here we have captured a partial example of an actor effect in an open project. It can be seen that with the help of the actor framework, it can be convenient to customize the implementation of a variety of Affector Material interaction Gameplay with water.

3.1 Rendering: Water rendering based on GPU Driven

In this topic, we will share with you the traditional approach in the past, and then how we creatively apply CDLOD to achieve GPU-driven water mesh rendering, including how to build quadtree on GPU, how to achieve occlusion elimination, ultra-high mesh density, vertex deformation, etc. In traditional water mesh rendering, we all know that we usually use a flat mesh, plus a height map to restore the wave motion of the water surface. You don’t even need real waves on the mobile side, just normal and UV animation to simulate them.

The realization algorithm is to restore the height of the wave and the position of the vertex. It is very simple, we only need to sample the height graph in VertexShader to restore the Z coordinate value of the vertex. The tangent space of each vertex is recalculated by sampling the data of adjacent height field. Finally, the tangent space of the vertex is interpolated to the rasterization stage.

Before implementing the GPU driver, we also investigated UE4.26’s water plugin, where we took a screenshot of the ocean example and saw that Unreal used six different mesh topologies to represent different ocean mesh densities. Although the same water material is used and the instance method is used to draw, 6DrawCall is also generated. In addition, the water body of UE4.26 is divided into different water actors of ocean, river and lake, which will also lead to an increase in the number of drawcalls. In addition, the water body of UE4.26 is Driven by CPU, and our goal is to do GPU Driven.

Finally, we adopted the technology of CDLOD in the selection of the scheme. The full name of CDLOD is the detail level based on continuous distance. It was originally used to solve the optimization problem of large world terrain rendering. We took advantage of this opportunity to put it into practice in large water rendering. CDLOD it has a lot of advantages, it has smooth change, will not produce cracks, between the Lod level difference is not more than 1 characteristics, and is very suitable for eliminating nodes based on quadtree.

The algorithm principle of constructing quadtree on GPU is actually a top-down process, starting from the root node, and each root node is a thread in the ComputeShader. If the current node is at this level of recursion and at the next level, it identifies the Quad. If you’re at the current level of recursion, and then at the next level, that means that the child node needs to continue recursing to a lower level to be determined, and so on and so forth to reach level 0. There will be many problems in constructing quadtree on GPU. The first problem we encountered is that recursive functions cannot be used on ComputeShader, which has limitations on the use of recursive functions.

However, if we tried to bypass this limitation, for example, by writing a large number of nested loops in the Shader code to solve recursive functions, the Shader would become extremely complex and the registers would be exhausted. Furthermore, in a typical build process, recursion complexity decreases with each level, that is, the level of the root node is the most complex and time-consuming, and level 0 is the least time-consuming. If we use uniform shader code for all levels, we will not have a good average execution time per thread.

To solve this problem, we execute ComputeShader in batches, with each level of build having its own variant of ComputeShader. Each time the output of the node that was not determined in the previous build is used as the input for this one, so that each build is only concerned with its current level, which greatly reduces shader complexity and averages execution time between threads.

In the construction process, we will first determine whether the quadtree nodes should be removed to reduce the number of instances. By sampling the simulation results of water surface Height, we can construct the Min/Max Height information in the Quad within a node, so as to construct a cube under the WorldSpace. Our culling is divided into two kinds, one is cone culling, the other is HZB culling. The cone culling will judge whether the 8 vertices of the cube are in the clipping space, while HZB will select the appropriate Z-Buffer MIpMap to make depth judgment for culling according to the screen space size of the cube.

How do we build a high density grid when the density of the nodes is very low if we render them directly? As shown in the figure below, areas with the same color have the same mesh density. Here we use two mesh densities to pre-generate two mesh types, with the second mesh having one fourth as many vertices as the first.

Taking full-size 32 and half-size 16 as examples, we take 32 and 16 to fill the quadtree nodes with non-boundary and boundary. In this way, we can express multiple mesh densities by using only two topologies, and the maximum number of drawcalls is 2.

In the final rendering, the most important thing is to determine which vertices need to be deformed. We will set a deformation region corresponding to the node size of the current level for each level, and the vertices in the region will be deformed. Since two different facets were used to fill the mesh density according to the power of 2 rule, it is necessary to deform the vertex positions of these facets if they are odd.

In addition, we also summarized the comparison table of the construction resolution and the corresponding construction consumption time under this resolution. No matter how big the scene is, the consumption time of our quadtree construction is only related to the construction resolution, so as to stabilize the performance upper limit. Construction resolution and mesh density can be randomly matched and configured according to actual project needs to achieve a balance between performance and effect. In current open source projects, 512 build resolution +32 grid density is the default configuration, and 256 +16 or lower can be selected for better performance. But this will lead to the mesh accuracy will be reduced, there may be jagged, out of shape.

Finally, let’s take a look at the actual Demo. The mesh density near the character is always high, while the sparse mesh in the distance transitions smoothly as the character camera moves.

3.2 Rendering: Dynamic surface shallow water based on Height Blend

Next, how to improve the UE4 native height hybrid algorithm, and on this basis to achieve static shallow water, and then combined with the simulation results to further achieve dynamic shallow water. Finally, we’ll show you how it actually works during the Gameplay process.

Open the world to the oceans, lakes, rivers and other normal bodies of water. Shallow water is also a very important expression to reflect the details of the environment, especially ponds, depressions, roads after rain water. Terrain rendering in UE4 is the result of blending multiple layers. The height-based mixing allows the surface to show gaps or low mixing. Therefore, we imagine that a shallow water layer is inserted into the Landscape to make Height Blend with other layers to express the effect of shallow water.

Let’s take a look at UE’s native HeightBlend algorithm. This algorithm is very simple. It multiplies the weight of each layer by the height value, then sums it up, then normalizes the percentage, and finally mixes the results of each layer. This algorithm is simple, but it causes different layers to Blend in the entire weight range, making it dirty.

Layer weight and height map value the value space is in the range [0,1], which is too small to be a real physical unit and will lose calculation accuracy. Moreover, the mean transition cannot be realized at the joint and the transition threshold cannot be controlled.

On the right is the original effect of this algorithm. The improved approach is to first map the height value to the real physical space, then multiply by the corresponding weight, and then we will filter a layer with the highest weight, that is, the highest layer. Then divide the difference between the weight of each layer and the top layer by a transition threshold, so as to obtain a smooth transition weight. Finally, the weight of the smooth transition is normalized to the percentage, and finally the results of each layer are mixed. The advantage of this algorithm is that the height map value at [0,1] is mapped to the actual physical unit first, thus increasing the calculation accuracy. Where the seams are there will be a tendency to average to achieve the mean transition, and the transition threshold can be controlled. We can see that the screenshot on the right is the improved effect, which basically meets the requirements of shallow water.

It is also important that we do not want the shallow water HeightBlend process to be locked into C++ code, but rather open up the art or TA to be edited via wires in the material editor. UE’s default surface material node, Layer Blend, does not decouple shallow water from other layers. So, we developed our own material node called Height Blend. As can be seen from the screenshot, the node developed by ourselves will output the weight information of shallow water layer separately in addition to the mixing results of other layers, so that we can control the final mixing effect of shallow water in the material.

We encapsulated the final shallow-water mixing operation in a material function. As mentioned earlier, we developed our own surface material node, which outputs the weight of the shallow-water layer separately. Once we get the weight, we can mix the color, normal and PBR information of the shallow-water layer separately. The mixing of normals and PBR is relatively simple using a relatively conventional linear interpolation, while in terms of color we consider the effect of transition from dry surface to wet area, and wet area to shallow water effect.

In fact, what we just mentioned is static shallow water. To further add dynamic shallow water, we add LBM simulation results. The left side of the picture below is the effect after adding LBM results. We used the LBM simulation to influence the height of the HeightBlend, and you can see that it interacts with the surrounding environment and reflects the actual HeightBlend. We then tested the Pipe simulation and Rain Affector together, and we can see that in addition to the surface ripples, we can also see that the water volume is different on the right side of the picture, which reflects the effect of the simulation on the water volume.

Finally, some interesting applications of Gameplay. You can see that the main character can cast a skill at will in the game world, which can leave a shallow layer of water on the surface. Shallow water and the surface form a highly mixed erosive effect, and the shallow water is slowly absorbed over time. Next, jiaming continues to share.

3.3 Application of simulation results

Now that we have seen the principles of water base rendering, we will further explain how the simulation results can be applied to more detailed effects of water. The first is the detailed normals on the surface of the water. With the detailed normals, the feeling of water flowing is more intense. And this effect is not difficult to achieve, we only need to use the velocity field of the water body as a flowmap to sample the Detail Normal. Since the simulated speed is defined in the world space, we need to perform a simple zoom and Clamp operation. In addition, we also need to take into account the fact that the velocity is zero when the details of the normal Flatten.

There is also the foam effect, unlike Jacobian’s other schemes by analyzing water height maps. We found that the collision places of water bodies usually have relatively high curl. Since we are two-dimensional velocity fields, the calculated curl is a scalar quantity. So we can think of curl as a 2-dimensional calculation of curl on the left-hand side, where f is the velocity is the x-coordinate; And g is the y coordinate of the velocity; And k is a c-axis, which you can ignore. Once again, we use the velocity field as a flow map to sample the foam map and multiply it by the mask to get the image below.

In addition to the effect on the surface of the water, objects around the water will also change color due to water. To calculate the amount of water, we use a double buffering technique. First, we copy the height of the current frame Buffer1 to Buffer2, then read the height saved in the last frame Buffer2, make a reduction, then take Max with the new water height and put back the height recorded in Buffer1 to calculate a water weight. When rendering the scene object, we will use C of the object’s world coordinate to subtract the height of the water recorded by Buffer 1 to calculate a weight of water contamination. The object’s PBR parameter for lighting calculation is used to interpolate this weight from the dry and wet Preset groups.

Here’s another example of a GPU particle interacting with a water body. The effect of falling leaves shown in the video can not only float on the water, but also rotate with the waves. Each particle in this effect’s particle system can be divided into three states: falling, floating and disappearing. When falling, the particle will determine whether it has fallen into the water based on the height of the scene and the height of the water. If so, it will switch to floating state. When floating, it simply takes the height of the water as its height in world space; Plus the curl mentioned earlier to control leaf rotation. If the particle falls to the ground it will become a vanishing, simply set it to transparent and wait several frames for the particle system to recycle it into the pool.

Two post-treatment effects related to water body height are also shared. First, WaterLine. This effect will appear at the edge where the water meets the bottom. The idea is to find pixels in the screen space that are close to the water and interpolate them into WaterLine colors. By using the projection distance matrix, we can calculate the world coordinate of each pixel on the Near Plane, and then use this coordinate to sample the height of water body, so that we can calculate the distance between the pixel corresponding to the Near Plane and the water surface. This distance is then calculated by Falloff’s function for an interpolated weight to get the WaterLine effect in the screenshot on the left.

Extending the idea of Water Line just now, we can calculate the world coordinates of different distances on the Line of sight. WP uses WP to simulate the brightness of some Scattering at this point through some noise or Water pressure field. WP sampling the height of the water body, we can know the depth of the water at this point, and then we can put it into a Falloff function and we can calculate the reduction of scattering due to this depth. So, we have the Ray March 4 points in our line of sight, and we calculate the integral of the scattering, and we get the Light Shafts effect in the video.

The last thing to share is the buoyancy effect that is closely related to gameplay. Unlike earlier, buoyancy is calculated on the CPU and then transferred to the rigid body of the physics engine. According to the formula, buoyancy is related to the volume of the object immersed in water, so we need the water depth corresponding to the position of the object accessed by the CPU to calculate its immersion volume. We needed an efficient mechanism to read simulation results from the GPU, so Aqua implemented a one-frame delay GPU read function. The general process is that we will record the location we need to query in each frame, and then sell it to a CS to read the simulation results from the Texture Atlas and store them to a Staging RT at a very low resolution. We wait until CS Dispatch is complete before we read back, and finally notify the query result through the next frame C++ callback. In this way, the bandwidth consumption and latency caused by GPU reading can be minimized. With the depth of the water, I can calculate the buoyancy. The driftwood in the Tech Demo uses this GPU reading frame to calculate the buoyancy at both ends of the wood, and then applies AddImpluseAtLocation to the rigid body of the wood. In addition, when each end of the wood checks that it is immersed in water, we apply a force at this end with the velocity of the water we read; This creates the effect of a rigid body floating in the current.

Today we share the open world water solution developed by Aqua team, which includes multi-level simulation system, actor system, GPU-driven CDLOD water rendering; There are also shallow water renderings, and various simulation results applications. After a year of development, Aqua has a lot of features and features, but there are some that are not perfect or can be continuously improved. For example: we don’t have sound support yet; Or maybe the scheme doesn’t support water simulation of complex terrain like a crypt. Hopefully, we will have the opportunity to further optimize and polish Aqua to make it even more perfect.

Q & A

Q1: How to control the mesh precision of water simulation?

A (Chen Jiaming) : This is A very good question. In the process of developing tech Demo, we found that in fact, 25cm represents the accuracy of A grid to simulate, which can achieve A good balance in performance and effect. Because in our demo, we have three layers of clips for simulation; So the corresponding accuracy is about 25cm, 50cm and 1m. Of course, we can also improve the simulation accuracy according to one of the requirements of the game, but every time the accuracy is doubled, it means that the number of substeps should be increased, mainly to avoid the situation of numerical explosion caused by exceeding CFL conditions.

Q2: Is the water rendering grid fixed size or adaptive?

A (Hu Youwei) : my share of also mentioned just now, that is to say our grid size, is determined by the two, the first is to build A resolution, the second is the set of grid density, we are now in the construction of A 512 + 32 resolution Settings to decide to generate grid size, of course you can choose other, such as 256 + 16. Currently this configuration is set in the project before the runtime, we have not implemented to change this during runtime runtime, of course we could theoretically do it.

Q3: How does water lighting render to avoid misalignment? How does GPU Driven fit into mobile?

A (Hu Youwei) : We have encountered this problem before, even far away on the water surface, we did have some serrations, just out of shape. So how do we solve this problem? We will through the vertex of the tangent space reduction into each pixel, is to restore the tangent space inside the pixel shader, at the same time when we restore tangent space, actually don’t have to sample the height of the adjacent pixels, but across multiple adjacent pixels, such as two, three, can let its tangent space smooth again.

At present, it is still a little difficult to realize GPU Driven in mobile terminal, because we all know the hardware architecture of mobile terminal. For ComputeShader, it is still in the support stage of API, and the architecture of mobile terminal has not been upgraded or changed. Therefore, it is still difficult to adapt GPU Driven on the mobile end. If CDLOD is used on the mobile end, it may be necessary to roll back to the CPU Driven way. We transplant the algorithm built on the GPU to the CPU end. Information such as transform sizes can then be rendered for instance using Vertex Streams. Custom data can only be baked into a map or updated to a map, since mobile is currently not very good with struct buff support.