Recently, I used Cocos Creator to develop some gamified interaction in class. Cocos is an excellent domestic game engine that can write cross-platform games in javascript. After reading the document, it seemed to run perfectly. However, in the experience meeting, we put forward problems such as long black screen time when loading, serious hot mobile phone, flash back, and lag. Headache, can only think of ways to optimize.

After several days of optimization, the performance gradually reached the standard, during which many pits were stepped. Therefore, I intend to record some methods of troubleshooting and optimization of performance problems and share them with students in need.

Although Cocos is a game development category, it has a lot in common with performance issues encountered in front end development, except for three metrics: load speed, CPU, and memory. Next, some optimization methods are expounded from the three indicators respectively.

1. Load speed optimization

The startup of Cocos can be roughly divided into five stages:

The loading and running time of the Cocos engine cannot be changed by the service side, and this part of the black screen time cannot be optimized. Then the black screen time optimization is left to Cocos static resource loading.

There are two ways to load static resources:

Resource compression is mainly for the compression of image resources, Tinify support PNG and JPG format images online compression, generally can compress 75% of the size, and there is no obvious difference in the visual, very recommended.

PNG and JPG images can also be compressed in the Cocos Creator editor if a certain degree of distortion is accepted.

If the PNG format of the picture is PNG, JPG format is selected JPG, you can adjust the quality of the picture, the lower the quality of the picture, the smaller the size, distortion will be more.

Resource cache is divided into disk cache and memory cache.

On the native side, the resource itself exists locally. On the Web side, resources can be cached on disk through HTTP caching or PWA.

Resources can also be cached in memory. Generally, there will be multiple scenes in the game, for example, there will be many levels in the game, one scene per level. If a scene is not entered repeatedly, then the scene resources may not be cached. If the scene needs to be re-entered, cache it to speed up the second opening.

Generally speaking, the storage space of hard disk is relatively large, and the storage problem of hard disk is not big. However, the memory space is relatively precious, can not all resources crammed, easy to cause high memory usage, and there may be a risk of memory leaks, so generally only some resident cache resources.

2. The CPU optimization

Because the game requires a lot of calculation and drawing, itself is relatively CPU eating. So CPU optimization is very important during gameplay. If the CPU load is too high, the device may become hot, the frame rate may decrease, or the card may retreat.

The CPU is responsible for parsing and executing instructions, so the main reason for the high load of the CPU is that there are too many instructions to execute, especially some time-consuming instructions. In the game, it is mainly the call of the draw instruction, also known as the drawCall. There are other computationally intensive systems, such as physical systems and collision systems. The other is the creation and destruction of nodes, and some update logic in business code.

For drawCall optimization, the ideal situation is to have as few drawcalls as possible. To understand the meaning and method of optimizing a drawCall, it is important to first know what the CPU does after the drawcall is executed.

CPU is not very good at Graphics Processing, so Graphics Processing is generally thrown to the GPU (Graphics Processing Unit) to do, which is why large games need a better Graphics card, in fact, the need for more powerful GPU.

The CPU sends the data to the GPU for rendering, but it doesn’t have to do anything. The CPU needs to write the data to be rendered to the data buffer (video memory) and set the render state (texture, shader, etc.) before the GPU can fetch the data for calculation and rendering.

Because of the strong graphics processing power of GPU, the processing speed of giving a bit of data at a time is about the same as giving a bunch of data at once. But for the CPU, if the drawCall is called frequently, a little bit at a time, then the CPU is busy. So the most efficient way to optimize drawCalls is batch processing.

The batch approach isgraph. The so-called composite image is the texture map to be rendered into a large atlas and sent to the GPU for rendering at one time. For example, if you have 3 sprites, 3 sprites have their own texture, and if they don’t fit, then you need 3 drawcalls. If the composite graph is turned on, then only 1 drawCall is required.

3 star icon Sprite, the drawCall is 4, why not 3, because the camera background itself needs a drawcall, so the star needs a total of 3 drawcalls.

After adding the atlas, you can see that the drawCall becomes 2, indicating that the star now requires only 1 drawCall.

In addition to Sprite, the label component (font) also supports composite diagrams. In fact, rendering fonts also send textures to the GPU to render.

There are two ways to implement fonts: Bitmap font and Free type font.

The so-called bitmap font, is all the characters are hit into a picture, so simple and rough, efficiency is also relatively high, because the equivalent of the font is pre-rendered. The disadvantage is that when the character set is large, such as all Chinese characters, the picture of the character may be large and the memory usage will be high. And it’s not very flexible, because the resolution of the image is fixed, and in high resolution screens, bitmap fonts will appear a little jagged.

The other is Free type fonts, such as TTF fonts. Unlike bitmap fonts, which use pixels to represent the font, Free Type fonts simply define rendering data for the font, which needs to be computed in real time at run time and then rendered. Such fonts do not have the problem of shrinking, but there is a certain computational cost, so generally need to be optimized through the cache.

For only numbers and English letters, and text nodes are more or often change, bitmap fonts can be considered for optimization, which can effectively reduce the number of drawcalls caused by text rendering.

Let’s look at a simple example like this. There are three label nodes in the scenario, and the font format is TTF.

A preview shows that the drawCall is 4. As mentioned earlier, the camera will have one drawcall by default, which means three text nodes bring three drawcalls. If there are a large number of text nodes or text nodes change frequently, it will cause a large number of drawCalls.

If we use BMFont, we can see that the drawCall immediately drops to 2, which means that the three nodes are drawn only once, resulting in a significant drawCall optimization.

For built-in fonts, Cocos also creates character textures for each label component and does not take part in the assembly by default.

Cocos provides BMFont like functionality for the Label component, which we can useCache ModeTo optimize the CPU.

When Cache Mode is NONE, Cocos creates character textures for the text of each label component and does not participate in the composite by default.

When the value is BITMAP, Cocos creates a character texture for the text of each label component, but can participate in dynamic compositures (described later) and draw in batches.

When the value is CHAR, Cocos generates a separate character atlas for the font and caches it. Subsequent new text can be retrieved directly from the character atlas cache without re-rendering. (In fact, the official Cocos document describes this as “the same characters will not be redrawn next time”, but as far as I understand, it still needs to be drawn, otherwise why would the text on the screen be updated, so it should just reuse the rendered data).

Compared with automatic atlas, Cache Mode uses dynamic combination of bitmaps. Static composite graphs are generated at build time, while dynamic composite graphs are generated at run time. Static composite graphics can reduce some runtime costs, but some dynamically loaded image resources can not apply static composite graphics, in this case can be optimized by dynamic composite graphics. On how to use dynamic graph, the Cocos speak already very detailed official document, go here, you can directly see: docs.cocos.com/creator/man…

As mentioned earlier, draw-call reduction using a composite graph is a common and effective method, but using a composite graph takes up a certain amount of memory, so at the same timeFocus on memory metrics. Another thing to note,After the combination of images does not mean that you can batch rendering, the requirements for the Sprite or label node participating in the composite diagram arecontinuous. Again in the star example above, there are 3 stars in the scene, that is, 3 sprites, which originally requires 3 drawcalls, but only 1 drawcall after the drawing is combined. Let’s add a Sprite node between the first and second stars to break the batch render:

When the red square is inserted, the drawCall becomes 4. Is the camera background drawCall + the first star drawCall + red square drawCall + the third and fourth star drawCall respectively. The first star could have been batch rendered with the third and fourth star, but was interrupted by the rendering of the red block.

Let’s adjust the position of the little square again, in front of the first star.

As you can see, the drawCall is now 3 times, even though there is no change in the display.

Therefore, try to keep the nodes participating in the diagram continuous, and do not insert other Sprite class nodes in between, so as not to break the batch rendering.

In addition, the mask component may be one of the culprits behind the increase in the number of drawcalls. Masks in Cocos are mainly used to achieve some shapes, such as rounded corners.

Why do you think so?

There is a white square in the scene.

The total drawCall is 2, so rendering the block requires 1 drawCall.

If you want to show the circle, you can mask it by adding a mask component.

As you can see, the drawCall changes from 2 to 4, indicating that when the mask is used, there are two drawCalls. That’s amazing. How does that work?

The cocos documentation explains it like this:

The conclusion is that three drawcalls are required to draw the node of mask component, and the node of mask component cannot be batch rendered with adjacent nodes, even if they use the same atlas. Therefore, mask should be used as little as possible. If you want to achieve rounded corners and other effects, and the size of the node is relatively fixed, you can ask the design students to directly give the drawing.

Of course, if you’re like me and you want to fine-tune the details inside, what is template buffering? Why do we have to draw a call three times? You can see the detailed explanation below, which requires a little bit of OpenGL knowledge, but you can skip it if you don’t want to go into detail:

  1. What is a template test?

    Template testing is all about determining whether certain areas should be rendered or not by setting them in the template buffer.

For more information, see: [learnopengl – cn. Readthedocs. IO/useful/latest / 0… learnopengl – cn. Readthedocs. IO/useful/latest / 0… the Advanced OpenGL / 02 font testing/)

  1. Node rendering using the mask component is a three-step process

    The render frame information can be viewed by spector.js. Here are the three frames associated with the circular render:

Render frame 1:

The render command is as follows, which means to draw 2 triangles with 6 vertices, which is actually the original square.

But it doesn’t actually render the cube.

The template buffer status is

This means that the value of the template buffer position corresponding to the small box area is directly set to 0, which means that the template buffer of that area is flushed.

Render frame 2:

The render command is as follows, which means to draw n(many) triangles with 186 vertices, which is actually a circle, because in OpenGL(Webgl) shapes are made by triangles.

The template buffer status is

Directly set the value of the template buffer position corresponding to the circular mask to 1.

Render frame 3:

The render command is as follows. As in the first frame, we render the blocks, this time we render the blocks.

The template buffer state is as follows, which means that only when the buffer position has a value of 1 will it be rendered, so the square is masked out of the circle.

In addition to drawCalls, some logical calculations can also affect CPU usage. For example, compute timing for widget components:

If ALWAYS is selected, the position and size of the node will be recalculated in each frame, so it takes a lot of calculation. You can just select ON_WINDOW_RESIZE, and it will be recalculated only if the window size changes. If you also need to compute widgets at other times, you can manually call Widget.updateAlignment on demand.

In addition, since the update life hook is called on every frame, it is important to be aware that the logic in the update is not executed too frequently, such as constantly typing logs or constantly calculating, which can affect CPU performance.

Node creation and destruction can also be performance intensive, so frequent node creation and destruction should be avoided and the number of nodes should be minimized.

Since Cocos draws on canvas in the Web, it is impossible to use developer debugging tools of the browser to view nodes. A Cocos plug-in is recommended hereccc-devtoolsGithub address:Github.com/potato47/cc…

If there are too many nodes, and the nodes are frequently created and destroyed, such as monsters, bullets, etc., in a game with a large number of repetitive objects, it can usually be optimized through a recycling plant. Recycling factory means that when nodes are used up, they are not destroyed, but cached. Next time, nodes can be reused directly from the cache instead of recreating them. Cocos itself provides the interface NodePool recycling plant, may I have an idea: docs.cocos.com/creator/man…

Collision detection in games can also be performance intensive. Use the Box or Circle collider as much as possible, and use the polygon collider less often.

3. Optimize memory

One of the most important resource uses in games is the cache of resources, such as images. Resources are divided into static resources and dynamic resources.

Static resources are those that are loaded as soon as the scene is entered. Dynamic resources refer to resources that are loaded asynchronously in a scenario. For example, some network images and audio are loaded through cc.loader.load or cc.loader.loadres.

We can go throughcc.loader._cacheView the resource list under the current scenario

You can also visually view the list of resources using the ccC-devTool mentioned earlier, and you can also see the size of the texture resources:

Note that an image is much larger in memory than it is on disk, because when the image is on disk, it is encoded, such as using PNG and JPG, and the amount of data is much smaller. But when it is in memory, it is decoded into pixel value, so it needs to occupy a large space.

To reduce memory, there are no more than two ways, one is to reduce unnecessary resources, the other is resource compression.

Reduce unnecessary resources, such as the background image in the scene, one set on mobile and one set on PC. Therefore, it should be realized by determining the platform through the code, and then dynamically loading the corresponding resources, rather than placing the mobile and PC backgrounds in the scene, and then controlling the explicit and implicit way to realize. This reduces the memory footprint of a set of resources.

For background, generally speaking, the design directly to the image will be larger, if it is just solid color or through simple background repetition or transformation can be achieved, can be achieved by development, so that the large background can be optimized away.

In addition, when combining the images, we should pay attention to the fact that only the related images are compared for the combined image. Otherwise, it means that a whole combined image may be loaded, and only one small image is used, which will cause a lot of memory space waste.

Resource compression, mainly refers to the compression of image resources, also known as texture compression.

Simply use tinify and other tools to compress the size of the image. If you do not change the size of the image, it will not reduce the size of the image resources in memory, but only reduce the storage volume of the image in the disk. For resources with low resolution requirements, you can use a 2x or 2x graph to reduce the volume of resources in memory.

Texture compression algorithms, such as Etc1, Etc2, PVRTC, etc., can optimize the size of the image in memory. JPG and PNG formats can compress the image data, but they cannot be read by the GPU, so they need to be decoded by the CPU and then rendered by the GPU. The data compressed by the texture compression algorithm can be directly rendered to the GPU, so texture compression can not only optimize memory, but also optimize CPU.

It is important to note that texture compression is generally lossy compression, compression rate can be selected. In addition, the algorithm of texture compression depends on whether the DEVICE’s GPU can decode, so different texture compression algorithms need to be used for different platforms.

Information about the texture compression algorithm, recommend a look at this article: zhuanlan.zhihu.com/p/237940807…

Etc1 is supported on most Android devices and PVRTC is supported on all iOS devices.

If the image does not need to support the Alpha channel, Select AndroidEtc1 RGB, choice of iOSPVRTC 4bits RGBCan. If you need to support alpha channel, Android selectEtc1 RGB Separate A, iOS choicePVRTC 4bits RGBA Separate A.

For unused memory, we also want to release in time to prevent memory leakage. Divide automatic release and manual release two kinds.

Static resources can be released by selecting the scenario automatic release option:

In this way, static resources in the scene will be automatically released after the scene is switched.

If you don’t want to wait to switch scene to release the static resources, also can use cc. The assetManager. ReleaseAsset manual release.

One of the pits is,Dynamically loaded resources cannot be automatically released along with static resources during scene switching. Need to pass throughcc.setAutoReleaseRecursivelySet it manually:

In this way, the dynamically loaded resources will be automatically released when the scene is switched. You can also manually release dynamically loaded resources using cc.loader.releaseRes.