Tencent mobile game optimization road

First of all, I would like to introduce the general background of the market. Now many engineers, whether male or female, may be playing mobile games, and I found many people playing some small mobile games on the way here. Tencent seems to have become the most profitable game company, so my sharing of mobile game content may be of some help to everyone.

First of all, from the domestic environment, after the rapid development of mobile games in the first two or three years, now it may be a stage of gradual slowdown. If we look at the top 10 games, including King of Glory, CrossFire, Onmyoji, A Chinese Ghost and A Chinese Odyssey, we will find that the gameplay of mobile games is becoming more and more serious, mainly in competitive games and Mmos (namely RPG), which also proves a problem. After the rapid development of the game industry in the first three years, Its IP dividend is waning and it is entering a phase of refined operations. Tencent has gone through three stages since 2012, each with slightly different operational requirements and product control.

This is what I do a screenshot on sina weibo, will find that many players playing games on mobile phone now the performance of the demand is higher and higher, may is not the same as three years ago, three years ago, because the game expansion is not so fast, the user also didn’t so much, so you buy camera phones may be more focused on camera or some other functions. These days, many people are likely to get a new phone for a game. This is another screenshot, by about two screenshots, we found that the performance of the players of the game is a requirement, for manufacturers, many hardware vendors at the time of cell phone, will bind a more famous game, to attract potential buyers by the user, and now also has a lot of our cooperation manufacturers, including OPPO, VIVO those domestic producers, They also want to use our platform to continuously improve the game and make it a better user experience when new phones are released.

Briefly say our Tencent internal mobile game version of the audit process. We mainly focus on performance, whether in functional testing or automated testing, or in the grayscale release process, we will do some basic data collection, data storage, mainly with the game itself related to some performance data. During the various tests, we analyzed the data and came to a conclusion whether the performance was up to par. If not, the version would be sent back for some performance optimizations and fixes.

And that brings up king of Glory, the one that’s taking everything by storm. Whether on the subway, the bus, or in the hotel will see groups of friends, everyone together to play king of Glory, king of Glory is indeed Tencent’s most popular game. I had the honor to participate in the release and testing of the entire version, including the process of performance optimization, before and after the launch of King of Honor. As you can see, before optimization, this is a FPS, but the in-game FPS may be slightly different from the FPS of a traditional APP. From the point of view of the game, FPS should be as stable as possible. Before optimization, FPS fluctuates very much, and after optimization, by the time of release of public beta, FPS waveform is very stable. I remember before the open beta, king glory done nearly thousands of performance optimization point, a performance optimization of large mobile game, not just one or two points that simple, may contain hundreds or thousands of points, each point is, every bit of death to dig out of the land, not solve the problem of an architecture can be a large chunk of optimization.

The six circles mainly include CPU, memory, drawcall, and the usage of each indicator. The words may be small, you can just look at the color of the figure. In fact, there will be a lot of red parts before optimization, red is equivalent to the red light, it is not up to standard. Optimized after a version of the current, and basically every index is very good, so king glory so fire is justified, because the product quality is very high, the whole team from top to bottom, the optimization of product style, performance, has the spirit of perseverance, in their head, even up to the company’s standard, They will continue to dig for optimization points.

At present, the distribution of mobile game engines of domestic games is roughly like this. We can see that domestic Unity engine and CoCOS engine account for half respectively, and the rest may be some self-developed engines, mainly netease, and some small and medium-sized teams will also use self-developed engines for feature development. Since 90% of Tencent’s internal games are made in Unity, the details of this sharing will be based on Unity.

We see a graph of six squares, which is basically the user experience that we experience directly when we play the game: stutter, heat, power consumption, delay, teleportation, and flash back. Now let’s focus on these six points and see what data collection and testing needs to be done in which direction.

When you find your APP and game stuck, you need to first look at the hardware overhead of FPS, CPU and GPU, and see if it has a lot of drawcalls. If your application is hot, we also want to look at the CPU and GPU. The power consumption is the same, depending on the CPU and GPU.

If the delay is severe, let’s say you play king glory, or play Onmyoji, delay may be your network have some problem, also could be the result of a game server itself, it is also possible because some fluctuation in the network has led to some packet delay, of course, it is possible that the client computing logic out some problems. Transients are similar to delays. When your network has a problem, the client does some computational compensation or fault-tolerant processing that forces you to the coordinate state that is synchronized with the server. Some games, like playing League of Legends, or playing Honor of Kings, it’s not teleportation, it might be a character panning over, and you see a character floating over, and that’s the state of synchronization. Other games may not have this panning process and will simply teleport you over.

Flash back is mainly to see the memory, because we know that there are many domestic Android devices now, android memory management is not as good as iOS, many manufacturers layer strategy is not the same, when your APP memory to a certain peak, may be strong to kill your application. In terms of CPU, iOS phones will kill your APP when your CPU peak and CPU usage remain at a high level. Generally speaking, if there are these problems, it will lead to the loss of users, the decrease of new users, and the decline of product reputation. Therefore, user experience is an all-round product quality problem.

Client performance test some of the common methods, the first is a large number of robots with screen, this and test pressure of the APP, the APP and the difference of the game, we use WeChat, Meituan or travel, the client I was the only one in use, and when playing games, the screen will have a lot of people, there may be ten, twenty, or even hundreds of people, So there is some pressure on the client.

The second is the single player game, which is the functional test we usually do, playing it by yourself normally. The third is multi-team, king of Glory may be 10 people, crossfire may need 16 or more, multi-team test is the client data synchronization and image processing fault tolerance mechanism, as well as automated testing, Tencent used more in this aspect, outside small and medium-sized teams may not use much.

Tencent’s automated testing is mainly used to do some smoke tests, or do some basic function verification after the build of the version. For example, if a game has ten copies, or a hundred copies, it has a hundred scenes, how do we make sure that each scene is usable? Automated tests can be used to run all scenarios after the daily build every day to ensure that these scenarios are ok. As with any kind of testing, it’s ultimately about looking at the data, so we capture a lot of performance data during testing and print a lot of logs in the application to do some lag analysis.

Speaking of automated testing in front, I would like to mention the Unity automated testing framework used by Tencent, called GAutomator, which has been open source in Git, and you can go to collect it. It’s primarily based on the Unity engine, and was featured at Unite 2016 last year as the first Unity-based automation framework.

We see the above three points, in fact is my summary about test automation development evolution, the earliest may be there are some blind spots, we’re just on the phone without order at random to the point, the second phase is based on a certain engine controls, is likely to be based on the virtual, is likely to be based on the android control some of the points. The third stage is to write scripts for more in-depth automated testing, which is a rough framework. It USES many features, the first is to use the Unity engine features, the second with the help of UIAutomator inside some of the functionality, may in the usual APP test in the use of more, is mainly used to retrieve some generic android standard controls, such as hair circle of friends, QQ landing, WeChat login and share button, inside the game, Or through Unity engine to achieve, because now many games through Unity to do, so it is said that the engine internal controls, buttons, pictures are actually available in memory, can be simulated after the operation, so that the depth of the automatic test scene. Otherwise, if you do a blind spot test in the past, it might at most get you a login, but the core scene can’t get in, or the core scene just stands still and can’t operate. So we dug a little deeper, wrapped another layer on top of it based on the engine, and controlled your gameplay through Python scripts.

In tencent internal Unity games are mostly made from a set of testing framework version of the smoke test and performance test, the more important thing is, of course, it can help you achieve some more depth, more broad compatibility tests, an adaptive test may require thousands of mobile phones, including the mix all kinds of brand mobile phone, if through the automated test, You can make the scenarios that fit the test run deeper and fuller.

The second phase is data collection, analysis, optimization, we can direct perception of some phenomenon, such as caton, fever, flash back, after these phenomena may be the nature of a row of red to the left, the main is memory, flow, drawcall, FPS, CPU, GPU, and the number of triangles, the things we can through some method to obtain, You can get it from android system files, you can get it from OpenGL, you can get it from inside the machine.

To talk about the reason for the lag, we play games, whether playing computer games or mobile games, often encounter a card a card situation, the user experience is very bad. If the movie we watch is stuck, our eyes can’t take it. In a nutshell, what’s the cause of the stuck movie? Whether APP or mobile, the ultimate cause of a game’s stutter is that a frame takes too long, resulting in a delay in the output of the image. The first cause of stalling is resource loading. What are resources? Resources are those pictures, animations and special effects seen in the game. Everything we can see with the naked eye basically belongs to resources. When resources are loaded, they will definitely be stuck, because they are IO operations of local files.

The second is inefficient logical functions, mainly because developers write bad algorithms when writing code, or call some functions that should not be called. IO, the third is the main thread and resource loading is similar, resource load may be done in the main thread IO, in addition to IO resource loading, we also find a lot of apps and games at ordinary times at the time of release will add some to the logcat log, but some logs may be used in the testing phase, it accidentally released into a live version, So there is a significant performance overhead and a poor user experience on the user side. Java and Unity are similar in terms of garbage collection. Each GC takes a few hundred milliseconds, which delays the output of the image and is bound to get stuck. Network fluctuation is when we are playing mobile games, suddenly from wifi to 4G network, or from a strong signal to no signal, after the fluctuation, you will see all kinds of strange phenomena, may be transient, may also float, will flash back and so on.

Let’s talk about GC and memory in Unity. Inside it is a virtual machine that implements C# through mono. Mono is the equivalent of a managed virtual machine and it has two memory values for standard memory, the first is mono heap memory and the second is mono used memory. Heap memory is when we pre-allocate a large chunk of memory for your game, say you allocate 50 megabytes of memory, but your C# objects, your variables may only use 10 megabytes or 20 megabytes, which is mono used memory. Look to the right of the flow diagram, when we are in need of memory allocation, we will go to the application, if there is enough free memory pile in mono, it directly to you, if not, it will first do a GC, will certainly be card, if the free memory is not enough, it will apply to the operating system to do some memory. Therefore, mono heap memory is a constant increase for Unity games, and it is important to control the size of the heap during development, otherwise it will cause the overall PSS to grow.

Look at two pictures. On the first picture there are two lines, one green and one blue. Green is the pre-allocated MONo heap memory, if it is in a steady state, this memory management is good. We can see that the blue line is a wave shape, and when it reaches a peak, it comes down, indicating that the game has done a good job of memory management, keeping the heap level and not increasing. This game is not doing very well because the green line keeps going up, which means it might have a memory leak, causing the heap to keep going up.

The only difference is that GC sometimes suspends the relevant thread, and your entire application is bound to get stuck. The second step is to iterate through all the memory objects that are already in use and mark those that are no longer in use. The third step is to release it. Step 4, restart the stopped thread. If we look at A piece of pseudocode like this, for example, when we finally call GC collection, object A will not be collected because A is held by A static object, and object B is A local variable that will be collected soon. So a lot of memory leaks are actually caused by objects being held by multiple root nodes or global objects and not being freed when they should be freed. Now let’s look at a graph with a lot of red dots on it that’s GC. GC happens a lot, so the FPS is very unstable and fluctuates a lot. The FPS waveform is relatively stable, because there are only four GCS in the whole process, so we concluded that a lot of stuttering is caused by GC.

For example, in the early version of a MOBA mobile game from Tencent, we found that a GC would be generated in the copy of 5V5 in about 15 seconds on average, and the performance of that version was relatively poor. Then we will look at the causes of GC. The first is memory requisition. When you requisition memory and the available memory is low, it will do GC, and the second is to call GC manually.

How do we get GC calls in Unity? We can use the mono_profiler_install_GC function to get the number of GC calls. Since Mono is open source, we can use many of the functions in Mono. Here are some simple optimization methods to reduce GC calls. There were a lot of optimizations in this version, like the ones we see on the right. There are a lot of pawn objects on the map, they use the object pool, constantly new out, will cause a lot of memory fragmentation, will also cause the memory increase speed too fast.

The second is that Lua does not pass Object as a parameter when calling C#. If Lua calls C# with Object as a parameter, there is a reflection mechanism. Reflection in C# contains a boxing and unboxing operation. Boxing is usually called boxing, and each boxing operation generates approximately 20 bytes of memory overhead, so as per normal game standards: At 25 frames per second, each frame has more than 20 bytes. That’s more than 400 bytes per second, and as the game goes on, the memory cost goes up.

The third optimization network data transceiver, there is a small function, each time a new object to receive data packets from the server, later changed to use Cache.

The fourth is to reduce the refresh frequency of some parts of the UI. This may sound strange, but the UI is the menu and option buttons that we see on the game interface. Because these uIs are fixed and change very little, they change very little, so there are a lot of parts of the UI that can be reduced by not having to refresh every frame, It can refresh every second or every 500 milliseconds. If playing some content is the wild can reduce costs, for example, play the king of glory when we will go to the red and blue buff there, after actually went to find the occurrence of red blue buff, is invisible to the far away in the, in view of some of the things we can not render it, don’t even have to calculate, This can reduce some client overhead in terms of performance.

It is necessary to add some abstract interface for destructor, because the development process is not the work of one person, the whole large project development may be hundreds of people or more, and some modules used by each developer will be directly inherited. In this way, the emphasis on each person’s own garbage, to deal with it, so that the need for more destructor interface. Last but not least,mono has a memory leak, which causes memory to run up, which then leads to constant requests for memory, which eventually leads to GC.

Mono memory leak, in a word, is when a memory object is not recycled when it should be, resulting in what is commonly referred to as a memory leak. Since there is no absolute disclosure in C# and Java, there is no such pointer concept, but in C++ there may be some Pointers in it.

When we do a GC collection, we can see that E and F are collected very quickly, but ABCD on the left is not collected. Why? Because A is A static variable, the following three will stay in the memory for A long time. In fact, BCD should be recycled at this time. Because of some reasons or the developer’s mistake, it did not remove the reference relationship, so some extra memory has been occupied in the game. This is just a simple abstract diagram, we in the actual test or in the analysis process, the memory object reference relationship may be as many as a dozen layers, we continue to extract the chrysality, those memory leakage problems to find out.

As for the memory snapshot, it is the same concept as the Java and C++ layer snapshot. We take a memory snapshot at two points in time and compare it. For example, when we enter the core scene and take a snapshot, when we come out and take another snapshot, it should be freed when we go back to the lobby or the main screen from the core scene. From the memory snapshot, we can get two things, one is the newly added memory, and the second is the memory that is still retained in the second snapshot. We now have the memory added between the two snapshots, including the size of the memory, the stack for this object, and its object type. This figure is reserved between the two snapshots of memory, need to stress that whether the new object or retain objects, from a third party point of view, we can’t judge whether it is reasonable, so the result is to the related module developers to see for themselves and only he knows the variable should be retained or should be recycled.

Let’s talk about Unity’s resources again. Resources are actually visible to us. Visible resources, such as special effects, animation, sound and pictures, are resources in the game, and resources are stored in the native layer. Resources are mainly scenarios to load, resources to release, and even some resources can be locked.

Why is it locked? We can add a property DontDestroyOnLoad to it, which means that when we lock some resources, we actually want it to see the scene faster the next time it enters the scene, so we’ll lock some resources common to multiple scenes. There are pros and cons to this, too many locks may be unnecessary, too few locks may be slow when the scene loads. On the right, you can see some of the most common resource subdivisions in Unity game development, starting with maps, GameObjects, meshes, animations, audio, resource repetition rates, resources retained between levels, and some resource copying.

How to optimize resources? At that time, there was an FPS game, a shooter game, and one version of the game was tested and found that the memory peak on the low power was very severe, like 350, maybe close to 400. Low configuration machine memory is too large, it is easy to flash back to be killed by the system. So we need to look at the cost of resources throughout the game. To get it. We can use the Resource FindObjectsofTypeAll game all of the allocation of resources, recycling, and the life cycle of resources. Of course, this function can cause the game to stall, this function value will be used during the testing process, the live version should not use this function.

Finally, there are four reasons for our positioning: first, the new version of its new maps and new resources are too large, and it does not do some layer processing of resources, it has loaded those small pictures or small animation on the low-end machine.

Second, there is an animation resource that has not been released. For example, when you first enter the game, you will see a small animation, which should be quickly recycled or released after playing, but this animation is not released, so it will remain in the game’s life cycle for a long time.

The same goes for audio resources, which are also not released. What is audio? For example when we play games will hear “welcome to the glory of the king” such an audio, when we enter the copy will hear, after this period of audio playback should immediately be recycled, because it won’t appear the second time in the process of the whole game, so we put these audio resources recovery is no longer in use.

Fourth, there are 6 small BOSS useless object pool, under normal circumstances we will use the object pool, because that version may be released in a rush, the big map on the new six small BOSS, after the small BOSS killed, it will constantly refresh, so it is always applying for memory new out, resulting in some memory increase too fast.

The optimization method has the following points: small pieces of resources are pre-loaded, and then the concept of resources and life cycle needs to be strengthened, that is, many resources are used up and deleted. The added tiles should not be too large, either in terms of tiles or resources (i.e., tiles should not exceed 1024), and should conform to Unity’s official standard of 2 to the NTH power. There is also compression compression compression, no matter which resources, we release to the live version of the time must be compressed, uncompressed resources are still a lot of memory. High, middle and low models use resources with different precision, which is also called hierarchical processing. APP does not have this situation, but in games, this is a common development strategy, and some hierarchical processing of resources will be done in high, middle and low models. Finally, avoid useless resource copies.

Next, resource optimization, first of all, resource repetition rate, avoiding repeated packaging. The retention of resources between levels reduces the amount of resources resident in memory and frees some resources in time. To copy resources, pay attention to the number of resource copies to avoid copying unnecessary resources. The size of the resource is compressed and then layered. One map, for example, is divided into four sections, allowing us to load images of different sizes on different models.

There are a few other caton optimizations. For example, a national war RPG found very few GCS on the map, but the lag was very high, so it was very strange. Then we went to the C# layer function call stack and found three problems: too much network IO, too much resource loading, and improper use of functions. At that time, we found that when we used foreach statements (foreach statements are officially not recommended), the iterator in foreach statements would generate some garbage memory, so we used for to replace it. Although the copy loading speed is not slow, the memory loaded at one time should not be too large. Duplicate resources need to be packaged separately, that is to say, a lot of duplicate resources, we separate into a package, and make some references to other resources. The live version does not use the FindObjectOfType function. In terms of data packets, our network IO is very large, so we did some merging and optimization of data packets, and finally reduced many empty callback functions. Unity officially recommends not to use some empty callback functions.

For performance optimization, we can summarize the following four points: first, resource optimization, second, rendering layer optimization, then code layer optimization, and finally, game strategy optimization. Resource, rendering, and code layer optimizations are common, but strategy optimizations can be different, and the gameplay and price of each game can be different, so this is for reference only.

As for the release standard of Tencent mobile games, Tencent mobile games have experienced 3 to 4 years of development, and we have such a standard. IOS mobile phones and android will have 123 aircraft standard, each are not the same as a standard for performance indicators, such as android a machine, it is not more than 550 million memory index, second gear machine is 450, three machine is 350, and so on, on different models, we measure of the performance overhead, performance testing of also is not the same.

In addition to Tencent’s standard, this graph also has Unity’s standard. Based on Unity’s engine, Unity will officially give a rough internal standard of an engine, namely CPU, memory per frame, number of triangles, VBO upload amount, as well as texture, texture, mesh and animation. In fact, it can be seen on Unity’s official website. Except it doesn’t have a form in Chinese. On the right are some examples, some sizes and standards for dynamic and static items, try not to exceed these standards, because now the game is getting bigger, it will inevitably exceed a little, but not too much.

The pros and cons of performance testing are that it can identify performance bottlenecks in most scenarios. It can ensure that the user experience on major models is smooth and that customer complaints are reduced. When your performance optimization is good, retention and word of mouth are improved. Where performance testing is lacking is that the models in which we test the environment are definitely far from adequate. There may be 3,000 Android models in the entire domestic market, and there may be dozens or hundreds in test environments. And we can’t 100 percent to simulate the user’s scene, also can’t 100 percent to cover. In multiplayer, we probably didn’t cover that much either, because each player’s actions were completely different from our own.

This leads to the next solution, on-line performance monitoring and analysis APM. APP should also have APM, which may also focus on memory and CPU indicators. Why do APM for mobile games? There will be a more classic tip figure, we saw in the test environment problem actually is always so a small part of online operating environment is a bottomless, always is very dangerous, online environment may be more than two thousand models, more, more operations, including domestic some very nasty, very complex networks, There are also some compatibility issues with third-party apps, which are all potential issues.

We now do the mobile game of the APM product function is probably at 5: the cloud control, risk early warning, data analysis, found the problem, the main can be real-time monitoring, when you released the new version of the game, you have to do some real-time monitoring on the performance of the players, avoid performance difference result in the loss of some players and complaints.

We also get a lot of mobile game APM data, now we get data latitude is about like this. The dimensions on the left are more general, mainly memory, CPU, FPS, drawCall. The right FPS will be more subdivided, with some calculation of mean, variance, lag, jitter and segmentation. Because different projects may have different concerns, we will try to make it as large and complete as possible. The main difference between APP and APP is that APP pays more attention to page loading. Some wrong mechanisms of Activity, including some stalling, are almost absent in the game.

In an online operating environment, there are several main reasons for stalling. The first is probably the game version itself, the core hardware. The second is a server problem, where the server logic may not be able to handle it, causing some delay. The third is some of the more common network fluctuations.

For mobile APM, we also have more options. We do some different dimension screening, filtering and calculation according to version, time, model, scene, picture quality and platform. First of all, we are able to continuously observe the performance trend changes between multiple versions, if say it is smooth, the situation is good, if it is the trend of has been down, like this figure, the latest version of the instructions should be have a problem, because we see a version alone may see no problem, but we see the shape of the N version in a row, You’ll see the problem getting worse and worse. Then in a release we can do a ranking of all the scenarios, and we can see which scenarios are very bad and which scenarios are good. Third, we look at the model. According to the model ranking, we can decide which model to optimize next. We can find some manufacturers to do some joint in-depth optimization.

Low frame rate is the distribution of low frame rate when we play a game. We see the long progress bar, which is actually how long each person has played the game, and we use some algorithm to mark the low frame rate of the scene, so that it will be more intuitive. We know that in some scenarios, at what time, the growth rate is low, and that allows us to further narrow down some of the problem locations.

Let me mention a few technical difficulties. The first one is lossless performance, because when we do some data collection, the performance of the game should definitely be lossless. APP is a little bit better. The interface of the APP is static, and we don’t do it very often, but gamers do it very often, so we need to do it without sacrificing performance. The second is to be compatible, we take the data to be compatible with various manufacturers of models. The third point is personalization, because different projects may require different references. The fourth is dynamic expansion. The actual number of users of some games is very large, and all the games may add up to hundreds of millions. The fifth point is cloud control. We may suddenly find some abnormal situations in some versions and situations, so we can turn it off in time. And the last one is high concurrency, which is the same as APP.

First of all we say about the performance condition, the data collection is only one thread, is our permanent memory on the client side from 1 million to 1.5 million, when memory is full, through an IO buffer technology to local files, it then there is no flow inside the core scenario we, is after we quit the core scenario to elevate the data.

High concurrency, actually is the same with the traditional software development, continuous optimization, master-slave separation, balanced load, dynamic capacity, including some of the in line report, the report is from the client to the server that data to a queue, because the server could not be indefinitely expand, when the load on the server is full, the client connection fails, The file will be placed in a queued sequence, and the next time the server is idle, it will be uploaded. There are three aspects of compatibility, mainly OpenGL version compatibility, Android version compatibility, and SDK compatibility, because there are a lot of other SDKS and common components in the game.

For cloud control, we will dynamically turn some functions on and off, so that the supported things will have probability, brand, IP address, game version, chip architecture and so on.

Describe the performance reporting process. When we first start from the client, we’ll read a configuration from the CDN to determine whether we’re currently on or off, and then we’ll connect to a TGW, through TConnd, which is a load balancing thing, to a file server and an analysis server, and from the file server directly to the cloud, And inside a load balancing TSpider from cloud to DB.

In conclusion, performance testing is a full link state, from automated testing, gray scale, data monitoring, public opinion monitoring, to version repair, forming such a closed loop. From online data we were able to validate player feedback, and we were able to spot issues with the build in time.

From the perspective of product quality, we hope to have full coverage. The upper part of the network may be used for monitoring and analysis in the online operation stage, while the lower part may be used for in-depth testing in research and development.

Thanks to Qin Yun for the review of this paper and Xu Chuan’s planning.

Tencent mobile game optimization road

Related Posts

Redis Migrate Data migration tool

The story of PowerJob begins: “Enough fun to write open source!”

Sidecar injection and transparent traffic hijacking in Istio