I’ve recently started working on Unity performance optimizations, which fall into three categories: CPU, GPU, and memory. Since the core combat of our game is computationally intensive, it is largely limited to the CPU. CPU optimization is divided into rendering and scripting, and this article will focus on scripting optimization.

In general, optimization requires knowing where performance hot spots are, and knowing them requires deep profiling on the target device. If you do not profile, but rely on guesswork to optimize, often less effort, even counterproductive.

This article describes proven, general optimization methods and ideas that can save you some profile time. The following details optimization recommendations from Unity API, C#, Lua, data structures and algorithms.

Unity API

GameObject.GetComponent

Unity is composite-based development, so GetComponent is a frequently used function. Each time GetComponent is called, Unity iterates through all the components to find the target component. It is unnecessary to search every time, and we can avoid this unnecessary overhead by caching.

Transform is the component we use the most, and GameObject provides a.transform to retrieve this component. However, after testing (2017.2.1 P1), we found that the efficiency of cache was still the highest. So if you want to access a particular component frequently, cache it.

private Transform m_transform; void Awake() { m_transform = transform; } void Start () {// m_transform for (int I = 0; i < 1000000; i++) m_transform.position = Vector3.one; For (int I = 0; i < 1000000; i++) transform.position = Vector3.one; For (int I = 0; i < 1000000; i++) GetComponent<Transform>().position = Vector3.one; }Copy the code

GameObject.Find

Gameobject. Find iterates through all the current GameObjects to return objects with matching names. So this function can be time-consuming when there are many objects in the game.

Instead of calling gameObject.find, you can cache the object found once on Start or Awake using the cached method.

Or use GameObject.findWithTag to find objects with specific tags. If you can identify the object in the first place, you can drag the object directly into Inspector via Inspector injection, avoiding run-time look-up.

Camera.main

Camera. Main is used to return the main Camera in the scene. Inside Unity, GameObject.FindWithTag is used to find the Camera with the MainCamera tag.

When we need frequent access to the main camera, we can cache it for performance gains.

private Camera m_mainCamera; void Awake() { m_mainCamera = Camera.main; } void Start () {// Camera. Main, Camera. i < 1000000; i++) Camera.main.transform.position = Vector3.zero; For (int I = 0; i < 1000000; i++) m_mainCamera.transform.position = Vector3.zero; }Copy the code

GameObject.tag

Gameobject.tag is often used to compare object tags, but using.tag == for comparison will generate GC Alloc for each frame. The GameObject.CompareTag comparison can avoid these GCS, but only if the tag for the comparison is defined in the Tag Manager.

// 46Bytes GC Alloc Per Frame
bool x = tag == "xxxxx";
 
// No GC Alloc, But Need to Define Tags in Tag Manager
bool y = CompareTag("xxxxx");
 Copy the code

MonoBehaviour

MonoBehaviour provides many internal call methods such as Update, Start, Awake, etc. They are very easy to use. Once the Update function is defined in a script that inherits from MonoBehaviour, Unity will execute this function every frame. See Execution Order of Event Functions for details.

However, when there are a large number ofMonoBehaviourtheUpdateWhen they need to be executed, you can see that they take a lot of time in profilers. Because in theMonoBehaviourInternal callsUpdateA series of examinations are required, as shown in the figure below:

We can build a MonoBehaviour manager that maintains a List and calls those needsUpdateMonoBehaviour throws the MonoBehaviour into the List and throws theirUpdateLet’s call the function something else, for exampleMonoUpdate. And then in this managerUpdateFunction to loop through all MonoBehaviour calls to themMonoUpdate. The result can be an order of magnitude improvement, as shown below:

10000 Update() Calls

Transform.SetPositionAndRotation

Unity notifies all child nodes each time transform. SetPosition or transform. SetRotation is called.

When the position and Angle information can know in advance, we can through the Transform. The SetPositionAndRotation a call to set the position and Angle at the same time, to avoid two call causes performance overhead.

Animator. Set…

Animator provides a set of methods like SetTrigger, SetFloat, and so on to control the animation state machine. Example: m_animator.setTrigger (” Attack “) is used to trigger an Attack animation. Inside the function, however, the “Attack” string is hashed to an integer. If we need to trigger the attack animation frequently, we can use animator.stringtoHash to hash ahead of time to avoid every hash.

// Hash once, use everywhere! Private static readOnly int Attack = animator.stringToHash (" Attack "); m_animator.SetTrigger(s_Attack);Copy the code

Material. The Set…

Like the Animator, Material provides a set of Settings to change the Shader. For example, m_mat.setFloat (” Hue “, 0.5f) is a floating point number named Hue that is used to set the material. Similarly we can hash ahead of time with shader.propertyToid.

// Hash once, use everywhere! private static readonly int s_Hue = Shader.PropertyToID("Hue"); M_mat. SetFloat (s_Hue, 0.5 f);Copy the code

Vector Math

If you need to compare distances, rather than calculate them, using SqrMagnitude instead of Magnitude can avoid a time-consuming square root operation.

When we do vector multiplication, one of the things we need to be careful about is the order of multiplication, because vector multiplication is time-consuming, so we should minimize vector multiplication as much as possible.

For (int I = 0; i < 1000000; i++) Vector3 c = 3 * Vector3.one * 2; For (int I = 0; i < 1000000; i++) Vector3 c = 3 * 2 * Vector3.one;Copy the code

It can be seen that the above vector multiplication results are exactly the same, but there is a significant time difference, because the latter has one less vector multiplication than the former. So, if possible, you should combine the number multiplication, and then do the vector multiplication at the end.

Coroutine

Coroutines are the mechanism Unity uses to implement asynchronous calls. If you are not familiar with Coroutines, please refer to my previous article: Understanding Coroutines in Unity.

If you need to implement some timed operations, some students may decide to Update each frame. Assuming that the frame rate is 60 frames and the call needs to be timed once every second, 59 invalid Update calls will result.

With Coroutine, these invalid calls can be avoided by yield return new WaitForSeconds(1f); Can.

SendMessage

SendMessage is used to call MonoBehaviour methods, but it uses reflection internally, which is extremely time-consuming and should be avoided as much as possible.

It can be replaced by an event mechanism.

Debug.Log

Exporting logs is notoriously time-consuming and undetectable. So you should turn it off for the official release.

Unity’s Log output is not automatically disabled in Release mode, so it needs to be disabled manually. We can disable Log output at run time with a line of code: debug.logger. logEnabled = false; .

Conditional compilation tags can also be used to encapsulate a layer of their own Log output to avoid compiling the Log output directly. For details, see: Debug. Log output of Unity3D research Institute screened in the release version.

C#

reflection

Reflection is an unusually time-consuming operation because it requires a lot of validation and cannot be optimized by the compiler.

Reflection can also fail AOT on iOS, so we should avoid using reflection as much as possible.

We can create our own string-type dictionary instead of reflection, or use a delegate to avoid reflection.

Memory allocation (stack and heap)

In C#, there are two strategies for memory allocation, one on the Stack and the other on the Heap.

Objects allocated on the stack are of fixed size types, making it efficient to allocate memory on the stack.

Objects allocated on the heap are of uncertain size. Since their memory size is not fixed, they are often prone to memory fragmentation, making their memory allocation more inefficient than the stack.

Value type and reference type

In C#, data can be divided into two types: Value Type and Reference Type.

Value types include all numeric types, Bool, Char, Date, all Struct types, and enumeration types. They all have a fixed size, and they all allocate memory on the stack.

Reference types include strings, arrays of all types, classes, and delegates, all of which are allocated on the heap.

The figure above is an example of allocating memory on the stack and heap. You can see that references of reference types are themselves stored on the stack and the objects they point to are stored on the heap.

packing

Boxing refers to converting a value type to a reference type, while UnBoxing refers to converting a reference type to a value type.

As we can see from the above figure, packing and unpacking involve stack-to-heap transfers and memory footprint, so they are very time consuming operations in nature and should be avoided as much as possible.

Mono’s foreach caused GC Alloc of each frame, which was essentially caused by packing and unpacking. This problem has been fixed after Unity5.6.

The garbage collection

The memory we allocate on the heap is actually collected by the Garbage Collector. The garbage collection algorithm is unusually time-consuming because it needs to traverse all objects, find unreferenced islands, mark them as “garbage,” and reclaim their memory.

Frequent garbage collection is not only time consuming, but also leads to memory fragmentation, making the next allocation of memory more difficult or even impossible to allocate. In this case, the heap memory limit doubles and cannot fall back, resulting in memory constraint.

So GC Alloc, the need to control heap memory allocation, should be avoided at all costs.

string

String concatenation causes GC Alloc, for example string gcalloc = “GC” + “Alloc” causes “GC” to become garbage, resulting in GC Alloc. For example, string c = string.Format(“one is {0}”, 1) also generates additional GC Alloc due to a boxing operation (the number 1 is boxed into the string “1”).

So if string concatenation is a high-frequency operation, you should avoid using + for string concatenation. C# provides the StringBuilder class for concatenating strings.

Virtual functions

Calls to virtual functions are more expensive than calls directly, so we can use the sealed modifier to seal classes or functions that are guaranteed not to be inherited.

For details, see: IL2CPP Optimizations: Devirtualization.

Lua

I previously wrote an article about pure Lua performance optimization: Writing high-performance Lua code. Here are some excerpts and additions.

local

Lua’s default variables are global and must be local to become local.

Local variables have the following advantages over all variables: 1. Faster reads and writes; 2. 3. At the end of the scope, it is automatically marked as garbage, avoiding memory leaks

So, although Lua’s default variable declarations are all global variables, we should still modify them with local as local variables.

table

Tables in Lua are internally divided into two parts: the Hash part and the array part. When an empty table is created, both parts default to 0. The rehash is constantly triggered as the content continues to be populated. Rehash is a time-consuming operation, so avoid it as much as possible.

If you need to create more small tables at the same time, you can avoid rehash by pre-populating the tables.

string

Like C#, string concatenation in Lua is expensive, but unlike C#, which provides StringBuilder, Lua doesn’t offer a similar native solution.

However, we can use table as a buffer and then use table.concat(buffer, “”) to return the final concatenated string.

Interact with c #

Different Lua solutions have different strategies for interacting with C#, but some basic points are the same.

First, for bridging MonoBehaviour’s three main updates, the best strategy is to inherit MonoBehaviour’s updates through a manager and send them to Lua, where all Lua updates are registered. This avoids multiple Lua/C# bridge interactions, which can save a lot of time.

Second, GC issues need to be considered. Default structs such as Vector3 are passed to Lua through a boxing operation, bringing in extra GC Alloc, which can be avoided with special configuration. The XLua solution can be found in the XLua Complex Value Types (Struct) GC Optimization Guide.

Finally, the general optimization ideas can refer to the use of Lua+Unity, let the performance fly – Lua and C# interaction, the author for example to do a more detailed analysis.

The data structure

Container type

Containers should be selected for different applications, depending on which applications are used more frequently. For example: * Arrays or lists are preferred for frequent random subscript access, dictionaries are preferred for frequent insertion or deletion, linkedLists are preferred.

There are also special data structures for specific applications. For example: * if the same element cannot exist, select HashSet * if it needs last in first out, optimize recursive function calls, select Stack * If it needs first in first out, select Queue

Object pooling

Object pools can avoid frequent Object generation and destruction. The generation of a gameobject may require memory first, then GC Alloc, and finally disk I/O. Frequent destruction of objects can cause serious memory fragmentation, making it more difficult to allocate heap memory.

Therefore, when a large number of objects need to be repeatedly generated and destroyed, we must use the object pool to cache the created objects. When they are not needed, we do not need to destroy them, but put them into the object pool, so that the next generation can be avoided.

public class ObjectPool<T> where T : new() { private Stack<T> objs; public ObjectPool(){ objs = new Stack<T>(); } public T GetObject(){T obj = objs.count > 0? objs.Pop() : new T(); return obj; Public void ReturnObject(T obj){if (obj! = null) objs.Push(obj); }}Copy the code

Space division

When calculating space collision or searching for the nearest neighbor, if the space is very large and there are too many objects to participate in the calculation, the computation complexity of traversing two layers one by one is order square.

We can reduce the complexity to N*Log(N) with the help of a spatially partitioned data structure. Quadtrees are used to partition 2D space, octrees are used to partition 3D space, and KD trees are undimensional.

I wrote an article before introducing the principle and optimization of KD tree: the application and optimization of KD tree, the content is more detailed, we can go to read.

algorithm

cycle

The use of loops is very common and can easily become a performance hotspot. We should try to avoid time-consuming or ineffective operations within the loop, especially if the loop is on an Update call per frame.

void Update() {
    for (int i = 0; i < count; i++)
        if (condition)
            excuteFunc(i);
}
 Copy the code

In the above iteration, the loop executes count times regardless of whether condition is true or false. If condition is false, the loop runs count times in vain.

void Update() {
    if (condition)
        for (int i = 0; i < count; i++)
                excuteFunc(i);
}
 Copy the code

When the judgment condition is put out of the loop, the problem of running away in vain can be avoided.

Another thing to watch out for is the order of multiple loops, and try to keep the ones with the most loops in the inner layer.

For (int I = 0; i < 1000000; i++) for (int j = 0; j < 2; j++) int k = i * j; For (int I = 0; i < 2; i++) for (int j = 0; j < 1000000; j++) int k = i * j; }Copy the code

When there is an order of magnitude difference in the number of inner and outer cycles, it is better to place busy cycles in the inner layer because it avoids more calls to initialize the inner loop counter.

Mathematical operations

Square root operations, trigonometric functions these are time-consuming mathematical operations and should be avoided as much as possible.

As mentioned earlier, if you are simply comparing distances rather than calculating them, you can express them as distances squared, saving you a time-consuming square root operation.

Triangulation can be circumvented by simple vector operations. For details, please refer to my previous article: Application and Thinking of vector operations in game development.

For example, if you often need to divide a constant, such as the use of ten-thousandth integer to represent decimals need to often divide 10000, you can change to multiply by 0.0001f, can avoid more time-consuming division operation than multiplication.

The cache

One of my favorite optimization ideas is caching. The essence of caching is to trade space for time. For example, many of the time-consuming functions mentioned earlier in the Unity API can be improved by caching.

This includes object pooling, which is also a caching technique. If you need to rely on complex calculations and use values frequently later on, you can cache them to avoid subsequent calculations and gain performance gains.

The resources

  • General Optimizations in Unity
  • Optimizing scripts in Unity games
  • Optimizing garbage collection in Unity games
  • Six important .NET concepts: Stack, heap, value types, reference types, boxing, and unboxing

Articles you may be interested in