Learn code optimization from Unity

Today we’re going to talk about how to learn code optimization from Unity — specifically, by learning the optimization strategies of Unity’s IL2CPP technology and applying them to our daily logic development.

For those of you who have done Unity development, IL2CPP is clearly known. In short, IL2CPP is Unity’s script backend for replacing Mono. Why Unity uses IL2CPP instead of Mono is a topic for another article.

IL2CPP consists of two parts:

An AOT (Ahead of Time) Compiler. Written entirely in C#.
A VM Runtime Library. Body C++, plus some platform-specific assembly code.

The IL2CPP AOT Compiler works exactly as it does: Read and Parse IL Assembly (though it is not known if Mono.Cecil is parsed), analyze and optimize it, and then generate CPP code. IL2CPP is also very simple to implement, the generated C++ code is basically one to one with IL, interested students can try to write some C#, and then see the generated C++ code.

IL2CPP has been officially released for more than a year. At the beginning, everyone questioned it, but now everyone has basically accepted it. This transformation certainly didn’t happen overnight, but it’s all about Unity’s focus on IL2CPP and its ongoing optimization.

In the past couple of months, Unity has published a trilogy of IL2CPP optimizations (click “Read the original article” for those who are interested). In the following novels, you can see how we can learn from them.

Let’s start with the first optimization example:

This is the most dogmatic introduction to object-oriented programming. Obviously, if we use common sense, animal.speak () in this example is polymorphic, while cow.speak () is not. While the latter does a direct function call, the performance difference between the two is a virtual function table query.

However, IL2CPP does not actually do this. IL2CPP’s optimization strategy is very conservative, and for simplicity, IL2CPP does not maintain context state when reading IL instructions. Therefore, IL2CPP cannot determine the specific type of COW when it sees cow.speak (). To be on the safe side, it can only perform one virtual function table query, which is represented as virtual function call.

Of course, optimization is also very simple, programmers add hint. A sealed modifier can be added to Cow’s typedefinition to seal the problem.

Optimization skips logic that is not needed and simplifies logic that cannot be skipped. After all, virtual Function call overhead is unavoidable in most cases. Next, the IL2CPP team introduces their ideas for optimizing virtual Function Call.

Take a look at sample code: View images

SayHello() ¶ MakeRuntimeBaseClass().SayHello() ¶

The Runtime Library implementation of IL2CPP similarly checks the virtual table during the SayHello virtual function call and throws a managed exception if the call method is not found.

The code is here: view the image

There’s really nothing to optimize here for us to write logic. And for programmers with experience in instruction-level optimization, this opportunity will be handed over to branch Prediction of cpus.

However, the IL2CPP team chose to optimize this if. Stub method vtable[slot] refers to a stub method that should be null.

Thus, while there are rare cases where an exception needs to be thrown, there is an extra function call overhead, for the vast majority of cases, an if check overhead is saved.

According to the IL2CPP blog, this optimization improved performance by 3% to 4%, but we’ll be open to error.

Here’s the third example from the original blog: View images

Notice things[I] in TotalSize! = null line, where if T is now Tree, a boxing operation will be performed.

Those of you who are familiar with code generation may also think of Generic sharing, where generic functions can share the same method instance when they have different reference types, and resolve to different method instances when they have value types.

And because of the AOT nature of IL2CPP, these things are already known at compile time, so IL2CPP can handle every instance of a generic function of value type in a special way to remove the boxing inside.

In fact, IL2CPP does just that, and it does take the programmer’s mind off it.

To summarize, how can we apply these optimization techniques to writing logic? Here’s a look at each one:

In the first example, IL2CPP gets additional optimization meta-information with compile-time hints.

For this reason, it is not easy to enumerate the application scenarios of writing logic. If you often use a language (such as C#) that can annotate or Attribute types, you may have similar optimization experience.

Suppose we want to develop a non-invasive serialization library. The core requirement is to serialize incoming objects into byte streams.

For libraries, passing in an unknown object requires reflection to retrieve the type meta information and dynamically generate the serialization code for subsequent object serialization.

This is just like a JIT, which means that the library needs to dynamically generate methods the first time each type of object is serialized, which is quite expensive, but fortunately can be amortized later. But for some servers, this random performance pressure can be unbearable.

Thus we can hint type definitions that might be serialized, creating a constraint that the programmer can only give the library objects of those hinted types at run time.

In this way, by generating these types of serialization functions at one time during the initialization of the serialization library, the uncertain consumption can be converted into the determined consumption, which can advance the runtime consumption and improve the overall performance.

In the second example, IL2CPP eliminates nullCheck by converting a few branches of the NullCheck to stub methods.

Stub method/stub class methods can also be used to make code more elegant and efficient.

For example, we have an IServiceProvider that is instantiated as a different ServiceProvider depending on the configuration. So, one design is to check knull everywhere a ServiceProvider is used, and the other is to initialize the ServiceProvider as a TrivialServiceProvider and use it however you want.

In fact, the two designs are not absolutely good or bad, it all depends on the role IServiceProvider plays in the logic.

If the IServiceProvider interface does not have default value semantics, then the first design may be better for you. But on the contrary, the second is more elegant than the first, and provides additional performance for trivial and rare logic.

In the third example, IL2CPP makes a special case for situations that can be optimized.

For example, Redis zset uses Ziplist when there are few elements, skiplist when there are many elements, etc.

The public account “Say to develop games for you” is newly opened, focusing on game development technology sharing, if you are interested in the article of the novel, welcome to long press to identify the qr code below to pay attention to or share with your friends.

Related Posts

Dynamic Feign and generic interface methods

Wechat mini program authorized login interaction (1)

Netty study notes source code two