What is DOTS?

DOTS is a step change in Unity and a very important milestone in the Unity blueprint. The Unity website has a link to the theme and even a staged slogan: Rebuild Unity’s Core! , showing Unity’s emphasis on DOTS.

So what does DOTS mean? Take a look at the screenshot from the official website:

High-performance multithreaded data-oriented technology stack. You can see a few key words for DOTS, high performance, multithreading, data orientation, stack.

So what does it use to secure those keywords?

C# jobs System

Jobs System hits the high performance, multithreading, and stack keywords in DOTS. In the last article we looked at the general flow of the CPU executing code snippets, so the flow of the CPU executing programs is basically the same as it was in the last article. Compile the code into an EXE, load it into memory, and send it to the CPU for execution.

A more detailed process can be found here: www.cnblogs.com/fengliu-/p/…

Processes, threads, and coroutines

Today’s computer architecture is mostly designed for threading, but in the early days of the computer, the computer underwent a gradual evolution from single program processing to multi-task processing. But whether it’s a single task or a multi-task, the basic unit of computer execution is a process (if this part of the foundation is really weak, you can roughly think of an EXE as a process). Each process has independent resource allocation, including but not limited to text area, data area, and stack area.

The text area stores the code executed by the processor; The data area stores variables and dynamically allocated memory used during process execution; The stack area stores the instructions and local variables of the active procedure call. So how does a computer execute multiple programs? The answer is the operating system.

The operating system controls the hardware resources of the computer in a unified manner, and then executes the specified time segment for different processes according to the scheduling requirements. Because the processing speed of the computer is very fast, users will feel that they are running multiple programs (processes) at the same time. However, this model is not without costs. When there are too many parallel processes, switching processes can be costly because it must first store the current context, then load the new context, then execute the fragment time, back up the storage, and then execute the next process fragment. The cost of context switching is sometimes greater than the cost of execution itself.

A thread is the smallest unit of CPU execution, and that’s what we’re talking about when we talk about multithreading. Thread is the entity performance in the process, a process can have many threads, each thread by CPU independent scheduling and dispatch, can imagine Unity mobile game development, Unity’s main thread and network socket thread is a multi-threaded performance.

Today’s computer because of multi-core parallel computing, so the programming is more based on multi-threaded way to design. One concept to understand here is concurrency and parallelism. Concurrency is the execution mode of a process, which means that multiple tasks are executed alternately in the same time period. Parallelism is a mode of execution in which different threads execute simultaneously at the same time.

Another manifestation of threading is resource sharing, where different threads within the same process share memory addresses and resources. It itself will not apply for system resources (except for the small amount that must be run), all resources are from the process space containing it, which makes the program processing resources more fast and convenient, using the advantages of multi-threading to improve computing efficiency, of course, this is also the difficulty of multi-threading programming. Even after years of multi-core cpus and thread-oriented computer architecture, multithreaded programming is still not widespread.

Coroutines can simply be understood as user – defined threads. For processes and threads, once you create them, you lose control over them, leaving it to the kernel to allocate time slices and execute them. But coroutines is the users themselves to create a “thread”, so from the operating system level, it is not affected by the kernel scheduling, you can create numerous collaborators in a thread cheng hardware (allow) to assist in your code logic, you can control its execution time and state, can also pass a collaborators took another coroutines, cheng At a fraction of the cost of switching.

So in summary, a process can have many threads, and each thread can create many coroutines. Processes are responsible for independent address space and resource management, and threads share these resources with the process. Threads improve CPU parallelism, but processes are portable across platforms, both of which consume the computer’s context-switching scheduling time. The coroutine is executed in the thread to avoid meaningless scheduling, and the same scheduling responsibility is transferred to the developer. Because it is parasitic in the thread, it cannot be deployed by the kernel and cannot make full use of hardware resources.

Multithreaded programming

We said that a thread is the smallest unit of kernel scheduling. Then according to the identity of running environment and scheduling group, it can be divided into kernel thread and user thread. As the name implies, a kernel thread is a thread that runs in a kernel environment and is allocated and scheduled by the kernel. User threads run in user space and are scheduled by thread libraries.

When a process’s kernel thread acquires CPU access, it loads a user thread to execute, so the kernel thread is essentially a container for the user thread.

Because threads share the same process resources, thread safety is also the most important problem in multithreaded programming. Simply speaking, it is how to manage the access and modification of the same resource by multiple threads to ensure that they can be executed in accordance with normal logic without problems.

For example, thread 1 needs to overwrite the value of A, and thread 2 needs to read the value of A. Since thread scheduling is controlled by the kernel, if the order of execution is wrong, the result will be completely off (known in industry jargon as a race condition).

Then the solution is roughly listed (not detailed description) :

  • Synchronized identifies synchronized in front of a key method, which causes subsequent requiring threads to wait until the previously held thread completes the call. The disadvantage is that if the locking method is not static, then the object itself is locked. Then all access to this object will have to wait, and the presence of multiple synchronized methods in your code can seriously affect performance.

  • The method of Lock is different from synchronized in that Lock is locked on demand, which requires a strong control over variables.

Multithreading for the Jobs System

Strictly speaking, the Jobs System is not in the realm of multithreaded programming because it cannot operate directly on thread processes. Accordingly, in order to ensure thread safety, it independently encapsulates multi-threaded scheduling framework. Users only need to inherit some classes and interfaces and use specified data types that meet the requirements to complete high-performance computing. Therefore, I personally think JOBS is a multi-threaded scheduling framework rather than a programming framework.

Jobs avoids using reference types to avoid collisions with the main thread’s data. In addition, a set of custom data structures are defined that are managed using a dedicated unmanaged memory called a NativeContainer. Including the following:

A simple example of code using Jobs:

Define a struct inherited from Ijob.

Blittable types can be understood as C# value types, including: Blittable types (NativeContainer type)

3. Override the Execute method.

Be very careful that everything except NativeContainer is a copy of the data. So the only way to access the results from the main thread is to put them inside NativeContainer.

In fact, the use of Jobs is not very convenient, there are a lot of points to pay attention to, you can refer to the official manual to check the common pit:

Docs.unity3d.com/Manual/JobS…

Unity ECS

ECS hits the high performance, data direction, and stack keywords in DOTS.

In previous chapters, we’ve gone into detail about the idea of ECS, the reasons for its high performance, and Entitas, an older plugin based on Unity. So in this section we are not going to expand on the principles of ECS, just to see how it differs from Entitas.

Unity’s ECS component is called Entities, similar to Entitas. But the architecture of the implementation is quite different.

Let’s start with creating Entity and setting Component:

 

Unity’s ECS on top, Entitas on the bottom.

Then look at the System:

 

Unity’s ECS on top, Entitas on the bottom.

After all, it’s a real son, and the System in UnityECS is three-pronged. The [BurstCompile] tag is used for all jobs. For details, the document is here:

Docs.unity3d.com/Packages/co…

Burst

Unity is currently pushing a compiler that claims to be faster than the C++ compiler. Here’s the description from the official website:

This part of the structure is mainly hit high-performance keywords.

Before we get to LLVM, let’s take a quick look at the technical solutions Unity has been using.

Open the new version of Unity (2018.4) and you can see this in the Player Setting option:

The current default is Mono and IL2CPP.

Mono

Mono, needless to say, is the foundation of Unity’s cross-platform platform and the means by which it got started. After so many years with Unity, it’s time to retire.

As an execution vehicle for IL middleware, ILR is provided for different platforms.

 

Take a look at the execution of Mono.

Although it was cross-platform for Unity, the problems accumulated and Unity had to ditch it and look elsewhere for several reasons:

  • Mono’s copyright restrictions mean Unity often doesn’t use the latest C# features in the latest releases.
  • Performance is a big problem, after all, it’s a virtual machine.
  • Maintenance is very difficult, and while IL is standard, VM is not! Mono is an open source project, but it doesn’t keep up with VM authoring for every new hardware platform, so Unity has to port or write it itself. Some Web-based platforms, such as WEBGL, are almost completely rewritten.
  • Mono cannot fulfill the 64-bit version requirements. In particular, Google made it mandatory in August that apps in the Google Store also be available in 64-bit versions. IL2CPP is currently the only option that meets the requirements.

IL2CPP

IL2CPP is a tool that converts IL to CPP, as the name suggests.

 

As you can see in red below, IL2CPP rewrites the compiled IL code into CPP code, which, when compiled into native platform executables using each platform’s native compiler, greatly improves performance by abandoning the virtual machine and optimizing for the native compiler.

If you look at the official statistics, the average performance is improved by 1.5-2.0 times.

 

Note that IL2CPP does away with virtual machines, but there are still I2CPP VM processes in the execution diagram above. This is because C# itself is based on managed code, and IL itself is also managed code, so even if IL2CPP converts IL to CPP, This part of the design framework is not transferable. So IL2CPP requires a VM to manage memory, allocate threads, and other administrative tasks. Rather than a VM, it’s more appropriate to describe it as a manager.

Note the difference between a VM that is solely responsible for code interpretation and execution, and a manager that is solely responsible for memory and features management, so the latter is much smaller in size and complexity.

LLVM

As you can see from Unity’s feature page, Burst is compiled based on LLVM, so let’s take a look at wikipedia’s definition of LLVM:

LLVM is a free software project that is a compiler infrastructure, written in C++, containing a series of modular compiler components and toolchains for developing compiler front and back ends. It is a program written for any programming language that uses virtual techniques to create compile-time, link-time, run-time, and “idle time” optimizations. It was originally implemented in C/C++, It currently supports ActionScript, Ada, D, Fortran, GLSL, Haskell, Java bytecode, Objective-C, Swift, Python, Ruby, Rust, Scala[1] and C#[2].

Link: zh.wikipedia.org/wiki/LLVM

LLVM provides the middle layer of the complete compilation system. It takes Intermediate Representation (IR) from the compiler and optimizes it. The optimized IR is then converted and linked to the target platform’s assembly language. LLVM can accept IR compiled from the GCC toolchain, including existing compilers underneath it. LLVM can also produce Relocatable Code at compile time, link time, and even run time.

Here’s an overview of the process:

 

LLVM is divided into three parts: front-end, middleware, and back-end.

Front end:

In simple terms, it is to generate middleware code by analyzing the morphology, syntax and semantics of different languages.

Middleware:

At the heart of LLVM is Intermediate Representation (IR), an underlying language similar to assembly. IR is a strongly typed Reduced Instruction Set Computing (RISC) that abstracts the target Instruction Set.

A simple Hello World program can be expressed in assembly form as follows:

The backend:

Crucially, it supports language-independent instruction set schemas and type systems. (Remember the difference between simple and complex instruction sets we talked about last time? Differences between ARM and X86 instruction sets

So far, LLVM has supported a variety of back-end instruction sets, such as ARM, Qualcomm Hexagon, MIPS, Nvidia Parallel instruction sets (PTX; Called NVPTX in the LLVM documentation), PowerPC, AMD TeraScale, AMD Graphics Core Next (GCN), SPARC, Z /Architecture (referred to as SystemZ in the LLVM documentation), x86, x86-64 and XCore. Some platform features are not fully implemented. But all the basic functionality of x86, x86-64, Z /Architecture, ARM, and PowerPC has been implemented.

The linker:

The LLD linker sub-project aims to develop a built-in, platform-independent linker for LLVM, removing the dependency on all third party linkers. As of May 2017, LLD already supports ELF, PE/COFF, and Mach-O. In cases where LLD support is incomplete, users can use other projects, such as the GNU LD linker. LLD supports link-time optimization. When LLVM link-time optimization is enabled, LLVM can output bitcode instead of native code, while native code generation is handled by linker optimization.

After looking at the principle of LLVM, do you feel familiar? Very similar to Mono? It’s all about converting third-party languages into middleware, and then making middleware compatible, right? Note, however, that Mono is for run-time, while LLVM is for compile time! Mono is a virtual machine for hardware platforms, while LLVM is an architecture for instruction sets! So LLVM is far superior to Mono in terms of performance, quantity and scalability. (Burst compilers are said to be 30% faster than C++ at best)

DOTS for Unity is currently the family bucket, and there are plenty of videos on the technology on the official theme page, so check them out if you want to learn more. There are only 6 chapters planned for this ECS battle, and all of them have been completed. We’ll talk about the framework of the UI.