Original address: juejin.cn/post/684490…

Reproduced please sign, plagiarism is strictly prohibited

This article has been authorized wechat public account: hongyangAndroid original first

takeaway

Recently, huawei Ark compiler is going to open source, I went to see the PPT of the press conference, and found that as an Android developer, I could not fully understand the knowledge points introduced in the PPT?? As a result, I made up the contents of PPT and sorted out the cost article.

This article will introduce the historical causes of Android stutter and the process of Google’s fight against it from the ground up in plain language

After reading this article you will

  1. Understand how computers interpret the programs we write and perform their functions

  2. Learn about the evolution of the Android VIRTUAL machine

  3. Understand the three main causes of Android lag from the bottom up

1. Basic Concepts

First, we need to brush up on some basic concepts to understand how a computer interprets what we write and performs its functions.

1. Compile & interpret

The source code of some programming languages, such as Java, is readable by computers through a compile-interpret process

Let’s start with some Java code

public static void main(String[] args){
    print('Hello World')}Copy the code

This is the first lesson for all programmers. Just write this code and execute it, and your computer or phone will print Hello World. So the question is, English is the language of the human world, how does the computer (CPU) understand English?

As we all know, 0 and 1 are the language of the computer world, so to speak, computers only know 0 and 1. So we just need to express the above English code to the computer through 0 and 1, so that the computer can understand and execute.

Combined with the figure above, Java source code is compiled into bytecode, which is then interpreted as machine code according to the rules in the template.

Machine code & bytecode

  • Machine code

    Machine code is language that can be read and executed directly by the CPU.

    But if you run on another computer using the machine code generated in the figure above, it will probably fail.

    This is because different computers can interpret different machine codes. In layman’s terms, machine code that works on COMPUTER A may not work on computer B.

    For example 🌰, Chinese A knows Chinese and English; Russian B knows Russian, English. At this time, he did a Chinese paper at the same time, AND B could not even find a place to write his name.

    So this is where we need bytecode.

  • The bytecode

    Chinese A can’t read the Russian paper, And Russian B can’t read the Chinese paper, but everyone can read the English paper.

    Bytecode is intermediate code. Java can be compiled into bytecode, and the same bytecode can be interpreted into the specified machine code according to the rules of the specified template.

    Bytecode benefits:

    1. Realizing cross-platform, a source code only needs to be compiled into a bytecode, and then the bytecode is interpreted into the machine code that the current computer knows according to different templates. This is what Java calls “compile once, run everywhere”.

    2. The same source code is compiled into bytecodes much smaller than machine code.

Compiled language & Interpreted language

  • Compiled languages

    C/C++, as we know it, is a compiled language that a programmer can compile in one step (into machine code), which can be interpreted and executed directly by the CPU.

    One might ask, given the benefits of bytecodes mentioned above, why not use bytecodes?

    This is because each programming language is designed differently. Some are designed to be cross-platform, such as Java, but others are designed for a specific machine or a specific group of machines.

    For example 🌰, OC and Swift are designed for apple’s products, I don’t care what other people’s products are. So OC or Swift were designed to be fast, directly compiled into machine code for an iPhone or iPad to read and execute. This is the main reason why iPhone apps are bigger than Android apps. This is one of the reasons why iphones are smoother! (No middleman earns the difference)

  • Compile – interpret language

    Take Java, the language used to develop Android. Java is a compile-interpreted language, which compiles to bytecodes (.class files in Java programs and.dex files in Android programs) instead of machine code. We then need to interpret the bytecode back into machine code so that it can be read by the CPU.

    This second interpretation, from bytecode to machine code, is implemented in the Java virtual machine after the program is installed or run.

Two, cause the three main factors of the caton


The latest Version of Android is 10 this year. In fact, in the past two years, the sound of Android phone stutter has been slowly lowered, and replaced by smooth sounds like iOS.

However, there are fewer such cases than iOS, which is actually due to Android’s three historical reasons for stalling. It starts lower than iOS.

1. Virtual machines – The explanation process is slow

From the above description, we can know that iOS is not stuck because it is able to communicate directly with the hardware layer by skipping the intermediate steps. Android’s performance is significantly lower than iOS due to the lack of a single step and the real-time interpretation of each execution into machine code.

We already know that bytecodes (middlemen) are one of the main causes of lag. Can we just throw bytecodes away like iOS and do it in one go?

Obviously not, because there are only a few iOS models around. On the Android side, there are countless phone models and CPU architectures/models, not to mention tablets, cars, and other devices. Having that many types of hardware devices means that there are many different hardware architectures, each with its own machine code interpretation rules. Obviously it’s not realistic to do it all at once like iOS.

So what to do? Since we can’t get rid of the bytecode middleman, we can only exploit him and make the whole interpretation process faster and faster. The factory is in the virtual machine.

Here comes the great Android VIRTUAL machine evolution!

â‘  Andorid 1.0 Dalvik(DVM)+ interpreter

The DVM is an Android VM developed by Google and can read the bytecode of the dex. In the Case of the Java virtual machine, the virtual machine on the Android platform refers to this DVM. In Android1.0, the interpreter (translator) in the DVM interpreted the bytecode while the program was running. As you can imagine, this is absolutely inefficient. In a word, card.

(2) the Android 2.2 DVM + JIT

In fact, the idea of solving the PROBLEM of DVM is very clear, we can explain a program before a function is run.

With Android2.2, Google was smart enough to introduce just-in-time (JIT), which literally translates to just-in-time compilation.

For example, 🌰, I often go to a restaurant and the owner already knows what I want to eat and prepares the food before I arrive so that I don’t have to wait for the food.

The JIT is the clever boss. It will write down the functions that the user uses frequently when the APP is opened on the phone. This content is compiled immediately when the user opens the APP, so that when the user opens the content, the JIT will have the ‘dish’ ready. This improves overall efficiency.

While THE JIT is smart and has a clear overall vision and vision, the reality is that it is still stuck.

Existing problems:

  • It slows down when you open the APP
  • Every time you open the APP, you have to work again.
  • If I suddenly order a dish that I’ve never ordered before, I’ll have to wait for the dish, so if the user turns on a ‘dish’ that the JIT hasn’t prepared, they’ll just have to wait for the interpreter in the DVM to interpret as it goes along.

(3) the Android 5.0 ART + AOT

Clever Google figured out that since we can compile bytecode to machine code when we open the APP, why not compile bytecode to machine code when the APP is installed? In this way, every time you open the APP, you don’t have to re-work, once and for all.

That’s the idea, so Google has replaced DVM with ART, which stands for Android Runtime. ART takes some Of the optimizations from DVM and compacts applications to machine code as soon as they are installed. This process is called AHEAD-of-time, or precompilation.

However, the problem comes again, opening the APP is not blocked, but the APP installation is extremely slow, some people may say, an APP is not frequently installed, you can sacrifice this time. But sorry, android phones will be re-installed every time OTA startup (i.e. after system version update or refresh), helpless! Despair! Yeah, remember those two years of fear, dominated by android updates?!

â‘£ Mixed compilation for Android 7.0

Google finally came up with the ultimate trick, DVM+JIT bad, ART+AOT bad. Ok, I’ll mix them all up, that’s all right!

So Google released a hybrid build for Android7.0. AOT secretly compiles the part of the code that can be compiled into machine code when the phone is not in use (see bytecode compilation template below for details on what that part is). In fact, the previous APP installation time to do the work secretly in the phone when empty to do.

If it’s too late to compile, call up the JIT and interpreter brothers and let them compile or interpret in real time.

I have to hand it to Google for this brutal way of solving the problem, so Android phones are getting out of the slow pit.

⑤ Android 8.0 has improved the interpreter

In Android8.0, Google went after the interpreter again, but the root of the problem was that the interpreter was too slow! So why don’t we make the interpreter interpret faster? As a result, Google has improved the interpreter to make the interpretation mode much more efficient.

â‘¥ Android 9.0 has improved compilation templates

This point is detailed in the compilation template for text section codes.

In a nutshell, Android9.0 provides a way to pre-place hotspot code, so that when an application is installed, it knows that frequently used code will be pre-compiled. (Quote from Zhihu @Weishu)

JNI — Java and C call each other slowly

JNI is a Java Native Interface for interacting with C/C++ code.

If you don’t do Android development, you probably don’t know that in addition to Java, there is probably some C code in the Android project.

There is a serious problem at this point. First of all, the picture above (refer to the ARK compiler Principles PPT) :

During development the Java source code is packaged into a.dex file during development, and C is directly the.so library, because C itself is the compiled language.

In the user’s phone, the.dex file (bytecode) in APK is interpreted as the.oAT file (machine code) running in the ART virtual machine, and the.so library is the binary code that the computer can run directly (machine code). There is definitely overhead for the two machine codes to call each other.

Here’s why the two machine codes are different.

Here we need to understand the bytecode -> machine code compilation process. Although both are compiled into machine code and can be directly called by the hardware, the performance, efficiency and implementation of the two machine codes are quite different, which is mainly caused by the following two points:

  • Different programming languages result in different bytecode compilations and different machine code compilations.

    For example 🌰, int a + b for C and Java, both static languages

    C can load memory directly and calculate in registers. This is because C is a static language, and a and B are determined int objects.

    In Java, although we must clearly specify the type of the object, such as int A = 0, but Java has dynamic, Java has reflection, proxy, no one can guarantee that a is still int type when called, so Java compilation needs to consider the context, that is, specific case specific compilation.

    So even bytecode is different, and the compiled machine code must be different.

  • The compiled machine code varies depending on the running environment

    It is obvious that the machine code compiled from Java is wrapped in ART, which stands for Android RunTime, almost the same as virtual machine. The runtime environment of C language is not in ART.

    The RunTime provides basic input/output or memory management support, which incurs additional overhead if you have to call each other from two different RunTime.

    For example 🌰, because Java has garbage collection (GC), an object address in Java is not fixed and may be moved by GC. That is, the address of the object in the machine code running in the ART environment is not fixed. C simply asks Java for the address of an object, but if the address of the object is moved, that’s the end of it. There are two solutions:

    1. Copy this object in C again. Obviously this is a lot of overhead.
    2. Tell ART I’m going to use this object, and you can’t touch the address of this object! I need you to stay here. The cost is relatively small, but if the address cannot be reclaimed, it may result in an OOM.

    (Here refer to Zhihu @Zhang Duo in Huawei announced ark compiler will have much impact on the Android software ecosystem? (answer)

3. Bytecode compilation template — not optimized for specific APP

We use 🌰 to understand the compilation template. “Hello world” can be translated as “Hello world”, but it can also be translated as “Hello world”. This difference is caused by the compilation template.

â‘  unified compilation template (VM template)

Bytecode can be compiled into machine code using different compilation templates, which directly results in different machine code performance after compilation.

In Android, ART has a set of prescribed, unified build templates, called VM templates, that aren’t bad, but aren’t great either.

It’s certainly not bad because it’s the work of Google dad, but it’s not good either because it’s not optimized for each APP.

â‘¡. Problems of VM template

The problem is not optimizing for every APP.

As mentioned above in Google’s virtual machine optimization for Android2.2, at that time, Google used the JIT to record the popular functions of users (hot codes) and compiled them immediately when users opened the APP, that is, hot codes were compiled first.

However, when it comes to hybrid compilation on Android7.0, this feature is weakened due to AOT, and hot code recorded by the JIT is not persistent. AOT’s compilation priority follows that of the VM template, with AOT prioritizing some bytecode compilation to machine code based on the contents of the template.

So this is where the problem arises.

First take 🌰, A Chinese restaurant’s signature dish is fried eggs with tomatoes, so the preparation of fried eggs with tomatoes must be very sufficient, but the customer A maverick, he did not eat fried eggs with tomatoes, he ordered A rare steak package every time, that time can only let the customer wait for the boss to finish the steak package.

If an APP’s hot code, such as the home page, happens to be outside of the VM template, then AOT is virtually useless. (For example, the VM template preferentially compiles classes and methods whose names are no longer than 15 characters, but the first page’s class name is just 15 characters longer. This is only an example, not an actual demonstration)

Following the VM template, AOT for some reason did not compile some of the home page code first, and instead compiled the less important Settings page code:

The flow above illustrates that in particular cases AOT compilation doesn’t work, it’s all about the interpreter and the JIT compiling in real time, and the whole compilation scheme is a throwback to Android2.2.

â‘¢ Wise ART

While the problem exists, it is not particularly serious. Because ART is not as dumb as I said. In the later application use process, ART will record and learn the user’s usage habits (save the hot code), and then update the customized VM template for the current APP, and continuously supplement the hot code and customized template.

Does this sound familiar? This is part of the rationale behind the slogan at the mobile launch conference, “Learning based on user habits and improving APP opening speed.”

â‘£ The final big move, once and for all

It’s not hard to solve this problem once and for all: we just need to order what we want to eat with our boss in advance, so that we don’t have to wait for our meal when we arrive.

In the latest Version of Android9.0, Google introduced this pre-ordered feature: the build system supports using Clang’s Profile Boot optimization (PGO) on native Android modules with blueprint build rules.

Speaking human: Google allows you to add a configuration file during development that specifies “hot code”, which will be prioritized when ART quietly builds the APP in the background after the APP is installed.

Although Google supports this technology, domestic information is too scarce for APP developers, and its popularity is not wide. I first affixed official links, as well as this blog, which introduced or quite detailed. (Xcode next door has a UI for PGO)

Third, the solution

The solution is summed up in four words: Huawei ark.

Solution of Ark:

  1. In view of the virtual machine problem, ark said: I don’t want you this lousy virtual machine, we run naked

  2. Regarding the JNI call problem, Ark said: “We let Java compile directly to machine code like C in the compile phase, kill the virtual machine, and call directly with the.so library, no JNI overhead problem

  3. Regarding the compilation template issue, Ark said: “We support different compilation optimizations for different apps

To summarize: Ark supports different compilation optimizations for different apps in the packaging compilation phase, and then directly packaged into machine code. Apk (probably no longer called APK), and then directly run.

So it looks like the Ark solves all three problems, but at what cost?

If we follow this line of thinking, Ark is definitely not just a compiler, it should have its own runtime. Of course, that’s all for another story.

The realization of the Ark is just a general idea, but not in-depth, because the ark is not open source, and the ARK press conference PPT has more marketing level, lack of technical details, now the fantastic ideas are completely on paper, everything is still waiting for open source.

Four, programmers do not carry the pot!

Since publishing the article, I have received some feedback, one of which is:

The main cause of the deadlock is garbage code and domestic software such as Baohuo, Familybucket pot.

At this point, I can’t deny that junk code, survival strategies, and family buckets are gross.

But if these effects were to rise to the top of the list of causes of congestion,

I think you are too highly of their garbage code negative optimization ability, or too look down on Xiaomi, Huawei these system manufacturers, or feel the world’s iOS hand level high Android a level?

If you must say that garbage is causing lag, try to understand which code is garbage. For example, some code is causing memory jitter and frequent GC collection is causing lag. Don’t just say “garbage code” and leave the programmer holding the bag. It’s been 9102 years. Stop blaming programmers! And those who think so, please don’t underestimate yourself!

As for survival, if the current systems such as Huawei and Xiaomi set up a round-the-clock survival and mutually pull up the process, it will probably be like the hackers who hacked Into Ali and reported to the company the next day.

As for the problem of some thousand yuan machine, you can understand Google’s new Android Go system, the APP development requirements under this system are unusually harsh.

5. Reference materials

  1. How much impact will huawei’s ark compiler have on the Android software ecosystem?
  2. Huawei upstart! The glory and mission of ark compiler
  3. One article to understand huawei ark compiler, android a big progress
  4. What does a JVM have to do when calling a native method?
  5. About Dalvik, ART, DEX, ODEX, JIT, AOT, OAT