preface

Series “alipay client architecture parsing will be from alipay client architecture design, segmentation and dismantling the client” container framework design “, “network optimization”, “performance start optimization”, “automatic log collection”, “RPC component design”, “mobile application monitoring, diagnosis, localization” the specific implementation, Lead you to further understand the iteration and optimization process of Alipay’s client architecture.

This section describes how to optimize the startup speed of Alipay Android client by garbage collection.

Application startup time is an important part of user experience for mobile apps. Compared with ordinary mobile apps, Alipay is too large, which will inevitably affect the startup speed. Some conventional optimization methods have been perfected in Alipay, and this paper attempts to further optimize alipay startup speed from the perspective of GC.

background

Compared to C, The Java language has some features, such as developers do not have to worry about memory allocation and reclamation, however, process memory management is an essential part of the compromise, the Java language designers put object allocation and reclamation into the Java virtual Machine, here we want to make clear the concept: GC comes with a cost, which includes: The engineers at Google are aware of the impact of GC on their applications, so they output GC logs to Logcat by default. We often see the following GC logs output from Logcat:

  1. GC_EXPLICITDalivk provides an API for developers to actively trigger GC. You can see the design of Google Maps to see how this API works
  2. GC_FOR _ALLOCK: is a GC that is triggered when an object fails to be allocated. This GC suspends all Java threads in the application until the GC is complete.
  3. GC_CONCURRENT: is a GC triggered by the Java virtual Machine based on the current state of the heap. This GC runs in the Dalvik separate GC thread and does not affect the application Java thread for part of the time.

Alipay startup is a typical critical path scenario and we would like to see as few GC_ CONCURRENT as possible (GC_ FOR_ ALLOCK should be reduced to a minimum if possible), however, With Logcat we can see very bad GC behavior – lots of GC_ FOR_ ALLOCK and an impressive number of Java threads blocked by WAIT_ FOR_ CONCURRENT_ GC, as shown in the figure below. By simply counting the time spent by these GCS, We can conclude that GC seriously affects application startup time.

Design ideas

Alipay is an application of Android system. How can we shorten the startup time by affecting the GC behavior of Dalvik? The problem can be broken down into two steps:

  • Whether Alipay can influence the behavior of its own Dalvik
  • How to improve Dalvik and shorten startup time

The answer to the first question is yes. The design idea of The Android system is that each Android application has its own Dalvik instance, which can modify the code and data in its own process space after the application is started. Therefore, Alipay influences Dalvik’s behavior by modifying Dalvik library file libdvm.so in memory.

The second difficulty lies in the input-output ratio: modifying the code and data in the process space is binary oriented and far more difficult than the source code, which means that slightly more complex Dalvik improvements are impossible.

This is a “space for time” strategy that uses more memory consumption in exchange for shorter startup times. This strategy is feasible on two conditions: First, the device manufacturer does not encrypt the Dalvik library files in memory; second, the device manufacturer does not change Google’s Dalvik source code (or a small amount of change). Theoretically, all devices can be covered by whitelist, but the implementation and maintenance costs are very high.

Implementation of GC suppression

The premise of GC inhibition is that Dalvik is familiar with how to change GC behavior, and the solution is roughly as follows: Second, in the binary code, find the “instruction fingerprint” of the conditional jump on branch A, and the binary code to change the branch. Let’s say override_A. After the application starts, scan memory for libdvm. According to the “instruction fingerprint” to locate the modification location, and then overwrite with override_A, here it should be noted that the definition of “instruction fingerprint” requires some knowledge of compiler and ARM instruction set. GC suppression mainly implements the following four parts:

  • The softLimit check is cancelled
  • Unwake the GC thread
  • Cancel the GC routine function
  • OOM stops implementation of GC suppression

1. Cancel the SoftLimit check:

The purpose of unchecking Softlimit is to maximize the allocation of objects. The following figure shows the corresponding arm instruction fragment for softlimit checking, which is located in the dvmHeapSourceAlloc function. OXE057 corresponds to the branch of “return NULL”. If we want to never enter the “return NULL” branch, we can change the result of the CMP instruction. In the implementation, we identify “0X42” as the “instruction fingerprint” and change it to “CMP R0, r0”, so that the Softlimit check can be cancelled.

   7616c: 42a1 cmp r1, r4
   7616e: d901 bls.n 76174 <_Z18dvmHeapSourceAllocj+0x20>
   76170: 2400 movs r4, # 0
   76172: e057 b.n 76224 <_Z18dvmHeapSourceAllocj+0xd0>
   76174: f8df 90bc ldr.w r9, [pc, # 188]; 76234 <_Z18dvmHeapSourceAllocj+0xe0>
   76178: 6a28 ldr r0, [r5, # 32]
   7617a: f853 3009 ldr.w r3, [r3, r9]
   7617e: 7d1a ldrb r2, [r3, # 20]
void* dvmHeapSourceAlloc(size_t n)
{
...
if (heap->bytesAllocated + n > hs->softLimit) {
/*
* This allocation would push us over the soft limit; act as
* if the heap is full.
/
returnNULL;Copy the code

2. Unwake the GC thread

The purpose of canceling GC thread wake is to prevent thread jitter due to frequent GC thread wake. Below is the corresponding fragment of C++ code and arm instruction, also in the dvmHeapSourceAlloc function. In the implementation we will scan the dynstr, dynsym, rel. PLT and PLT fields of libdvm. So for pthreadcondsignal@plt and then traverse all the branches in dvmHeapSourceAlloc. Calculates the destination address.

If pthreadcondsignal@plt and the current branch jump destination address are configured, erase this command.

   if (heap->bytesAllocated > heap->concurrentStartBytes) {
/
* We have exceeded the allocation threshold. Wake up the
* garbage collector.
*/
dvmSignalCond(&gHs->gcThreadCond);
}
7621c: 6800 ldr r0, [r0, # 0]
7621e: 30b4 adds r0, #180 ; 0xb4
76220: f7a9 ed0e blx 1fc40 76224: 4620 mov r0, r4 76226: e8bd 83f8 ldmia.w sp! , {r3, r4, r5, r6, r7, r8, r9, pc}Copy the code

3. Cancel the GC routine function

The cancellation of GC routine function is realized by hook technology. We encapsulate GC suppression into two native interfaces doStartSuppressGC and doStopSuppressGC. It is further encapsulated as a JNI interface for developers to call in Java. The general application mode is that the developer can see from the log that Alipay will trigger a large number of GC in a certain scene and this GC will affect the user experience (slow response time or animation), and then insert doStartSuppressGC and doStopSuppressGC before and after the scene.

Take the cold start scenario of Alipay as an example, we insert doStartSuppressGC into the attachBaseContext function of container Quinox and doStopSuppressGC at the end of loading the home page.

4. Stop GC suppression in OOM

If you only consider the suppression of GC during alipay startup, you do not need to consider the implementation of OOM stop GC suppression, because Alipay startup is not enough to trigger OOM. But we want GC suppression to be a building block that can be applied to more scenarios. If the program triggers OOM before calling doStopSuppressGC, GC suppression needs to be stopped before OOM occurs. Instead of simply changing the direction of the branch jump, we need to inject a new branch jump before OOM happens. The code for this new branch will be implemented by us. The main function of the new branch is to call doStopSuppressGC, then remove the injected new branch, and finally jump back to Dalvik and execute OOM.

The implementation also uses the traditional hook technology. In dvmCollectGarbageInternal hook function:

  • When the condition is not met, return directly to achieve the purpose of canceling GC;
  • When the condition is met, the hook is cancelled and the original one is executeddvmCollectGarbageInternal.

The implementation uses an open source binary injection framework: github.com/crmulliner/… .

It is important to note that the performance overhead of using pre_hooks and post_hooks provided by the framework in hotspot functions is very high.

The design in this article only uses the pre_hook once, so there are no performance problems. The reader may ask, does this “command fingerprinting” make sense? My answer is that missing judgment does not affect correctness, and misjudgment exists in theory but has a very small probability (misjudgment means that “instruction fingerprint” locates the location of the wrong code). Even if a miscalculation occurs, we have a final layer of protection — the disaster recovery mechanism implemented by the students in the infrastructure group. When the program fails to complete normal startup due to misjudgment, Alipay is restarted and GC suppression is directly abandoned in subsequent startup.

The effect

The boot time data in the figure above was obtained from an internal Android 4.x test device (without release indicating debug version). According to the chart, the startup time of The Alipay client has been shortened by 15 to 30 percent.

summary

Through this section, we have a preliminary understanding of alipay’s “garbage collection” mechanism and specific practices under the optimization of Android client startup performance. Due to space limitation, we cannot elaborate on many technical points one by one. And the corresponding technical kernel, we also applied in mPaaS and output, welcome to start the experience:

Tech.antfin.com/docs/2/4954…

We are also looking forward to your feedback on the design ideas and specific practices of Android startup performance optimization. Welcome to discuss and communicate with you.

Past reading

Alipay Client Architecture Analysis: A Preliminary Study of iOS Container Framework

“Alipay Client Architecture Analysis: A Preliminary Study of Android Container Framework”

The opening | modular and decoupling type development in ant gold mPaaS theorypractice probing depth,

Dependency Analysis Guide between Bundles of Koubei App

The source code analysis | ant gold suit mPaaS RPC calls under the framework of course”

“Practice of Alipay Mobile Terminal Dynamic Solution”

Follow our public account, get first-hand mPaaS technology practice dry goods