This article has participated in the third phase of the Diggings Creators Camp. For more details, check out diggings program | Creators Camp. The third phase of the diggings Creators Camp is currently underway

Zero, preface,

I wrote an article about startup optimization a little over a year ago. See “Performance Optimization Series” APP startup Optimization Theory and Practice (part 1). There are new insights every year. This article will supplement the app startup optimization scheme on the basis of the previous article. Please check it out.

The main contents of this paper are as follows:

  • Start time monitoring actual combat: manual dot and AspectJ way comparison;
  • Bootstrap optimization: directed acyclic graph initiators, IdleHandler initiators and other black technology solutions;
  • Optimization tool description.

I. Optimization tools

1.1. Traceview (deprecated)

TraceView is a great performance analysis tool for the Android platform that shows trace logs graphically, but has been deprecated. In addition, TraceView is too expensive and the results are not realistic.

1.2, CPU Profiler

Instead of Traceview, there is a CPU Profiler. It can check the. Trace files captured by using the Debug class to plug the application, record new method trace information, save. Trace files, and check the real-time CPU usage of the application process. For details, see Using the CPU Performance Profiler to check CPU Activity

1.3. Systrace + function pile insertion

Systrace allows you to collect and check timing information for all processes running on your device. It includes some of the Androidkernel data (such as CPU scheduler, IO, and APP Threads) and generates AN HTML report that allows users to view and analyze trace content. However, time analysis of the application code is not supported. If you need to analyze the execution time of the program code, you need to analyze the details in a way that combines function pilings. In the second section below gives the actual combat case, please refer to.

Second, start time-consuming monitoring

There are many ways to calculate startup speed, such as manual dotting, AOP dotting, ADB command, Traceview, Systrace, etc., which have been preliminarly explained in the article “Performance Optimization Series” APP startup optimization theory and Practice (part 1), and will not be covered here. The following will be time-consuming monitoring processing from the actual combat direction.

To monitor startup time, I initialize some third-party frameworks in the onCreate of Application, such as ARouter, Bugly, LoadSir, etc., to simulate time-consuming operations.

2.1 How to monitor the execution time of each method?

2.1.1 Method 1: Manually dotting

After learning that manual dotting can monitor app startup time, is that applicable to every method, let’s try dotting before and after every method initialized by the third party framework.

override fun onCreate() {
    super.onCreate()
    //Debug.startMethodTracing("App")
    //TraceCompat.beginSection("onCreate")TimeMonitorManager.instance? .startMonitor() initRouter() TimeMonitorManager.instance? .endMonitor("initRouter") TimeMonitorManager.instance? .startMonitor() initBugly() TimeMonitorManager.instance? .endMonitor("initBugly") TimeMonitorManager.instance? .startMonitor() initLoadSir() TimeMonitorManager.instance? .endMonitor("initLoadSir")

    //Debug.stopMethodTracing()
    //TraceCompat.endSection()
}
Copy the code

In this way, of course, the amount of time each method takes can be calculated, but if you add duplicate code to each method, and you add two lines to a method, how many methods are there? One hand at a time? !!!!!

This way is too “stupid”, and the source code is extremely invasive, abandon.

Is there a more elegant way to calculate the execution time of each method? The answer is yes.

AOP (Aspect Oriented Programming) is a technique that allows you to dynamically and uniformly add certain functionality to a program without modifying the source code by precompiling and running its dynamic proxies.

And its purpose is mainly to logging, performance statistics, security control, transaction processing, the exception handling code division from the business logic code, such as through the separation of these actions, we hope they can be independent to the guidance of business logic method, and then change the behavior does not affect the business logic of the code.

The above manual approach is highly coupled to the business logic code, and AOP is a good solution to this problem. There are many ways to implement AOP in Android, and here’s a look at one of the more common implementations -AspectJ

2. Aop-aspectj

AspectJ is one of the concrete implementations of AOP, which deals with crosscutting concerns. AspectJ, one of the concrete implementations of AOP, adds the concept of Join points to Java. It adds a few new constructs to the Java language, such as pointcuts, Advice, inter-type declarations, and aspects. Pointcuts and advice affect the program flow dynamically, intertype declarations statically affect the class hierarchy of the program, and aspects encapsulate all of these new structures.

So let’s go down and use AspectJ to compute.

Add the dependent

build.gradle

dependencies {
    classpath 'com. Hujiang. Aspectjx: gradle - android plugin - aspectjx: 2.0.10'
}
Copy the code

app#build.gradle

plugins {
    id 'com.android.application'
    id 'kotlin-android'
    id 'android-aspectjx'}... dependencies { implementation'org. Aspectj: aspectjrt: 1.8 +'
}
Copy the code

Create a new class class and add the @aspect annotation to indicate that the current class is an Aspect for the container to read.

@Aspect
class PerformanceAOP {}Copy the code

The next step is to write the logical code for the requirements. Our requirement is to calculate the execution time of each method, and to use @around and JoinPoint for uniform processing of methods.

@Around("call(* com.fuusy.fuperformance.App.**(..) )")
fun getMethodTime(joinPoint: ProceedingJoinPoint) {
    val signature = joinPoint.signature
    val time: Long = System.currentTimeMillis()
    joinPoint.proceed()
    Log.d(TAG, "${signature.toShortString()} speed time = ${System.currentTimeMillis() - time}")}Copy the code

Run it to see the effect:

21:05:44.504 3597-3597/com.fuusy.fuperformance D/PerformanceAOP: App.initRouter() speed time = 2009
21:05:45.104 3597-3597/com.fuusy.fuperformance D/PerformanceAOP: App.initBugly() speed time = 599
21:05:45.112 3597-3597/com.fuusy.fuperformance D/PerformanceAOP: App.initLoadSir() speed time = 8
Copy the code

Third, start optimization means

To optimize the startup speed of an app, the Application layer can only interfere with the business logic in its Application and Activity. For example, in Application, it is often time consuming to initialize third-party frameworks in onCreate. How to optimize the operation?

There are two main directions of startup optimization, asynchronous execution and delayed execution.

3.1 Asynchronous execution

3.1.1. Start subthreads

Speaking of asynchronous processing logic, is the first response to start child threads? So let’s put it to the test. Again simulating time-consuming operations in the Application, this time I will create a thread pool and perform the initialization of the three-party framework in the pool.

override fun onCreate() {
        super.onCreate() TimeMonitorManager.instance? .startMonitor()// Create a thread poolval newFixedThreadPool = Executors.newFixedThreadPool(CORE_POOL_SIZE) newFixedThreadPool.submit { initRouter() } newFixedThreadPool.submit { initBugly() } newFixedThreadPool.submit { initLoadSir() } TimeMonitorManager.instance? .endMonitor("APP onCreate")}Copy the code

Look at the execution time

/ / total time
com.fuusy.fuperformance D/TimeMonitorManager: APP onCreate: 45
// Single method execution time
com.fuusy.fuperformance D/PerformanceAOP: App.initLoadSir() speed time = 8
com.fuusy.fuperformance D/PerformanceAOP: App.initBugly() speed time = 678
com.fuusy.fuperformance D/PerformanceAOP: App.initRouter() speed time = 1768
Copy the code

The execution time of the single method initLoadSir is 8 milliseconds, that of initBugly is 678 milliseconds, and that of initRouter is 1768 milliseconds. After using the thread pool, the total execution time of onCreate is only 45 milliseconds, 2400 milliseconds to 45 milliseconds, which is more than 90% faster. The effect is undoubtedly significant.

However, the actual project business is complex, and the thread pool solution does not cover all cases, such as a third party framework can only be initialized in the main thread, such as a framework can only be initialized in onCreate first, then proceed to the next step. So what do we do in these cases?

If the method can only be executed in the main thread, then the subthread mode must be abandoned.

If the method needs to be completed in a specific phase, you can use CountDownLatch, a synchronization assist tool.

CountDownLatch is a universal synchronization tool that can be used for a variety of purposes. CountDownLatch with a count of 1 is used as a simple open/close latch or gate: call await all threads wait at the gate until it is called countDown thread. CountDownLatch, initialized to N, can be used to make a thread wait until N threads have completed an operation, or an operation has completed N times. A useful property of CountDownLatch is that it does not need to call the countDown thread to wait for the count to reach zero before continuing; it simply prevents any thread from passing the await until all threads can pass.

More generally, CountDownLatch is a utility class that waits for a child thread to complete before letting the program proceed to the next step. So let’s see.

Create a CountDownLatch with a count of 1, and simulate the initBugly method by waiting.

class App : Application(a){
    / / create a CountDownLatch
    private val countDownLatch: CountDownLatch = CountDownLatch(1)
    
    override fun onCreate(){... newFixedThreadPool.submit { initBugly()/ / execution countDown
              countDownLatch.countDown()
         }
         //awaitcountDownLatch.await() TimeMonitorManager.instance? .endMonitor("APP onCreate")}}Copy the code

Restart the APP

com.fuusy.fuperformance D/PerformanceAOP: App.initBugly() speed time = 642
com.fuusy.fuperformance D/TimeMonitorManager: APP onCreate: 667
Copy the code

As you can see, the final total time is waiting for initBugly to complete execution, so the startup time is longer.

As you can see from the above, the way to start a thread pool can only work in the general case, and can be a disadvantage when it comes to complex logic. For example, what happens when there is a dependency between two tasks? I also found that for each method, I had to submit a Runnable task for execution, which was also a drain on resources.

Is there a more elegant way to execute code that can operate asynchronously and resolve dependencies between tasks? Of course, a more elegant asynchronous approach is provided – directed acyclic graph initiators.

3.1.2 Directed acyclic Graph initiator

In actual projects, tasks are executed in sequence. For example, when initializing the wechat Pay SDK, it is necessary to get the corresponding App key from the background first, and then initialize the payment according to this key.

For the problem of task execution order, there is a data structure that can solve very well, namely directed acyclic graph. Let’s start with a specific description of directed acyclic graphs.

3.1.2.1 Finite Acyclic Graph (DAG)

Directed acyclic graph: IF there is no ring in a directed graph, it is called a directed acyclic graph, also called DAG graph.

The figure above is a directed acyclic graph with no edges pointing to each other. If B->A in the graph then there is A ring and it is not A directed acyclic graph.

So what does startup optimization have to do with this?

As mentioned above, the DAG diagram is designed to address dependencies between tasks. To solve this problem, actually involves a knowledge AOV Network (Activity On Vertex Network).

3.1.2.2 AOV (Activity On Vertex Network)

AOV network is one of the typical applications of DAG. Using DAG as a project, vertices represent activities and directed edges

indicate that activity Vi must precede activity Vj. In the case of the directed acyclic graph above, B must be executed after A, D must be executed before E, and there is A relationship between the vertices to be executed successively.
,vj>

This happens to coincide with the dependency of the start-up task. As long as the AOV network is used to execute the start-up task, the dependency problem of the start-up task will be solved.

In an AOV network, topological sorting is used to find the order in which tasks are executed.

3.1.2.3 Topology sorting

Topological sorting is A sorting of vertices of directed acyclic graphs, such that if there is A path from vertex A to vertex B, vertex B appears after vertex A in the sorting, and each AOV network has one or more topological sorting. The topological sorting implementation steps are also very simple, as follows:

Topological sort implementation:

  1. Select a vertex with no precursor (input degree is 0) from the AOV network and output;
  2. The vertex and all directed edges starting from it are removed from the net;
  3. Repeat 1 and 2 until the current AOV network is empty or no precursors exist in the current network.

Take a real-life example of making tea.

As shown in the figure above, it is a directed acyclic graph making tea, and for the realization of topological sorting, we will follow the steps above:

  1. Find the vertex with the entry degree of 0, here the vertex with the entry degree of 0 is only “prepare tea set” and “buy tea”, choose either “prepare tea set”;
  2. Remove the vertex “Prepare the tea set” and remove the starting edge from it, and you get the following:

  1. At this time, only the input degree of “buy tea” vertex is 0, then select this vertex and repeat the operation of 1 and 2.

This is repeated until the vertices are executed in the following order:

Of course, the final result of topological sorting has many kinds, for example, here at the beginning can choose “buy tea” vertex with input degree of 0 as the initial task, the result will change, here do not do the detailed discussion.

Directed acyclic graphs, AOV networks, and topological sorting have been explained above, and the next step is to combine them with startup tasks. In fact, according to the rules of topology sorting tasks in order to execute.

/** * Topological sort */
fun topologicalSort(): Vector<Int> {
    val indegree = IntArray(mVerticeCount)
    for (i in 0 until mVerticeCount) { // Initialize the number of inputs for all points
        val temp = mAdj[i] as ArrayList<Int>
        for (node in temp) {
            indegree[node]++
        }
    }
    val queue: Queue<Int> = LinkedList()
    for (i in 0 until mVerticeCount) { // Find all the points where the input degree is 0
        if (indegree[i] == 0) {
            queue.add(i)
        }
    }
    var cnt = 0
    val topOrder = Vector<Int>()
    while(! queue.isEmpty()) { val u = queue.poll() topOrder.add(u)for (node in mAdj[u]) { // Find all the adjacent points of this point (input degree 0)
            if (--indegree[node] == 0) { If the input becomes 0, then add it to the queue with the input degree 0
                queue.add(node)
            }
        }
        cnt++
    }
    check(cnt == mVerticeCount) {  // Check whether there is a ring. Theoretically, the number of points should be the same as the number of points. If not, it means there is a ring
        "Exists a cycle in the graph"
    }
    return topOrder
}

Copy the code

You can directly view FuPerformance on Github for the initiator to start the task.

After the implementation of the initiator, the basic use of the Application or Activity is as follows:

  1. Each Task is singled out and executed in a child thread inheriting the Task abstract class, such as initializing ARouter;
class RouterTask() : Task(a){
    override fun run() {
        if (BuildConfig.DEBUG) {
            ARouter.openLog()
            ARouter.openDebug()
        }
        ARouter.init(mContext asApplication?) }}Copy the code

If you must execute in the main thread then inherit MainTask. If you need to wait for the task to complete before you can proceed to the next step, then implement needWait and return true.

override fun needWait(): Boolean {
    return true
}
Copy the code
  1. A dependsOn method is required if there is a dependency between tasks, e.g. wechat payment is dependent on AppId acquisition.
class WeChatPayTask :Task(a){

    /** * wechat payment depends on AppId */override fun dependsOn(): List<Class<out Task? >? >? { val task = mutableListOf<Class<out Task? > > ()// Add Task to get AppID
        task.add(LoadAppIdTask::class.java)
        return task
    }

    override fun run() {
        // Initialize wechat Pay}}Copy the code
  1. After the tasks are processed separately, add the task queue in the onCreate of Application.
// Method 2. Initiator
TaskDispatcher.init(this)

TaskDispatcher.newInstance()
    .addTask(RouterTask())
    .addTask(LoadSirTask())
    .addTask(BuglyTask())
    .addTask(LoadAppIdTask())
    .addTask(WeChatPayTask())
Copy the code

This is the implementation and use of directed acyclic graph initiators. You can see that it makes the code elegant and solves several pain points mentioned at the beginning:

  • Dependencies of tasks in subthreads;
  • [Bug Mc-103172] – Task execution in a child thread must wait for it to complete
  • Settings are executed in the main thread.
  • High code coupling and waste of resources.

3.2. Delay execution

In the second part, the optimization method is delayed execution. There are many ways to realize delayed execution operation:

  • Thread to sleep
object : Thread() {
    override fun run() {
        super.run()
        sleep(3000) // Sleep for 3 seconds
        /** * The operation to be performed */
    }
}.start()
Copy the code
  • Handler#postDelayed
handler.postDelayed(
    Runnable {
        /** * The operation to be performed */
    }, 3000
)
Copy the code
  • TimerTask implementation
val task: TimerTask = object : TimerTask() {
    override fun run() {
        /** * The operation to be performed */
    }
}
val timer = Timer()
timer.schedule(task, 3000) // Execute the run method of TimeTask after 3 seconds
Copy the code

All three can be delayed, but when applied to start tasks, they all have a common pain point — the inability to determine the duration of the delay.

So how do you solve this pain point?

You can take advantage of the IdleHandler mechanism in Handler.

3.2.1, IdleHandler

In the process of startup, there are some tasks that do not have to be executed immediately after the App is started, in which case we need to find the right time to execute the task. How to find this time? Android actually gives us a great mechanism. In the Handler mechanism, provides a message queue idle, the time to execute the task -IdleHandler

IdleHandler is used when the current thread message queue is idle. You might ask, what if the message queue is never idle and IdleHandler never executes? Because the start time of IdleHandler is not controllable, it needs to be used in conjunction with the project business.

According to the characteristics of IdleHandler, to implement an IdleHandler initiator, as follows:

class DelayDispatcher {
    private val mDelayTasks: Queue<Task> = LinkedList<Task>()

    private val mIdleHandler = IdleHandler {
        if (mDelayTasks.size > 0) { val task: Task = mDelayTasks.poll() DispatchRunnable(task).run() } ! mDelayTasks.isEmpty() }/** * Add delayed tasks */
    fun addTask(task: Task): DelayDispatcher? {
        mDelayTasks.add(task)
        return this
    }

    fun start() {
        Looper.myQueue().addIdleHandler(mIdleHandler)
    }
}
Copy the code

use

DelayDispatcher().addTask(Task())? .start()Copy the code

3.3. Other Schemes

  • Load SharedPreferences in advance.
  • The child process is not started during the startup phase.
  • Class loading optimization
  • The I/O optimization

Zhang Shaowen mentioned in the development master class:

When the load is high, I/O performance degrades quickly. Especially for low-end machines, the same I/O operation may take dozens of times longer than high-end machines. Network I/O is not recommended during startup, and disk I/O optimization requires knowing what files are read during startup, how many bytes, how big the Buffer is, how long it is used, on what thread, and so on.

  • Class rearrangement

The class loading sequence of the startup process can be obtained by overwriting the ClassLoader

class GetClassLoader extends PathClassLoader { public Class<? >findClass(String name) {
        // Record the name to a fileWriteToFile (name,"coldstart_classes.txt");
        return super.findClass(name); }}Copy the code

Then use Facebook’s open source Dex optimization tool to arrange the whole class in the Dex.

ReDex is an Android bytecode (DEX) optimizer originally developed by Facebook. It provides a framework for reading, writing, and analyzing.dex files, as well as a set of optimizations for improving bytecode delivery using the framework.

  • Rearrange resource files

On the principle of resource file rearrangement and landing scheme can refer to Alipay App construction optimization analysis: optimize the Android terminal startup performance through installation package rearrangement

3.4. Black technology

  • GC is suppressed during startup

Alipay uses this method, can directly refer to the Alipay client architecture analysis: Android client startup speed optimization of “garbage collection”

  • CPU frequency lock

The higher the CPU frequency, the faster the computation, but the higher the power consumption, in order to improve the startup speed, stretch the CPU frequency, the faster the speed, but also the phone power consumption is faster.

Four,

The above describes some business-related optimization methods and some business-irrelevant black technology, which can effectively improve the startup speed of App. There are many plans for starting optimization, but we still need to judge and implement the plan based on the actual project situation.

Finally, performance optimization is a long-term process, I will open a series of performance optimization theory and practice, mainly involving startup, memory, lag, slimming, network optimization, please continue to pay attention.

  • Start the optimization

Project address: FUusy /FuPerformance

References:

Alipay client architecture: Android client startup speed optimization of “garbage collection” Optimize Android startup performance by rearrangement of the installation package. You can play with Android performance analysis and optimization

Recommended Reading:

“Performance Optimization Series” APP Startup Optimization Theory and Practice (I)