Startup performance is the face of APP use experience, and a long startup process may reduce users’ interest in using the APP. Tiktok has also verified that it has a significant impact on business indicators through degrading experiments on startup performance. Douyin has hundreds of millions of daily activities, and an increase in startup time of a few hundred milliseconds may reduce the retention of tens of thousands of users. Therefore, optimization of startup performance has become the top priority of Douyin’s Android basic technology team in terms of experience optimization.

In the last chapter on The Theory and Tools of Priming performance optimization, the priming performance optimization of Douyin has been introduced from the perspective of principle, methodology and tools. This paper will introduce the scheme and ideas of Priming performance optimization of Douyin from the perspective of practice through specific case analysis.

preface

Startup refers to the whole process from the user clicking on the icon to seeing the first frame of the page. The goal of startup optimization is to reduce the time consuming of this process.

The startup process is more complex, in the process and thread dimension, it involves multiple cross-process communication and switch between multiple threads; In the time cause dimension, it includes CPU, CPU scheduling, IO, lock wait and other types of time. Although the startup process is complicated, we can finally abstract it down to a linear process of the main thread. So the optimization of startup performance is to shorten the linear process of the main thread.

Next, according to the logic of direct optimization of main thread, indirect optimization of background thread and global optimization, we will introduce some typical cases that the team encountered in the practice of starting optimization, and also briefly introduce some excellent schemes in the industry.

Optimization case study

1. Direct optimization of the main thread

Optimization of the main thread will be described in the order of life cycle.

1.1 MutilDex optimization

Let’s start with the first phase, the attachBaseContext phase of the Application. Due to Applicaiton Context assignment and other issues, this phase usually does not involve much business code and is not expected to be time-consuming. However, in the actual test process, we found that on some models, the initial startup time after application installation is very serious. After preliminary positioning, the main time is multidex.install.

After a detailed analysis, we determined that the problem was concentrated on 4.x models and affected first boot after first installation and subsequent updates.

The reason for this problem is that the instruction format design of DEX is not perfect, and the total number of Java methods referenced in a single DEX file cannot exceed 65536. When the number of methods exceeds 65536, it will be divided into multiple dex. In general, Dalvik VM can only execute the optimized Odex file. In order to improve the application installation speed, Dalvik VM only optimizes the first dex of the application in the installation stage. For non-first dex it will be optimized on the first run call to multidex.install, and this optimization can be very time-consuming, which causes the problem of slow first boot on 4.x devices.

There are several necessary conditions causing this problem, they are: dex is split into multiple dex files, only the first dex is optimized during installation, multidex. install is called during startup, and The Dalvik VM needs to load Odex.

Obviously, we can’t break the first two conditions — for Douyin, it’s difficult to optimize it into a single DEX, and we can’t change the installation process of the system. The condition of calling multidex. install during the startup phase is also difficult to break — first, with the expansion of the business, it is difficult to use a dex to host the startup code; Second, even if it does, it is difficult to maintain.

Therefore, we chose to break the restriction that “Dalvik VM needs to load Odex”, that is, to bypass Dalvik’s restriction and directly load the unoptimized dex. The core of this solution is the native function Dalvik_dalvik_system_DexFile_openDexFile_bytearray, which supports loading unoptimized dex files. The specific optimization scheme is as follows:

  1. First, extract the bytecode of the original non-first DEX file from APK.
  2. Call Dalvik_dalvik_system_DexFile_openDexFile_bytearray, pass in the DEX bytecode obtained from APK one by one, complete the DEX loading, and get the legal DexFile object.
  3. Add dexfiles to DexPathList of APP PathClassLoader;
  4. Odex optimization for non-first dex is performed asynchronously after delay.

For more details of MutilDex optimization, refer to the previous article of the public account, and the scheme has been open source. For details, see the github address of the project (github.com/bytedance/B…

1.2 ContentProvider optimization

Next is the related optimization of ContentProvider. ContentProvider is one of the four components of Android. It is unique in terms of life cycle — activities, Services, and BroadcastReceivers are instantiated only when they are called and their life cycles are executed. Even if the ContentProvider is not called, it is automatically instantiated during the startup phase and executes the associated lifecycle. After the Application attachBaseContext method is called during the initialization phase of the process, the installContentProviders method is executed to install all contentProviders of the current process.

This process instantiates all the contentProviders of the current process through a for loop, calling their attachInfo and onCreate lifecycle methods, Finally, the ContentProviderHolder associated with these ContentProviders is published to the AMS process once.

A ContentProvider is a feature that automatically initializes processes during their initialization. As a component for cross-process communication, it is also used by several modules for automatic initialization, the most typical of which is the official Lifecycle component. The initialization is with the aid of a ContentProvider called ProcessLifecycleOwnerInitializer is initialized.

LifeCycle initialization is just a registration of LifecycleCallbacks for the Activity and we don’t need to do much optimization on a logical level. It is worth noting that if there are a lot of such contentProviders for initialization, the creation of the ContentProvider itself, life cycle execution, and so on can add up to a very time-consuming process. To address this problem, we can optimize by aggregating multiple initialized ContentProviders into one using the Startup provided by JetPack.

In addition to this kind of ContentProvider, we also found some time-consuming contentProviders in the actual optimization process, here is a general introduction to our optimization ideas.

public class ProcessLifecycleOwnerInitializer extends ContentProvider { @Override public boolean onCreate() { LifecycleDispatcher.init(getContext()); ProcessLifecycleOwner.init(getContext()); return true; }}Copy the code

For our own ContentProvider, we can refactor to change automatic initialization to on-demand if initialization takes time. For some tripartite or even official ContentProviders, they cannot be optimized directly through refactoring. Here, take the official FileProvider as an example to introduce our optimization ideas.

FileProvider use

FileProvider is a component introduced by Android7.0 for file access control. Before the introduction of FileProvider, we directly transfer file Uri for some cross-process file operations such as photographing. After the introduction of FileProvider, our entire process is as follows:

  1. First, inherit FileProvider to implement a custom FileProvider, and register the Provider in the manifest, and associate a FILE PATH XML file with its FILE_PROVIDER_PATHS property.
  2. Methods convert the file path to a Content Uri using FileProvider’s getUriForFile method, and then call ContentProvider’s Query, openFile, and other methods.
  3. When the FileProvider is invoked, the file path is checked first to determine whether the file path is in the XML defined in step 1. If the file path is verified, the subsequent logic is continued.

Time-consuming analysis

From the above process, as long as we do not call FileProvider in the startup phase, there will be no Time related to FileProvider. However, in fact, from the start trace, there is fileProvider-related time in our start phase. The specific time is in the attachInfo method of the life cycle of FileProvider. The FileProvider attachInfo method will not only call the onCreate method that we are most familiar with, but also call the getPathStrategy method. Our time is concentrated in the getPathStrategy method.

In terms of implementation, the getPathStrategy method is mainly used to parse the XML file associated with the FileProvider, and the result of parsing will be assigned to the mStrategy variable. Further analysis shows that mStrategy is used in file path verification of query, getType, openFile and other interfaces of FileProvider. Our query, getType, openFile and other interfaces will not be called during startup, so FileProvider attachInfo’s getPathStrategy is completely unnecessary. We can execute getPathStrategy logic only when query, getType, openFile, etc are called.

Optimization scheme

FileProvider is the code in AndroidX, we cannot modify it directly, but it will participate in our code compilation. We can modify its implementation by modifying the bytecode during compilation. The specific implementation scheme is as follows:

  1. Stake the ContentProvider’s attachInfo method, set grantUriPermissions to ProviderInfo to false before implementing the original implementation, and then call the original implementation and catch exceptions. GrantUriPermissions is set back to True on ProviderInfo after the call completes, bypassing the execution of getPathStrategy with the check of grantUriPermissions. The exported exception detection of ProviderInfo is not used here to bypass the getPathStrategy call because the exported ProviderInfo is used in the attachInfo super method Attribute caching)
public void attachInfo(@NonNull Context context, @NonNull ProviderInfo info) { super.attachInfo(context, info); // Sanity check our security if (info.exported) { throw new SecurityException("Provider must not be exported"); } if (! info.grantUriPermissions) { throw new SecurityException("Provider must grant uri permissions"); } mStrategy = getPathStrategy(context, info.authority); }Copy the code
  1. The query, getType, openFile and other methods of FileProvider are inserted. The getPathStrategy is initialized before the original method is called, and the original implementation is called after the initialization.

Although a single FileProvider does not take much time, for some large apps, there may be multiple FileProviders for module decoupling. In this case, the benefits of FileProvider optimization are considerable. Similar to FileProvider, The WorkManager provided by Google also has an initialized ContentProvider, which can be optimized in a similar way.

1.3 Starting task reconstruction and task scheduling

The third stage of startup is onCreate stage of Application, which is the peak stage of the execution of startup tasks. The optimization in this stage is for the optimization of various startup tasks, which has strong business relevance. Here is a brief introduction to the general idea of our optimization.

The core idea of Douyin startup task optimization is to maximize the code value and resource utilization. The code value maximization is mainly to determine which tasks should be executed in the startup stage, its core goal is to remove tasks that should not be executed in the startup stage from the startup stage; Resource utilization maximization is to use system resources as much as possible to reduce task execution time under the condition that tasks have been determined in the start-up stage. For a single task, we need to optimize its internal implementation, reduce its own resource consumption to provide more resources for the execution of other tasks, and for multiple tasks, we need to make full use of system resources through reasonable scheduling.

From the perspective of landing, we mainly carry out two things: start task reconstruction and task scheduling.

Starting Task Reconstruction

Due to the high complexity of services and the loose control of start-up tasks in the early stage, there are more than 300 tasks in the start-up stage of Tiktok. In this case, scheduling tasks in the start-up stage can improve the startup speed to a certain extent, but it is still difficult to raise the startup speed to a high level. So a very important aspect of startup optimization is to reduce startup tasks.

Therefore, we divide startup tasks into configuration tasks, preloading tasks and functional tasks. The configuration task is mainly used to initialize various SDKS, and relevant SDKS cannot work before it is executed. The preloading task is used to preheat some subsequent functions to improve the execution speed of subsequent functions. Functional tasks are function-related tasks that are performed during the process startup lifecycle. We adopted different transformation methods for these three tasks:

  • Configuration tasks: For configuration tasks, our ultimate goal is to remove them from the startup phase. There are two main reasons for doing so. First, some configuration tasks are still time-consuming, and removing them from the startup task can improve our startup speed. Secondly, relevant SDK cannot be used normally before the configuration task is executed, which will affect our functional availability, stability and scheduling in the optimization process. In order to eliminate the configuration task, we atomized the configuration task, which originally needed to actively call the implementation of injecting context, callback and other parameters into SDK. The method of spi (service discovery) is changed to the method of on-demand invocation — For tiktok’s own code, we request the upper layer of the application to obtain it through SPI when we need to use context, callback and other parameters. For the three-party SDK that cannot modify the code, We encapsulated them in an intermediate layer. The subsequent use of three-party SDK is through the encapsulated intermediate layer, and the CONFIGURATION task of SDK will be performed when the relevant interface of the intermediate layer is called. This way we can remove the configuration task from the startup phase and implement it as needed.
  • Preloading tasks: For preloading tasks, we normalized them to ensure the function correctness of preloading tasks in the case of degradation. Meanwhile, we removed expired preloading tasks and redundant logic in preloading tasks to enhance the value of preloading tasks.
  • Functional tasks: For functional start-up tasks, we disintegrate and reduce them in granularity, remove unnecessary logic in start-up stage, and add scheduling and demotion capability support for functional tasks for subsequent scheduling and demotion.

Task scheduling

There have been a lot of introductions about task scheduling in the industry, but the dependency analysis and task scheduling are not introduced here, and some possible innovations in the practice of Douyin are mainly introduced:

  • Scheduling based on landing page: Trill start in addition to enter the home page, and authorization login, push pull live, such as different landing page, these different landing page on the execution of tasks are relatively large differences, we can at the Application stage by reflecting the main thread of messages in the queue for startup target page, specific task scheduling based on landing page;
  • Performance based scheduling: The performance data of the device is scored and normalized in the background, and the normalized result is sent to the end. The end schedules tasks according to the performance level of the device.
  • Scheduling based on function activity: statistics users’ use of each function, calculate an activity data of each function for users, and send them to the terminal, which schedules according to the level of function activity;
  • Scheduling based on terminal intelligence: on the end, the user’s subsequent behavior is predicted by the way of terminal intelligence, so as to preheat the subsequent functions.
  • Startup degradation: For some devices and users with poor performance, the startup tasks and functions are degraded, postponed to after startup, or even not performed at all, to ensure the overall experience.

1.4 Activity stage optimization

The previous several stages all belong to the Application stage. Next, we will take a look at the optimization related to the Activity stage. In this stage, we will introduce two typical examples of the merger of Splash and Main and deserialization optimization.

1.4.1 Merge Splash and Main

First of all, let’s take a look at the combination of SplashActivity and MainActivity. In the previous version, Tiktok’s launcher activity is SplashActivity, which mainly carries the logic related to opening the screen such as advertising and activities. In general, our startup process is as follows:

  1. Enter the SplashActivity and check whether the screen is currently open for display.
  2. If there is an open screen to be displayed, the open screen will be displayed and wait for the end of the display before jumping to MainActivity. If there is no open screen, the MainActivity will be directly jumped to MainActivity.

In this process, we need to start two activities. If we combine these two activities, we can get two benefits:

  1. Reduce the Activity startup process once;
  2. Use the time it takes to read the on-screen information to do concurrent tasks strongly associated with the Activity, such as asynchronous View preloading.

To realize the merger of Splash and Main, we need to solve two Main problems:

  • How to solve the problem of external Activity name jump after merging;
  • If you solve the LaunchMode versus multi-instance problem.

The first problem is easier to solve by using activity-alias+targetActivity to point SplashActivity to MainActivity. Now let’s look at the second question.

LaunchMode problem

Before the merger of Splash and Main, the launchmodes of SplashActivity and MainActivity are standard and Sinngletask respectively. In this case, we can ensure that MainActivity has only one instance, and that we can return to the previous page when we exit the application home and re-enter it.

After merging SplashActivity with MainActivity, our launcher Activity becomes MainActivity. If we continue using the launchMode singletask, When we go out from the secondary page home and click icon to enter again, we will not be able to return to the secondary page, but to the Main page. Therefore, after the merger, the Launch mode of MainActivity will no longer be able to use singletask. After investigation, we finally chose SingleTop as our launchMode.

Multi-instance problem

1. Internal start multi-instance problems

Using SingleTop solves the problem of home going out and re-entering the page and not being able to return to the previous page, but then comes the problem of MainActivity having multiple instances. There are some logic strongly associated with the life cycle of MainActivity in the logic of Douyin. If there are multiple instances of MainActivity, this part of logic will be affected. At the same time, multiple implementations of MainActivity will also lead to unnecessary resource overhead, which is inconsistent with expectations. So we want to solve this problem.

Our solution to this problem is to add FLAG_ACTIVITY_NEW_TASK and FLAG_ACTIVITY_CLEAR_TOP flags for all intents that start MainActivity in the app. To achieve a feature similar to Singletask’s Clear Top.

FLAG_ACTIVITY_NEW_TASK + FLAG_ACTIVITY_CLEAR_TOP basically solves the problem of internally starting multiple instances of MainActivity, but in our testing, we found that on some systems, Even with the implementation of clear Top, there is still the problem of multiple instances.

Even if SplashActivity points to MainActivity via activity-alias+targetActivity, However, on the AMS side, it still thinks that SplashActivity is started, and when MainActivity is started later, it will think that there is no MainActivity before, so it will start a MainActivity again.

Our solution to this problem is to change the Component information that starts the MainActivity Intent from MainActivity to SplashActivity. This has completely solved the multi-instance problem caused by internally starting MainActivity.

To minimize intrusions into the business, and to prevent subsequent iterations from having MainActivity problems caused by internal launches, we staked the call to Context startActivity. For a call to start a MainActivity, the original implementation is called after adding flags to the Intent and replacing Component information. The reason why I choose to implement the piling method is that the code structure of Douyin is relatively complex, there are multiple base classes of activities, and some of the base classes of activities cannot be directly modified into the code. Services that do not have this problem can be implemented by rewriting the Activtity base class and the startActivity method of the Application.

2. External start multi-instance problem

The above solution to multiple instances of MainActivity is implemented by modifying the Intent to start the Activity before starting it. This approach is obviously not a solution to the problem of multiple instances of MainActivity caused by MainActivity starting outside the application. Do we have any other solutions to the multi-instance problem caused by external MainActivity launches?

Let’s go back to where we started with the MainActivity multi-instance problem. The reason to avoid multiple MainActivity instances is to prevent multiple MainActivity objects from executing the life cycle of the MainActivity in an unexpected way. Therefore, the problem of multiple MainActivity instances can be solved by ensuring that multiple MainActivity objects are not present at the same time.

To avoid multiple MainActivity objects at the same time, we first need to know whether the current MainActivity object exists. The solution to this problem is relatively simple, we can listen to the Activity life cycle, Add or destroy MainActivity instances in onCreate and onDestroy, respectively. If the number of MainActivity instances is 0, the MainActivity object does not exist.

Having solved the problem of counting MainActivity objects, we now need to keep the number of concurrent MainActivity objects under 1 at all times. To solve this problem, we need to review the Activity startup process. Starting an Activity first goes through AMS, and AMS calls the process where the Activity is located. The process in which the Activity is located will post to the main thread through the Handler of the main thread, and then create the Activity object through Instrumentation, and perform the subsequent life cycle. For the external launch of MainActivity, what we can control is the part after AMS returns to the process, where we can select newActivity with Instrumentation as the entry point.

Specifically, our optimization plan is as follows:

  1. Instrumentaion implements a custom Instrumentaion class from Instrumentation, rewriting all methods in it by proxy forwarding.
  2. Instrumentaion object from the ActivityThread and a custom Instrumentaion object from the ActivityThread. Instrumentaion Uses the custom Instrumentaion object to replace the original Instrumentaion object of the ActivityThread by reflection.
  3. In custom Instrumentaion class newActivity method, to determine whether the current to create the Activity MainActivity, if not MainActivity or there is no current MainActivity object, Call the original implementation, otherwise replace its className argument to point to an empty Activity to create an empty Activity;
  4. Finish itself in the empty Activity’s onCreate, Start SplashActivity with an Intent that adds FLAG_ACTIVITY_NEW_TASK and FLAG_ACTIVITY_CLEAR_TOP Flags.

It is important to note here hook Instrumentaion implementation scheme, on the high version of the Android system we can also to AppComponentFactory instantiateActivity way to replace.

1.4.2 Deserialization optimization

Another typical optimization in the Tiktok Activity phase is deserialization optimization — during tiktok use, some data is locally serialized, which needs to be deserialized during the startup process, which affects the speed of Tiktok start. In the previous optimization process, we optimized block logic asynchronization, snapshot and other case by case from the business level, and achieved good results. However, such method is more troublesome to maintain, and deterioration often occurs in the iteration process, so we tried to optimize it in the way of positive optimization deserialization.

To be specific, the deserialization problem in the start stage of Douyin is the time-consuming problem of Gson data parsing. Gson is a JSON parsing library launched by Google, which has the advantages of low access cost, convenient use and good function expansibility, but it also has an obvious weakness. That is, it takes time to do the first parsing of a Model, and it ballooned as the Model became more complex.

The initial parsing time of Gson depends on its implementation. There is a very important role in the data parsing process of Gson, which is the TypeAdapter. For each object Class to be parsed, Gson generates a TypeAdapter first. Then using this TypeAdapter parsing, Gson default resolution scheme USES a ReflectiveTypeAdapterFactory create TypeAdapter, its creation and parsing process involves a lot of reflection calls, concrete process is:

  1. First, all fields of the object to be parsed are obtained through reflection, and their annotations are read one by one to generate a Field map from serializeName to Filed mapping.
  2. During the parsing process, the serializeName read is used to find the corresponding Filed information in the generated map. The Filed information is then parsed in a specific type based on the data type of the Filed, and the value is assigned through reflection.

Therefore, the core of time optimization for Gson analysis is to reduce reflection. Here are some optimization schemes used in Douyin.

Custom TypeAdapter optimization

Through analyzing the Gson source, we know that Gson analytical USES is the form of a chain of responsibility, if have before ReflectiveTypeAdapterFactory TypeAdapterFactory can deal with a Class, It’s won’t perform to ReflectiveTypeAdapterFactory and Gson framework support into custom TypeAdapterFactory again, So one of our optimizations is to inject a custom TypeAdapterFactory to optimize the parsing process.

This custom TypeAdapterFactory generates a custom TypeAdapter at compile time for each Class to be optimized, and in this TypeAdapter generates the relevant parsing code for each Class field to avoid reflection.

Bytecode processing in the process of generating custom TypeAdapter, we use the open-source Bytex processing framework of Douyin team (github.com/bytedance/B…

  1. Configure the Class to be optimized: In the development phase, through annotations, configuration files to whiten the Class we need to optimize;
  2. Collect information about the Class to be optimized: After compiling, we configure the Class by reading from the configuration file; During the traverse phase of all classes in the project, we read classes configured by annotations through the ClassVisitor that ASM provides. For all the classes that need optimization, we use the visitField method of ClassVisitor to collect all the Filed information of the current Class.
  3. Generate custom TypeAdapter and TypeAdapterFactory: In the TrASForm phase, we use the collected Class and Field information to generate custom TypeAdapter classes, as well as the custom TypeAdapterFactory that creates these TypeAdapters.
public class GsonOptTypeAdapterFactory extends BaseAdapterFactory {

    protected BaseAdapter createTypeAdapter(String var1) {
        switch(var1.hashCode()) {
        case -1939156288:
            if (var1.equals("xxx/xxx/gsonopt/model/Model1")) {
                return new TypeAdapterForModel1(this.gson);
            }
            break;
        case -1914731121:
            if (var1.equals("xxx/xxx/gsonopt/model/Model2")) {
                return new TypeAdapterForModel2(this.gson);
            }
            break;
        return null;
    }
}

public abstract class TypeAdapterForModel1 extends BaseTypeAdapter {

    protected void setFieldValue(String var1, Object var2, JsonReader var3) {
    Object var4;
    switch(var1.hashCode()) {
    case 110371416:
        if (var1.equals("field1")) {
            var4 = this.gson.getAdapter(String.class).read(var3);
            ((Model1)var2).field1 = (String)var4;
            return true;
        }
        break;
    case 1223751172:
        if (var1.equals("filed2")) {
            var4 = this.gson.getAdapter(String.class).read(var3);
            ((Model1)var2).field2 = (String)var4;
            return true;
        }
    }
    return false;
}
}
Copy the code

Optimization ReflectiveTypeAdapterFactory

The above method of customizing TypeAdapter can optimize the first parsing time of Gson by about 70%. However, this scheme needs to increase the parsing code during compilation, which will increase the package volume and has certain limitations. Therefore, we also try to optimize the implementation of Gson framework. In order to decrease the cost of access to the us by modifying the bytecode way to modify ReflectiveTypeAdapterFactory implementation.

Original ReflectiveTypeAdapterFactory before the actual data analysis, is to reflect all field information Class first, then parse, and in the actual process of parsing and not all of the fields are used to, the following Person Class, for instance, Before the Person parsing, the three classes of Person, Hometown and Job will be parsed, but the actual input may only be a simple name, in which case the parsing of Hometown and Job is completely unnecessary. If the implementation of the Hometown and Job classes is complex, this will lead to more unnecessary time consumption.

class Person { @SerializedName(value = "name",alternate = {"nickname"}) private String name; private Hometown hometown; private Job job; } class Hometown { private String name; private int code; } class Job { private String company; private int type; } // Enter {"name":" zhang SAN "}Copy the code

For such cases, our solution is “on-demand parsing”. For example, when parsing the Class structure of Person, we normally parse the name field of the basic data type, and the hometown and job fields of the complex type. Their Class type is logged and an encapsulated TypeAdapter is returned; In the actual data analysis, if hometown and job nodes are included, we will analyze the Class structure of hometown and job. This optimization scheme is particularly effective when the Class structure is complex but the actual data nodes are missing a lot. In the practice of Douyin, the optimization range of some scenes is close to 80%.

Other optimization schemes

The above two typical optimization schemes are introduced. Other optimization schemes have been tried in the actual optimization process of Douyin, and good optimization effects have been achieved in specific scenes. You can refer to them:

  • Unified Gson object: Gson caches TypeAdapter for parsed classes, but this cache is Gson object level, different Gson objects do not reuse, through unified Gson object can realize TypeAdapter reuse;
  • Pre-created TypeAdapters: For scenarios with sufficient concurrent space, we create typeAdapters of related classes in the asynchronous thread in advance, and then directly use the pre-created TypeAdapters for data parsing.
  • Use other protocols: For serialization and deserialization of local data, we tried to use binary sequential storage, reducing the time of deserialization by 95%. In the case of cross-version data incompatibations, we roll back to version-compatible Gson parsing via version control.

1.5 UI rendering optimization

After the Activity phase optimization, let’s take a look at the UI rendering phase optimization. In this phase, we will introduce optimization related to View loading.

In general, there are two ways to create a View: the first is to build the View directly from code, and the second is to load an XML file with a LayoutInflate. The focus here is on optimizing LayoutInflate loading XML. Loading XML with a LayoutInflate involves three steps:

  1. The IO process of parsing an XML file into an in-memory XmlResourceParser
  2. Java reflection of the Class based on the XmlResourceParser Tag name;
  3. Create a View instance and eventually generate a View tree.

These three steps are time-consuming on the whole. At the business level, we can optimize the LOADING time of XML to some extent by optimizing the XML hierarchy and loading on demand using the ViewStub approach.

Here we introduce another common optimization scheme, asynchronous preloading. The fragment rootView in the following figure is inflate out during the measure phase of UI rendering, and there is a gap between the application startup and the measure. We can use this time to load these views into memory in the background thread and read them directly from memory in the measure phase.

X2c solves the lock problem

Androidx already provides asynclayoutinflaters for asynchronous loading of XML. However, when using asynclayoutinflaters directly, it is easy to lock and even more time consuming.

Analysis shows that this is because there is an object lock in the LayoutInflate, and even if this object lock is bypased by building different LayoutInflate objects, there will still be other locks in the AssetManager layer, Native layer. Our solution is XML2Code, which generates the code to create the View for the annotated XML file at compile time, and then asynchronously precreates the View. X2c solution solves the problem of multi-thread locking and improves the efficiency of View pre-creation. At present, the scheme is being polished, and it will be introduced in detail after polishing.

The problem of LayoutParams

In addition to the problem with asynchronous synchronized locks, another problem is the LayoutParams problem.

LayoutInflater uses the root parameter to manipulate View LayoutParam. If root is not null, A LayoutParams of type root inflate will be constructed for the View and LayoutParams will be set, but we can’t get the root layout when we are asynchronously inflate, and if the root is null, The LayoutParams of the synchronized View will be null, and the default value will be used when the View is added to the parent layout, which will cause the properties of the synchronized View to be lost, The solution to this problem is to create a new root of the corresponding type at pre-load time to properly resolve the inflate View property.

Public View inflate(XmlPullParser Parser parser, @nullable ViewGroup root, Boolean attachToRoot) {// Omit other logic if (root! = null) { // Create layout params that match root, if supplied params = root.generateLayoutParams(attrs); if (! attachToRoot) { // Set the layout params for temp if we are not // attaching. (If we are, we use addView, below) root.setLayoutParams(params); } } } public void addView(View child, int index) { LayoutParams params = child.getLayoutParams(); if (params == null) { params = generateDefaultLayoutParams(); if (params == null) { throw new IllegalArgumentException("generateDefaultLayoutParams() cannot return null"); } } addView(child, index, params); }Copy the code

Other problems

In addition to the multithreaded locking and LayoutParams problems mentioned above, there are some other problems encountered during the preloading process, which are as follows:

  1. The problem with the inflate thread priority: In general, background threads have a low priority, and it may be too late to pre-load or even more time-consuming than not to pre-load asynchronously because the priority of the synchronized synchronized thread is too low. In this case, it is recommended to properly promote the priority of the asynchronous synchronized thread.
  2. Handler problem: There are some custom views that create handlers, in which case we need to modify the Handler creation code to specify the Looper for the main thread.
  3. Thread requirements: Typically, a custom View uses an animation, and the animation checks if it is the main thread of the UI thread at start. In this case, we need to modify the business code to move the relevant logic to the actual View tree.
  4. Scenarios where you need to use the Activity context: One solution is to preload the Activity asynchronously after it has started. This approach does not have to deal specifically with the View context, but the preloaded concurrent space may be compressed; Another way is to preload the Application context in the Application phase, but replace the preloaded View context with the Activity context before adding it to the View tree. To meet the needs of the Activity context for Dialog display, LiveData use and other scenarios.

1.6 Main thread Time message optimization

Above, we have basically introduced the optimization related to each major life cycle of the main thread. In the actual optimization process of Tiktok, we found that some main-thread time-consuming messages posted between these life cycles would also affect the startup speed. Such as between Application and Activity, between Activity and UI rendering. These main thread messages cause delays in the rest of our lifecycle, affecting startup speed, and we need to optimize them.

1.6.1 Main thread message scheduling

For their own engineering code, we can be more convenient optimization; However, some of them are internal logic of third party SDKS that are difficult to optimize; Even for messages that are easily optimized out, the cost of preventing degradation is very high. We try to solve this problem from another Angle by adjusting the main thread message queue while the optimized part posts messages to the main thread so that the messages related to the start are executed first.

Our core principle is to determine the core startup path according to the App startup process, and use message queue adjustment to ensure that the cold startup scenario involves priority scheduling of related messages, thus improving the startup speed. Specifically, it includes the following:

  1. Create a custom Printer. Replace the original Printer by Looper’s setMessageLogging interface and forward the original Printer.
  2. Update the next message to be scheduled in the onCreate of the Application and the onResume of the MainActivity. The expected target message after the onCreate of the Application is Launch Activity. The expected message after MainActivity’s onResume is the render related doFrame message. To reduce the impact, disable messages are scheduled after startup or after an abnormal path has been executed.
  3. The specific execution of message scheduling is carried out in the println method of custom Printer. In the println method, the main thread message queue is traversed, and the target message in the message queue is judged according to message.what and message.gettarget (). If it exists, it is moved to the head and executed first;
1.6.2 Main Thread Time Message optimization

Through main thread message scheduling, we can solve the influence of main thread message on startup speed to a certain extent, but it also has certain limitations:

  1. We can only adjust messages that are already in the message queue. For example, after MainActivity onResme, there is a main thread message that is time consuming, and the doFrame message has not entered the main thread message queue, then we need to execute our time consuming message before we can execute the doFrame message. It still has an impact on startup speed;
  2. Although we remove the main thread time message from the startup phase, there will still be lag after startup.

For these two reasons, we need to optimize the time consuming messages for the main thread during startup.

Generally speaking, the main thread time messages are mostly business related, so you can directly find the problem logic through the stack of the main thread output by the trace tool and conduct targeted optimization. Here we mainly introduce a case optimization that other products may also encounter — the main thread time caused by WebView initialization.

In we find a main thread in the process of the optimization of the large time consuming, the call stack. The first layer for WebViewChromiumAwInit startChromiumLocked, is the code in system Webview, By analyzing the WebView code found that it is in WebViewChromiumAwInit ensureChromiumStartedLocked post to the main thread, for the first time in each process cycle using WebView will perform a, Either calls made on the main thread or child threads will eventually be posted to the main thread, resulting in time consuming. Therefore, we cannot solve the main thread lag by modifying the calling thread. At the same time, because it is the system code, we cannot solve the problem by modifying the code implementation, so we can only try to optimize it from the perspective of use at the business layer.

Void ensureChromiumStartedLocked (Boolean onMainThread) {/ / omit other logic / / We must post to the UI thread to cover the case that the user has invoked Chromium // startup by using the (thread-safe) CookieManager rather than creating a WebView. PostTask.postTask(UiThreadTaskTraits.DEFAULT, new Runnable() { @Override public void run() { synchronized (mLock) { startChromiumLocked(); }}}); while (! mStarted) { try { // Important: wait() releases |mLock| the UI thread can take it :-) mLock.wait(); } catch (InterruptedException e) { } } }Copy the code

Problem orientation

From the perspective of business optimization, we first need to find the usage point of the business. Although we find that the time-consuming message is webView-related by analyzing the code, we still cannot locate the final call point. To locate the final call point, we need to know something about the WebView call flow. System WebView is an independent App, other applications for the use of WebView need to go through a framework class called WebViewFactory, In this class, we get the Webview application’s Context from the package name of the Webview, and then we get the Webview application’s Classloader from that Context, Finally, load the relevant SO of Webview through ClassLoader, load the implementation class of WebViewFactoryProvider in Webview and instantiate it. Subsequent calls to WebiView are made through the WebViewFactoryProvider interface.

Through subsequent analysis, it is found that the first invocation of getStatics, getGeolocationPermission, createWebView and other methods of WebViewFactoryProvider interface will be triggered WebViewChromiumAwInit ensureChromiumStartedLocked to the main thread post a lengthy message, therefore our problem becomes to WebViewFactoryProvider call localization of relevant methods.

One way to locate the WebViewFactoryProvider is to peg it. Since WebViewFactoryProvider is not a class that can be directly accessed by the application, we must call the WebViewFactoryProvider by calling other code of the framework. In this case, we need to analyze all the call points to WebViewFactoryProvider in the framework, and then peg and log all the calls to those call points in the application to locate them. It is obvious that the cost of this way is relatively high, and it is easy to overlook the situation.

In fact, in the case of WebViewFactoryProvider, we can take a much more convenient approach. From the previous analysis we know that WebViewFactoryProvider is an interface, and we get it by reflection the way it’s implemented in the Webview application, So we can replace WebViewFactoryProvider in WebViewFactory by dynamically generating a WebViewFactoryProvider object. Filter the method name in the invoke method of the generated WebViewFactoryProvider class and output its call stack for our whitelisted method. In this way we eventually locate the main thread time logic that is triggered by our WebView UA fetch.

The solution

Confirm that our time is caused by obtaining WebView UA, we can use local cache to solve the problem: Considering that WebView UA records the version of WebView and other information, it will not change in most cases, so we can completely cache WebView UA locally, and then directly read from the local, and every time the application is cut to the background, Get a WebView UA update to the local cache to avoid delays in use.

In the case of Webview UA changes caused by Webview upgrade, the cache scheme may not be updated in time. If the real-time requirement of Webview is very high, We can also call the child process ContentProvider to get the WebView UA in the child process, which will affect the main thread of the child process but will not affect our foreground process. Of course, due to the need to start a sub-process and go through the complete Webview UA reading, this method is obviously inferior to the local cache method in terms of reading speed. It is not suitable for some scenarios requiring reading speed. We can adopt the corresponding scheme according to the actual needs.

2. Background task optimization

The previous cases are basically about the optimization of the main process-related time. In fact, in addition to the direct time of the main process, the time of background tasks will also affect our startup speed, because they will seize the CPU, IO and other resources of the foreground task, resulting in a longer execution time of the foreground task. Therefore, we need to optimize our background tasks while optimizing the foreground time. Generally speaking, the optimization of background tasks has a strong correlation with specific businesses, but we can also sort out some common optimization principles:

  1. Reduce the background thread unnecessary task execution, especially some heavy CPU, IO tasks;
  2. The number of threads in the startup stage is converged to prevent excessive concurrent tasks from preempting main thread resources, and to avoid frequent inter-thread scheduling that reduces concurrency efficiency.

In addition to these general principles, two typical cases of background task optimization in Douyin are also introduced here.

2.1 Process Startup Optimization

In the process of optimization, we need to pay attention not only to the running status of the background thread of the current process, but also to the running status of the background process. At present, most applications have push function. In order to reduce power consumption in the background and avoid process killing due to occupying too much memory, push related functions are generally placed in a separate process. If the push process is started in the startup stage, it will also have a great impact on our startup speed. We try to delay the start of the push process appropriately to avoid starting in the startup stage.

In offline cases, you can filter keywords such as “Start proc” in logcat to find out whether subprocesses are started during startup and obtain information about components that trigger the Start of subprocesses. For some complex projects or tripartite SDKS, it is difficult to locate the specific startup logic even if we know the components of the startup process. We can precisely locate our trigger point by inserting calls to startService, bindService, Recevier and ContentProvider components and entering the call stack, combined with the components in “Start Proc”. In addition to the manifest life process, there may be some cases of forking out native processes, which can be found through adb shell ps.

2.2 the GC inhibition

Another typical case of background tasks affecting startup speed is GC, which may seize our CPU resources or even cause our threads to be suspended. If there are a large number of GC during startup, our startup speed will be greatly affected.

One way to solve this problem is to reduce our startup code execution and reduce memory resource utilization. This solution requires us to change our code implementation and is the most fundamental solution to solve the gc impact on startup speed. At the same time, we can also reduce the impact of GC on startup speed through the general method of GC suppression. Specifically, we can suppress some types of GC during startup to achieve the purpose of reducing GC.

Recently, the company’s Client Infrastructure-App Health team investigated the GC suppression scheme of ART VIRTUAL machines, which has been tried to optimize the startup speed of applications on some products of the company. Detailed technical details will be shared on the bytedance Terminal Technology public account after further polishing.

3. Global optimization

Case basic described above are some of the more time-consuming point optimization for a phase, in fact we also exist some single time consuming less obvious, but it may affect the global high frequency points, such as the high frequency function in our business, such as we the class loading, methods of execution efficiency, and so on, here we are going to the optimization of trill in these aspects to try to do some introduction.

3.1 Class loading optimization

3.1.1 this optimization

First let’s take a look at an example of tiktok’s class loading optimization. We can’t talk about class loading without the parent delegate mechanism of class loading. Let’s briefly review the class loading process under this mechanism:

  1. If it can find it, it returns it directly. If it cannot find it, it calls the loadClass of the parent ClassLoader to find it.
  2. If the parent clasloader can find the relevant class, return it, otherwise call findClass to load the class.
protected Class<? > loadClass(String name, boolean resolve) throws ClassNotFoundException { Class<? > c = findLoadedClass(name); if (c == null) { try { if (parent ! = null) { c = parent.loadClass(name, false); } else { c = findBootstrapClassOrNull(name); } } catch (ClassNotFoundException e) { } if (c == null) { c = findClass(name); } } return c; }Copy the code

This in the Android

One of the most important aspects of parental delegation is the parent-child relationship of classLoaders. Let’s take a look at classloaders in Android. Generally, Android has two classLoaders, namely BootClassLoader and PathClassLoader. BootClassLoaderart is responsible for loading Android SDK classes. Our activities, textViews, and so on are loaded by the BootClassLoader. The PathClassLoader is responsible for loading classes in the App, such as our custom Activity, FragmentActivity from the Support package, and other classes that will be called into the App. BootClassLoader is the parent of PathClassLoader.

ART virtual machine class loading optimization

The ART virtual machine still follows the principle of parental delegation when it comes to class loading, but the implementation has been optimized. In general, its general process is as follows:

  1. The findLoadedClass method of the PathClassLoader is called to find the loaded class. This method is called to the LookupClass method of the ClassLinker via JNI, and returns if it can be found.
  2. Under the condition of the loaded class can not find, not immediately returned to the Java layer, it will be in the native to class lookup call ClassLinker FindClassInBaseDexClasLoader;
  3. In FindClassInBaseDexClasLoader, first of all to judge whether this is BootClassLoader currently, If it is BootClasLoader, it will try to search from the loaded class of the current ClassLoader. If it can find it, it will return directly. If it cannot find it, it will try to use the current ClassLodaer to load it.
  4. If the current ClassLoader is not BootClassLoader, it checks whether it is PathClasLoader. If it is not PathClassLoader, it returns directly.
  5. If the current ClassLoader is a PathClassLoader, it checks whether the current PathClassLoader has parent. If there is a parent, the parent into the recursive call FindClassInBaseDexClasLoader method, if can find the direct return; If no parent is found or the current PathClassLoader does not have any parent, the class is loaded directly in the native layer through DexFile.

If only PathClassLoader is on the ClassLoadeer link from PathClassLoader to BootClassLoader, the Java layer findLoadedClass method is called. Instead of just looking up loaded classes, as it literally means, it loads classes directly in the native layer through DexFile. In this way, compared with calling findClass back to the Java layer and then calling back to the Native layer through DexFile loading, an unnecessary JNI call can be reduced and the operation efficiency is higher. This is an optimization of the class loading efficiency of ART VIRTUAL machine.

ClassLoader model in Tiktok

In order to answer this question, we need to take a look at the ClassLoader model in Tiktok. In order to reduce the package size in Douyin, some non-core functions are delivered dynamically through plug-ins. After the plug-in framework is connected, the ClassLoader model of Tiktok is as follows:

  1. In addition to BootClassLoader and PathClassLoader, DelegateClassLoader and PluginClasLoader are introduced.
  2. DelegateClassloader global 1, which is the parent of PathClassLoader, whose parent is BootClassLoader;
  3. The parent of the PluginClassLoader is BootClassLoader.
  4. The DelegateClassLoader holds the PluginClassLoader reference and the PluginClassLoader holds the PathClasloader reference;

This ClassLoader model has a very obvious advantage, that is, it can be very convenient to support the isolation, reuse and plug-in and componentization switch;

  1. Isolation of classes: If a class with the same name exists in the host and multiple plug-ins, a class used in the host will be loaded from the host APK first, and a class used in the plug-in will be loaded from the current plug-in APK first. This loading mechanism is not supported by the plug-in framework of the single ClassLoader model.
  2. Class reuse: When using a class specific to a plug-in in the host, we can detect a class loading failure in the DelegateClassLoader and use PluginClassLoader to load the class from the plug-in to enable the host to reuse the class from the plug-in. When using a hose-specific class in a plug-in, the failure of class loading can be detected in PluginClassLoader, and then the PathClassLoader can be used to load, so that the plug-in can reuse classes in the host. This reuse mechanism is not supported by plug-in boxes of other multi-classLoader models;
  3. Free switching between componentized and compileOnly: With this ClassLoader model, you can easily switch between directly dependent componentized and compileOnly + plugins by loading the host/plugin classes without any explicit ClassLoader.

ART class loading optimization is broken

The advantages of Tiktok’s ClassLoader model have been described above, but there is a hidden disadvantage, that is, it will break the ART virtual machine’s optimization mechanism for class loading.

As we know from the previous introduction, when there is only PathClassLoader on the ClassLoader link from PathClassLoader to BootClassLoader, classes can be loaded in the native layer to reduce one JNI call. In Tiktok’s ClassLoader model, there is a DelegateClassLoader between PathClassLoader and BootClassLoader, Its existence will cause the condition of “only PathClassloader on the ClassLoader link from PathClassloader to BootClassLoader” to be broken, which will cause the first loading of all classes in our app to require one more jNI call. Under normal circumstances, one more jNI call will not bring much consumption, but for the startup stage of a large number of class loading scenarios, this impact is also relatively large, which will have a certain impact on our startup speed.

Non-invasive optimization: delayed injection

Now that you know how plug-ins can negatively impact class loading, the optimization idea is clear — remove the DelegateClassLoader from between PathCLasLoader and BootClassLoader.

From the previous analysis, we know that DelegateClassLoader is introduced so that a PluginClassloader can be used to load the DelegateClassLoader into the plugin if the PathClassLoader loadClass fails. The DelegateClassloader is completely unnecessary, we can inject the DelegateClassloader only when we need the plug-in functionality.

In practice, however, this can be difficult because we don’t know exactly when the plug-in is loading. For example, we can implicitly rely on compileOnly, load the plug-in’s class, or trigger the plug-in’s view in XML. If you want to adapt to business development will bring relatively large intrusion.

Here’s another way to optimize — we can’t know exactly when plug-ins are loading, but we can know where they aren’t loading. For example, if the Application phase is not loaded with plug-ins, it is perfectly appropriate to wait for the Application phase to complete before injecting the DelegateClassloader. In fact, during startup, class loading is mainly concentrated in the Application stage. By DelegateClassloader injection after Application execution, the impact of plug-in scheme on startup speed can be greatly reduced and intrusion into service can also be avoided.

Intrusive optimization solution: transform the ClassLoader model

The above scheme does not need to invade the business and the transformation cost is very small, but it only optimizes the class loading in the Application stage, and the optimization of the class loading in the subsequent stage of ART is still not available. From the perspective of the ultimate performance, we have made further optimization. The core idea of our optimization is to completely remove the DelegateClassloader from the PathClassLoader and BootClassLoader and solve the problem of the host loading the plug-in class in other ways. Through analysis, we can know that the host loads the classes of the plug-in in several main ways:

  1. Class. ForName to reflect the Class that loaded the plugin.
  2. Run time loads plug-in classes directly through compileOnly, which implicitly depends on them.
  3. Load the plug-in’s component classes when you start the plug-in’s four components.
  4. Classes that use plug-ins in XML;

The problem then becomes how to implement these four categories of host-loaded plug-ins without injecting a ClassLoader.

The most direct way to solve the problem of plug-in Class loading is to specify ClassLoader as DelegateClassloader when calling class.forName. However, this approach is not business-friendly, and there are some problems with the code in the third-party SDK that we cannot modify. Our final solution is to bytecode stake the class.forname call and then try to load it using DelegateClassloader if the Classload fails.

Then there are the implicit dependencies on compileOnly, which are harder to handle generically because you can’t find a good time to check for class load failures. The solution to this problem is to change the business to fetch eOnly implicitly via class.forname for several reasons:

  1. First, there are very few ways to fetch eonly implicit dependency calls in Tiktok, so the transformation cost is relatively manageable.
  2. Then, compileOnly is convenient to use, but it has some problems in plug-in load control, troubleshooting, and compatibility between plug-in host versions. Class. ForName + interface can solve these problems.

The loading of the four major component classes of plug-ins and the use of plug-in classes in XML can all be solved with the same solution — replace ClassLoader in LoadedApk with DelegateClassLoader, Either the loading of the four component classes or the class loading when the LayoutInflate loads the XML is loaded using the DelegateClassLoader, For this part of the principle we can refer to DroidPlugin, Replugin and other related plug-in principle analysis, here will not expand the introduction.

3.1.2 Class Verify Optimization

For the optimization of ClassLoader, the optimization is the load stage of the class loading process, for other stages of class loading can also be optimized, a typical case is the optimization of classVerify, The classVerify process is used to verify that the class complies with the Java specification. If it does not comply with the specification, it throws verify-related exceptions during the Verify phase.

Typically, classes in Android perform Verify when an application is installed or a plug-in is loaded, but there are certain cases, For plugins after Android10, plugins compiled using extract filter type, and static verify fails due to host/plugin interdependence, verify needs to be run at runtime. In addition to verifying classes, the process of running Verify also triggers a load of classes that it depends on, which can be time-consuming.

In fact, classVerify is mainly used to verify bytecode sent from the network. For our plug-in code, it will verify the validity of the class during compilation. And even if an invalid class does appear, At most, this is the time to transfer exceptions thrown by the Verify phase to class use.

Therefore, it can be argued that classVerify at runtime is unnecessary and that the loading of these classes can be optimized by turning classVerrify off. There are some good ways to disable ClassVerify. For example, the runtime locates the memory address of verify_ and then sets it to skip Verify mode to skip classVerify.

 // If kNone, verification is disabled. kEnable by default.
  verifier::VerifyMode verify_;


  // If true, the runtime may use dex files directly with the interpreter if an oat file is not available/usable.
  bool allow_dex_file_fallback_;


  // List of supported cpu abis.
  std::vector<std::string> cpu_abilist_;


  // Specifies target SDK version to allow workarounds for certain API levels.
  int32_t target_sdk_version_;
Copy the code

Of course, the optimization scheme of turning off classVerify may not be of value to all applications. Before optimization, OATdump command can be used to output the information of classVerify classes in the host and plug-in at runtime. For situations where there are a large number of classes in Verify at run time, the scheme described above can be optimized.

oatdump --oat-file=xxx.odex > dump.txt
cat dump.txt  | grep -i "verified at runtime" |wc -l
Copy the code

3.2 Other global optimizations

In terms of global optimization, there are some other relatively common optimization schemes. Here are some simple introductions for your reference:

  • High-frequency method optimization: optimized high-frequency call methods such as service discovery (SPI) and experimental switch reading, and preloaded the operations such as annotation reading and reflection originally at runtime to the compilation stage. The execution speed was improved by directly generating object code to replace the original call in the compilation stage;
  • I/O optimization: Reduce unnecessary I/OS during startup, prefetch I/OS on key links, and improve I/O efficiency.
  • Binder optimization: Cache results of binders that will be called several times during startup to reduce the number of IPC, such as packageInfo acquisition, network state acquisition, etc.
  • Lock optimization: Reduce the impact of lock problems on startup by removing unnecessary locks, reducing lock granularity, reducing lock duration, and other general solutions
  • Bytecode execution optimization: reduce some unnecessary bytecode execution by inlining method calls. At present, Bytex has been integrated in Tiktok Bytex open-source framework as a plug-in (see Bytex introduction for details).
  • Preloading optimization: make full use of the concurrent capability of the system, preload all kinds of resources accurately and accurately in the asynchronous thread by means of user portrait and terminal intelligent prediction, so as to eliminate or reduce the time consuming of key nodes. The content available for preloading includes SP, Resource, View, class, etc.
  • Thread scheduling optimization: Sleeping state and Uninterrupible Sleeping time can be reduced by dynamic priority adjustment of tasks and load balancing on different CPU cores. Without increasing CPU frequency, Improved CPU time slice utilization (solution provided by Client Infrastructure-App Health team);
  • Cooperation with manufacturers: Cooperate with manufacturers to obtain more system resources through CPU core binding and frequency boosting, so as to achieve the purpose of improving startup speed;

Summary and Prospect

So far, we have introduced typical and common cases in Douyin startup optimization, hoping that these cases can provide some reference for everyone’s startup optimization. Review trill past all start the related optimization, optimization of general only accounted for a fraction of, more to do with the business related optimization, this part of the optimization has a strong business relationship, other business can’t directly to migrate, for this part we summarizes some optimization methodology, specific can see “theory and tools of the startup performance optimization”. Finally, from the point of view of practice, we do some summaries and prospects for startup optimization, hoping to be helpful to everyone.

Continuous iteration

Start-up optimization is a process that needs continuous iteration and polishing. Generally speaking, the initial stage is the “fast and fierce” rapid optimization stage. In this stage, the optimization space will be relatively large, the optimization granularity will be relatively coarse, and good profits can be achieved with less manpower input. The second stage is the difficult stage, which requires more investment than the first stage, and the final improvement effect depends on the difficult stage. The third stage is the anti-deterioration and continuous refinement and optimization process. This process is the most lasting one, and is also very important for products with rapid iteration. It is the only way to achieve the ultimate startup performance.

Scene: the generalization

Startup optimization also needs to be extended and generalized. In general, we focus on the time from the user clicking icon to the first frame of the home page, but with the increase of commercial screen opening and push click scenes, we also need to expand to these scenes. In addition, there are many times when the first frame of the page appears, but the user still cannot see the desired content, because the user may not pay attention to the time of the first frame of the page, but the time when the valid content is loaded. Trill, for example, we are paying attention to start speed at the same time, also pay attention to video first time frame, from the AB experiment the index is even more important than startup time, other products can also be combined with their own business, to define some corresponding indicators, verify the impact on the user experience, to decide whether to need to be optimized.

The global consciousness

Generally speaking, we measure startup performance in terms of startup speed. In order to speed up the startup process, some tasks that were originally performed during the startup phase may be postponed or on demand. This can optimize the startup speed, but it can also damage the subsequent user experience. For example, if you defer a background task during a startup phase to later use, you may end up with a usage lag if the first use is on the main thread. Therefore, as well as focusing on startup performance, we need to pay attention to other metrics that may affect it.

In terms of performance, we need a macro index that can reflect the global performance to prevent local optimal effect. We need to establish a business startup performance with the business relationship, specifically in the optimization process as much as possible to start some of the larger optimization support AB ability, to do so on the one hand can realize the qualitative analysis of the optimization, prevent some local performance gains but will hurt global experience of negative optimization was taken to line up; On the other hand, the qualitative analysis ability of the experiment can be used to quantify the effect of each optimization on the business, so as to provide guidance for the subsequent optimization direction. At the same time, it can also provide rollback ability and timely stop loss for changes that may cause stability or abnormal function.

At present, Volcano Engine, bytedance’s enterprise-level technical service platform, has opened its AB experiment capability to the public. Students who are interested in volcano Engine can visit its official website for more information.

Full coverage and fine operation

There are two major goals for tiktok startup optimization in the future. The first goal is to maximize the coverage of startup optimization. In terms of experience, we should optimize functions such as interaction and content quality while optimizing performance, so as to improve the reach efficiency and quality of functions. In terms of the scene, all kinds of startup modes, such as cold start, warm start and hot start, and the landing page are fully covered; The optimization direction covers CPU, IO, memory, locking, UI rendering and other optimization directions. The second goal is to realize startup optimization and refined operation, so that thousands of people can work for thousands of hours and thousands of faces. Different startup strategies are adopted for different users, different device performance and status, and different startup scenarios to maximize the experience optimization.

Join us

Tiktok Android basic technology team is a deep pursuit of the ultimate team, we focus on the performance, architecture, package size, stability, basic library, compilation and construction of the direction of deep cultivation, to ensure the super-large team research and development efficiency and the use of hundreds of millions of users experience. At present, Beijing, Shanghai, Hangzhou, Shenzhen have a large number of talents in need, welcome people with lofty ideals to work with us to build a hundred million user APP!

If you are interested in this opportunity, you can click here to visit bytedance’s recruitment website to inquire about positions related to “Douyin Basic Technology Android”. You can also contact us by email: [email protected] or send your resume to us directly.