The background,

Distance last startup optimizer (start task classification) for almost 2 years apart, although has maintained before startup, but each version will cost a lot of time trying to start deltas, want to do is start an automated monitoring process to reduce the time cost, and found in the startup monitor development part start can be optimized, So I optimized the startup as well.

This paper mainly involves the following aspects:

  • 1. Startup optimization: startup process, how to optimize, push startup optimization, binary rearrangement, follow-up plan
  • 2, automatic start monitoring

Second, the results

1. Startup optimization: Self-test on iPhone8Plus, from clicking the icon to the full loading of the home page image reduced from 1.2s to 0.51s. The test students verified on iPhone6 and iPhone8 respectively, and the total startup time was reduced by 50%-60% compared with the online version.

2. Start monitoring: At a fixed time every night, the device will automatically start the application 10 times, upload the start data and diff the data of the last day, and send the method of exceeding the INCREMENT of DIFF data to the email of the code submitter, prompting the corresponding students to modify.

Below is the optimized startup of 8Plus

Third, optimize the idea

1. How to define the start and end time?

Before optimization, it is necessary to standardize and unify the calculation standard of startup time, so as to measure startup time and optimization effect.

1.1 Startup Process

According to the figure below, the start time is defined as the user clicks icon, and the end time is defined as the completion of data display on the home page

1.2 Calculate the start time and end time

1.2.1 Test Criteria:

The screen recording tool is used to record the startup of the APP. The startup time is calculated by the clip function of QuickTime Plyaer or the unframe of the video. The startup time is calculated by clicking the Appicon to turn gray and the end time is the time when the home page picture is fully displayed.

1.2.2 How to count codes:

  • Start time: using the current processIdentifier (NSProcessInfo\processIdentifier), the process creation time (__p_starttime) in the process information is the starttime.
+ (NSTimeInterval)processStartTime {struct kinfo_proc kProcInfo; if ([self processInfoForPID:[[NSProcessInfo processInfo] processIdentifier] procInfo:&kProcInfo]) { return Kpprocinfo.kp_proc.p_un.__p_starttime.tv_sec * 1000.0 + kprocinfo.kp_proc.p_un.__p_starttime.tv_usec / 1000.0; } else {NSAssert(NO, @" can't get process information "); return 0; } } + (BOOL)processInfoForPID:(int)pid procInfo:(struct kinfo_proc*)procInfo { int cmd[4] = {CTL_KERN, KERN_PROC, KERN_PROC_PID, pid}; size_t size = sizeof(*procInfo); return sysctl(cmd, sizeof(cmd)/sizeof(*cmd), procInfo, &size, NULL, 0) == 0; }Copy the code
  • End time: to the front page all the pictures is loaded for the end time, hook images are downloaded method, before the start to finish to all calls to the method of url into an array, remove array images after the download is complete, when the number of array elements is zero, on behalf of the home page of pictures download is complete, is the end of time, the following is a hook pseudo code:
- (void)hook_setImageWithUrl:(NSString *)url completed:(completedBlock)completed {// starting has completed the logic if before hook (LaunchSteps.launchFinished) { [self hook_setimageWithUrl...] ; return; } [LaunchImageArray addobject:url]; completedBlock newCompletedBlock = ^(...) { [LaunchImageArray removeObject:url]; If (LaunchImageArray. Count = = 0) {/ / array number 0 is for all the images are downloaded complete LaunchSteps. LaunchFinished = YES; } if (completed) { completed(...) ; } } [self hook_setImageWithUrl:url completed:newCompletedBlock]; }Copy the code

2. Optimize the two steps

2.1 to find

According to the startup process, find out the time-consuming methods for app startup that are not needed in the startup process.

2.2 change

The time-consuming methods are subdivided to find the parts that can be optimized and modified. Lazy loading or delaying execution of methods that are not needed in the startup process until after the startup is complete.

3. Optimization of pre-main

The whole process of pre-main can be seen in the previous figure. There are very mature and abundant materials explaining this process, so we will not repeat it here. In short, what we can do in this stage is as follows:

  • 1. Load Dylibs stage: reduce or merge dylibs, change dynamic library into static library.
  • 2. Rebase/Bind: Reduce the number of classes, methods, and classes
  • 3. Objc Setup stage: Nothing to do
  • Initializers stage: optimize +load methods, reduce constructor functions, and reduce C++ static global variables

In fact, the above parts have been almost done in the last optimization (two years ago). For example, the SDK of the company has been changed into static libraries, and the category and load methods have also been processed. Delete unused classes, methods, resources this in a long time ago when package volume optimization has done more thoroughly, and now also has a set of automated process to manage the package volume increment of every day, so the whole of the pre – the main start optimization can do very little, but in order to break through the original optimized speed, also made some coolie live, Due to the long history of the project and the large number of codes and libraries, the whole time of pre-main is relatively high. Therefore, I want to measure the influence of the introduction of each library on the start of the whole project. By creating a new project, I will import the libraries in the Podfile into the new project one by one. Then roughly estimate the pre-main time for each library by:

Set the environment variable DYLD_PRINT_STATISTICS to 1 in Xcode, which outputs the elapsed time of pre-main.

  • Add podA to podfile
  • 2, the pod update
  • 3. Restart the device (important!) , Xcode runs the new project, records the pre-main time, compares the difference of the time when the podA library is not added, and then repeats steps 1, 2 and 3 to roughly calculate the impact of the pre-main time brought by the introduction of each library project.

After figuring out the approximate time of each library, we evaluated some libraries that were not so important and took tens of milliseconds, such as a Refresh library (which itself had a similar logic), and we removed it and changed the parts used. Promote optimization of libraries that are time-consuming and convenient to promote and modify (those that are inconvenient to promote are easy to be hit, pay attention to safety ⚠️). This step will remove about 3-5 libraries.

4. Optimization in main stage

The first step of optimization in the main stage is to find the tasks that can be optimized, and three ways are provided to find them:

Solution a:

By walking through the code, we can see which tasks are unnecessary in the whole startup link, and then postpone. We can output the time after the execution of each method through NSLog in the way of staking and driving, and count the time of each method. We can subdivide the time of the method with high time consumption, dismantle the tasks that can be disassembled, and postpone the tasks that can be postponed. This scheme is relatively straightforward and simple. If there is no task priority sorting, this method can also speed up the startup speed. The disadvantage is that some unnecessary tasks caused by some dependencies cannot be found out.

CFAbsoluteTime start = CFAbsoluteTimeGetCurrent(); [self doSomething]; NSLog(@"doSomething : %f",CFAbsoluteTimeGetCurrent() - start); // The extern keyword is extern in other classes, such as in main.m: CFAbsoluteTime kAppStartTime; int main(int argc, char *argv[]) { kAppStartTime = CFAbsoluteTimeGetCurrent(); } // someClass.m extern CFAbsoluteTime kAppStartTime; CFAbsoluteTime duration = (CFAbsoluteTimeGetCurrent() - kAppStartTime);Copy the code

Scheme 2:

Using hook objc_msgSend method, generate a JSON file according to the required data format of the flame graph. Upload the JSON file to chrome://tracing/ Generate a flame graph. We can easily see which methods are executed and how much time is spent during startup. Next, we need to analyze the necessity of calling each task at startup and then optimize it. Below is the flame chart without optimization:

The startup timeline is from left to right, and from top to bottom: methods B, C, and D are called from method A. The A method is at the top and the BCD is at the bottom. For example, the APPDelegate swizzied_didFinishLuanch method calls launch… And MainTabbarController. The analogy can be used to find out which method was ultimately called and how long it took.

Plan 3: APP Launch tool

APP Launch tool is the most optimized and comprehensive detection tool at present, and it is also the official address recommended by Apple. It also includes the functions of Time Profile and System Trace. The fire map only captures the main thread (other threads can be captured, And there are some very subtle time-consuming operations can not be caught, direct use of this tool to do startup optimization is also completely feasible, a brief introduction to the use of this tool:

  • Xcode Debug Information Format DWARF with dsYM File in build Settings
  • Xcode compiles and runs the project
  • Start the APP with Xcode –> Open Developer Tool –> Instruments –> APPLaunch. APPLaunch will automatically close the APP after 5 seconds.
  • If the analysis data obtained is not symbolized, select file –> Symbols in the upper left corner of the APP Launch screen and select the Symbols with a green light to restart the project in APP Launch.

It is roughly the operation mode shown in the above figure to find out the time of all tasks in the main thread, and check whether each task is optimized in the startup link one by one according to the stack on the right side of the figure (even if it is several milliseconds).

Modify the time-consuming tasks found

Through the scheme introduced above, you can find the tasks that can be optimized. Here are some examples that can be used for reference:

  1. Lazy loading/postponement methods: In the early part of didFinishlanched method, there are many manual hook methods, some of which can be optimized, such as hook route jump, the function is after startup in the broadcast room related components are not initialized and the execution of the operation into the broadcast room will cause an exception, but there is no routing operation at startup. This hook can be deferred until the initialize method executes the route for the first time.

  2. Preloading pictures: According to the last part of the dynamic map of App Luanch, it can be found that the time of 21ms (while the flame map statistics is around 45ms) is when tabItem is set up with pictures, with a total of 5 tabs and 10 pictures. The time is mainly due to the decoding operation of imageNamed:. Can this be optimized?

Because the imageNamed method is cacheable and thread-safe, it can be decoded in child threads at an earlier time. The hook imageNamed method is used to get the local image required at startup time, and the image is preloaded at an earlier time:

// Dispatch_async (dispatch_get_global_queue(0, 0)) ^{ NSArray *preloadImage = @[@"image1",@"image2"...] ; for (NSString *imageName in preloadImage) { [UIImage imageNamed:imageName]; }}); // You can evaluate whether preloading is effective by obtaining the time separately in scheme 1. - (void)setAllTabbarItems {CFAbsoluteTime start = CFAbsoluteTimeGetCurrent(); [self setItemImage...] ; NSLog(@"setAllTabbarItems : %f",CFAbsoluteTimeGetCurrent() - start); }Copy the code

Another point that is easy to overlook is that our drop-down refresh control has an image animation group on it, which is also time-consuming to decode, and we can postpone the entire drop-down refresh control to launch rather than dump all images to preload. Other operations that fetch database cache and sandbox cache can also be preloaded in this child thread.

  1. Delayed automatic login: after the automatic login is successful, a notification will be sent. In some places, time-consuming operations such as pulling configuration will be performed after receiving the notification. Automatic login is not required during the startup process (if the request on the home page needs to be transmitted, such as UID, can be cached first). Because our automatic login method is relatively hidden and triggered by many places, make a breakpoint in the automatic login location, run the program to see which steps in the startup process will lead to the login operation, and optimize it.

  2. Pre-request home page data: through the flame map analysis, there are two long time main thread is almost idle, can it be optimized? Is it because the main thread is suspended by another thread? Eventually come to the conclusion that because at that time in the request page data, waiting for the data rendered page, home page web request is on the front page of the viewdidload method, can change to didFinishLaunchingWithOptions earlier time request page data, shorten the main thread of free time, In the pre-request, we also need to consider the problem of network resource competition. We can find out all the NSURLSession requests at the start through the custom NSURLProtocol interception, try to ensure that the pre-request on the home page is the first request, and delay unnecessary network requests. We intercepted a certain SDK initialization directly sent a lot of requests and our IP directly connected related logic, resulting in the effect of pre-request home page is not obvious, (how to evaluate the effect of pre-request? After modifying the IP direct connection and SDK request, the data return of the home page is 150-200ms earlier, and the startup of iPhone8plus itself is more than 1s. Startup speed went up by 15%. Secondly, because our home page makes the parent controller structure, when the child controller is currently displayed, it will preload/render the left and right controllers, and in the startup process, this step will be delayed until the startup is completed (this time is also relatively high).

  3. Cache data: home page can advance request data early return to the time, and using the cache can directly cut off the network request time-consuming, common first use the cache with the request data refresh interface, experience the effect is very poor, if just use the cache, the effect will be very good, but start the interval is too long can lead to under most of the host would have on the home page, Therefore, we set a valid period for the cache (currently defined as 3-5 minutes). If the time difference between this startup and the last cached data does not exceed this period, the cache will be used directly, and if the time difference exceeds this period, the preloaded value will be used.

  4. Sectional home page load: we main structure is divided into the search box at the top of the front page, and the following data quickly, apparently more important is the following data quickly, and can delay the search box to load, but because the impact is not big (20 ms) and product support for this program is not too, no, if your app has the apparent multiple paragraphs, You can also make sure that important sections are shown first.

The power of APP Launch tools

After the above optimization, we used the APP Launch tool to check whether there were other areas that could be optimized, where system Trace related functions were used. Click on the triangle below to expand all the application threads and find the main thread.

Through the above we can see there is a waiting for the lock operation cause the main thread to block 47 ms measurement is 80 ms (sometimes), we need to find out why and then deal with it, quite simple to find the way is see by block in the main thread of this period of time, which the child thread on a mission, look all the thread tools to detect, It’s very easy to find a child thread that’s doing something, and there’s an interrupt -> execute -> interrupt call over and over, and we modify it, and finally

This time was reduced from 188ms to 78ms. Other blocks are also looked at, and the system behavior cannot be adjusted.

5. Click Push to start optimization

The startup tasks are divided into high, medium, and low priorities. The high priority is required for application startup, the middle priority is required for entering the live broadcast room and jumping to the page, and the low priority is the task to be executed after the startup is complete.

Optimization by clicking on a push into the landing page, actually is also introduce method of optimization in front of priority task, and secondly by a push into the studio, the user is expected to see live content, due to the request of the home page, load, and render takes up resources, so can push into the studio and on the link to the home page request delay to from the end of the studio.

6. Binary rearrangement

The principle of binary rearrangement:

Detect the number of page missing interrupts through APP Launch. Since there will be internal cache after the application is started, it is necessary to restart the device to clear the memory cache for detection.

When compiling binary code, the default is to write files in the order of the linked Object File(.o) and functions in the order of the functions inside the Object File. The order of the links is build Phases –> Compile Sources. You can use Xcode to set Build Settings –> Write Link Map File to YES to generate a Link Map File, and then check the order of symbolic links in the # Symbols: section of the Link Map.

With that in mind, all we need to do to implement binary rearrangement is to line up the symbols needed for startup at compile time, generating an executable file that will trigger as few broken pages as possible when the pages are loaded into memory.

  • How do I adjust the symbolic order of a project at compile time? The linker XCode uses is called LD, and LD has an argument called order_file. As long as you have the file and tell XCode the path to the file, XCode will compile and package the binary executable in symbolic order.
  • How do I get symbols for startup? Fsanitize-coverage =func,trace-pc-guard = fsanitize-coverage=func,trace-pc-guard = Other C falg __sanitizer_cov_trace_pc_guard_init, and __sanitizer_cov_trace_pc_guard, the first is the initialization method, the second is each method is intercepted, and then records all methods intercepted at startup, Write the symbols and generate the order_file file. Set the file path in Build Settings –>Order File.

Later, the # Symbols: segment of linkmap can be analyzed to confirm whether the Symbols are adjusted, and the results can be checked after the adjustment is confirmed: In iOS13 iPhone8plus, there is a very small difference between using binary rearrangement and not using page fault times/time, or startup time, which is roughly within the fluctuation range of each measurement. I don’t know if dyld2 of iOS13 has been optimized to dyld3 (interested in dyld upgrade optimization can search for information). So we ended up giving up on binary rearrangement.

7. Start optimizing the follow-up plan

  1. Startup modularization, currently all startup items are concentrated in one class, the +import header file is only 200 lines, so the next version will divide startup items into multiple modules for processing by business.
  2. Promote other SDKS to optimize, especially the number of threads that occupy a lot of sub-threads needs to be controlled. Relevant SDKS are also being processed at present.

4. Start monitoring

process

In order to monitor the change of startup time in the daily development process, the method call time in the startup process was monitored, and the reason for the time was analyzed by comparing the difference between the current version and yesterday version through daily construction. The process is as follows:

  • Jenkins compiled and constructed, and reported LinkMap after completion of construction
  • After packing is complete, passios-deploy, install App on real machine
  • Start theAppiumIs used to start the App multiple times
  • Run the test script, control Appium, and Appium controls the device. Repeat cold startup for several times, report data, and take average value to reduce floating impact
  • Analyze data, time increase, decrease, increase and Diff, etc
  • Analysis results are sent by email
  • Optimize the code

Analysis report

The first part isPre-MainAnd the home page image loading time is as follows:

The second part is Diff by comparing the startup time data of the two versions. During the startup process, if the method of the current version does not appear in the comparison version, it is considered as a new method

The third part is the time variation of existing methods

The fourth part is the time consuming of the library in the startup process

The fifth part is the time consuming of + load method

implementation

Record the startup method and the corresponding method time through Hook

Statistics the image loading time of the pre-main and home page

Pre-main Time = Time to enter the Main function – process creation time. The following is the implementation of obtaining process creation time

+ (BOOL)processInfoForPID:(int)pid procInfo:(struct kinfo_proc *)procInfo { int cmd[4] = {CTL_KERN, KERN_PROC, KERN_PROC_PID, pid}; size_t size = sizeof(*procInfo); return sysctl(cmd, sizeof(cmd)/sizeof(*cmd), procInfo, &size, NULL, 0) == 0; } + (NSTimeInterval)processStartTime { struct kinfo_proc kProcInfo; if ([self processInfoForPID:[[NSProcessInfo processInfo] processIdentifier] procInfo:&kProcInfo]) { return Kpprocinfo.kp_proc.p_un.__p_starttime.tv_sec * 1000.0 + kprocinfo.kp_proc.p_un.__p_starttime.tv_usec / 1000.0; } else {NSAssert(NO, @" can't get process information "); return 0; }}Copy the code

Home page image loading completion time: Hook image downloading method, before the completion of the startup, all the URLS calling this method are stored in the array, after the completion of the image downloading, remove the array, when the number of elements in the array is 0, it means that the image downloading of the first screen of the home page is completed, that is, the end time

Pre-main + load method, C++ static constructors,attribute((constructor)), functions in the __mod_init_func section and OC methods

+ load

The + load method in the project has a certain influence on the startup time more or less. By Hook + load method, the time of + load method is counted, mainly through an earlier time than the implementation of + load method. Hook to the class defined load method, load method before and after the insertion of statistical time processing

In Mach -o, the __DATA, __objc_NLclslist and __DATA, __objc_nlcatList sections hold non-lazy class and non-lazy cateogry respectively. Getsectbynamefromheader is used to Hook the classes and classes that define the load method. The load method is defined using the earliest loaded dynamic library before the main binary load method call. Hook by loading the load method in the dynamic library

struct Category { char * _Nonnull category_name; char * _Nonnull class_name; struct objc_method_list * _Nullable instance_methods; struct objc_method_list * _Nullable class_methods; struct objc_protocol_list * _Nullable protocols; Const section *nonLazyClass = GetSectByNameFromHeader((void *)mach_header, "__DATA", "__objc_nlclslist"); if (NULL ! = nonLazyClass) { for (ptr address = nonLazyClass->offset; address < nonLazyClass->offset + nonLazyClass->size; address += sizeof(const void *)) { Class cls = (__bridge Class)(*(void **)(mach_header + address)); } } const section *nonLazyCategory = GetSectByNameFromHeader((void *)mach_header, "__DATA", "__objc_nlcatlist"); if (NULL ! = nonLazyCategory) { for (ptr address = nonLazyCategory->offset; address < nonLazyCategory->offset + nonLazyCategory->size; address += sizeof(const void **)) { struct Category *cat = (*(struct Category **)(mach_header + address)); } // Hook IMP originIMP = loadMethod-> IMP; IMP replaceIMP = imp_implementationWithBlock(^(__unsafe_unretained id self, SEL sel) { ((void (*)(id, SEL))originIMP)(self, sel); }); loadMethod->imp = replaceIMP;Copy the code

objc_msgSend

OC method execution process will call objc_msgSend, so Hook it, can count the time of OC method, objc_msgSend is a variable parameter function, save the scene, keep the parameters unchanged, Call the original objc_msgSend, referring to the InspectiveC implementation

Static void replacementObjc_msgSend() {__asm__ volatile (// save q0, q7, [sp, #-32]! \n" "stp q4, q5, [sp, #-32]! \n" "stp q2, q3, [sp, #-32]! \n" "stp q0, q1, [sp, #-32]! STP x8, lr, [sp, #-16]! \n" "stp x6, x7, [sp, #-16]! \n" "stp x4, x5, [sp, #-16]! \n" "stp x2, x3, [sp, #-16]! \n" "stp x0, x1, [sp, #-16]! \n" "mov x2, x1\n" "mov x1, lr\n" "mov x3, Sp \n" // call preObjc_msgSend "bl __Z15preObjc_msgSendP11objc_objectmP13objc_selectorP9RegState_\n" "mov x9, X0 \n" "mov x10, x1\n" "TST x10, x10\n" "LDP x0, x1, [sp], #16\n" "LDP x2, x3, [sp], #16\n" "LDP x4, x5, [sp], # 16 \ n "" LDP x6, x7, [sp], # 16 \ n" "LDP by 8, lr, [sp], # 16 \ n" / / read q0 - q7 "LDP q0 and q1, [sp], # 32 \ n" "LDP q2, q3, [sp], #32\n" "ldp q4, q5, [sp], #32\n" "ldp q6, q7, [sp], // save x0-x9 "STP x0, x1, [sp, #-16]! \n" "stp x2, x3, [sp, #-16]! \n" "stp x4, x5, [sp, #-16]! \n" "stp x6, x7, [sp, #-16]! \n" "stp x8, x9, [sp, #-16]! STP q0, q1, [sp, #-32]! \n" "stp q2, q3, [sp, #-32]! \n" "stp q4, q5, [sp, #-32]! \n" "stp q6, q7, [sp, #-32]! \n" // call postObjc_msgSend hook. "bl __Z16postObjc_msgSendv\n" "mov lr, x0\n" // read q0-q7 "LDP q6, q7, [sp], # 32 \ n "" LDP q4, q5, [sp], # 32 \ n" "LDP q2, q3, [sp], # 32 \ n" "LDP q0 and q1, [sp], # 32 \ n"/read/x0 - x9 "LDP by 8, x9, [sp], #16\n" "ldp x6, x7, [sp], #16\n" "ldp x4, x5, [sp], #16\n" "ldp x2, x3, [sp], #16\n" "ldp x0, x1, [sp], #16\n" "ret\n" "Lpassthrough:\n" "br x9" ); }Copy the code

C++ static constructors, attribute((constructor)), _modinit_func section functions

__mod_init_func stores the address of the initialization function, __mod_init_func is in the DATA section, Pointer points to the TEXT section, and there are many such functions in the project. These functions are executed during the pre-main phase. Getsectiondata (machHeader, “__DATA”, “__mod_init_func”, &size); The address of the original function is recorded in the global array. The hook function calls the function from the array according to index

void myinit(int argc, char **argv, char **envp) {}

__attribute__((section("__DATA, __mod_init_func"))) typeof(myinit) *__init = myinit;

TestClass test = TestClass();

__attribute__((constructor)) void testConstructor() {}
Copy the code
void HookInitFuncInitializer(int argc, const char *argv[], const char *envp[], const char *apple[], const struct ProgramVarsStr *vars) {
    ++CurrentPointerIndex;
    InitializerType f = (InitializerType)Initializer[CurrentPointerIndex];
    f(argc, argv, envp, apple, vars);
    
    NSString *symbol = [NSString stringWithFormat:@"%p", f];
    Dl_info info;
    if (0 != dladdr(f, &info)) {
        NSString *sname = @(info.dli_sname);
        if (sname.length > 0) {
            symbol = sname;
        }
    }
}

static void HookModInitFunc() {
    Dl_info info;
    dladdr(HookModInitFunc, &info);
    mach_header *machHeader = info.dli_fbase;
    unsigned long size = 0;
    pointer *p = (pointer *)getsectiondata(machHeader, "__DATA", "__mod_init_func", &size);
    int count = (int)(size / sizeof(void *));
    for (int i = 0; i < count; ++i) {
        pointer ptr = p[i];
        Initializer[i] = ptr;
        p[i] = (pointer)HookInitFuncInitializer;
    }
}
Copy the code

Library time statistics

LibAFNetworking. A (AFHTTPSessionManager. O), and then parses into AFNetworking and AFHTTPSessionManager. In this way, we can roughly count which library it is, which class it contains, and then count which method belongs to the library. This method can not count the name of the class does not correspond to the file name, or common Category, etc

. /Products/Debug-iphoneos/AFNetworking/libAFNetworking.a(AFHTTPSessionManager.o)Copy the code

My colleagues and I finished the content together