Oasis iOS R&D engineer, OASIS ID: KeepFit storage box.

directory

  1. How I Boosted the startup speed of Twitter Oasis by 30%
  2. How I Increased the startup speed of Weibo Oasis by 30% (II)
  3. Lazy Binary Rearrangement

Preface to 0.

Startup is the first impression an App gives to the user and is crucial to the user experience. Imagine an App that requires more than 5 seconds to start. Do you still want to use it?

The original project was certainly free of these problems, but as business requirements grew, the code grew. If left unchecked, the startup time will continue to rise until it becomes unacceptable.

Starting from the principle of optimization, this paper introduced how I found the symbols needed for startup by modifying the library type and Clang pegs, and then modified the compilation parameters to complete the rearrangement of binary files to improve the startup speed of the application.

Let’s start with the conclusion:

  • Before optimization:

    Total pre-main time: 1.2 seconds (100.0%)
             dylib loading time: 567.72 milliseconds (45.5%)
            rebase/binding time: 105.14 milliseconds (8.4%)
                ObjC setup time:  40.01 milliseconds (3.2%)
               initializer time: 532.47 milliseconds (42.7%)
               slowest intializers :
                 libSystem.B.dylib :   4.70 milliseconds (0.3%)
              libglInterpose.dylib : 295.89 milliseconds (23.7%)
                      AFNetworking :  48.75 milliseconds (3.9%)
                             Oasis : 285.94 milliseconds (22.9%)
    Copy the code
  • The optimized

    Total pre-main time: 822.34 milliseconds (100.0%)
             dylib loading time: 196.71 milliseconds (23.9%)
            rebase/binding time: 104.95 milliseconds (12.7%)
                ObjC setup time:  31.14 milliseconds (3.7%)
               initializer time: 489.53 milliseconds (59.5%)
               slowest intializers :
                 libSystem.B.dylib :   4.65 milliseconds (0.5%)
              libglInterpose.dylib : 230.19 milliseconds (27.9%)
                      AFNetworking :  41.60 milliseconds (5.0%)
                             Oasis : 335.84 milliseconds (40.8%)
    Copy the code

Through the two techniques of Staticlib optimization and binary rearrangement, I successfully reduced the pre-main time of oasis from 1.2s to about 0.82s, an improvement of about 31.6%!

Both phones are iPhone 11 Pro, with the optimized look on the right. (Forgive me if I drive a little slower on the right 😂)

1. Transform dynamic library to static library

Apple recommends setting the total startup time for an app to be less than 400 milliseconds, and we must do it within 20 seconds or the system will kill our app. We can try to optimize the application of the main function to didFinishLaunchingWithOptions time, but how to debug code before the call of the slow start?

1.1 Viewing the pre-main time

Application of execution of the system main function and the function of the commissioned by the calling application (applicationWillFinishLaunching) before, many things will happen. We can add the DYLD_PRINT_STATISTICS environment variable to the project scheme.

Run it and we can see the console output:

Total pre-main time: 1.2 seconds (100.0%)
         dylib loading time: 567.72 milliseconds (45.5%)
        rebase/binding time: 105.14 milliseconds (8.4%)
            ObjC setup time:  40.01 milliseconds (3.2%)
           initializer time: 532.47 milliseconds (42.7%)
           slowest intializers :
             libSystem.B.dylib :   4.70 milliseconds (0.3%)
          libglInterpose.dylib : 295.89 milliseconds (23.7%)
                  AFNetworking :  48.75 milliseconds (3.9%)
                         Oasis : 285.94 milliseconds (22.9%)
Copy the code

This is what I found using the iPhone 11 Pro. This is just to explain the function of each part, not to discuss how to optimize and compare, do not delve into this time.

Note: If you are testing the slowest startup time of your app, use the slowest device you support.

The output shows the total time it took for the system to call the application main, followed by a breakdown of the major steps.

WWDC 2016 Session 406 Optimizing application Startup Time details each step and tips for improving time. Here’s a brief summary:

  • dylib loading timeThe dynamic loader finds and reads the dependent dynamic libraries used by the application. Each library may have its own dependencies. While loading the Apple system framework is highly optimized, loading an embedded framework can be time-consuming. To speed up the loading of dynamic libraries, Apple recommends that you use fewer dynamic libraries or consider merging them.
    • The proposed target is six additional (non-systematic) frameworks.
  • Rebase/binding timeFixed adjusting the pointer inside the mirror (resetting) and setting the pointer to the symbol outside the mirror (binding). To speed up relocation/binding time, we need fewer pointer fixes.
    • An application with a large number of objective-C classes, selectors, and categories can add 800ms to the startup time.
    • If the application uses C++ code, fewer virtual functions are used.
    • It is also usually faster to use the Swift architecture.
  • ObjC Setup Time Objective-C runtime requires setting class, category, and selector registrations. Any improvements we make to reposition the binding time will also optimize this setup time.
  • Initializer Time Runs the initialization program. If you use objective-C’s +load method, replace it with the +initialize method.

After the system calls main, main calls UIApplicationMain and the application delegate methods in turn.

1.2 Loading time of dynamic and static libraries

1.2.1 Dynamic Library Loading Time

Let’s take a look at how many dynamic libraries there are in the project:

  1. In the projectProductFolder find our project.appFile, right clickShow in Finder.
  2. Go to the appropriate directory and right-click to show package contents.
  3. Go to the Frameworks folder and open it.
  4. The project is written in pure Swift, the following are the system Swift library, we can not optimize, we can leave it.

As you can see, we have 36 dynamic libraries in our project. Here is the total time of pre-main:

Total pre-main time: 1.2 seconds (100.0%)
         dylib loading time: 567.72 milliseconds (45.5%)
        rebase/binding time: 105.14 milliseconds (8.4%)
            ObjC setup time:  40.01 milliseconds (3.2%)
           initializer time: 532.47 milliseconds (42.7%)
           slowest intializers :
             libSystem.B.dylib :   4.70 milliseconds (0.3%)
          libglInterpose.dylib : 295.89 milliseconds (23.7%)
                  AFNetworking :  48.75 milliseconds (3.9%)
                         Oasis : 285.94 milliseconds (22.9%)
Copy the code

1.2.2 Using static Libraries

In the Pod project, select the Library we are using, then click Build Settings, search for or find the Mach-O Type Settings, and change the Mach-O Type to Static Library.

⇧+⌘+K to perform a Clean Build Folder and rebuild again.

There are also three dynamic libraries, because objective-C doesn’t have namespaces, it has sign conflicts, so it’s there. Here is the total time of pre-main:

Total pre-main time: 877.84 milliseconds (100.0%)
         dylib loading time: 220.07 milliseconds (25.0%)
        rebase/binding time: 112.29 milliseconds (12.7%)
            ObjC setup time:  30.78 milliseconds (3.5%)
           initializer time: 514.70 milliseconds (58.6%)
           slowest intializers :
             libSystem.B.dylib :   4.33 milliseconds (0.4%)
          libglInterpose.dylib : 253.44 milliseconds (28.8%)
                  AFNetworking :  37.08 milliseconds (4.2%)
                        OCLibs :  61.75 milliseconds (7.0%)
                         Oasis : 246.28 milliseconds (28.0%)
Copy the code

As you can see, by changing the Mach-o Type from a dynamic library to a static library, the dylib loading time has been greatly improved, while the rest of the time hasn’t changed much. The total time was reduced from 1.2 seconds to approximately 0.9 seconds, optimizing the startup time of approximately 0.3 seconds.

1.2.3 Pits encountered

However, if you change only the Mach -o Type, you will get an error if you try the Validate App in the Organizer after Archive:

  • Found an unexpected Mach-O header code: 0x72613c21

This is a configuration issue in CocoaPods. CocoaPods adds a [CP] Embed Pods Frameworks execution script to Build Phases in the project.

"${PODS_ROOT}/Target Support Files/ PODs-project name/PODs-project name -frameworks. Sh"
Copy the code

After executing pod Install, we will generate a pods-projectname-framework.sh script file. Since we manually modified the Mach-o Type, the install_framework in this script will still execute, so we will remove the libraries that were converted to static libraries from the Pods-projname-frameworks.sh file.

Take AFNetworking as an example, which needs to be removed from the file:

install_framework "${BUILT_PRODUCTS_DIR}/AFNetworking/AFNetworking.framework"
Copy the code

You can also write a Ruby script using CocoaPods post_install.

  1. Make the associated libraries static.

    target.build_configurations.each do |config|
        config.build_settings['MACH_O_TYPE'] = 'staticlib'
    end
    Copy the code
  2. Read the Pods-project-names-frameworks. Sh file and delete the relevant strings.

    regex = /install_framework.*\/#{pod_name}\.framework\"/ pod_frameworks_content.gsub! (regex, "")Copy the code

2. Binary rearrangement

2.1 App launch

It is unsafe for processes to have direct access to physical memory, so the operating system creates a layer of virtual memory on top of physical memory. On this basis, Apple also has ASLR(Address Space Layout Randomization) technology protection, but not the focus of this time.

In iOS, virtual memory is mapped to physical memory in the smallest unit of pages. When a process accesses a virtual memory Page but the corresponding physical memory does not exist, a Page Fault interrupts and the Page is loaded. While this is fast in itself, it can add up to thousands (or more) of Page faults during the startup of an App.

A page on iOS is 16KB.

We often say start refers to click App by the end of the first page displayed include pre – the main, the main to didFinishLaunchingWithOptions over the entire time. The main didFinishLaunchingWithOptions end, this part is what we can control, there have been many articles on how to optimize the, is not the focus of this article. The binary rearrangement discussed here is primarily an optimization for reducing Page faults.

In addition, there are two important concepts: cold start and hot start. Some students may think that killing and then restarting the App is a cold start, but it is not true.

  • Cold start

    A cold boot occurs only after the program exits completely and the paging data loaded in between is overwritten by another process, or after the device is restarted for the first installation.

  • Warm start

    After the program is killed, it restarts immediately. At this point, the previously loaded paging data is still in the corresponding physical memory and can be reused without a full reload. So the speed of hot start is relatively fast.

We’ll use the Instruments tool System Trace later to more intuitively compare the two booting options.

2.2 Concepts related to binary rearrangement

2.2.1 Significance of binary rearrangement

Programs are executed sequentially by default.

If the methods needed to start are in two pages Page1 and Page2 (method1 and method3), the system must do two Page faults in order to execute the corresponding code.

If we rearrange the methods so that method1 and method3 are on the same Page, there will be one less Page Fault.

So how do you measure the rearrangement and verify it?

  • Check whether the number of Page faults decreases.
  • Check the LinkMap file, the intermediate of the compilation process, for confirmation.

2.2.2 System Trace

So how do you measure page load time? This is where the System Trace tool in Instruments is used.

First, reboot the device (cold boot). ⌘+I Open Instruments, select the System Trace tool.

Click record ⏺, the first page appears, stop ⏹ immediately. Only Main threads are displayed for filtering. Select Summary: Virtual Memory.

  • File Backed Page In is the number of times the Page Fault is triggered.
  • Page Cache Hit is the number of Page Cache hits.

Now let’s look at the hot start. Kill the App and repeat the previous operation (without restarting) :

Comparing the number of File Backed Page In for cold and hot starts, you can see that the number of Page faults triggered by hot starts is very small.

2.2.3 Startup Sequence

2.2.3.1 File Order

The Compile Sources list order in Build Phases determines the order in which the files will be executed (which can be adjusted). If not rearranged, the order of files determines the order in which methods and functions are executed.

We add the following code to the ViewController and AppDelegate and execute it.

+ (void)load {
    NSLog(@"%s", __FUNCTION__);
}

/ / output
2020- 04- 23 22:56:13.551729+0800 BinaryOptimization[59505:5477304] +[ViewController load]
2020- 04- 23 22:56:13.553714+0800 BinaryOptimization[59505:5477304] +[AppDelegate load]
Copy the code

We adjust the order of the two classes in Compile Sources and then execute.

2020- 04- 23 23:00:08.248118+0800 BinaryOptimization[59581:5482198] +[AppDelegate load]
2020- 04- 23 23:00:08.249015+0800 BinaryOptimization[59581:5482198] +[ViewController load]
Copy the code

As you can see, the order of execution of the +load method changes as the order of the files in Compile Sources changes.

2.2.3.2 Symbol table order

Change Write Link Map File to YES in Build Settings. After compilation, a Link Map symbol table TXT File is generated.

After ⌘ + B build, select App in Product, open it in Finder, select the Intermediates. Noindex folder,

Binaryoptimization-linkmap-normal-arm64.txt

Open the file and come to the end of part 1.

We can see that this order is consistent with the order in Compile Sources. The next part:

# Sections:
# Address	Size    	Segment	Section
0x100005ECC	0x0000065C	__TEXT	__text
0x100006528	0x0000009C	__TEXT	__stubs
0x1000065C4	0x000000B4	__TEXT	__stub_helper
0x100006678	0x000000BE	__TEXT	__cstring
0x100006736	0x00000D2B	__TEXT	__objc_methname
0x100007461	0x00000070	__TEXT	__objc_classname
0x1000074D1	0x00000ADA	__TEXT	__objc_methtype
0x100007FAC	0x00000054	__TEXT	__unwind_info
0x100008000	0x00000008	__DATA_CONST	__got
0x100008008	0x00000040	__DATA_CONST	__cfstring
0x100008048	0x00000018	__DATA_CONST	__objc_classlist
0x100008060	0x00000010	__DATA_CONST	__objc_nlclslist
0x100008070	0x00000020	__DATA_CONST	__objc_protolist
0x100008090	0x00000008	__DATA_CONST	__objc_imageinfo
0x10000C000	0x00000068	__DATA	__la_symbol_ptr
0x10000C068	0x00001348	__DATA	__objc_const
0x10000D3B0	0x00000018	__DATA	__objc_selrefs
0x10000D3C8	0x00000010	__DATA	__objc_classrefs
0x10000D3D8	0x00000008	__DATA	__objc_superrefs
0x10000D3E0	0x00000004	__DATA	__objc_ivar
0x10000D3E8	0x000000F0	__DATA	__objc_data
0x10000D4D8	0x00000188	__DATA	__data
Copy the code

This is some information about Mach-O, not the focus of this time. The symbol that comes after this part is, because there are so many of them, I just took some of them.

# Symbols:
# Address	  Size    	  File  Name
0x100005ECC	0x0000003C	[  1] +[AppDelegate load]
0x100005F08	0x00000088	[  1] -[AppDelegate application:didFinishLaunchingWithOptions:]
0x100005F90	0x00000108	[  1] -[AppDelegate application:configurationForConnectingSceneSession:options:]
0x100006098	0x00000080	[  1] -[AppDelegate application:didDiscardSceneSessions:]
0x100006118	0x0000003C	[  2] +[ViewController load]
0x100006154	0x0000004C	[  2] -[ViewController viewDidLoad]
0x1000061A0	0x000000A0	[  3] _main
0x100006240	0x000000B4	[  4] -[SceneDelegate scene:willConnectToSession:options:]
0x1000062F4	0x0000004C	[  4] -[SceneDelegate sceneDidDisconnect:]
0x100006340	0x0000004C	[  4] -[SceneDelegate sceneDidBecomeActive:]
0x10000638C	0x0000004C	[  4] -[SceneDelegate sceneWillResignActive:]
0x1000063D8	0x0000004C	[  4] -[SceneDelegate sceneWillEnterForeground:]
0x100006424	0x0000004C	[  4] -[SceneDelegate sceneDidEnterBackground:]
0x100006470	0x0000002C	[  4] -[SceneDelegate window]
0x10000649C	0x00000048	[  4] -[SceneDelegate setWindow:]
0x1000064E4	0x00000044	[  4] -[SceneDelegate .cxx_destruct]
0x100006528	0x0000000C	[  5] _NSLog
0x100006534	0x0000000C	[  5] _NSStringFromClass
0x100006540	0x0000000C	[  7] _UIApplicationMain
0x10000654C	0x0000000C	[  6] _objc_alloc
0x100006558	0x0000000C	[  6] _objc_autoreleasePoolPop
0x100006564	0x0000000C	[  6] _objc_autoreleasePoolPush
...
Copy the code

As you can see, the overall order is the same as in Compile Sources, and the methods are linked in the same order as in the file. Once the methods in the AppDelegate are added, the methods in the ViewController are added, and so on.

  • AddressRepresents the address of a method in a file.
  • SizeRepresents the size of a method.
  • FileIndicates the number of files in.
  • NameRepresents the method name.

2.2.4 Initial experience of binary rearrangement

Create an Order file in the project root directory.

touch BinaryOptimization.order
Copy the code

Then in the Build Settings found in the Order File, fill in the. / BinaryOptimization. Order.

In the binaryOptimization.order file fill:

+[ViewController load]
+[AppDelegate load]
_main
-[ViewController someMethod]
Copy the code

Then execute the ⌘ + B build.

You can see that the top few methods in the Link Map are in the same order as the methods we set up in the binaryoptimization.order file!

Xcode’s linker LD also ignores the nonexistent method -[ViewController someMethod].

If the link option -order_file_statistics is provided, the missing symbols will be printed in the log as warnings.

2.3 Binary rearrangement actual combat

To actually implement the binary rearrangement, we need to take all the symbols of the methods, functions, etc. that are started, save their order, and then write the order file to implement the binary rearrangement.

Douyin has an article about its development practice: A solution based on binary file rearrangement improves APP startup speed by more than 15%, but the article also mentions bottlenecks:

The solution based on static scan + runtime trace still has a few bottlenecks:

  • Can’t initialize the hooks
  • Some block hooks fail
  • C++ does not scan statically through indirect function calls to registers

The current rearrangement scheme can cover 80% ~ 90% of symbols. In the future, we will try other schemes such as compile-time piling to cover 100% of symbols, so that the rearrangement can achieve the optimal effect.

At the same time, a solution to compile time staking is also given.

2.3.1 Clang pile

In fact, it is a code coverage tool, more information can be found on the website.

Build Settings Other C Flags -fsanitize-coverage=trace-pc-guard

Undefined symbol: ___sanitizer_cov_trace_pc_guard_init
Undefined symbol: ___sanitizer_cov_trace_pc_guard
Copy the code

Checking the official website will require us to add one or two functions:

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>

// This callback is inserted by the compiler as a module constructor
// into every DSO. 'start' and 'stop' correspond to the
// beginning and end of the section with the guards for the entire
// binary (executable or DSO). The callback will be called at least
// once per DSO and may be called multiple times with the same parameters.
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                                    uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

// This callback is inserted by the compiler on every edge in the
// control flow (some optimizations apply).
// Typically, the compiler will emit the code like this:
// if(*guard)
// __sanitizer_cov_trace_pc_guard(guard);
// But for large functions it will emit a simple call:
// __sanitizer_cov_trace_pc_guard(guard);
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if(! *guard)return;  // Duplicate the guard check.
  // If you set *guard to 0 this code will not be called again for this edge.
  // Now you can get the PC and do whatever you want:
  // store it somewhere or symbolize it and print right away.
  // The values of `*guard` are as you set them in
  // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
  // and use them to dereference an array or a bit vector.
  void *PC = __builtin_return_address(0);
  char PcDescr[1024];
  // This function is a part of the sanitizer run-time.
  // To use it, link with AddressSanitizer or other sanitizer.
  __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}
Copy the code

We added our code to viewController.m. We don’t need extern “C” so we can delete it. __sanitizer_symbolize_pc() will still give an error.

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>

void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if(! *guard)return;  // Duplicate the guard check.
// void *PC = __builtin_return_address(0);
  char PcDescr[1024];
// __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}
Copy the code

The function __sanitizer_cov_trace_pc_guard_init counts the number of methods. After running, we can see:

INIT: 0x104bed670 0x104bed6b0

(lldb) x 0x104bed670
0x104bed670: 01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00.0x104bed680: 05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00. (lldb) x0x104bed6b0-0x4
0x104bed6ac: 10 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00.0x104bed6bc: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00.Copy the code

After reading memory, we can see something like a counter. The last one printed is the end position, and it’s 4 bits, 4 bits, so if you move 4 bits forward, it should print the last bit.

According to the little endian mode, 10 00 00 00 00 corresponds to 00 00 00 10 is 16. Let’s add some methods to the ViewController:

void(^block)(void) = ^ (void){
    
};

void test(a)
{
    block();
}

- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event {
    test();
}
Copy the code

Print it again:

(lldb) x 0x10426d6dc-0x4
0x10426d6d8: 13
Copy the code

You can see that we’ve increased by 3(block is an anonymous function), the counter counts the number of functions/methods, we’ve added three, and the index has increased by 3.

Let’s click on the screen again:

guard: 0x1007196ac 8 PC 
guard: 0x1007196a8 7 PC 
guard: 0x1007196a4 6 PC Hq
Copy the code

We found that for every click there were three prints on the screen. We created a ‘touchesBegan: Touches withEvent’ and opened the assembly display (menu bar Debug→Debug Workflow→Always Show Disassembly).

If we look at other functions we’ll see similar displays in assembly code.

That is, Clang staking is a call to the __sanitizer_cov_trace_pc_guard function inserted into the assembly code.

Once you’ve got all the symbols, you need to save them, but you can’t use an array, because there might be something that’s executed on a child thread, so using an array would have threading problems. Here we use atomic queues:

#import <libkern/OSAtomic.h>
#import <dlfcn.h>

/* Atomic queue features 1, first in last out 2, thread safety 3, can only save structure */
static OSQueueHead symbolList = OS_ATOMIC_QUEUE_INIT;

// List of symbolic structures
typedef struct {
    void *pc;
    void *next;
} SymbolNode;

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    if(! *guard)return;  // Duplicate the guard check.
    
    // Before the function is executed, the address of the next function to be executed is saved in a register
    // Get the return address of the function
    void *PC = __builtin_return_address(0);
    
    SymbolNode * node = malloc(sizeof(SymbolNode));
    *node = (SymbolNode){PC, NULL};
    / / team
    OSAtomicEnqueue(&symbolList, node, offsetof(SymbolNode, next));
    
    // Here are some printouts, just for a look, in practice can be annotated
    // dlopen gets the memory address of the function from the dynamic library
    // dladdr gets the function from its memory address
    typedef struct dl_info {
        const char      *dli_fname;     /* Pathname of shared object */
        void            *dli_fbase;     /* Base address of shared object */
        const char      *dli_sname;     /* nearest symbol */
        void            *dli_saddr;     /* Address of nearest symbol function */
    } Dl_info;
    Dl_info info;
    dladdr(PC, &info);
    printf("fnam:%s \n fbase:%p \n sname:%s \n saddr:%p \n",
           info.dli_fname,
           info.dli_fbase,
           info.dli_sname,
           info.dli_saddr);
}
Copy the code

After we run it, we can see a lot of printing here, just take one, and it’s obvious that sname is the symbol name that we need.

fnam:/private/var/containers/Bundle/Application/3EAE38170EF7- 4892.-BC55- 368.CC504A568/BinaryOptimization.app/BinaryOptimization 
 fbase:0x100938000 
 sname:+[AppDelegate load] 
 saddr:0x10093d81c 
Copy the code

Now we export the symbols we need by clicking on the screen. Note that C functions and Swift methods need to be underlined. (This point can be confirmed in the LinkMap file mentioned earlier)

- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event
{
    NSMutableArray <NSString *>* symbolNames = [NSMutableArray array];
    while (YES) {
        SymbolNode * node = OSAtomicDequeue(&symbolList, offsetof(SymbolNode, next));
        if (node == NULL) {
            break;
        }
        Dl_info info;
        dladdr(node->pc, &info);
        
        NSString * name = @(info.dli_sname);
        BOOL isObjc = [name hasPrefix:@"+ ["] || [name hasPrefix:@"-"]; // the OC method is not processed
        NSString * symbolName = isObjc? name : [@"_" stringByAppendingString:name]; //c functions, swift methods are preceded by underscores
        [symbolNames addObject:symbolName];
        printf("%s \n",info.dli_sname);
    }
    
    NSEnumerator * emt = [symbolNames reverseObjectEnumerator];
    NSMutableArray<NSString*>* funcs = [NSMutableArray arrayWithCapacity:symbolNames.count];
    NSString * name;
    while (name = [emt nextObject]) {
        if (![funcs containsObject:name]) {
            [funcs addObject:name];
        }
    }
    // Delete the current method, because the click method is not required for startup
    [funcs removeObject:[NSString stringWithFormat:@"%s",__FUNCTION__]];
    
    NSString * filePath = [NSTemporaryDirectory() stringByAppendingPathComponent:@"BinaryOptimization.order"];
    NSString *funcStr = [funcs componentsJoinedByString:@"\n"];
    NSData * fileContents = [funcStr dataUsingEncoding:NSUTF8StringEncoding];
    // Create a file on the path
    [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil];
    
    NSLog(@"% @",filePath);
}
Copy the code

If you click directly on the screen, there is a big hole, and you see the console keeps output, in an infinite loop:

-[ViewController touchesBegan:withEvent:] 
-[ViewController touchesBegan:withEvent:] 
...
Copy the code

We set a breakpoint inside the while:

__sanitizer_cov_trace_pc_guard has a total of 10, which triggers queue enqueuing in __sanitizer_cov_trace_pc_guard, queue enqueuing in __sanitizer_cov_trace_pc_guard, queue enqueuing in __sanitizer_cov_trace_pc_guard, queue enqueuing in __sanitizer_cov_trace_pc_guard, and queue enqueuing in __sanitizer_cov_trace_pc_guard.

Solutions:

-fsanitize-coverage=func,trace-pc-guard

Only check the entry of each function.

Run it again and tap the screen and it won’t be a problem.

2.3.2 Obtaining the Order file from the real machine

We saved the order file in TMP folder on the real machine. How can we get it?

⇧+⌘+2 in Window→Devices And Simulators:

2.3.3 Swift

Can Swift also be rearranged? Of course you can!

Let’s add a Swift class to our project and call viewDidLoad:

class SwiftTest: NSObject {
    @objc class public func swiftTestLoad(a){
        print("swiftTest"); }} - (void)viewDidLoad {
    [super viewDidLoad];
    [SwiftTest swiftTestLoad];
}
Copy the code

Build Setting Other Swift Flags

-sanitize-coverage=func
-sanitize=undefined
Copy the code

After running, click on the screen to view the console:

-[ViewController touchesBegan:withEvent:] 
-[SceneDelegate window] 
-[SceneDelegate window] 
-[SceneDelegate sceneDidBecomeActive:] 
-[SceneDelegate sceneWillEnterForeground:] 
// The following 4 are for Swift$ss5print_9separator10terminatoryypd_S2StFfA1_ $ss5print_9separator10terminatoryypd_S2StFfA0_ $s18BinaryOptimization9SwiftTestC05swiftD4LoadyyFZ $s18BinaryOptimization9SwiftTestC05swiftD4LoadyyFZTo -[ViewController  viewDidLoad] -[SceneDelegate window] -[SceneDelegate window] -[SceneDelegate window] -[SceneDelegate scene:willConnectToSession:options:] -[SceneDelegate window] -[SceneDelegate window] -[SceneDelegate setWindow:] -[SceneDelegate window] -[AppDelegate application:didFinishLaunchingWithOptions:] main2020- 04- 24 13:08:43.923191+0800 BinaryOptimization[459:65420] /private/var/mobile/Containers/Data/Application/DA2EC6F0- 93.C9- 45A09 -D95-C21883E0532C/tmp/BinaryOptimization.order
Copy the code

Write Link Map File to NO and delete Other C Flags/Other Swift Flags configuration.

Because this configuration automatically inserts the jump execution __sanitizer_cov_trace_pc_guard in our code. You don’t need to rearrange it. You need to get rid of it. Also remove __sanitizer_cov_trace_pc_guard from ViewController.

2.3.4 Comparison before and After Binary Rearrangement

After practicing and testing in the project:

  • File Backed Page In(Page Fault Count) occurs 2569 times and takes 298ms before binary reordering.

  • After binary reordering, File Backed Page In(Page Fault Count) occurs 2311 times and takes 248ms.

As you can see, the binary rearrangement reduced the number of Page faults, reducing the total time from 298ms to approximately 248ms, optimizing the startup time by approximately 50ms.

3. Summary

  1. We optimized the dynamic library by turning it into a static librarydylib loading time.
    • Apple’s official recommendation is less than six, but here we keep only three dynamic libraries due to symbol conflicts.
  2. With binary rearrangement, the number of methods required for startup is more compact and reducedPage FaultThe number of times.
    • When obtaining the symbol table, Clang peg can be directly hooked to Objective-C method, Swift method, C function and Block without any difference. Compared with tiktok’s previous proposal, it is indeed much simpler and the threshold is lower.

Important:

Can the Pod tripartite library be added to the order file? Yes!

The binary rearrangement practice in this paper considers the symbols needed to start the tripartite library. It’s not specified in the article, but the principle is the same.

Added 4.

After static, the tripartite library will be merged into the main project’s Mach-o file.

  1. Used in the tripartite library[NSBundle bundleForClass:[self class]]And the behavior of[NSBundle mainBundle]Consistent.
  2. As a result of the previous problem may resultBundleProblem not found (We are trying to process it).

The resource problem has been identified and the solution has been found. I’ll tidy it up and send it out when I have time. Thank you for your support

How I increased the startup speed of Weibo Oasis by 30% (2)

If you found this article helpful, give me a thumbs up