preface

  • Since the Douyin team shared this article about The development practice of Douyin: A Solution based on binary file rearrangement to improve APP startup speed by more than 15%, the startup time of binary rearrangement optimization in the pre-main phase has been widely spread.

  • This article first describes the principle of binary rearrangement (because the principle part of douyin team in the above article is mostly point-to-stop, most of our friends did not learn anything after reading it). Then I will introduce and practice how to solve this problem left by the Douyin team by combining clang piling method:

    Hook Objc_msgSend cannot solve pure swift, block, c++ methods.

    To achieve the perfect binary rearrangement scheme.

(This article will be explained from the perspective of principle, some familiar students may feel that the rhythm is too wordy, in order to take care of most students, we can skip according to the catalog.)

Before we get to binary rearrangement, we need to know a little bit about the lead-up and what kind of problem binary rearrangement is intended to solve.

Virtual memory vs. physical memory

In this article, I will not cover this concept through textbooks or most sources. We look at the concept of technology or in terms of actual problems and their corresponding solutions.

In the field of computer, any technology or concept is born to solve practical problems.

In the early days of computers, there was no concept of virtual memory, and any application that was loaded from disk into running memory was fully loaded and ordered.

So, there are two problems:

Legacy issues when using physical memory

  • Security issues: The memory module uses real physical addresses, and each application process in the memory module is arranged in sequence. So in theProcess 1Can be accessed via address offsetThe other processesIn the memory.
  • The efficiency problem: With the development of software, more and more memory needs to be occupied when a software is running. However, users often do not use all the functions of the application, resulting in a great waste of memory, and the later open processes often need to queue.

In order to solve the above two problems, virtual memory came into being.

How virtual memory works

By referring to virtual memory, a large contiguous memory space that we think we have in our process is actually virtual, meaning that we can access it from 0x000000 to 0xFFFFFF. But in fact, this memory address is only a virtual address, and this virtual address can be mapped through a mapping table to obtain the real physical address.

What does that mean?

  • In effect, the system places a limit on the access to real physical memory, and only addresses written to the map are allowed to be accessed.
  • For example, virtual addresses0x000000 ~ 0xffffffAny address in this range can be accessed, but the actual physical address corresponding to this virtual address is randomly assigned to the memory page by the computer.
  • The concept of actual physical memory paging is mentioned here, as described in more detail below.

As you may have noticed, we can’t access the data in one project while accessing it in another project. The principle is virtual memory.

Here’s a diagram of how virtual memory works:

Virtual memory solves interprocess security problems

Obviously, with a reference to virtual memory, there is no problem of offsetting the address space of another process.

Because each process’s mapping table is separate, you can access these addresses in your process any way you want. These addresses are restricted by the mapping table, and their real physical addresses are always within the specified range, so there is no problem of offsetting the memory space of other processes.

And in practice, every time an application is loaded into memory, the actual physical memory allocated is not necessarily fixed or continuous due to paging and lazy loading and security issues addressed by ASLR.

CPU addressing process

After the virtual memory is introduced, the CPU accesses data through the virtual memory address as follows:

  • Based on the virtual memory address, find the mapping table of the corresponding process.
  • Through the mapping table to find its corresponding real physical address, and then find the data.

This process is called address translation and is accomplished by the cooperation of the operating system and MMU, a hardware unit integrated on the CPU.

So after the security problem is solved, how to solve the efficiency problem?

Virtual memory solves efficiency problems

I just mentioned that virtual memory and physical memory are mapped through a mapping table, but this mapping cannot be one-to-one, which would be a waste of memory. For efficiency, real physical memory is actually paginated. The mapping table is also page-based.

In other words, the mapping table only maps to a page, not to each address.

On Linux, the size of a page is 4KB, which may vary from platform to platform.

  • Mac OSIn the system, one page is4KB ,
  • iOSIn the system, one page is16KB .

We can view it directly using the pagesize command.

So why does paging solve the efficiency problem of memory waste?

Principle of memory paging

Suppose there are two processes currently running, and their states look like the following:

As you can see in the figure above, the actual physical memory is not contiguous or complete for a process.

The 0 and 1 on the left of the mapping table indicate whether the current address is in physical memory. Why do you say that?

  • When an application is loaded into memory, the entire application is not loaded into memory. This is the lazy loading concept. In other words, the actual physical memory is stored as much as the application uses.

  • When an application accesses an address that is zero in the mapping table, that is, not loaded into physical memory, the system immediately blocks the entire process, triggering what is known as a page-missing interruption-Page Fault.

  • When a page-missing interrupt is triggered, the operating system re-reads the page from disk to physical memory, and then points the virtual memory in the mapping table to the corresponding memory. (If the current memory is full, the operating system will find a page of data to overwrite through the displacement page algorithm, which is why no amount of applications will crash. But the root cause of restarting the previously opened application when it is opened again).

With this paging and overwriting mechanism, the memory waste and efficiency problems are solved perfectly.

But now, there is a problem.

Q: When the application is developed and virtual memory is used, one of the functions is fixed in virtual memory no matter how many times it is run.

What does that mean?

Suppose the application has a function with an offset of 0x00a000 based on the first address, and the virtual address ranges from 0x000000 to 0xffFFFF based on this, then I can get the real implementation address of this function anyway by passing the virtual address 0x00a000.

And this mechanism gives a lot of hackers maneuverable space, they can easily write in advance to obtain the implementation of fixed functions to modify hook operation.

In order to solve this problem, ASLR was created. The principle is that every time a virtual address is mapped to a real address, a random offset value is added to solve the problem we just mentioned.

(Android 4.0, Apple iOS4.3, OS X Mountain Lion10.8 began to introduce ASLR technology to the whole population, and in fact, since the introduction of ASLR, the threshold for hackers has been raised. It is no longer the age when anyone can be a hacker.

Now that we’ve covered the whole process and principle of physical memory, virtual memory, and memory paging, let’s move on to the focus, binary rearrangement.

Binary rearrangement

An overview of the

Now that we know that paging can trigger interrupts and that Page faults can block processes, we know that this problem can have an impact on performance.

In fact, iOS also performs a signature verification for production applications when they are reloaded due to page missing interruptions. As a result, the application Page fault in the iOS production environment is more time-consuming.

The cost of a Page Fault shared by the Tiktok team was 0.6-0.8ms. The actual test found that different pages would be different, and it was also related to the CPU load state, which was between 0.1-1.0ms.

When users use the app, the first immediate impression is that it takes time to start the app. However, due to the large number of classes, categories, tripartite and so on that need to be loaded and executed during the startup period, the time caused by multiple page faults can not be underestimated. This is also necessary for binary rearrangement for boot optimization.

Binary rearrangement optimization principle

Suppose at startup we need to call two functions method1 and method4. The position of function compilation in Mach -O is based on the compile order of LD (Xcode’s linker) rather than the call order. Therefore, it is likely that the two functions are distributed on different memory pages.

At startup, both Page1 and Page2 need to be loaded into physical memory from scratch, triggering a Page fault twice.

The binary rearrangement is to place method1 and method4 on a single memory page, and then only load Page1 at startup, triggering a Page fault once for optimization purposes.

The practice in a real project is to group together the functions that need to be called at startup (for example, in the first 10 pages) to minimize page faults for optimization purposes. This is called binary rearrangement.

At this point, I believe many students can’t wait to see how the binary rearrangement. It’s easy to do, but there are a few things you need to know before you do it:

  • How to detect Page Faults: First of all, we need to know how to look at page Faults to help us see before and after optimization.

  • How to rearrange binary.

  • How to check their successful rearrangement?

  • How to detect all the methods that need to be called when you start.

    • hook objc_MsgSend(Only availableocAs well asswiftadd@objc dynamicModified method).
    • Static scanmachoSymbol and function data stored in a particular section or section. (static scan, mainly used to obtainloadMethod,C + + constructsFor c++ construction, seeComb through the dyLD loading process from scratchThis article is covered in detail and demonstrated).
    • clangPeg (perfect version, completely getswift , oc , c , blockAll functions)

It’s a lot of stuff. Let’s take it one by one.

How do I view page Fault

Tip:

If you want to see the actual number of Page faults, you should uninstall the application, see the effect after the first application installation, or open many other applications first.

Since you have run the app before, and part of the app has been loaded into physical memory and mapped to the map table, restarting the app will trigger less page missing interrupts, as well as killing the app before opening it.

In fact, the hope is to overwrite/clean up the physical memory loaded before, reduce the error.

  • 1 ️ ⃣ : openInstruments, select System Trace.
  • 2️ discount: choose real machine, choose project, click start, when the first page is loaded out click stop. Note here that it is best to kill and reinstall the application, because the definition of hot and cold startup does not necessarily mean that killing and restarting the application in the background is cold startup because of the process.
  • 3️ : Wait for the completion of analysis, check the number of page missing
    • Background kill to restart the application
    • Start the application for the first time

Of course, you can do a side check by adding DYLD_PRINT_STATISTICS to check the total time spent in the pre-main phase.

You can test the following scenarios to get a better understanding of cold or hot starts and physical memory page coverage.

  • The application is installed for the first time
  • Start when the application background is not open
  • Kill the background and restart
  • Restart without killing the background
  • After killing the background, open some other applications and start again

How does binary rearrangement work

With all that said, it’s time to start doing binary rearrangement. It’s actually quite simple, Xcode already provides this mechanism, and LibobJC is actually optimized with binary rearrangement.

Refer to the below

  • First of all,XcodeIt’s using a linker calledld , ldThere is a parameter calledOrder File, we can configure one with this parameterorderPath to the file.
  • In thisorderIn the file, write down the symbols you need in order.
  • When the projectbuildThe time,XcodeThe file is read and the binary package is generated in the order of the symbols in the filemach-O .

Binary rearrangement questions – Digression:

  • 1️ : Order will there be a problem if the symbol in the file is wrong or the symbol does not exist?

    • A:ldI’m going to ignore these symbols, and in fact if you providelinkoptions-order_file_statistics, willwarningTo print the missing symbols in the log. .
  • 2️ discount: Some students may consider whether this approach will affect retail sales?

    • A: First of all,objcThe source code itself uses this approach.
    • The binary rearrangement simply rearranges what is generatedmachoThe order of function tables and symbol tables in.

How to view the symbol order of my project

Before and after rearranging, we need to check whether our symbol order has been changed successfully, and this time we use Link Map.

The Link Map is generated during compilation and records the layout of the binaries. Write Link Map File is used to set the output or not. The default is no.

Clean it up, run the project products-show in Finder, and find the upper and upper directory of Macho.

.txt

This file stores the order of all Symbols in the # Symbols: section (ignore.o, which will be covered in more detail in the LLVM compiler chapter).

As you can see, this symbol order is obviously in the order of the files Compile Sources has.

Tip:

The leftmost address in the above file is the actual code address, not the symbolic address, so we binary rearrange not just change the symbolic address, but use the symbolic order, rearrange the offset address of the entire code in the file, and put the address of the method that needs to be loaded into the previous memory page. To reduce the number of page faults and optimize time, be aware of this.

To help you understand this, you can use MachOView to look at the sequence of the source code before and after the _text section.

We practice

Go to the project root directory and create a new file, touch lb.order. Pick a few random methods that need to be loaded at startup, such as the ones BELOW.

-[LBOCTools lbCurrentPresentingVC]
+[LBOCTools lbGetCurrentTimes]
+[RSAEncryptor stripPublicKeyHeader:]
Copy the code

Write to the file, save, configure the file path.

Rerun and view.

As you can see, the three methods we have written have been put first. So far, the code generated in Macho with the smallest offset from the first address is the three methods we have written. Assuming they were originally on three different pages, we have optimized the two page faults.

Error message

Can’t open the order file. It’s because of the file format. You don’t need to use the MAC text editor. Use the command tool touch to create.

Gets a symbol to start loading all functions

At this point, there is only one problem left, and that is how to know which methods I need to call to start my project, which we touched on a little bit in the previous chapter.

  • hook objc_MsgSend(Only availableocAs well asswift @objc dynamicAfter the method, and due to the number of variable parameters, need to use assembly to obtain parameters.)
  • Static scanmachoSymbol and function data stored in a particular section or section. (static scan, mainly used to obtainloadMethod,C + + constructsFor c++ construction, seeComb through the dyLD loading process from scratchThis article is covered in detail and demonstrated).
  • clangPeg (perfect version, completely getswift , oc , c , blockAll functions).

In this article, we will talk about how to hook all the symbols of a function by means of a compile-time peg.

Clang plugging pile

The official documentation for clang’s pile coverage is as follows: The clang code Coverage tool documentation provides a detailed overview, as well as a brief Demo.

thinking

In fact, there are two main implementation ideas for CLang piling. One is to write the clang plug-in by ourselves (the custom clang plug-in will lead you to write a plug-in of your own in the follow-up layer LLVM). The other is to use a tool or mechanism already provided by Clang itself to implement our need to get all symbols. In this article, we will practice according to the second way of thinking.

The principle of exploring

Create a new project to test and use the mechanics and principles of the static piling code coverage tool.

Follow the instructions in the document.

  • First, add the compile Settings.

Add Other C Flags to Apple Clang-Custom Compiler Flags

-fsanitize-coverage=trace-pc-guard
Copy the code
  • addhookThe code.
void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                                    uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if(! *guard)return;  // Duplicate the guard check.

  void *PC = __builtin_return_address(0);
  char PcDescr[1024];
  //__sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}
Copy the code

This is written in viewController.m of empty project.

  • Run the project and view the print

The code names the two pointer addresses printed after INIT as start and stop. So let’s use LLDB to see what is stored in the memory address from start to stop.

We found that the number stored is from 1 to 14. So we add an oc method.

- (void)testOCFunc{
    
}
Copy the code

Run it again.

0e
0f
1
14
1
15

So let’s add another C function, a block, and a touch screen method.

It is also found that the serial number increases to 18 successively, so we get a guess that this memory interval is the number of all symbols of the project.

Second, we call the c function in the touch screen method, which calls the block. So we click on the screen and find this:

Finding that we actually call several methods prints guard several times:.

In fact, this is similar to what we did with buried statistics. Add a breakpoint to the touch method to view the assembly:

In assembly we find that the actual code in the first line of each function call (except stack balancing and register data preparation) has been added a BL call to __sanitizer_cov_trace_pc_guard.

In fact, this is the principle and name of static piling.

Summary of static piling

Static piling actually adds hook code (our __sanitizer_cov_trace_pc_guard function) to the binary source data inside each function at compile time to implement the global method hook effect.

doubt

Some students may have some questions about my summary of the principle of the above statement.

Is it simply to modify the binary and add assembly code inside each function to call the hook function, or is it just like the compiler adds a flag to the generated binary and then automatically calls the hook code one more step if it has this flag at runtime?

I use Hopper here to look at the generated Mach-O binaries.

As we can see from the binary source above, it is true that the assembly code that calls the extra method is added inside the function from the beginning. That’s why we call it static piling.

So now that we know the principle in general, how do we get the sign of the function?

Gets all function symbols

Let’s get our heads together.

Train of thought

We now know that __sanitizer_cov_trace_pc_guard is called as the first step in any function. So those of you who are familiar with assembly might have this idea:

When the function is nested, the address of the next instruction is saved in the X30 (also known as LR register) when the rotor function is skipped.

For example, if function A calls function B, which in ARM assembly is bl + 0x****, the instruction will first store the address of the next assembly instruction in the X30 register, and then execute it at the specified address passed after the jump to BL. Bl can implement assembly instructions to jump to a certain address, the principle is to change the value of the PC register to point to the address to jump, and in fact, B function will also protect the value of the X29 / X30 register to prevent the sub-function jump to other functions will overwrite the value of X30, of course, Except for leaf functions.).

When B executes ret, the return instruction, it reads the address of register X30 and jumps over, thus returning to the next level of the function.

This is actually possible. We wrote this line in __sanitizer_cov_trace_pc_guard:

void *PC = __builtin_return_address(0); 
Copy the code

It is used to read the address stored in X30 to return to the current instruction. So its name is __builtin_return_address. In other words, this address is where I’m going to return to after the current function has finished executing.

In fact, BT function call stack is also this idea to achieve.

That is, we can now use the __sanitizer_cov_trace_pc_guard function, Use the __builtin_return_address number to get the address of the next instruction in assembly code that calls __sanitizer_cov_trace_pc_guard.

Maybe it’s a little convoluted, but let me draw a picture to sort out the process.

Gets the function name based on the memory address

How do I get the function name when I get the address of the line inside the function? Here the author shares his train of thought.

In order to prevent fishhook hook from being used for some specific methods, we open the dynamic library with Dlopen, get a handle, and then get the memory address of the function to call directly.

Is it similar to our process, except we seem to reverse it? Well, the opposite is also possible.

As with dlopen, there is a method in dlfcn.h as follows:

typedef struct dl_info {
        const char      *dli_fname;     /* 所在文件 */
        void            *dli_fbase;     /* File address */
        const char      *dli_sname;     /* Symbolic name */
        void            *dli_saddr;     /* Function start address */
} Dl_info;

// This function can find the function symbol by the internal address of the function
int dladdr(const void *, Dl_info *);
Copy the code

To experiment, import the header #import and modify the code as follows:

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    if(! *guard)return;  // Duplicate the guard check.
    
    void *PC = __builtin_return_address(0);
    Dl_info info;
    dladdr(PC, &info);
    
    printf("fname=%s \nfbase=%p \nsname=%s\nsaddr=%p \n",info.dli_fname,info.dli_fbase,info.dli_sname,info.dli_saddr);
    
    char PcDescr[1024];
    printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}
Copy the code

View the print result:

I finally see the symbol we’re looking for.


Collect symbol

Now, many of you might think, well, why don’t you just go to the project and get all of my symbols, write them in the order file, and be done with it?

Why is that?

Clang static piling – Pit point 1

→ : Multithreading problem

This is a multi-threaded problem, and since each method in your project may be executed in a different function, __sanitizer_cov_trace_pc_guard may also be affected by multiple threads. So of course you can’t just take all the symbols in an array and do it.

There are many ways to do that, the author here to share their own approach:

Considering that this method will be used many times and the use of locks will affect performance, we use apple’s underlying atomic queue (which is actually a stack structure and uses queue structure + atomicity to ensure order) to achieve this.

- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event{
    // walk through the queue
    while (true) {
        //offsetof is to find the offsetof a property relative to a structure
        SymbolNode * node = OSAtomicDequeue(&symboList, offsetof(SymbolNode, next));
        if (node == NULL) break;
        Dl_info info;
        dladdr(node->pc, &info);
        
        printf("%s \n",info.dli_sname); }}// Atomic queue
static OSQueueHead symboList = OS_ATOMIC_QUEUE_INIT;
// Define the symbolic structure
typedef struct{
    void * pc;
    void * next;
}SymbolNode;

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    if(! *guard)return;  // Duplicate the guard check.
    void *PC = __builtin_return_address(0);
    SymbolNode * node = malloc(sizeof(SymbolNode));
    *node = (SymbolNode){PC,NULL};
    
    / / team
    // offsetof is used here to add the next node to the queue and find the position of the next pointer to the previous node
    OSAtomicEnqueue(&symboList, node, offsetof(SymbolNode, next));
}
Copy the code

When you’re excited to start thinking about multi-threaded solutions and you’re done writing them, run it and find:

It’s a loop.

Clang Static Piling – Pit point 2

→ : The above clang pegging method will insert hook code in the loop as well.

When we confirm that we are queuing and queuing are ok, and your own writing method is ok to save and read, we found this pit, this will be an infinite loop, why?

Here I will not take you to analyze the assembly, directly said the conclusion:

Assembly will see a method with a while loop that is statically added multiple times to __sanitizer_cov_trace_pc_guard calls, resulting in an infinite loop.

→ : Solution

Change Other C Flags to the following:

-fsanitize-coverage=func,trace-pc-guard
Copy the code

On behalf of the func to hook. Run again.

Think we’re done again? Haven’t..

Pit 3: load method

The guard parameter of __sanitizer_cov_trace_pc_guard is 0 when the → : load method is used.

The above print did not find load.

Solution: Mask the __sanitizer_cov_trace_pc_guard function

if(! *guard)return;
Copy the code

So you have the load method.

Here’s a hint:

If we want to start optimizing after/before a function, we use a global static variable to change its value at a specific time in the function __sanitizer_cov_trace_pc_guard.

Residual refinement work

  • If you’re using my multithreaded approach, we’re going to go backwards because of the first in, last out
  • It still needs to be done.
  • orderFile Format RequirementscThe function,blockWe need to add it before we call it_, underscore.
  • Write to the file.

The complete code of the author’s demo is as follows:

#import "ViewController.h"
#import <dlfcn.h>
#import <libkern/OSAtomic.h>
@interface ViewController(a)
@end

@implementation ViewController
+ (void)load{
    
}
- (void)viewDidLoad {
    [super viewDidLoad];
    testCFunc();
    [self testOCFunc];
}
- (void)testOCFunc{
    NSLog("@" oc function);
}
void testCFunc(){
    LBBlock();
}
void(^LBBlock)(void) = ^ (void) {NSLog(@"block");
};

void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                         uint32_t *stop) {
    static uint64_t N;  // Counter for the guards.
    if (start == stop || *start) return;  // Initialize only once.
    printf("INIT: %p %p\n", start, stop);
    for (uint32_t *x = start; x < stop; x++)
        *x = ++N;  // Guards should start from 1.
}
- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event{
    NSMutableArray<NSString *> * symbolNames = [NSMutableArray array];
    while (true) {
        //offsetof is to find the offsetof a property relative to a structure
        SymbolNode * node = OSAtomicDequeue(&symboList, offsetof(SymbolNode, next));
        if (node == NULL) break;
        Dl_info info;
        dladdr(node->pc, &info);
        
        NSString * name = @(info.dli_sname);
        
        / / add _
        BOOL isObjc = [name hasPrefix:@ "+ ["] || [name hasPrefix:@"-["];
        NSString * symbolName = isObjc ? name : [@ "_" stringByAppendingString:name];
        
        / / to heavy
        if (![symbolNames containsObject:symbolName]) {
            [symbolNames addObject:symbolName];
        }
    }

    / / the not
    NSArray * symbolAry = [[symbolNames reverseObjectEnumerator] allObjects];
    NSLog(@ "% @",symbolAry);
    
    // Write the result to a file
    NSString * funcString = [symbolAry componentsJoinedByString:@"\n"];
    NSString * filePath = [NSTemporaryDirectory() stringByAppendingPathComponent:@"lb.order"];
    NSData * fileContents = [funcString dataUsingEncoding:NSUTF8StringEncoding];
    BOOL result = [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil];
    if (result) {
        NSLog(@ "% @",filePath);
    }else{
        NSLog(Error writing file); }}// Atomic queue
static OSQueueHead symboList = OS_ATOMIC_QUEUE_INIT;
// Define the symbolic structure
typedef struct{
    void * pc;
    void * next;
}SymbolNode;

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    //if (!*guard) return;  // Duplicate the guard check.
    
    void *PC = __builtin_return_address(0);
    
    SymbolNode * node = malloc(sizeof(SymbolNode));
    *node = (SymbolNode){PC,NULL};
    
    / / team
    // offsetof is used here to add the next node to the queue and find the position of the next pointer to the previous node
    OSAtomicEnqueue(&symboList, node, offsetof(SymbolNode, next));
}
@end
Copy the code

File is written to TMP path, run, open mobile phone download view:

With that done, you can immediately optimize your project.

Swift engineering/mixed engineering problems

The above method is suitable for the pure OC project to obtain symbols.

Because swift’s compiler front end is its own Swift compiler front end, the configuration is slightly different.

Search for Other Swift Flags and add two configurations:

  • -sanitize-coverage=func
  • -sanitize=undefined

The Swift class can also get symbols through the above method.

Cocoapod project problem

For libraries introduced by the Cocoapod project, due to different targets. Write Link Map File, -fsanitize-coverage=func,trace-pc-guard, order File, etc. The solution is to set the desired target.

For the SDK directly imported into the project manually, whether it is static library. A or dynamic library, the default main project Settings can be used, it is possible to get symbols.

As a final note, manually imported tripartite libraries will not load if they are not imported and used. The same goes for adding the load method.

Some students reported an error after adding Settings

-fsanitize-coverage=func,trace- PC-guard Other C Flags

After adding the flag, you need to implement hook code, otherwise the compiler will tell you that the symbol cannot be found.

According to the “Clang Piling -> Principle Exploration” part of this paper, add hook function to achieve. Or you can just use the full demo code.

Effect monitoring after optimization

The page fault sample is also captured on the first interactive interface after the complete first installation of cold boot, ensuring the same environment. The results before and after rearrangement optimization are as follows.

  • Before optimization
  • The optimized

In fact, in a production environment, because page Faults also require signature verification, the optimization effect is even greater in a distribution environment.

conclusion

Based on the actual carbon process, this paper realizes the complete process of clang static piling to achieve the binary rearrangement and optimized start-up time step by step.

The specific implementation steps are as follows:

  • 1 ️ ⃣ : useclangThe peg gets all that needs to be loaded at startup timeFunction/method , block , Swift methodAs well asC++ constructorThe symbols.
  • 2️ : Realization of binary rearrangement through order File mechanism.

If you have any questions or different views, please leave a message.