This is the 27th day of my participation in the August Genwen Challenge.More challenges in August

In the previous two articles, I have introduced some basic concepts and ideas for starting optimization. Now I will focus on an optimization scheme for the pre-main stage, namely binary rearrangement.

1. Principle of binary rearrangement

In the virtual memory section, we know that when a process accesses a virtual memory page where the corresponding physical memory does not exist, a Page Fault is triggered, thus blocking the process. At this point, the data needs to be loaded into physical memory and then accessed again. This has some impact on performance.

Based on Page Fault, we think that in the process of cold startup of App, there will be a large number of classes, categories, and third parties that need to be loaded and executed, and the resulting Page Fault will take a lot of time. Taking WeChat as an example, let’s take a look at the number of Page faults in the startup stage

  • CMD+iShortcut key, selectSystem Trace

  • Click Start (you need to restart the phone and clear the cache data before starting), stop the first interface, and follow the operation in the following figure

It can be seen from the figure that WeChat has occurred PageFault 2800+ times. It can be imagined that this isVery bad performance.

  • Then we’ll use the Demo to see how the methods are sorted at compile time, and define the following methods in the following order in the ViewController
@implementation ViewController

void test1(){
    printf("1");
}

void test2(){
    printf("2");
}

- (void)viewDidLoad {
    [super viewDidLoad];
    
    test1();
}

+(void)load{
    printf("3");
    test2();
}
@end
Copy the code
  • inBuild Setting -> Write Link Map FileSet toYES

  • CMD+B compiles demo, and then looks in the corresponding pathlink mapThe file, shown below, can be found in the classFunctions are loaded from top to bottom, andfileThe order is based onBuild Phases -> Compile SourcesIn order to load

From the number of Page faults and loading order, we can see that in factThe root cause of too many Page faults is that the methods that need to be called at startup time are in different pages. Therefore, our optimization idea is:Lining up all the methods that need to be called at startup time, i.e. on a single Page, turns multiple Page faults into a single Page Fault. This is binary rearrangementCore principles, as shown below

Note: iOS will also reload an app in production if a Page Fault occursSignature verificationPage faults in the iOS production environment take more time than the Debug environment.

2. Binary rearrangement practice

Now, let’s do some concrete practice, first understand some nouns

Linkmap is an intermediate product of iOS compilation and records the layout of binary files. You need to enable the Write Link Map File in Xcode’s Build Settings. The Link Map consists of three parts:

  • Object FilesThe path and file number of the link unit used to generate the binary
  • SectionsRecord the range of addresses for each Segment/section in Mach-O
  • SymbolsRecord the address range of each symbol in order

ld

Ld is the linker used by Xcode and has an order_file parameter. We can configure a File path with the suffix Order by setting it to Build Settings -> Order File. In this order file, the required symbols are written in the order in which they are loaded when the project is compiled to achieve our optimization

So the essence of binary rearrangement is to rearrange the symbols that start loading.

If the project is small, it is possible to customize an order file and manually add the order of methods. However, if the project is large and involves many methods, how do we get the function to start running? There are several ideas

  • 1, the hook objc_msgSendAs we know, the essence of a function is to send a message that will come at the bottomobjc_msgSend, but because the objc_msgSend parameter is mutable, it needs to passassemblyAcquisition, higher requirements for developers. And you can only get itOCAnd the swift,@objcMethods after
  • 2. Static scanningScanning:Mach-OSymbol and function data stored in a particular section or section
  • 3. Clang piling: batch hook, can achieve 100% symbol coverage, that is, full accessSwift, OC, C, blockfunction

3. The Clang of pile

LLVM comes with a simple code coverage test built in. It inserts calls to user-defined functions at the function level, base block level, and edge level. Santizer coverage is needed for our batch hook here.

The official documentation for clang’s pile coverage is as follows: The clang code Coverage tool documentation provides a detailed overview, as well as a brief Demo.

  • [Step 1: Configuration] EnableSanitizerCoverage
    • OC project, need to be in:In the Build SettingsIn the”Other C Flags“Add-fsanitize-coverage=func,trace-pc-guard
    • In case of Swift project, additional information in”Other Swift Flags“Add-sanitize-coverage=func 和 -sanitize=undefined
    • All binaries linked to the App need to be turned onSanitizerCoverageIn order to fully cover all calls.
    • Also throughpodfileTo configure the parameters
post_install do |installer| installer.pods_project.targets.each do |target| target.build_configurations.each do |config|  config.build_settings['OTHER_CFLAGS'] = '-fsanitize-coverage=func,trace-pc-guard' config.build_settings['OTHER_SWIFT_FLAGS'] = '-sanitize-coverage=func -sanitize=undefined' end end endCopy the code
  • Create a new OC fileCJLOrderFileOverride two methods
    • __sanitizer_cov_trace_pc_guard_init method

      • Parameter 1startIs a pointer to an unsigned int, 4 bytes long, equivalent to an arrayThe starting position, the starting position of the symbol (read from high to low)

      • Argument 2 stop, since the address of the data is read down (i.eRead from high to low, so the address is not the real address of stop, but the last address marked. When reading stop, because stop takes up 4 bytes,Stop Real address = stop printed address -0x4)

      • What does the value stored in the stop memory address represent? When adding a method/block /c++/ attribute to a method (three more), find that its value is also increased by the corresponding number, such as adding a test1 method

    • The __sanitizer_cov_trace_pc_guard method captures all symbols at the start time, enqueuing all symbols

      • parameterguardIt was a sentinel,Tell us which number was called
      • The storage of symbols requires a helpThe list, so you need to define the linked list nodeCJLNode.
      • throughOSQueueHeadAtomic queues are created to ensure read and write security
      • throughOSAtomicEnqueueMethods the nodeThe teamThe next symbol is accessible through the next pointer to the list
Static OSQueueHead queue = OS_ATOMIC_QUEUE_INIT; Typedef struct {void * PC; typedef struct {void * PC; void *next; }CJLNode; /* -start: start position -stop: not the address of the last symbol, but the address of the last symbol in the entire symbol table =stop-4 (because stop is an unsigned int, 4 bytes). Void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {static uint64_t N; if (start == stop || *start) return; printf("INIT: %p - %p\n", start, stop); for (uint32_t *x = start; x < stop; x++) { *x = ++N; }} /* Can fully hook methods, functions, and block calls to capture symbols, is multithreaded, this method only stores PC, in the form of a linked list - guard is a sentry, */ void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {// if (! *guard) return; // get PC /* -pc the current function returns the address of the previous call -0 the current function address, i.e. the return address of the current function -1 the address of the current function caller, */ void *PC = __builtin_return_address(0); // Create a node and assign CJLNode *node = malloc(sizeof(CJLNode)); *node = (CJLNode){PC, NULL}; // The symbol is not accessed by the subscript, but by the next pointer to the list, so we need to borrow offsetof (structure type, Next) OSAtomicEnqueue(&queue, node, offsetof(CJLNode, next)); }Copy the code
  • Step 3: Get all symbols and write them to a file
    • The while loop fetches symbols from the queue, processes prefixes for non-OC methods, and stores them in an array

    • The array is reversed because the queue is stored in reverse order

    • The array is de-weighted and the symbol of the method itself is removed

    • Convert the symbols in the array to a string and write it to the cjl.order file

extern void getOrderFile(void(^completion)(NSString *orderFilePath)){ collectFinished = YES; __sync_synchronize(); NSString *functionExclude = [NSString stringWithFormat:@"_%s", __FUNCTION__]; Dispatch_after (dispatch_time (DISPATCH_TIME_NOW, (int64_t) (0.01 * NSEC_PER_SEC)), dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{// Create symbol array NSMutableArray<NSString *> *symbolNames = [NSMutableArray array]; While (YES) {CJLNode *node = OSAtomicDequeue(&queue, offsetof(CJLNode, next)); if (node == NULL) break; // Save the PC to info Dl_info info; dladdr(node->pc, &info); // printf("%s \n", info.dli_sname); If (info.dli_sname) {// if (info.dli_sname) {if (info.dli_sname) {// If (info.dli_sname) {// If (info.dli_sname); BOOL isObjc = [name hasPrefix:@"+["] || [name hasPrefix:@"-["]; NSString *symbolName = isObjc ? name : [@"_" stringByAppendingString:name]; [symbolNames addObject:symbolName]; } } if (symbolNames.count == 0) { if NSEnumerator *emt = [symbolNames reverseObjectEnumerator]; (completion) {completion(nil); / / to heavy NSMutableArray < > nsstrings * * funcs = [NSMutableArray arrayWithCapacity: symbolNames. Count]; nsstrings * name; the while (name = [funcs nextObject]) {if (![funcs containsObject:name]) {[funcs addObject:name];}} // Remove itself [funcs RemoveObject: functionExclude]; / / the array into a string nsstrings * funcStr = [funcs componentsJoinedByString: @ "\ n"); NSLog(@"Order:\n%@", FuncStr); / / the string written to the file nsstrings * filePath = [NSTemporaryDirectory () stringByAppendingPathComponent: @ "CJL. Order"]; NSData *fileContents = [funcStr dataUsingEncoding:NSUTF8StringEncoding]; BOOL success = [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil]; if (completion) { completion(success ? filePath : nil); } }); }Copy the code
  • Step 4: IndidFinishLaunchingWithOptionsNote that the location of the call is up to you, and is generally the first interface to render
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    
    [self test11];
    
    getOrderFile(^(NSString *orderFilePath) {
        NSLog(@"OrderFilePath:%@", orderFilePath);
    });
    
    return YES;
}

- (void)test11{
    
}
Copy the code

These are the only three methods in cjL.order at this point

  • [Step 5: Copy the file, put it in the specified location, and configure the path.] Generally, put the file in the main project path, and clickBuild Settings -> Order FileIn the configuration./cjl.order, the following is the comparison before and after the configuration (the top is the familiarity before the configuration, and the bottom is the symbol order after the configuration)

Note: Avoid endless loops

  • Build Settings -> Other C FlagsIf yes is configured-fsanitize-coverage=trace-pc-guardIn theThe while loopPart of it will appearInfinite loop(we are intouchBeginDebug in method)

  • We opened assembly debugging and found three__sanitizer_cov_trace_pc_guardThe call

  • The first time bl istouchBegin

  • The third bl isprintf
  • The second bl is becauseThe while loop. That as long asIf it is a jump, it will be hookedThat there areBl, b“Will be hooked

The solution: will BuildSetting other C Flags-fsanitize-coverage=trace-pc-guardTo change to-fsanitize-coverage=func,trace-pc-guard