Start optimizing the routine

I’ve written a summary of iOS startup optimization before. Review those optimizations for the pre-main loading process for your APP.

  • Xcode adds environment variablesDYLD_PRINT_STATISTICS

  • printpre-mainThe time consumption of each stage of the process is analyzed for different stages

1. Dynamic library loading time optimization scheme: Dynamic library is officially recommended to be used most by Apple6A, is greater than the6We can optimize by merging dynamic libraries

2. Rebinding time optimization scheme: The rebinding process of virtual memory space and physical memory space can be optimized by binary data rearrangement

3, OC class registration time optimization scheme: reduce the definition of OC class, as long as the class is in the project, it will cause memory consumption

4, load, constructor time optimization scheme: put the start time operation on the child thread

Binding vs. Link

If you look at the entire process from the APP code to the IPA package, and then to the program launch, you will find that binding occurs at runtime, while link occurs at compile time.

Binary rearrangement

In addition to those general operations, where else can we start to optimize our startup? We can start with the rebinding process in pre-main.

The prospect of virtual memory

The development of early computers was still in the era of physical memory. At that time, some problems related to memory were solved by expanding the physical memory. However, there were two obvious problems in the physical memory at that time:

  • Out of memory
  • Security issues

Both problems were solved with the further development of computers, mainly with the invention of virtual memory by human beings. The loading mode of memory is also changed to lazy loading mode, which memory is used to load which, to avoid unnecessary waste of memory. In memory, where data is arranged page by page, paging improves memory execution efficiency. inCPUThere’s a component in theMMUIt is called the memory management unit, and its main function is to translate memory addresses. In memoryVirtual memoryandPhysical memoryThere’s one in betweenThe mapping table, as shown in the figure below.For example, no matter how to modify the virtual memory address of the access, the process memory access space allocated by the system cannot be jumpedSecurity isolationEach process has oneThe mapping tableTo ensure memory security.Physical memoryIt is managed by the operating system.

Relationships between sections and pages

Segments are in the format of mach-O files and have nothing to do with memory. A page is a unit of memory. A page of memory on iOS is 16K, while a page of memory on Mac is 4k.

Missing page exception (PageDeault)

The memory address accessed by the mobile phone is the virtual address accessed. When the user operates a certain function and the corresponding physical memory is not loaded into the physical memory space, the operating system will have page missing exception and page missing interruption, which will interrupt the current process and current code in the CPU and get stuck. At the same time, the operating system will find a suitable place to put the current virtual memory data in physical memory.

Page replacement

When the operating system loads data, it swaps out the less active parts of physical memory. The phone has 8 gigabytes of virtual memory, and only 4 gigabytes for each app. Why not 8GB but 4G, in order to isolate 32 bit, compatible with 32 bit operating system. The communication between processes can only use the interface provided by the system to send signals to the interface. Apple proposed ASLR technology to solve the problem of data insecurity. ASLR concept: Address Space Layout Randomization is a security protection technology against buffer overflow. By randomizing the linear area Layout of heap, stack and shared library mapping, it increases the difficulty for the attacker to predict the destination Address. It is a technique to prevent the attacker from locating the attack code directly and prevent overflow attack. The purpose is to configure the data address space in a random way, so that some sensitive data can be configured to an address that the malicious program cannot know in advance, so that the attacker is difficult to attack. Due to the existence of ASLR, the loading address of executable files and dynamic linked libraries in virtual memory is not fixed every time they are started, so the resource pointer in the image needs to be fixed at compile time to point to the correct address. The correct memory address = ASLR address + offset value. The correct memory address is redirected rebase. ASLR technology is to make the application start at a random address and give an offset value for memory security. The address of the code block in the Mach-o file is actually an offset address.

Binary rearrangement

Why does binary rearrangement optimize startup speed? PageDeault missing pages are usually in milliseconds. When a large number of pages are missing at the same time, the user will perceive this situation. The answer is cold start. So how can we intuitively see the number of missing pages in memory? How can I reduce memory page misses?

  • The number of missing pages in memory when the APP starts can be built in by XcodeinstrumentsThe inside of theSystem TraceAnalysis.

After analyzing the demoPagefaultThe quantity is 184. Of course, this is just a test demo, but if we were working on a project, the number would be amazing. The methods at startup are distributed on different pages in memory, and the methods at startup are all placed in the first place in memory, which can be reducedPagefaultThe number of times. The following uses demo as an example to configure the projectorder fileCan achieve the purpose of modifying binary. Objc has studied the underlying know, in objC source code there is one.orderfileOpen the.orderThe following content followsSummary:.orderA file is a symbol file and is used by the compiler to arrange the binaries into symbol files. Secondly we should understand Xcode through configurationLinkMapYou can view the implementation order of the project code.According to the above configuration completed after compiling the project, willproductsThe inside of the.appfileShow In Finder

Eventually we can find onelinkMapFile, open itLet’s create one in the project home directory of demodemo.orderfileSo here we are.orderEdit and create some symbols in the fileXcode configurationorder file Open it again after compilingLinkMapfile🍺🍺🍺 clearly passedLinkMapYou can see that the code is implemented in the order we configured it.orderThe file has been modified. Does this mean that our project can also modify binary in this way? The answer is no, because the methods called at the start of our project are likely to be nested with multiple layers and have a very complex structure, which makes it difficult to analyze the actual call order of the methods.

expand

What we need is the order of methods to initiate the call. How can we get that? Direct observation is certainly unrealistic. C function, block.

  • hook objc_msgSend().
  • Through the script, scan the code.
  • 100%hookTo all methods usedClang plugging pile.

Clang plugging pile

First we are studyingClang plugging pileBefore, familiarize yourself with the officialClang plugging pileThe documentThe point of the document isTracing PCsIts main function is trackingCPUThe code that executes.

  • Create a test Demo inbuild setttinG insideother c flagS configuration-fsanitize-coverage=trace-pc-guard

  • Copy the Example in the document into the project and buID it to see if it succeeds

__sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop)The function is going to open up some memory and it’s going to give us the memory addresses of the starting and ending locations. throughstopandstartWe can determine the number of symbols, the number of functions that this function can monitor globally. Let’s test our guess. First of all getstopLet’s subtractuint32_tThe space that the type takes up is 4, and you get the last data. On the breakpoint, we print the memory address of the last data as followsAt this point we can see from the memory data that the total number of symbols is 11, now let’s add a function method and see if this value changes?After adding the function method, the total number of symbols becomes 12, proving our conjecture is correct. Our goal is to take all the symbol names and sequences and generate them.orderFile.__sanitizer_cov_trace_pc_guard(uint32_t *guard)A function is a function that can hook into everything, and we can start with it.We open up a sliver thread and print the current thread in the function.You can see the function from the print__sanitizer_cov_trace_pc_guard(uint32_t *guard)Callback callbacks are multithreaded, and there will be multithreaded access, which requires thread-safety. Where the callback returns the result insidevoid*PC = __builtin_return_address(0).PCThe address is the address of the previous function, which gives us a chance to get the symbolic name of the function.We’re going to pass the information that PC points to that function area toDL_infoStructure, and the function call stack is passed throughreturnTo display the address of. As a result, we found that the console printed the results in an infinite loop. πŸ˜… Why? Clang willhookGo to the while loop. Discover modification after fumblingother c flagsfor-fsanitize-coverage=func,trace-pc-guardCan be solvedAfter modifying the configuration of the buID function, click the screen to find the console printed below🍺🍺 looks like the output is very close to the symbol name we need, now just unduplicate, reverse the call order and ok. After de-recasting and sorting, the output is as followsWe can sandbox our final.orderOpen the file as followsFinally it paid off and we got ours.orderFile, then we can follow the steps at the beginning of this articleLinkMapandorder fileFile, after the configuration is completed, let’s compile and have a lookLinkMapfileAccording to theLinkMap🍺🍺🍺🍺 disk 🍺🍺 Next we can rearrange the binary data according to the demo method and observe our memoryPagefaultIs there any change in quantity?

OC and Swift mixed project binary rearrangement

aboutOCwithSwiftMixed binary rearrangement is actually the same as above, butbuid settingThe following additional configuration is required.other Swift flagsadd-sanitize-coverage=funcand-sanitize=undefined.

conclusion

  • Clang plugging pileOnly hook with function methods.
  • As long as addClang plugging pileFlag, then the compiler will be at allmethods,function,blockAdd a sentence to the code edge__sanitizer_cov_trace_pc_guardThe code,Clang plugging pileIt is an official protocol, often used to do code reviews, to a certain extent, performance loss. All we’re getting.orderThe file needs to be cleared laterClang plugging pileThe tag.
  • inClang plugging pileTo record symbol names, you need to place the data in a thread-safe atomic queue to record the order in which symbols are executed
  • Don’t waste resources during project developmentThe business logicOn the optimization. Any optimization is built on the basis of waste, without waste, there is no room for optimization.

IOS startup optimization of binary rearrangement analysis to this point, this article describes the views of personal opinion, if you have any questions welcome to comment. Need this article test demo please leave a message below!