Background:

  • Mach-o files are executables, object code or dynamic libraries for operating systems based on the Mach core. They are an alternative to.out, which provides greater extensibility and speed up access to information in symbol tables.
  • Symbol tables are used to mark specific information about elements in source code, such as identifiers, declaration information, line numbers, function names, etc., such as data types, scopes, and memory addresses. The iOS symbol table is in dSYM files
  • Separated into three steps, the program compiles the build process: the source file preprocessing (processing), processing precompiled directives, generated. I file, the next step to compile, lexical analysis (lex tools to identify the lexical semantic rules table), syntax analysis and semantic analysis to generate. S assembly, final assembly, generate the binary to the target file. O. The object file is then linked by the linker to form an executable file. A or Mach-o file.
  • Link is divided into dynamic link and static link, static link will be all the object file. O all content link to the execution file, if other execution file needs the function, must also be included. Dynamic linking To solve this space waste problem, only function information links are added to the execution file
  • Dyld is a library that loads dynamically linked libraries that recursively load all the dynamic libraries needed to load an executable. Dynamic libraries include iOS operating system framework, OC Runtime system libobJC, and system-level libraries libSystem, such as libdispatch (GCD) and libsystem_block (Block).

App startup process

For an executable file, the loading process is divided into two parts:

  1. Pre-main is when the operating system starts executing an executable file and completes process creation, execution file loading, dynamic linking, and environment configuration
  2. Main refers to the process of loading the callback from the main function entry to the App Delegate

The operating system loads the App executable file

The operating system loads the executable, executes the executable in the new space with the fork instruction, loads the dependent executable (Mach-o) file, locates its internal and external pointer references, such as strings and functions, executes the C function declared attribute((constructor), Load the methods in the extension (Category), C++ static object load, and call ObjC’s +load function

Basic process:

After the App starts, the system first loads the executable file (the collection of all the.o files of its App), and then loads the dyld, a dynamic linker library specially used to load the dynamic link library. Execution starts with dyLD, which starts with the dependencies of the executable and recursively loads all of the dependency dynamically linked libraries. Dynamic link libraries include all system frameworks used in iOS, libobjc for loading OC Runtime methods, and system-level libsystems such as libdispatch(GCD) and libsystem_blocks (Block).

Dyld loads the dynamic library

The dynamic link library loading process is mainly done by DYLD, Apple’s dynamic linker.

  1. The system first reads the Mach-o headers in the executable file of App (the Mach-o file)
  2. Dyld initializes the runtime environment, obtains dynamic dependencies from it, enables caching, loads the associated dependency libraries (which also include our executable), links to those libraries, and finally calls the initialization method for each dependency library. In this step, the Runtime is initialized. After all dependent libraries are initialized, the last bit (the program executable) is initialized.
  3. Check and verify the existence and correctness of the symbol table
  4. Map All Mach-O files used for overall statistics on variable declarations, function calls, and so on
  5. Bind to correct memory addresses of symbols, functions, etc., referenced from other libraries
  6. Perform the rebase operation to correct references within your own library
  7. The Runtime system initialization initializes the class structure of all the classes in the project and then invokes all the Load methods.
  8. Finally, dyld returns the address of main, main is called, and we arrive at the familiar program entrance. When loading a Mach-O file (an executable file or a library), the dynamic linker first checks the shared cache to see if it exists, and if so, pulls it out of the shared cache and uses it. Each process maps this shared cache into its own address space. This approach greatly optimizes the startup time of applications on OS X and iOS.

Mach-o image file

The official document: developer.apple.com/library/arc…

Mach-o is the native executable format for binary files in OS X and is the preferred format for transferring code. Executable formats determine the order in which code and data in a binary file are read into memory. The ordering of code and data affects memory usage and paging activity, which directly affects program performance. The size of a segment is measured by the number of bytes in all segments it contains, rounded up to the next virtual memory page boundary. The Mach-O binaries are organized as Segements. Each set consists of one or more pieces. Each section has a different type of code or data. Seinterfaces always start with page boundaries, but the sections don’t have to be page aligned. Therefore, seinterfaces will eventually be 4096 bytes or a multiple of 4 kilobytes, of which 4096 bytes is the minimum size. The seinterfaces and sections of the Mach-O executable are named according to their intended use. The conventions of seINTERFACES names are to use all-uppercase letters starting with double underscores (for example, TEXT); The convention for section names is to use all lowercase letters starting with a double underscore (for example, text). There are several possible segements in the Mach-O executable, but only two are performance-related: the __TEXT section and the __DATA section.

The __TEXT Segment: Read Only The __TEXT Segment is a read-only area containing executable code and constant data. By convention, compiler tools create every executable with at least one read-only __TEXT segment. Since this segment is read-only, the kernel can map the __TEXT segment directly from the executable to memory once. When a segment is mapped to memory, it can share its contents among all processes. (This is mostly the case with frameworks and other shared libraries.) The read-only attribute also means that pages that make up __TEXT segments never have to be saved to backup storage. If the kernel needs to free physical memory, it can discard one or more __TEXT pages and re-read them from disk as needed. The main part of the __TEXT segment,sections distribution

  • __text Machine code for the compiled executable
  • __const General constant data
  • __cString Literal string constants (reference strings in source code)
  • A position-independent code stub routine used by the __picsymbol_stub dynamic linker (DYLD)

The __DATA Segment: Read/Write __DATA Segment contains The nonvariable of The executable file. The seinterfaces are readable and writable, so that the __DATA segments of the static library or other dynamically shared library are logically copied for each process that is linked to the library. When an in-memory page is read or written, the kernel makes it copy-on-write. This technique can be done. The dynamic library is shared in memory and can be accessed by other processes. However, since __DATA Segment is readable and writable, a separate copy of _DATA Segment is performed when a process writes to the shared _DATA Segment. The __DATA segment has many parts, some of which are only used by the dynamic linker. Some of the more important parts that can appear in __DATA segments are listed below. For a complete list of segments, see Mach-O Runtime Architecture.

  • A global variable initialized by __data (e.g. Int a = 1; Static int a = 1; .
  • __const constant data to be repositioned (for example, char * const p = “foo”;)
  • __bss uninitialized static variables (for example, static int a;) .
  • __common External global variables that are not initialized (for example, int a; External function block).
  • The __dyLD placeholder section, used by the dynamic linker.
  • __la_symbol_ptr lazy symbol pointer. Symbolic pointer to each undefined function called by the executable.
  • __nl_symbol_ptr non-lazy symbol pointer. Symbol pointer for each undefined data symbol referenced by the executable.

The composition of the __TEXT segment and __DATA segment of the Mach-O executable is directly related to performance. The techniques and objectives for optimizing these sections are different. However, their common goal is to improve memory efficiency.

The most typical Mach-O file consists of executable code in __TEXT, __TEXT. The __TEXT segment, for example, is read-only and maps directly to the executable, so if the kernel needs to reclaim physical memory for some __TEXT pages, it doesn’t have to save the pages to the Back Store and then page them. It simply frees memory and reads it back from disk when referenced later in the code. While this is less expensive than swapping memory pages, because it’s just a disk access, not a swap of two memory pages, it still hurts performance, especially if many pages have to be recreated from disk.

The improvement in this case is to improve the reference location of the code by program reordering, as described in improving the reference location. This technique groups methods and functions together, depending on the order in which they are executed, how often they are called, and how often they call each other. If pages in a __text partial group work logically in this way, they are less likely to be released and read back multiple times. For example, if you put all the boot-time initialization functions on one or two pages, you don’t have to recreate the page after initialization occurs.

Unlike __TEXT segments, __DATA can be written to segments, so __DATA pages in segments are not shareable. Non-trivial global variables in the framework can have an impact on performance because each process linked to the framework gets a copy of these variables. The main solution to this problem is to move as many non-constant global variables as possible to __TEXT, __const by declaring them partially const. The Reduce Shared Memory page describes this and related techniques. This is usually not an application problem because the __DATA part of the application is not shared with other applications.

The compiler stores variable global data of different types in different parts of the segment __DATA. These types of data are uninitialized static data and symbols consistent with the undeclared “tentatively defined” ANSI C concept extern. Uninitialized static data is in the __DATA section of the __bSS segment. The temporary symbol is in the __common section of the __DATA.

The ANSI C and C ++ standards specify that the system must set uninitialized static variables to zero. (Uninitialized other types of uninitialized data.) Because uninitialized static variables and temporary definition symbols are stored in separate parts, the system needs to treat them differently. But when variables are in different parts, they are more likely to end up on different memory pages, so they can be swapped separately, making the code run slower. As described in the Reduce Shared Memory page, the solution to these problems is to merge a very large amount of global __DATA in one part of the segment.

ObjC Runtime

The dyLD load process initializes the Runtime system, and there is considerable optimization that can be done at this stage

  1. All types of definition and registration, Objective-C classes are not determined by the compiler, but are dynamically loaded into the global table at runtime
  2. Non-fragile ivars variable offset update, fixed instance variable memory address offset issue
  3. Classes are replaced and added to the list of methods. Methods in a class are loaded to the list of methods
  4. Verify that the selector is globally unique

Initializers phase

After the Runtime system loads, initialization begins

  1. Objc’s +load() function
  2. Attribute ((constructor)) void DoSomeInitializationWork()
  3. The creation of a C++ static global variable (usually a class or structure) of a non-basic type, such as the construction of a global static structure, can slow startup if there is heavy work in the constructor

Pre-main stage analysis

The following conclusions can be drawn from the above. Factors affecting the start-up time of this stage are as follows:

  1. Load and memory reallocation planning of executable files, scheduling of virtual memory paging management for its segments and sections
  2. Dyld dynamically links public images in memory, checking shared data and link calls at run time
  3. Initialization of Runtime, including class registration, category loading, variable alignment, etc
  4. C++ static object and global variable loading
  5. ObjeC all load function calls are loaded

Optimization measures:

  1. Reduce ObjC’s class bloat problem, clean up unused classes, and merge loose and useless classes
  2. Reduce the separation of static variable declaration and initialization
static int x; static short conv_table [128]; Static int x = 0; static short conv_table [128] = {0};Copy the code

Reduce the use of static variables 3. Reduce the export of symbol table You can set -exported_symbols_list or -unexported_symbols_LIS to limit the export of symbol table, thus reducing the workload of DYLD 4. Remove the unused frameworks dependency, specify whether the frameworks depend on require or optional, and the optional will dynamically check the frameworks. Remove useless methods 6. Reduce the implementation of the +load function and reduce the logic operating in it 7. Binary some frequently called code, generate static library, use static library instead of dynamic library, and centralize multiple static library frameworks into static framework, so as to reduce dyLD link work. The differences between cold start and hot start are as follows:

The main stage

  1. The load life cycle callback of the AppDelegate proxy
  2. Layout, draw, and load the Application Window
  3. Loading optimization points of RootViewController:
  4. Compress and reduce the startup image
  5. Try not to use storyboards or NIBs to lay out rootViewControllers
  6. In didFinishLaunchingWithOptions phase, minimize blocking code execution, can use multithreading loading logic processing, pay attention to the multithreaded synchronization in the main obstruction may cause the problem of black screen
  7. Load the initialization logic for asynchronous requirements asynchronously