For questions about whether the program starts from main, refer to the runtime documentation. It will be helpful to know what comes next.

First acquaintance with DYLD

1.1. What is DYLD?

Dyld is short for the Dynamic Link Editor, which translates as dynamic linker and is an important part of apple’s operating system. In iOS/Mac OSX system, only a very small amount of process only needs the kernel can finished loading, basically all processes are dynamically linked, so Mach – O image file will have a lot of external reference library and symbols, but these references do not directly use, at startup also must pass the content of these references to fill, This filling is done by the dynamic linker DYLD, or symbol binding. The dynamic linker dyld exists in the system as a user executable file. Generally, applications specify a LC_LOAD_DYLINKER load command in the mach-o part of the file. This load command specifies the path of the dyLD. Usually its default value is /usr/lib/dyld. Dyld (at /usr/lib/dyld) is used to link the mach-o file to the kernel. Dyld simply loads the application’s Mach-O files into memory. In iOS 13, iOS will adopt the new DYLD 3 to replace dyLD 2.

Dyld is open source, so you can download the source code from dyLD’s official website to read how it works and learn the details of how the system loads the dynamic library. This article is in the DYLD-750.6 version of the source code for debugging.

1.2 DYLD Shared Cache

1.2.1 precompiled

The precompilation process mainly deals with precompiled commands that start with “#” in source code files.

  • Remove all “#define” and expand all macro definitions.

  • Handle all conditional precompiled instructions such as “#if”, “#ifdef”, “#elif”, “#else”, “#endif”.

  • Process the “#include” precompiled instruction by inserting the included file at the location of the compiled instruction. Note that this process is recursive, meaning that the included file may also contain other files.

  • Delete all comments “//”, “/* */”

  • Add line numbers and file name identifiers, such as #2 “hello.c” 2, so that the compile-time compiler can generate line number information for debugging and display line number information for compile-time compilation errors or warnings.

  • Keep all #pragma compiler directives because the compiler needs to use them.

1.2.2 compilation

The compilation process is generally divided into six steps: scanning, parsing, semantic analysis, source code optimization, code generation and object code optimization.

scanning

The preprocessed source code is fed into the Scanner. The Scanner’s job is simple: it simply parses the code. Using an algorithm similar to that of a Finite State Machine, it is easy to split the character sequence of the source code into a series of tokens. That is, lexical analysis, where code is sliced up into tokens, such as keywords, brackets, equals signs, and strings.

In other pre-processing languages, such as C, the work of macro replacement and file inclusion is usually left out of the compiler’s purview and handed over to a separate preprocessor.

The parser reads the stream of characters that make up the source program and organizes them into meaningful sequences of morphemes. For each morpheme, the parser produces as output a syntax unit (token) of the following form:

<token-name,attribute-value>

This lexical unit is passed on to the next step, parsing.

Syntax analysis

In this syntax unit, the first component token-name is an abstract symbol used by the parsing step, while the second sub-attribute-value points to an entry in the symbol table about the lexical unit. Symbol table entry information is used by semantic analysis and code generation steps.

Verify that the syntax is correct, and if an expression is not valid, such as various mismatched parentheses, missing operators in the expression, and so on, the compiler will report errors in the parsing phase. All nodes are then grouped into an abstract syntax tree AST.

Semantic analysis

The semantic parser uses information from the syntax tree and symbol table to check whether the source program is consistent with the semantics defined by the language. It also collects type information and stores it in a syntax tree or symbol table for later use during intermediate code generation.

An important part of semantic analysis is type checking. The compiler checks whether each operator has a matching operation component. For example, many programming languages define an array with an integer subscript. If you use a floating-point number as an array subscript, the compiler must report an error.

Parsing only completes parsing of the expression at the syntactic level, but it does not know whether the statement really makes sense. For example, in C, it makes no sense to multiply two Pointers, but this statement is syntactically valid; For example, is it legal to multiply a pointer with a floating point number? The only semantics that the compiler can parse are Static semantics, meaning semantics that can be determined at compile time, as opposed to Dynamic semantics, meaning semantics that can only be determined at runtime.

Static semantics usually include declaration and type matching, and type conversion. Dynamic semantics generally refers to semantically related problems that occur at run time, such as dividing 0 as a number is a run-time semantic error.

Generate IR (intermediate code)

In the process of translating a source program into object code, a compiler may construct one or more intermediate representations. These intermediate representations can take many forms. Syntax trees are intermediate representations that are commonly used in parsing and semantic analysis.

After parsing and semantic analysis of the source program, many compilers generate an explicit intermediate representation of low-level or machine-like languages. We can think of this representation as the program of some abstract machine. This intermediate representation should have two important properties: it should be easy to generate and easily translated into the language on the target machine.

Intermediate code allows the compiler to be divided into a front end and a back end. The compiler front end is responsible for generating machine-independent intermediate code, and the compiler back end converts the intermediate code into target machine code. This allows cross-platform compilers to use the same front end for different platforms and several back ends for different machine platforms.

After completing these steps, IR intermediate code generation can begin. CodeGen is responsible for translating the syntax tree from top to bottom into LLVM IR, which is the output at the front end of the compilation process and the input at the back end.

  • OC is compiled to IR by the CLang compiler, which is then regenerated into an executable. O

  • In Swift, the SWIFTC compiler is used to compile to IR and then regenerate to an executable file

The following is the compilation process in Swift. SIL (Swift Intermediate Language) is the Intermediate code in the Swift compilation process, which is mainly used to further analyze and optimize the Swift code. As shown in the figure below, SIL is located between AST and LLVM IR

Note: Swift differs from OC in that Swift generates a high-level SIL

1.2.3 assembly

To generate the assembly

clang -S -fobjc-arc main.m -o main.
Copy the code

Generate object file

clang -fmodules -c main.m -o main.o
Copy the code

Target file:

An object file is an intermediate file that has been compiled but not linked. It is similar to the content and structure of an executable file, so it is generally stored in the same format as the executable file format.

It’s not just executable files that are stored in executable format. Dynamic link library and static link library files are stored in executable file format.

One question: What if a variable in the object code is defined in another module? In fact, the absolute addresses of global variables and functions defined in other modules at final run time are determined at final link time. So modern compilers can compile a source file into an unlinked object file, and then the linker eventually links these object files together to form an executable file. Let’s go into the world of links with this question.

Links to 1.

First, static links

For example, we use foo() in another module, func.c, in the program module main.c. We have to know the exact address of the function foo whenever we call it in the mian.c module, but since each module is compiled separately, the compiler does not know the address of the function foo when it compiles main.c, so it temporarily shelves the target address of the instructions that call foo. The linker corrects the target address of these instructions while waiting for the final link. If we don’t have a linker, we need to manually correct each directive that calls foo by filling in the correct address for the foo function. When the func.c module is recompiled and the address of foo may change, all of our address directives in main.c that use foo will have to be reconfigured. This tedious work can be a programmer’s nightmare. Using the linker, you can refer directly to functions and global variables in other modules without knowing their addresses, because when linking, the linker automatically looks up the address of foo in the corresponding func.c module based on the symbol foo that you refer to, and then corrects all directives in the main.c module that refer to Foo. Having their target address be the address of the real function foo is the basic process and function of static linking.

Generally speaking, the whole linking process is divided into two steps:

Step 1 Space and address allocation

All input object files are scanned to obtain the length, attributes and positions of their segments, and all symbol definitions and symbol references in the symbol table of the input object file are collected and unified into a global symbol table. In this step, the linker will be able to get the segment lengths of all the input object files, combine them, calculate the combined lengths and positions of each segment in the output file, and establish a mapping relationship.

  • The concept of Symbol became widely used with assembly language to represent an address, either the starting address of a subroutine (later a function) or the starting address of a variable.
Step 2 symbol analysis and relocation

Using all the information gathered in the first step above, read the data from the summary section of the input file, relocate the information, and perform symbol resolution and relocation, adjust the addresses in the code, and so on. In fact, the second step is the heart of the linking process, especially the relocation process.

  • The process of recalculating the location of each target is called Relocation.

Dynamic linking

Under Linux, the Dynamic Linker ld.so is actually a shared object that the operating system loads into the process’s address space via mapping. After loading the dynamic linker, the operating system cedes control to the dynamic linker’s entry address (like executables, shared objects have entry addresses). When the dynamic linker gains control, it performs a series of initialization operations of its own, and then dynamically links executable files based on the current environment parameters. When all the dynamic linking work is complete, the dynamic linker transfers control to the entry address of the executable file and the program begins to execute.

Waste of space is one of the problems with static linking, and the other is that static linking can cause a lot of trouble for updating, deploying, and distributing applications. The easiest way to solve these two problems is to separate the program’s modules from each other into separate files, rather than statically linking them together. Simply put, you don’t link the object files that make up the program until the program is ready to run. That is, you defer linking until runtime, which is the basic idea of dynamic linking.

When program1.c is compiled as program1.o, the compiler does not yet know the address of the foobar() function, as explained in the static link. When the linker links program1.o to an executable, the linker must determine the nature of the foobar() function referenced in program1.o. If foobar() is a function defined in another static target module, the linker relocates the foobar() address reference in program1.o according to the rules of static linking. If foobar() is a function defined in a dynamically shared object, the linker marks the symbol’s reference as a dynamically linked symbol without relocating it, leaving the process for load time.

So how does the linker know if a reference to foobar() is a static symbol or a dynamic symbol? This is actually why we use lib.so. The complete symbol information is stored in lib.so (because dynamic linking is also required at runtime), and lib.so is one of the link input files. The linker knows when parsing the symbol that foobar() is a dynamic symbol defined in lib.so. This allows the linker to make a reference to foobar() special, making it a reference to a dynamic symbol.

After a program is divided into several modules, how to combine these modules to form a single program is a problem to be solved. The problem of how to combine modules can be attributed to the problem of how to communicate between modules. The most common communication between MODULES in STATIC language C/C++ is in two ways, one is function call between modules, the other is variable access between modules. Function access must know the address of the target function, and variable access must know the address of the target variable, so both methods can be reduced to one way, that is the symbol reference between modules. Modules rely on symbols to communicate. Similar to a jigsaw puzzle, the module that defines symbols has an extra area, and the module that references symbols just misses that area. The Linking process for this module is Linking.

Compiling each source code module independently and then “Linking” them as needed. The main content of the link is to deal with the mutual reference between each module, so that each module can be connected correctly. The link process mainly includes Address and Storge Allocation, Symbol Resolution and Relocation.

Generate executable files

Dyld link, the source code of each module is compiled by the compiler into Object File (.o or.obj) and the referenced dynamic libraries (.so,. Framework,.dylib) and static libraries (.a,.lib) are linked together to form the final executable File.

1.3 Static library, dynamic library and executable file

It’s not just executable files that are stored in executable format. Both dynamic and static link library files are stored in executable file format.

Static library

A static library can be thought of as an archive of object files (a collection of object files, that is, many object files compressed into a single file). When linking such a library into an application, the static linker collects the object files from the library and packages them into a separate binary along with the application’s object code. This means that the executable file size of your application will grow as the number of libraries increases. In addition, when the application is launched, the application code (including library code) is imported into the application’s address space once and for all.

The dynamic library

Dynamic libraries can be shared by multiple app processes, so there is only one copy in memory. If it was a static library, there would be multiple copies, since there would be one copy in each app’s Mach-O file. Compared to static libraries, using dynamic libraries can reduce the size of memory occupied by the app. The dynamic library cannot be run directly, but needs to be loaded into memory and executed through dyLD, the dynamic link loader of the system.

When dyLD is loaded, shared cache technology is enabled to optimize program startup. The shared cache is mapped into memory by DYLD when the process is started, and then when any Mach-O image is loaded, DyLD first checks whether the mach-O image is in the shared cache with the desired dynamic library, and if so, maps its memory address in shared memory directly to the process’s memory address space. In the case that the program depends on a lot of system dynamic libraries, this method can significantly improve the program startup performance.

The difference between DYLD 3 and DYLD 2 can be seen from the main method. In the old main method, the main App is initialized after the first step is completed, and then the shared cache is loaded. In DyLD 3, the order was changed: in DyLD 3, mapSharedCache was executed first (loading the shared cache) and then the main App was loaded.

This allows the execution to see the output

Clang main.o -o main Run the./main command to display starming rank 14Copy the code

The executable file Mach-o

An object file is an intermediate file that has been compiled but not linked. It is similar to the content and structure of an executable file, so it is generally stored in the same format as the executable file format.

The object file contains at least compiled machine instruction code, data, and some information needed for linking, such as symbol tables, debugging information, strings, etc. The general object file stores this information in segments with different attributes. Such as: code segment, data segment, read-only data segment, other segments, file header, segment table, relocation table, string table, symbol table, debugging table, etc.

ELF HeaderELF file header:

It contains basic attributes that describe the entire file, such as ELF file version, target machine model, program entry address, and so on

The ELF file header defines ELF magic number, file machine byte length, and data storage mode. Version. Running platform, ABI version, ELF relocation type, hardware platform, hardware platform version. Entry address. Program header entry and length, segment table position and length and number of segments

The ELF magic number

View the ELF file header – readelf-h

➜ ~ readelf -h test ELF Header: Magic: 7f 45 4C 46 02 01 01 00 00 00 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x400430 Start of program headers: 64 (bytes into file) Start of section headers: 6624 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 9 Size of section headers: 64 (bytes) Number of section headers: 31 Section header string table index: 28Copy the code

The first four bytes are the identifiers that must be the same for all ELF files. They are 0x7F, 0x45, 0x4C, and 0x46. The first byte corresponds to the DEL control character in the ASCII character, and the last three bytes correspond to the ASCII code of the three ELF letters. These four bytes are also known as ELF files’ magic numbers, and the first few bytes of almost all executable file formats are magic numbers. For example, the first two of the a.out format are 0x01 and 0x07. The first two bytes of a PE/COFF file are 0x4d, 0x5a, and ASCII character MZ. This magic number is used to confirm the type of the file. The operating system will verify that the magic number is correct when loading the executable file, and if it is incorrect, it will refuse to load it.

The next byte identifies the ELF file class, 0x01 for 32-bit and 0x02 for 64-bit; The sixth byte is the byte order, which specifies whether the ELF file is big-endian or small-endian. The seventh byte specifies the main version number of the ELF file, which is usually 1 since the ELF standard has not been updated since version 1.2. The last nine bytes are not defined in the ELF standard and are usually filled with 0. Some platforms use these nine bytes as an extension flag.

Section Header Table(section table) :

This table describes information about all segments contained in the ELF file, such as segment name, segment length, offset in the file, read and write permissions, and other attributes of segments.

Relocation table

When the linker processes the object file, it needs to relocate some parts of the object file, that is, references to absolute addresses in the code segment and data segment. This relocation information is recorded in the ELF file relocation table, which has a relocation table for each code and data segment that needs to be relocated.

String table

ELF files use many strings, such as segment names, variable names, etc. Because the length of a string is often variable, it is difficult to represent it with a fixed structure. A common practice is to store strings together in a table, and then reference strings using the offset of strings in the table.

The symbol table

The symbol table in an ELF file is usually a segment in the file. The segment name is. Symtab. The structure of the symbol table is very simple. It is an array of Elf32_Sym structures, and each Elf32_Sym structure corresponds to one symbol. The first element of the array, the element with subscript 0, is an invalid “undefined” symbol. Elf32_Sym structure is defined as follows:

typedef struct { Elf32_Word st_name; // the symbol name Elf32_Addr st_value; // the corresponding value of the symbol Elf32_Word st_size; // Symbol size unsigned char st_info; // Symbol type and binding information unsigned char st_other; Elf32_Half st_shndx; }Elf32_Sym;Copy the code

Debugging information

The object file may also contain debugging information. Almost all modern compilers support source-level debugging, such as setting breakpoints in functions, monitoring variable changes, and walking alone, as long as the compiler previews the relationship between source code and object code. For example, which line in the source code corresponds to the address in the object code, the types of functions and variables, the definition of the structure, and the string are saved in the object file. Some advanced compilers and debuggers even support viewing the contents of the STL container, meaning that the programmer can directly observe the values of the members in the STL container during debugging.

But it is worth mentioning that debugging information takes up a lot of space in the target file and executable file, often several times larger than the code and data itself, so when we develop the program and release it, we need to remove the debugging information that is not used for users, in order to save a lot of space. Under Linux, we can use the “strip” command to strip debugging information from ELF files

$strip foo
Copy the code

Executables contain programs that can be executed directly, and are represented by ELF executables, which generally do not have extensions.

COFF is a format specification first proposed and used by Unix System V Release 3. Later, Microsoft developed PE format standard based on COFF format and applied it to Windows NT System at that time. System V Release 4 introduced the ELF format on top of COFF, and currently popular Linux systems use ELF as the base executable file format. This is the main reason why PE and ELF are currently so similar, as they both come from the same executable file format, COFF.

Unix’s earliest executable file format, THE A. out format, was designed to be so simple that when the concept of shared libraries came along, the A. out format became inadequate.

So mach-O is the executable file format for OS X and iOS.

Mach-O is the abbreviation of Mach Object file format. It is a general name for the executable file types in different running periods of iOS systems. It is a file format for executables, object code, dynamic libraries, and kernel dumps.

There are three file types of Mach-o: Executable, Dylib, and Bundle

  • Executable

Executable is the binary master file for app, which can be found in the Products file in the Xcode project:

  • Dylib

Dylib is a dynamic library, dynamic library is divided into dynamic link library and dynamic load library.

Dynamically-linked libraries: When the executable is loaded, the dynamically-linked libraries are loaded into memory without being loaded. (Start as the program starts)

Dynamically loaded libraries: load them by code or command using dlopen, etc., when needed. (After the program starts)

  • Bundle

A Bundle is a special type of Dylib that you cannot link to. All you can do is load the Runtime through Dlopen at Runtime, which can be used for plug-ins on macOS.

  • Image and Framework

Image contains all three of these types;

The Framework can be thought of as a dynamic library.

The structure of the Mach – O

Mach-o is a binary byte stream grouped in blocks of data. Each Mach-O file contains a Mach-O header, followed by a series of load commands, followed by one or more segments, each containing 0 to 255 blocks.

The Header structure

Save some basic information about Mach-O, including running platform, file type, number of LoadCommands commands, total size of commands, DYLD Flags, etc.

Load Commands

Following the Header, these load instructions clearly tell the loader how to handle the binary data, some of which are handled by the kernel, others by the dynamic linker. This data is used to determine the memory distribution and related load commands when loading the Mach-O file, which guides the system kernel loader and dynamic linker. For example, the loading address of our main () function, the file path of the dyld required by the program, and the file path of the related dependent libraries.

Data

The specific data for each segment is stored here, including the specific code, data, and so on.

Ii. Dyld loading process

Certain operations of a program must be performed before main, and other operations must be performed after main. These are typically C++ global object constructors (before main) and destructors (after main).

2.1 Finder entry: _dyLD_start

Download the latest version of dyLD source code dyLD-750.6.

Add a load method to the ViewController file and a C++ method to the main.m file:

__attribute__ ((constructor) void kcFunc () {printf (" : % s \ n ", __func__); }Copy the code

Add a breakpoint to the load method. Run the program. The LLDB debugging instruction bt looks at the function call stack:

(LLDB) thread #1, queue = 'com.apple.main-thread', stop reason = breakPoint 1.1 * frame #0: 0x0000000100661D5C 002- Application load analysis' +[ViewController Load](self=0x00000001007252d0, _cmd=<no value available>) at ViewController.m:16 frame #1: 0x00000001984b53bc libobjc.A.dylib`load_images + 944 frame #2: 0x00000001006b221c dyld`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 464 frame #3: 0x00000001006c35e8 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 512 frame #4: 0x00000001006c1878 dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 184 frame #5: 0x00000001006c1940 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 92 frame #6: 0x00000001006b26d8 dyld`dyld::initializeMainExecutable() + 216 frame #7: 0x00000001006b7928 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 5216 frame #8: 0x00000001006b1208 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) + 396 frame #9: 0x00000001006b1038 dyld`_dyld_start + 56Copy the code

So far, we have found the program’s entry function: _dyLD_start. Then we can look in the source code for the _dyLD_START function to find out. Find this file according to different architecture logic processing, such as I386, X86_64, ARM64, ARM. The following uses X86_64 as an example to analyze dyLD.

A look at the source code is all assembly, do not understand how to do? We can use annotations to understand the flow. Dyldbootstrap ::start = dyLDbootstrap ::start = dyLDbootstrap ::start = dyLDbootstrap ::start = dyLDbootstrap ::start

Call is the instruction to call a function, (same as bl). This function is where our app starts

#if __x86_64__ && ! TARGET_OS_SIMULATOR ... __dyld_start: ... # call dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue) movl 8(%rbp),%esi # param2 = argc into %esi leaq 16(%rbp),%rdx # param3 = &argv[0] into %rdx leaq ___dso_handle(%rip),%rcx # param4 = dyldsMachHeader into %rcx leaq -8(%rbp),%r8 # param5 = &glue into %r8 call __ZN13dyldbootstrap5startEPKN5dyld311MachOLoadedEiPPKcS3_Pm movq -8(%rbp),%rdi cmpq $0,%rdi jne Lnew ...Copy the code

Dyldbootstrap: 2.2: start

Dyldbootstrap ::start refers to the start function in the scope of the dyldbootstrap namespace. Go to the source code, search for dyLDBootstrap, and find the start function.

uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
				const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{
...
	return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
Copy the code

Again using the LLDB debug stack result, dyLDbootstrap ::start is followed by the dyld::_main function.

2.3 dyld: : _main

_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue) { ... Omission of more than 600 lines of code sMainExecutable = instantiateFromLoadedImage (mainExecutableMH mainExecutableSlide, sExecPath); . { // find entry point for main executable result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN(); if ( result ! = 0 ) { // main executable uses LC_MAIN, we need to use helper in libdyld to call into main() if ( (gLibSystemHelpers ! = NULL) && (gLibSystemHelpers->version >= 9) ) *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit; else halt("libdyld.dylib support not present for LC_MAIN"); } else { // main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main() result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();  *startGlue = 0; }}... return result; }Copy the code

Or borrow LLDB debugging stack result, dyld: : go after is dyld: _main function: initializeMainExecutable function.

The // find entry point for main executable is implemented using the sMainExecutable function. We are looking for a sMainExecutable this function, do find a instantiateFromLoadedImage function is found.

2.4 instantiateFromLoadedImage

This function content is less, only jump a function, then we jump to instantiateMainExecutable function.

The main purpose of this function is known from the function comment: Before DYld gains control, the kernel will map to the executable, which is exactly where the executable map ImageLoader is created, returned to our main program sMainExecutable, and added to our image image.

// The kernel maps in main executable before dyld gets control.  We need to 
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	// try mach-o loader
	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
		addImage(image);
		return (ImageLoaderMachO*)image;
	}
	
	throw "main executable not a known format";
}
Copy the code

2.5 instantiateMainExecutable

InstantiateMainExecutable, instantiation of the main program is to use real sniffLoadCommands this function to do.

// determine if this mach-o file has classic or compressed LINKEDIT and number of // create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
	//dyld::log("ImageLoader=%ld, ImageLoaderMachO=%ld, ImageLoaderMachOClassic=%ld, ImageLoaderMachOCompressed=%ld\n",
	//	sizeof(ImageLoader), sizeof(ImageLoaderMachO), sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed));
	bool compressed;
	unsigned int segCount;
	unsigned int libCount;
	const linkedit_data_command* codeSigCmd;
	const encryption_info_command* encryptCmd;
	sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
	// instantiate concrete class based on content of load commands
	if ( compressed ) 
		return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
	else
#if SUPPORT_CLASSIC_MACHO
		return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
		throw "missing LC_DYLD_INFO load command";
#endif
}
Copy the code

2.6 sniffLoadCommands

Again the sniffLoadCommands function in the scope of ImageLoaderMachO. Let’s take a look:

// determine if this mach-o file has classic or compressed LINKEDIT and number of segments it has void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed, unsigned int* segCount, unsigned int* libCount, const LinkContext& context, const linkedit_data_command** codeSigCmd, const encryption_info_command** encryptCmd) { *compressed = false; *segCount = 0; *libCount = 0; *codeSigCmd = NULL; *encryptCmd = NULL; . switch (cmd->cmd) { case LC_DYLD_INFO: case LC_DYLD_INFO_ONLY: case LC_LOAD_DYLIB: case LC_LOAD_WEAK_DYLIB: case LC_REEXPORT_DYLIB: case LC_LOAD_UPWARD_DYLIB ... }Copy the code

This function is the basisLoad CommandsTo load the main program.Let’s explain a few parameters here:

  • compressed– > according toLC_DYLD_INFO_ONLYTo decide.
  • segCountThe maximum number of segment commands cannot exceed255A.
  • libCountDepending on the number of libraries,LC_LOAD_DYLIB (Foundation / UIKit ..), the maximum value cannot be exceeded4095A.
  • codeSigCmd, application signature.
  • encryptCmd, the application of encrypted information, (we commonly known as the application shell, we are not jailbreak environment heavy signature is the need to hit the shell application to debug).

After the above steps, the instantiation of the main program is complete.

2.7 return to dyld: : _main

Then we go back to the dyld::_main function to analyze the main program.

  1. Configuring environment Variables
configureProcessRestrictions(mainExecutableMH, envp);
Copy the code

On Linux systems, LD_LIBRARY_PATH is an environment variable consisting of several paths, each separated by a colon. LD_LIBRARY_PATH is null by default. LD_LIBRARY_PATH if we set LD_LIBRARY_PATH for a process, the dynamic linker first looks for the directory specified by LD_LIBRARY_PATH when it looks for shared libraries when the program starts. This environment variable makes it easy to test new shared libraries or use non-standard shared libraries. For example, if we want to use a modified libc.so.6, we can put the new version of the libc into our /home/user directory and specify LD_LIBRARY_PATH:

$ LD_LIBRARY_PATH=/home/user /bin/ls

  1. Shared cache
// load shared cache checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide); if ( gLinkContext.sharedRegionMode ! = ImageLoader::kDontUseSharedRegion ) { #if TARGET_OS_SIMULATOR if ( sSharedCacheOverrideDir) mapSharedCache(); #else mapSharedCache(); #endif}Copy the code

When dyLD is loaded, shared cache technology is enabled to optimize program startup. The shared cache is mapped into memory by DYLD when the process is started, and then when any Mach-O image is loaded, DyLD first checks whether the mach-O image is in the shared cache with the desired dynamic library, and if so, maps its memory address in shared memory directly to the process’s memory address space. MapSharedCache reference is as follows:

static void mapSharedCache() { dyld3::SharedCacheOptions opts; opts.cacheDirOverride = sSharedCacheOverrideDir; opts.forcePrivate = (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion); #if __x86_64__ && ! TARGET_IPHONE_SIMULATOR opts.useHaswell = sHaswell; #else opts.useHaswell = false; #endif opts.verbose = gLinkContext.verboseMapping; loadDyldCache(opts, &sSharedCacheLoadInfo); // update global state if ( sSharedCacheLoadInfo.loadAddress ! = nullptr ) { dyld::gProcessInfo->processDetachedFromSharedRegion = opts.forcePrivate; dyld::gProcessInfo->sharedCacheSlide = sSharedCacheLoadInfo.slide; dyld::gProcessInfo->sharedCacheBaseAddress = (unsigned long)sSharedCacheLoadInfo.loadAddress; sSharedCacheLoadInfo.loadAddress->getUUID(dyld::gProcessInfo->sharedCacheUUID); dyld3::kdebug_trace_dyld_image(DBG_DYLD_UUID_SHARED_CACHE_A, (const uuid_t *)&dyld::gProcessInfo->sharedCacheUUID[0], {0, 0}, {{0, 0}}, (const mach_header *) sSharedCacheLoadInfo. LoadAddress); }}Copy the code

Here the core method loadDyldCache is executed, which is briefly described:

bool loadDyldCache(const SharedCacheOptions& options, SharedCacheLoadInfo results) {/** ** omits **/ #if TARGET_IPHONE_SIMULATOR only supports mmap()ing cache privately into process return mapCachePrivate(options, results); #else if ( options.forcePrivate ) { // mmap cache into this process only return mapCachePrivate(options, results); } else { // fast path: when cache is already mapped into shared region if ( reuseExistingCache(options, results) ) return (results->errorMessage ! = nullptr); // slow path: this is first process to load cache return mapCacheSystemWide(options, results); } #endif }Copy the code

In loadDyldCache there are key judgments:

  • Whether or not to run in the emulator, the emulator has separate processing

  • If the cache is already mapped to the shared area, its address in the shared area is mapped to the address space of the process. Those interested can delve into UseExistingCache. If not, the cache is loaded and mapped.

  • Loaded cache in the/System/Library/Caches/com. Apple. Dyld under several groups, each CPU architecture on behalf of a group.

  1. The main program initializes
// Old way is to run initializers via a callback from crt1.o if (! gRunInitializersOldWay ) initializeMainExecutable(); #else // run all initializers initializeMainExecutable(); #endifCopy the code
  1. Join the Dynamic library

The load method is called not only by loadInsertedDylib, but also by methods such as Dlopen that load dynamic libraries at runtime.

// load any inserted libraries if ( sEnv.DYLD_INSERT_LIBRARIES ! = NULL ) { for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! = NULL; // loadInsertedDylib(*lib); }Copy the code
  1. linklinkThe main program
Link (sMainExecutable, senV. DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1); sMainExecutable->setNeverUnloadRecursive(); if ( sMainExecutable->forceFlat() ) { gLinkContext.bindFlat = true; gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding; }Copy the code
  1. linklinkThe dynamic library
// Link any inserted library // do this after linking main executable so that any dylibs pulled in by inserted // dylibs (e.g. libSystem) will not be in front of dylibs the program uses if ( sInsertedDylibCount > 0 ) { for(unsigned int i=0; i < sInsertedDylibCount; ++i) { ImageLoader* image = sAllImages[i+1]; link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1); image->setNeverUnloadRecursive(); } if ( gLinkContext.allowInterposing ) { // only INSERTED libraries can interpose // register interposing info after all  inserted libraries are bound so chaining works for(unsigned int i=0; i < sInsertedDylibCount; ++i) { ImageLoader* image = sAllImages[i+1]; image->registerInterposing(gLinkContext); }}}Copy the code
  1. Initialization program: initializeMainExecutable
    // Old way is to run initializers via a callback from crt1.o
    if ( ! gRunInitializersOldWay ) 
        initializeMainExecutable(); 
#else
    // run all initializers
    initializeMainExecutable(); 
#endif

    // notify any montoring proccesses that this process is about to enter main()
    notifyMonitoringDyldMain();
Copy the code

Dyld: : _main function

_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue) { if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) { launchTraceID = dyld3::kdebug_trace_dyld_duration_start(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, (uint64_t)mainExecutableMH, 0, 0); } //Check and see if there are any kernel flags dyld3::BootArgs::setFlags(hexToUInt64(_simple_getenv(apple, "dyld_flags"), nullptr)); / / / / configure the signature information, Grab the cdHash of the main executable from the environment uint8_t mainExecutableCDHashBuffer [20]. const uint8_t* mainExecutableCDHash = nullptr; if ( hexToBytes(_simple_getenv(apple, "executable_cdhash"), 40, mainExecutableCDHashBuffer) ) mainExecutableCDHash = mainExecutableCDHashBuffer; . #endif CRSetCrashLogMessage("dyld: launch started"); SetContext (mainExecutableMH, argc, argv, envp, apple); // Pickup the pointer to the exec path. sExecPath = _simple_getenv(apple, "executable_path"); . / / configure the environment variables configureProcessRestrictions (mainExecutableMH envp); . if ( sJustBuildClosure ) sClosureMode = ClosureMode::On; getHostInfo(mainExecutableMH, mainExecutableSlide); // load shared cache checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide); if ( gLinkContext.sharedRegionMode ! = ImageLoader::kDontUseSharedRegion ) { #if TARGET_OS_SIMULATOR if ( sSharedCacheOverrideDir) mapSharedCache(); #else mapSharedCache(); #endif}... // install gdb notifier stateToHandlers(dyld_image_state_dependents_mapped, sBatchHandlers)->push_back(notifyGDB); stateToHandlers(dyld_image_state_mapped, sSingleHandlers)->push_back(updateAllImages); . // load any inserted libraries if ( sEnv.DYLD_INSERT_LIBRARIES ! = NULL ) { for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! = NULL; // loadInsertedDylib(*lib); } // record count of inserted libraries so that a flat search will look at // inserted libraries, then main, then others. sInsertedDylibCount = sAllImages.size()-1; // link main executable gLinkContext.linkingMainExecutable = true; #if SUPPORT_ACCELERATE_TABLES if ( mainExcutableAlreadyRebased ) { // previous link() on main executable has already adjusted its internal pointers for ASLR // work around that by rebasing by inverse amount sMainExecutable->rebase(gLinkContext, -mainExecutableSlide); } #endif (sMainExecutable, senV. DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1); sMainExecutable->setNeverUnloadRecursive(); if ( sMainExecutable->forceFlat() ) { gLinkContext.bindFlat = true; gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding; // Link any inserted library // do this after linking main executable so that any dylibs pulled in by inserted // dylibs (e.g. libSystem) will not be in front of dylibs the program uses if ( sInsertedDylibCount > 0 ) { for(unsigned int i=0; i < sInsertedDylibCount; ++i) { ImageLoader* image = sAllImages[i+1]; link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1); image->setNeverUnloadRecursive(); } if ( gLinkContext.allowInterposing ) { // only INSERTED libraries can interpose // register interposing info after all  inserted libraries are bound so chaining works for(unsigned int i=0; i < sInsertedDylibCount; ++i) { ImageLoader* image = sAllImages[i+1]; image->registerInterposing(gLinkContext); } } } if ( gLinkContext.allowInterposing ) { // <rdar://problem/19315404> dyld should support interposition even without  DYLD_INSERT_LIBRARIES for (long i=sInsertedDylibCount+1; i < sAllImages.size(); ++i) { ImageLoader* image = sAllImages[i]; if ( image->inSharedCache() ) continue; image->registerInterposing(gLinkContext); }}... // Old way is to run initializers via a callback from crt1.o if (! gRunInitializersOldWay ) initializeMainExecutable(); #else // run all initializers initializeMainExecutable(); #endif // notify any montoring proccesses that this process is about to enter main() notifyMonitoringDyldMain(); if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) { dyld3::kdebug_trace_dyld_duration_end(launchTraceID, DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, 0, 0, 2); } ARIADNEDBG_CODE(220, 1); . return result; }Copy the code

The analysis is now based on the LLDB debug stack results, and the initialization method initializeMainExecutable function will be entered

2.8 initializeMainExecutable

Debug stack results based on LLDB. We know that the next thing we’re going to do is go into runInitializers, which is obvious from the name, and that’s what initialization is all about.

void initializeMainExecutable() { // record that we've reached this step gLinkContext.startedInitializingMainExecutable = true; // run initialzers for any inserted dylibs ImageLoader::InitializerTimingList initializerTimes[allImagesCount()]; initializerTimes[0].count = 0; const size_t rootCount = sImageRoots.size(); if ( rootCount > 1 ) { for(size_t i=1; i < rootCount; ++i) { sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); } } // run initializers for main executable and everything it brings up sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); // register cxa_atexit() handler to run static terminators in all loaded images when this process exits if ( gLibSystemHelpers ! = NULL ) (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL); // dump info if requested if ( sEnv.DYLD_PRINT_STATISTICS ) ImageLoader::printStatistics((unsigned int)allImagesCount(),  initializerTimes[0]); if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS ) ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]); }Copy the code

2.9 runInitializers

If the runInitializers can’t jump in, we can do a global search to find the definition:

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	uint64_t t1 = mach_absolute_time();
	mach_port_t thisThread = mach_thread_self();
	ImageLoader::UninitedUpwards up;
	up.count = 1;
	up.imagesAndPaths[0] = { this, this->getPath() };
	processInitializers(context, thisThread, timingInfo, up);
	context.notifyBatch(dyld_image_state_initialized, false);
	mach_port_deallocate(mach_task_self(), thisThread);
	uint64_t t2 = mach_absolute_time();
	fgTotalInitTime += (t2 - t1);
}
Copy the code

Debug stack results based on LLDB. So we know that we’re going to go into processInitializers

2.10 processInitializers

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
									 InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
	uint32_t maxImageCount = context.imageCount()+2;
	ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
	ImageLoader::UninitedUpwards& ups = upsBuffer[0];
	ups.count = 0;
	// Calling recursive init on all images in images list, building a new list of
	// uninitialized upward dependencies.
	for (uintptr_t i=0; i < images.count; ++i) {
		images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
	}
	// If any upward dependencies remain, init them.
	if ( ups.count > 0 )
		processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code

Debug stack results based on LLDB. We know we’re going to go into the recursiveInitialization function

2.11 recursiveInitialization

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) { recursive_lock lock_info(this_thread); recursiveSpinLock(lock_info); if ( fState < dyld_image_state_dependents_initialized-1 ) { uint8_t oldState = fState; // break cycles fState = dyld_image_state_dependents_initialized-1; try { // initialize lower level libraries first for(unsigned int i=0; i < libraryCount(); ++i) { ImageLoader* dependentImage = libImage(i); if ( dependentImage ! = NULL ) { // don't try to initialize stuff "above" me yet if ( libIsUpward(i) ) { uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) }; uninitUps.count++; } else if ( dependentImage->fDepth >= fDepth ) { dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps); } } } // record termination order if ( this->needsTermination() ) context.terminationRecorder(this); // let objc know we are about to initialize this image uint64_t t1 = mach_absolute_time(); fState = dyld_image_state_dependents_initialized; oldState = fState; context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo); // initialize this image bool hasInitializers = this->doInitialization(context); // let anyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; context.notifySingle(dyld_image_state_initialized, this, NULL); if ( hasInitializers ) { uint64_t t2 = mach_absolute_time(); timingInfo.addTime(this->getShortName(), t2-t1); } } catch (const char* msg) { // this image is not initialized fState = oldState; recursiveSpinUnLock(); throw; } } recursiveSpinUnLock(); }Copy the code

Debug stack results based on LLDB. We know that we’re going to go to the notifySingle function

2.12 notifySingle

static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo) { //dyld::log("notifySingle(state=%d, image=%s)\n", state, image->getPath()); std::vector<dyld_image_state_change_handler>* handlers = stateToHandlers(state, sSingleHandlers); if ( handlers ! = NULL ) { dyld_image_info info; info.imageLoadAddress = image->machHeader(); info.imageFilePath = image->getRealPath(); info.imageFileModDate = image->lastModified(); for (std::vector<dyld_image_state_change_handler>::iterator it = handlers->begin(); it ! = handlers->end(); ++it) { const char* result = (*it)(state, 1, &info); if ( (result ! = NULL) && (state == dyld_image_state_mapped) ) { //fprintf(stderr, " image rejected by handler=%p\n", *it); // make copy of thrown string so that later catch clauses can free it const char* str = strdup(result); throw str; } } } if ( state == dyld_image_state_mapped ) { // <rdar://problem/7008875> Save load addr + UUID for images from outside the shared cache if ( ! image->inSharedCache() ) { dyld_uuid_info info; if ( image->getUUID(info.imageUUID) ) { info.imageLoadAddress = image->machHeader(); addNonSharedCacheImageUUID(info); } } } if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit ! = NULL) && image->notifyObjC() ) { uint64_t t0 = mach_absolute_time(); dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); uint64_t t1 = mach_absolute_time(); uint64_t t2 = mach_absolute_time(); uint64_t timeInObjC = t1-t0; uint64_t emptyTime = (t2-t1)*100; if ( (timeInObjC > emptyTime) && (timingInfo ! = NULL) ) { timingInfo->addTime(image->getShortName(), timeInObjC); } } // mach message csdlc about dynamically unloaded images if ( image->addFuncNotified() && (state == dyld_image_state_terminated) ) { notifyKernel(*image, false); const struct mach_header* loadAddress[] = { image->machHeader() }; const char* loadPath[] = { image->getPath() }; notifyMonitoringDyld(true, 1, loadAddress, loadPath); }}Copy the code

Now that the LLDB debug stack is gone, let’s look at sNotifyObjCInit and find it in the registerObjCNotifiers function

2.13 registerObjCNotifiers

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;
	sNotifyObjCUnmapped = unmapped;

	// call 'mapped' function with all images mapped so far
	try {
		notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true);
	}
	catch (const char* msg) {
		// ignore request to abort during registration
	}

	// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem)
	for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(); it != sAllImages.end(); it++) {
		ImageLoader* image = *it;
		if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) {
			dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
			(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
		}
	}
}
Copy the code

It’s just an init method, and I can’t follow it, so what do I do? The registerObjCNotifiers function is found in the _dyLD_OBJC_notify_register function in the source code.

2.14 _dyld_objc_notify_register

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code

The _DYLD_OBJC_notify_register function is called when the _objC_init function is called

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    runtime_init();
    exception_init();
    cache_init();
    _imp_implementationWithBlock_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}
Copy the code

Coming here, we see that _dyLD_OBJC_Notify_register is called, passing three parameters

  • map_images : dyldimageThis function is triggered when loaded into memory.
  • load_images : dyldInitialize theimageTriggers the method. Well known to usloadMethod is also called here.)
  • unmap_image : dyldimageThis function is triggered when it is removed.

That is, _objc_init registers and stores the addresses of map_images, load_images, and unmap_image functions.

We still haven’t found the closed loop of the stack. Go back to the line in 2.11 recursiveInitialization and check out the doInitialization method

// initialize this image
bool hasInitializers = this->doInitialization(context);
Copy the code

2.15 doInitialization

The only thing that’s missing from the LLDB debug stack is load images, so obviously we need to go into the doImageInit function and see what’s going on.

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
	CRSetCrashLogMessage2(this->getPath());

	// mach-o has -init and static initializers
	doImageInit(context);
	doModInitFunctions(context);
	
	CRSetCrashLogMessage2(NULL);
	
	return (fHasDashInit || fHasInitializers);
}
Copy the code

2.16 doImageInit

A brief analysis of this function is to shift the pointer to the function method and finally find the implementation of its initialization.

LibSystem initializer must run first //

libSystem initializer must run first. Go to doInitialization -> doModInitFunctions -> libSystemInitialized

To initialize Libsystem, libdispatch_init will be called. To initialize libdispatch init, _os_object_init will be called

After the runtime is initialized, some notifications are registered in _objc_init, which takes over some work from dyld, including initializing the class structure of the corresponding dependency library, calling all the load methods in the dependency library, etc.

In _objc_init, map_images, load_images, and unmap_image are registered and stored. This is where the load images call is made, and the LLDB call stack is closed.

  • map_images:dyldimageThis function is triggered when (image file) is loaded into memory
  • load imagesdyldInitialize theimageTriggers the function. When the image file is mapped, it is executedload_imagesTo deal with thedyldOf the mirror that has been mapped+loadMethods.
  • unmap_image:dyldimageThis function is triggered when it is removed

The Runtime takes over and calls map_images for parsing and processing, then calls call_load_methods in load_images to iterate over all the loaded classes. Call the +load method of a class and the +load method of its metaclass in succession.

The libSystem initializer must run first. The system library is also loaded as an image by dyLD, so load images can also form a closed loop.

void ImageLoaderMachO::doImageInit(const LinkContext& context) { if ( fHasDashInit ) { const uint32_t cmd_count = ((macho_header*)fMachOData)->ncmds; const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)]; const struct load_command* cmd = cmds; for (uint32_t i = 0; i < cmd_count; ++i) { switch (cmd->cmd) { case LC_ROUTINES_COMMAND: Initializer func = (Initializer)(((struct macho_routines_command*)cmd)->init_address + fSlide); #if __has_feature(ptrauth_calls) func = (Initializer)__builtin_ptrauth_sign_unauthenticated((void*)func, ptrauth_key_asia, 0); #endif // <rdar://problem/8543820&9228031> verify initializers are in image if ( ! this->containsAddress(stripPointer((void*)func)) ) { dyld::throwf("initializer function %p not in mapped image for %s\n", func, this->getPath()); } if ( ! dyld::gProcessInfo->libSystemInitialized ) { // <rdar://problem/17973316> libSystem initializer must run first dyld::throwf("-init function in image (%s) that does not link with libSystem.dylib\n", this->getPath()); } if ( context.verboseInit ) dyld::log("dyld: calling -init function %p in %s\n", func, this->getPath()); { dyld3::ScopedTimer(DBG_DYLD_TIMING_STATIC_INITIALIZER, (uint64_t)fMachOData, (uint64_t)func, 0); func(context.argc, context.argv, context.envp, context.apple, &context.programVars); } break; } cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize); }}}Copy the code

conclusion

  • This paper is based on theAPPAt startupStart functionIntroduced into the rightdyldThe exploration of
  • fromdyldIn the assembly_dyld_startfinddyldbootstrap::start
  • dyldbootstrap::startCall theRebaseDyld functionMach-OThe inside of theThe DATA segmentPointer is readdressed and bound, and initializedmachSystem call. The followingStart functionYou do a lot of initialization
  • The following fromdyldbootstrap::startCall thedyld::_main
  • indyld::_main,
    1. Environment Variable Configuration
    2. Shared cache
    3. Initialization of the main program
    4. Join the Dynamic library
    5. The link of the main program
    6. Dynamic link library
    7. InitializeMainExexutable ()
  • Loading an image file starts by instantiating oneImageLoaderThrough theImageLoaderLoad the dynamic library, record and insert the number of libraries, first try the reference library after the master library, and finally other libraries, the completion of the loading is the link library
  • The link library is linked to the main binary file first, followed by the introduction of the dynamic library, the link is recursive operation, through the recursive realization of the library surface reference library connection operation, after the completion of the link cycle insert the initial mirror set
  • Then we initialize it by getting the real address of the image file. During initialization, through analysis of various calls, we find it inObjc: _objc_init functionThe specific initialization operation of.
  • At last,NotifyMonitoringDyldMain functionNotifies all listeners that the process is about to enterThe main () function