App startup – DyLD details

Simple summary

  • System reads firstAppExecutable file (Mach-OFile), obtained from insidedyldAnd then loaddyld.dyldTo initialize the runtime environment.
  • Turn on the caching strategy, load the associated dependency libraries (which also include our executable), link them, and finally call the initialization method for each dependency library. In this step,runtimeIs initialized.
  • When all dependent libraries have been initialized, the last bit (the program executable) is initialized, at this pointruntimeThe class structure initializes all classes in the project, and then calls allloadMethods. The lastdyldreturnmainFunction address,mainThe main function is called, and we come to the program entrance.

1. Find the DYLD dynamic linker and load it

On Mac and iOS, executable files are usually formatted in Mach Object format.

  • The Header holds some basic information, including whether the file is 32-bit or 64-bit, the processor architecture on which the file is running, the file type,LoadCommandsAnd so on.
  • Load Command is used to tell the kernel and dyLD how to Load the resources needed to run the APP into memory. Such asmainFunction load address, dynamic linkerdyld, and the file paths of related dependent libraries, etc.
  • Data contains specific code, Data, and so on.

Let’s look at the details of a Mach-O file with MachOView:

You can see:

  • dyldThe dynamic linker path is inLC_LOAD_DYLINKERIn the order, usually in the/usr/lib/dyldUnder the path.
  • LC_MAINRefers to the loading address of the main function
  • LC_LOAD_DYLIBAll points to the dependencies that the program allows to be loaded
  • The libraries we imported through CocoaPods will also be included in theLC_LOAD_DYLIBIn the command

After loading the executable file of App, the system analyzes the file to obtain the path of dynamic linker DYLD to load DYLD, and then hands over the following things to DYLD.

2. Boot the DYLD and initialize it

Dyld (The Dynamic Link Editor) is apple’s dynamic linker, which is an important part of Apple’s operating system. After the program preparation of the system kernel, DyLD is responsible for the remaining work. Dyld is open source, and anyone can download the source code on Apple’s website to read how it works and learn the details of how the system loads the dynamic library.

Dyld download address, the author downloaded version 655.1, from the 519 version, the introduction of dyLD3.

Now that we have the path to the DYLD, how does the DYLD load? How is the next set of loading procedures implemented?

The main() function is a familiar program entry. Let’s start with the main() function and explore the process of dyLD loading before the main() function is executed. At the interrupt point of the main() function, then run, the call stack looks like this:

As you can see, there are only main() and start functions in the call stack. Is there any way to see the call stack in more detail? There is one method that goes before the main() call, and that is the +load() function. In this case, write a +load() function in the controller, press the breakpoint and run it, and the call stack looks like this:

The __dyLD_START function is called from dyLD source code dyLDstartup. s. This function is implemented by assembly and compatible with all platforms. In arm64 assembler code, you can see a bl command jump to the dyldbootstrap::start() function, which corresponds to the above call stack:

// call dyldbootstrap::start(app_mh, argc, argv, slide, dyld_mh, &startGlue)
bl	__ZN13dyldbootstrap5startEPK12macho_headeriPPKclS2_Pm
Copy the code

An implementation of the dyldBootstrap ::start() function can be found in the dyldinitialization.cpp file:

//
// This is code to bootstrap dyld. This work in normally done for a program by dyld and crt.
// In dyld we have to do this manually.
//
uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[], 
				intptr_t slide, const struct macho_header* dyldsMachHeader,
				uintptr_t* startGlue)
{
	// if kernel had to slide dyld, we need to fix up load sensitive locations
	// we have to do this before using any global variables
	// Calculate the virtual address offset based on the executable Header of dyld
    slide = slideOfMainExecutable(dyldsMachHeader);
    boolshouldRebase = slide ! =0;
#if __has_feature(ptrauth_calls)
    shouldRebase = true;
#endif
    if ( shouldRebase ) {
    		If the kernel does not load dyLD at its preferred address, we need to relocate __DATA segment in dyLD
        rebaseDyld(dyldsMachHeader, slide);
    }

	// allow dyld to use mach messaging
	// Mach message initialization
	mach_init(a);// kernel sets up env pointer to be just past end of agv array
	const char** envp = &argv[argc+1];
	
	// kernel sets up apple pointer to be just past end of envp array
	const char** apple = envp;
	while(*apple ! =NULL) { ++apple; }
	++apple;

	// set up random value for stack canary
	// Stack overflow protection
	__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
	// run all C++ initializers inside dyld
	// if DYLD_INITIALIZER is supported, run all the C++ initializers inside dyld
	runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple);
#endif

	// now that we are done bootstrapping dyld, call dyld's main
	// Now that we have finished booting dyld, let's execute the main function of dyld
	// Calculate the virtual address offset according to the executable file Header of App
	uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
	// Enter the dyld::_main() function
	return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
Copy the code

The dyLDbootstrap ::start() function is used to bootstrap the DYLD and initialize the runtime environment. When finished, this function calls dyld::_main() and passes the return value to __dyLD_START to call the real main() function.

Before continuing to explore the details of the dyld::_main() function, a few concepts are introduced:

Virtual memory

  • All the memory we developers are exposed to in the development process is virtual memory. Virtual memory makes App think that it has continuous available memory (a continuous and complete logical address space), which is a gift from the system to us, while in fact, it is usually distributed in multiple physical memory fragments. Virtual memory space mapping vm_map maps virtual memory and physical memory.

  • Virtual memory is a logical address space built on top of physical memory that provides a contiguous logical address space up (application) and hides the details of physical memory down.

  • Virtual memory allows a logical address to have no actual physical address or multiple logical addresses to correspond to one physical address.

  • Virtual memory is divided into pages of equal size (16KB on 64-bit systems) to improve management and read and write efficiency. Page is divided into read-only and read-write pages.

  • Virtual memory is an intermediate layer between physical memory and processes. On iOS, when memory is low, an attempt is made to free read-only pages that can be read from disk the next time they are accessed. If there is no memory available, the App in the background is notified (that is, the memory warning is received), and if there is still no memory available after that, the App in the background is killed.

  • In the 64-bit architecture of ARM processor, that is 0x000000000-0xFFFFFFF, each hexadecimal number is 4 bits, that is, the 36th power of 2, which is 64GB, that is, the maximum virtual memory space of App is 64GB.

Page Fault

A Page fault occurs when an application accesses a logical Page that does not exist in the corresponding physical memory. When a Page fault occurs, the current program is interrupted, an available Page is searched in physical memory, data is read from disk into physical memory, and execution of the current program continues.

Dirty Page & Clean Page
  • If a Page can be regenerated from disk, it is called a Clean Page, or read-only Page.

  • When a Page contains process-related information, it is called a Dirty Page.

A read-only Page like a code snippet is a Clean Page. For __DATA, when data is written, COW(Copy on write) is triggered. The Page is marked as Dirty and copied at the same time.

Rebase & Binding

ASLR & Code Sign

ASLR (Address Space Layout randomization) Randomization of the Address space layout. When App is started, the program will be mapped into virtual memory and allocated a logical address space. This logical address space has a starting address. Before iOS4.3, this address is basically fixed, which means that the initial virtual memory of a given program on a given architecture is basically the same. And in the normal life cycle of the process, the address distribution in memory has a very strong predictability, which gives hackers a lot of space to play (code injection, rewrite memory), hackers can easily find the address of the function from the start address + offset. ASLR technology makes the starting address of the program randomly change after each startup, so that the loading address of all the codes, files and dynamic libraries in the virtual memory of the program is not fixed every startup, which can prevent hackers from guessing the address. The purpose of Code Sign is to ensure that Code has not been tampered with. However, it should be noted that Xcode does not encrypt the entire Mach-O file and use it as a digital signature. Instead, it generates a separate encrypted hash value for each page. And stored in __LINKEDIT, which allows the contents of each page of the file to be verified and protected against tampering.

Fix-up

All the executables, all the dynamic libraries, after they’re loaded into virtual memory, they’re just independent of each other, and we need to wire them up. Before dynamic libraries like memory are loaded, they all have a preferred_address, but because of ASLR, these addresses are wrong, so the dynamic libraries can’t be called correctly. Meanwhile, due to the existence of Code Sign, we cannot modify these wrong call instructions. However, since modern code-Gen uses PIC (Position Independ Code) position-independent code technology, it means that code can be loaded to indirect addresses. When the call occurs, code-gen actually creates a pointer to the caller in the __DATA segment, then loads the pointer and jumps over.

This process of fixing an incorrect call and creating a pointer to the correct call is called fix-up. There are two types of fix-ups, rebasing and binding.

Rebasing refers to Rebasing the pointer to the interior of the image and Binding refers to basing the pointer to the exterior of the image.

We can use dyldinfo to see what an executable needs to Fix up:

➜ xcrun dyldinfo-rebase-bind Mach-o-analysis rebase information (from compressed dyldinfo): segment section address type value __DATA_CONST __cfstring 0x100008018 pointer 0x100006700 __DATA_CONST __cfstring 0x100008038 pointer 0x100006705 __DATA_CONST __objc_classlist 0x100008048 pointer 0x10000D388 __DATA_CONST __objc_classlist 0x100008050 pointer 0x10000D400 __DATA __la_symbol_ptr 0x10000C000 pointer 0x100006664 __DATA __la_symbol_ptr 0x10000C008 pointer 0x100006670 __DATA __objc_const 0x10000C078 pointer 0x1000074D1 __DATA __objc_const 0x10000C080 pointer 0x100006034 __DATA __objc_const 0x10000D348 pointer 0x10000D2B0 __DATA __objc_selrefs 0x10000D350 pointer 0x100006795 __DATA __objc_selrefs 0x10000D358 pointer 0x1000067A6 __DATA __objc_selrefs 0x10000D360 pointer 0x1000067AB __DATA __objc_classrefs 0x10000D370 pointer 0x10000D400 __DATA __objc_superrefs 0x10000D378 pointer 0x10000D388 __DATA __objc_data 0x10000D388 pointer 0x10000D3B0 __DATA __objc_data 0x10000D3A8 pointer 0x10000C0F0 __DATA  __objc_data 0x10000D3D0 pointer 0x10000C088 __DATA __data 0x10000D488 pointer 0x100007492 __DATA __data 0x10000D498 pointer 0x10000C138 __DATA __data 0x10000D4A8 pointer 0x10000C308 __DATA __data 0x10000D4B8 pointer 0x10000C328 ... bind information: segment section address type addend dylib symbol __DATA __objc_data 0x10000D408 pointer 0 UIKit _OBJC_CLASS_$_UIResponder __DATA __objc_data 0x10000D458 pointer 0 UIKit _OBJC_CLASS_$_UIResponder __DATA __objc_classrefs 0x10000D368 pointer 0 UIKit _OBJC_CLASS_$_UISceneConfiguration __DATA __objc_data 0x10000D390 pointer 0  UIKit _OBJC_CLASS_$_UIViewController __DATA __objc_data 0x10000D3E0 pointer 0 UIKit _OBJC_METACLASS_$_UIResponder __DATA __objc_data 0x10000D430 pointer 0 UIKit _OBJC_METACLASS_$_UIResponder __DATA __objc_data 0x10000D3B8 pointer 0 UIKit _OBJC_METACLASS_$_UIViewController __DATA __objc_data 0x10000D3B0 pointer 0 libobjc _OBJC_METACLASS_$_NSObject __DATA __objc_data 0x10000D3D8 pointer 0 libobjc _OBJC_METACLASS_$_NSObject __DATA __objc_data 0x10000D428 pointer 0 libobjc _OBJC_METACLASS_$_NSObject __DATA __objc_data 0x10000D398 pointer 0 libobjc __objc_empty_cache __DATA __objc_data 0x10000D3C0 pointer 0 libobjc __objc_empty_cache __DATA_CONST __got 0x100008000 pointer 0 libSystem dyld_stub_binder __DATA_CONST __cfstring 0x100008008 pointer 0 CoreFoundation ___CFConstantStringClassReference __DATA_CONST __cfstring 0x100008028 pointer 0 CoreFoundation ___CFConstantStringClassReferenceCopy the code
Rebase

Rebasing: adjust the pointer pointing in the mirror to correct the data of Mach o loading into memory with not fixed head address (ASLR).

In the past, DyLD would load dylib to the specified address, all Pointers and data would be correct to the code, dyLD didn’t need to do any fix-ups. Now with ASLR, dylib will be loaded to a new random address, the random address will be different from the old address pointed by the code and data, dyLD needs to correct the slide, reps is to add the offset to the pointer address of dylib, the calculation method is as follows:

slide = actual_address - preferred_address
Copy the code

The offset is then repeatedly added to any pointer to the __DATA segment that needs rebase. This comes back to page fault and COW (copy-on-write). This can create AN I/O bottleneck, but because rebase is ordered by address, it is an ordered task from the kernel’s point of view, which prereads data and reduces I/O consumption.

The rebase step is performed first. The image needs to be read into memory and encrypted by page to ensure that it will not be tampered with. Therefore, the bottleneck of this step is IO. Bind takes place later, where the bottleneck is CPU computing because the symbol table is queried to point to resources across mirrors, and the mirrors are already read in and encrypted for validation during the Rebase phase.

Binding

Binding: This is the process of Binding Pointers to external symbols of dylib calls. For example, we need to use NSObject in objC code, the symbol _OBJC_CLASS_$_NSObject, but it’s not in our binary, it’s in the system library foundation.framework, So the binding operation is needed to bind the correspondence together.

LazyBinding means that the dynamic library is not immediately bound when it is loaded, and the binding is applied when the method is first called. The way to do this is simple: use the symbol dyLD_stub_binder. The lazyBinding method is called to dyLD_STUB_binder the first time, and then dyLD_STUB_Binder is responsible for finding the real method and binding the address to the pile, so that you don’t have to bind it again the next time.

Binding handles Pointers to the outside of dylib that are actually bound by the symbol name, which is a string. The __LINKEDIT section also stores the pointer to bind and the symbol to which the pointer points. Dyld needs to find the corresponding implementation of symbol, which requires a lot of calculation, look up in the symbol table. When found, the contents are stored in that pointer in the __DATA section. Binding seems to require more calculation than Reps, but actually requires less I/O operations, the Binding time is mainly spent on calculation, because reps has done Binding for THE IO operation before, so the two steps are mixed together.

rebaseDyld

The slideOfMainExecutable() function is used to calculate the offset in the dyLDBootstrap ::start() function.

static uintptr_t slideOfMainExecutable(const struct macho_header* mh)
{
	const uint32_t cmd_count = mh->ncmds;
	const struct load_command* const cmds = (struct load_command*)(((char*)mh)+sizeof(macho_header));
	const struct load_command* cmd = cmds;
	for (uint32_t i = 0; i < cmd_count; ++i) {
		if ( cmd->cmd == LC_SEGMENT_COMMAND ) {
			const struct macho_segment_command* segCmd = (struct macho_segment_command*)cmd;
			if ( (segCmd->fileoff == 0) && (segCmd->filesize ! =0)) {
				return (uintptr_t)mh - segCmd->vmaddr;
			}
		}
		cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
	}
	return 0;
}
Copy the code

As can be seen, this function obtains all load_commands of the executable File, then iterates to find all LC_SEGMENT_COMMAND commands, and then finds segCmd whose File Offset is 0 and File Size is not 0. Subtracting the segCmd vmADDR from the (Uintptr_t) MH gives the offset slide. According to the calculation formula of offset, (Uintptr_t) MH is the actual loading address of the file, and segCmd vmaddr is the preferred loading address of the file.

Dyld is also a Mach-O file, so after loading dyLD, it also needs to be fixed up to work properly. But as we can see in the following code, there is only rebaseDyld(dyldsMachHeader, slide); You don’t see the binding method. Why? To take a bold guess, there may be no calls to external symbols inside dyLD, so there is no need to bind. Let’s open the terminal and type the following command:

➜ ~ CD /usr/lib ➜ ~ xcrun dyldinfo - rebase-bind dyld for Arch x86_64: rebase information (from local relocation records and indirect symbol table): segment section address type __DATA_CONST __got 0x0008F000 pointer __DATA_CONST __got 0x0008F008 pointer __DATA_CONST __got 0x0008F010 pointer __DATA_CONST __got 0x0008F018 pointer __DATA_CONST __got 0x0008F020 pointer binding information  (from external relocations and indirect symbol table): segment section address type weak addend dylib symbol for arch i386: rebase information (from local relocation records and indirect symbol table): segment section address type __DATA_CONST __got 0x00072000 pointer __DATA __nl_symbol_ptr 0x000750B8 pointer __DATA __nl_symbol_ptr 0x000750BC pointer __DATA __nl_symbol_ptr 0x000750C0 pointer __DATA __nl_symbol_ptr 0x000750C4 pointer __DATA __nl_symbol_ptr 0x000750C8 pointer ... // Omit binding information (from external relocations and indirect symbol table): segment section address type weak addend dylib symbolCopy the code

As you can see, there really is no pointer to bind inside the dyLD executable.