In the previous chapter we went through the compilation journey. Our App has been successfully compiled and generated the corresponding Mach-O executable file. Now we need to start the related operations

Thread thread thread thread thread thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread

At runtime, we usually start with the main function as a starting point for coding, but before we discovered the main function, we also performed a number of operations, such as a series of operations on dyld, which will be explored in detail in this chapter

Loading and dynamic linking

First of all, AMway has a book called “Programmer self-cultivation: Linking, loading and Library”, which is refreshing after reading.

An App basically needs to go through two steps of loading and dynamic library link from executable file to real running code

loading

An executable (program) is a static concept, just a file on hard disk until it is run; Process is a dynamic concept, which is a process during the running of the program. We know that after each program is run, it will have its own independent virtual address space, and the upper limit of the address space size is determined by the hardware of the computer (CPU bits).

The virtual space of a process is controlled by the OPERATING system. Multiple processes run in the operating system at the same time. The virtual address space between these processes is isolated.

Load is the hard disk in the executable file is mapped to the virtual memory process, but is expensive and rare memory, so the program instructions and data necessary for the execution of the full loaded into memory obviously doesn’t work, so people study found that the program is running is a locality principle, can only stay in memory of the most commonly used part, Less frequently used data is stored on disk, which is the basic principle of dynamic loading

The process of loading can also be understood as the process of creating a process. The operating system only needs to do three things:

  • Create a separate virtual address space
  • Read the executable header, and establish a virtual space and executable mapping relationship
  • Set the CPU instruction register to the executable file entry address, start running

Dynamic library linking

concept

The linked common libraries are divided into static libraries and dynamic libraries. Static libraries are compile-time linked libraries that need to be linked into your Mach-O file. If you need to update them, you have to compile them again. Dynamic libraries are run-time linked libraries that can be dynamically loaded using DYLD.

In real iOS development, you will find that many features are already available, not only for you, but also for other apps, such as GUI framework, I/O, networking, etc. Linking these shared libraries to your Mach-O files is also done through the linker.

All system frameworks (UIKit,Foundation, etc.) used in iOS are dynamically linked. Analogous to plugs and plugins, statically linked code inserts plugs and plugins one by one in the process of static linking after compilation, and directly executes binary files at runtime. Dynamic linker requires the process of “plugging” at program startup, so the dynamic linker needs to be ready before the code we write executes.

Shared cache

To save space, Apple keeps these system libraries in one place: the Dyld Shared cache.

The Mach-O file is the result of compilation, whereas the dynamic library is linked at run time and does not participate in compilation and linking of the Mach-O file, so the Mach-O file does not contain the symbol definition of the dynamic library.

That is, the symbols are shown as undefined, but their names and corresponding library paths are recorded. When the runtime imports dynamic libraries through Dlopen and DLSYM, it first finds the corresponding library path according to the record, and then finds the binding address through the record name symbol.

Dlopen will load the shared library into the address space of the running process. The loaded shared library will also have undefined symbols, which will trigger more shared libraries to be loaded. Dlopen also has the option of parsing all references at once or doing so later. Dlopen opens the dynamic library and returns the reference pointer. Dlsym uses the dynamic library pointer and function symbol returned by Dlopen to get the address of the function and then use it.

advantages

The benefits of using dynamic library links are as follows:

  • Code sharing: Many programs dynamically link these LiBs, but there is only one copy of them in memory and on disk
  • Easy to maintain: Libsystem.dylib is a stand-in for libsystem.b.dylib, for example. When you want to upgrade, just switch to libsystem.c.dylib and replace the stand-in
  • Reduce the size of the executable file: Dynamic links do not need to be typed at compile time compared to static links, so the size of the executable file is much smaller

Watch program start from DYLD

Introduction to the

Dyld (The Dynamic Link Editor) is apple’s dynamic linker, which is an important part of Apple’s operating system. After an application is compiled and packaged into a Mach-O file in executable format, DYLD is responsible for linking and loading programs.

The related code for dyLD is the open source ☞ source address

Start the process

Create an empty project. We know that load is better than main, so put the breakpoint in load and look at the call stack.

_dyld_start

dyldbootstrap::start

Dyldbootstrap ::start refers to the start function in the scope of the dyldbootstrap namespace. Go to the source code, search for dyLDBootstrap, and find the start function.

//
//  This is code to bootstrap dyld.  This work in normally done for a program by dyld and crt.
//  In dyld we have to do this manually.
//
uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[], 
				intptr_t slide, const struct macho_header* dyldsMachHeader,
				uintptr_t* startGlue)
{
	// if kernel had to slide dyld, we need to fix up load sensitive locations
	// we have to dothis before using any global variables slide = slideOfMainExecutable(dyldsMachHeader); bool shouldRebase = slide ! = 0;#if __has_feature(ptrauth_calls)
    shouldRebase = true;
#endif
    if ( shouldRebase ) {
        rebaseDyld(dyldsMachHeader, slide);
    }

	// allow dyld to use mach messaging
	mach_init();

	// kernel sets up env pointer to be just past end of agv array
	const char** envp = &argv[argc+1];
	
	// kernel sets up apple pointer to be just past end of envp array
	const char** apple = envp;
	while(*apple ! = NULL) { ++apple; } ++apple; //set up random value for stack canary
	__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
	// run all C++ initializers inside dyld
	runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple);
#endif

	// now that we are done bootstrapping dyld, call dylds main
	uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
	return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}

Copy the code

The main call flow of start function is as follows:

1. Perform bootstrap lifting first, because DYLD itself is also a dynamic library, but it needs to link to other dynamic libraries, so it does not depend on other libraries. In addition, the relocation of global and static variables required by dyLD itself is completed by dyLD itself

  • const struct macho_headerThis refers toMach-OIn the fileheader
  • intptr_t slideThis is essentiallyALSRIn other words, the address space configuration is loaded randomly with a random value (i.e. Slide) to prevent attacks
  • rebaseDyldIt’s a dyLD redirect

2. Open function messages using: mach_init()

3. Set stack protection :__guard_setup

4. Start linking the shared object: dyld::_main

dyld::_main

uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue) { ... This is the main function of dyLD link, the code is too long, step by step analysis... }Copy the code

1. Configure environment variables

1.1 Main executable file from cdHash environment variables. Environment variables are defined by the system and can be configured in Xcode

1.2
setContext

1.3
configureProcessRestrictions

1.4
checkEnvironmentVariables

1.5
getHostInfo

2. Load the shared cache

2.1 Verifying the shared cache path: checkSharedRegionDisable

2.2
mapSharedCache

3. Add dyLD to the UUID list

Add dyld itself to the UUID list addDyldImageToUUIDList

4.reloadAllImages

4.1 instantiateFromLoadedImage instantiate the main programs

sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
Copy the code

The kernel is mapped to the main executable. We need the files already mapped to the main executable to create an ImageLoader

// The kernel maps in main executable before dyld gets control.  We need to 
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	// try mach-o loader
	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
		addImage(image);
		return (ImageLoaderMachO*)image;
	}
	
	throw "main executable not a known format";
}
Copy the code

Through the sniffLoadCommands instantiateMainExecutable loading is really the main program for MachO file LoadCommons period of some of the column load

  • The maximum number of segments is 256!
  • The maximum number of dynamic libraries (including a custom system) is 4096!
void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
											unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
											const linkedit_data_command** codeSigCmd,
											const encryption_info_command** encryptCmd)
{
    ...
    for (uint32_t i = 0; i < cmd_count; ++i) {
    ...
}
Copy the code

After the image file is generated, it is added to the global image of sAllImages. The main program is always the first object of sAllImages

static void addImage(ImageLoader* image) { // add to master list allImagesLock(); sAllImages.push_back(image); allImagesUnlock(); . }Copy the code

4.2 Loading and inserting dynamic library loadInsertedDylib

// load any inserted libraries
if( sEnv.DYLD_INSERT_LIBRARIES ! = NULL ) {for(const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! = NULL; ++lib) loadInsertedDylib(*lib); }Copy the code

SMainExecutable, SENV. DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);

Link the main program in the dynamic library, symbol binding

		// link main executable
		gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
		if ( mainExcutableAlreadyRebased ) {
			// previous link() on main executable has already adjusted its internal pointers for ASLR
			// work around that by rebasing by inverse amount
			sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
		}
#endif
		link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
		sMainExecutable->setNeverUnloadRecursive();
		if ( sMainExecutable->forceFlat() ) {
			gLinkContext.bindFlat = true;
			gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
		}
Copy the code

At this point, you are done configuring environment variables -> loading the shared cache -> instantiating the main program -> loading the dynamic library -> linking the dynamic library.

5. Run all initializers

The function call is initializeMainExecutable(); . Run the initializer for the main executable and everything that comes with it

5.1 runInitializers->processInitializers Initialization Preparations

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	uint64_t t1 = mach_absolute_time();
	mach_port_t thisThread = mach_thread_self();
	ImageLoader::UninitedUpwards up;
	up.count = 1;
	up.images[0] = this;
	processInitializers(context, thisThread, timingInfo, up);
	context.notifyBatch(dyld_image_state_initialized, false);
	mach_port_deallocate(mach_task_self(), thisThread);
	uint64_t t2 = mach_absolute_time();
	fgTotalInitTime += (t2 - t1);
}
Copy the code

5.2 Go through image.count, recursively initialize the image,

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
									 InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
	uint32_t maxImageCount = context.imageCount()+2;
	ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
	ImageLoader::UninitedUpwards& ups = upsBuffer[0];
	ups.count = 0;
	// Calling recursive init on all images in images list, building a new list of
	// uninitialized upward dependencies.
	for (uintptr_t i=0; i < images.count; ++i) {
		images.images[i]->recursiveInitialization(context, thisThread, images.images[i]->getPath(), timingInfo, ups);
	}
	// If any upward dependencies remain, init them.
	if ( ups.count > 0 )
		processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code

5.3 recursiveInitialization Indicates that the image is initialized

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
										  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
    ...
    uint64_t t1 = mach_absolute_time();
	fState = dyld_image_state_dependents_initialized;
	oldState = fState;
	context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
	// initialize this image
	bool hasInitializers = this->doInitialization(context);

	// letanyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; context.notifySingle(dyld_image_state_initialized, this, NULL); . }Copy the code

5.3.1 notifySingle Receives a callback from a mirror

static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{ ... }
Copy the code

The next step is to call load_images. The notifySingle does not find load_images. It is a callback

5.3.2 The assignment of sNotifyObjCInit is in the registerObjCNotifiers

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;
	sNotifyObjCUnmapped = unmapped;

	// call 'mapped' function with all images mapped so far
	try {
		notifyBatchPartial(dyld_image_state_bound, true, NULL, false.true);
	}
	catch (const char* msg) {
		// ignore request to abort during registration
	}

	// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem) for (std::vector
      
       ::iterator it=sAllImages.begin(); it ! = sAllImages.end(); it++) { ImageLoader* image = *it; if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) { dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}
      *>Copy the code

5.3.3 The registerObjCNotifiers are called in the _dyLD_OBJC_notify_register function

This function is used to call external shared dynamic libraries, such as objC libraries that need to be loaded in the Runtime

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}

Copy the code

We can see in the source code that _dyLD_OBJC_notify_register is called under _objc_init

The meanings of the three parameters are as follows:

  • map_imagesThis function is triggered when dyld loads the image into memory.
  • load_imagesThis method is triggered when dyld initializes the image (the familiar load method is also called here).
  • unmap_image: this function is triggered when dyld removes the image.
void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    lock_init();
    exception_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);
}
Copy the code

This is a system-specific C++ constructor call.

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
	CRSetCrashLogMessage2(this->getPath());

	// mach-o has -init and static initializers
	doImageInit(context);
	doModInitFunctions(context);
	
	CRSetCrashLogMessage2(NULL);
	
	return (fHasDashInit || fHasInitializers);
}
Copy the code

This C++ constructor has a specific way to write it. Find the corresponding method in the MachO file as follows

__attribute__((constructor)) void CPFunc() {printf("C++Func1");
}

Copy the code

6.notifyMonitoringDyldMainListen in on Dyld’s main

7. Find the call to main

Find the real main entry and return.

// find entry point for main executable
result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
Copy the code

summary

At this point, the entire startup process is complete

The process for loading runtime is as follows

  • Dyld starts to initialize the program binary
  • The ImageLoader reads the image, which contains our class, method, and other symbols
  • Since the Runtime binds the callback to dyLD, when the image is loaded into memory, DyLD tells the Runtime to process it
  • Called when the Runtime takes overmap_imagesDo the parsing and processing, and thenload_imagesIn the callcall_load_methodsMethod, iterating over all incoming classes, calling the Class’s +load method and its Category’s +load method by inheritance hierarchy

conclusion

The process diagram

Dyld call order

1. Boot and boot yourself from the original call stack left by the kernel

2. Recursively load the program dependent dynamic link library into memory, of course there is a caching mechanism

3. Non-lazy symbols are immediately linked to the executable file, and lazy is stored in the table

4.Runs static initializers for the executable

5. Locate the main function of the executable file, prepare parameters, and call it

6. The program execution is responsible for binding the lazy symbol, providing Runtime dynamic loading services, and providing the debugger interface

7. Execute static terminator after main function return

8. In some scenarios, adjust the _exit function of libSystem after the main function ends

Hierarchical sequence diagram

reference

  • The story behind DYLD & Source code Analysis
  • IOS Basics – Comb through the dyLD loading process from scratch
  • What happened before the main function of the iOS program
  • IOS master class
  • APP startup process from the perspective of DYLD source code