Dyld profile

Dyld (The Dynamic Link Editor) is an Apple dynamic linker, used to load all libraries and executable files, is an important part of the Apple operating system, after the kernel of the system to prepare the program, dyLD is responsible for the rest of the work. And it is open source, anyone can download its source code through the Official website of Apple to read and understand how it works, so let’s try to see the load process of DyLD through the source code.

Now let’s go through the controlloadMethod to add breakpointsdyldWhere the implementation started:

From the function call stack, we can see that dyld is executed from the start method, then we open the source code to find the start method, to analyze the process of calling dyld step by step. Version 832.7.3 is used here.

Dyld entry function start

uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[], Const dyld3::MachOLoaded* dyldsMachHeader, uintPtr_t * startGlue) { Dyld3 :: kdebug_TRACe_dyLD_Marker (DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0); RebaseDyld (dyldsMachHeader); rebaseDyld(dyldsMachHeader); // kernel sets up env pointer to be just past end of agv array const char** envp = &argv[argc+1]; // kernel sets up apple pointer to be just past end of envp array const char** apple = envp; while(*apple ! = NULL) { ++apple; } ++apple; // Stack overflow protection __guard_setup(apple); #if DYLD_INITIALIZER_SUPPORT // run all C++ initializers inside dyld runDyldInitializers(argc, argv, envp, apple); #endif // Initialize dyld _subsystem_init(apple); // now that we are done bootstrapping dyld, call dyld's main uintptr_t appsSlide = appsMachHeader->getSlide(); // Call main (the core code is in main, Return dyld::_main((macho_header*)appsMachHeader, appsSlide, Argc, Argv, enVP, Apple, startGlue); }Copy the code

In the start method is mainly to do some preparatory work, is the start phase, the core code in the main function, next we enter the main function.

The main function of dyld

Now let’s go to main

uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue) { // 1. Kernel detection code, here omit...... / / 2. The main program configuration related uint8_t mainExecutableCDHashBuffer [20]. // Uint8_t * mainExecutableCDHash = nullptr; if ( const char* mainExeCdHashStr = _simple_getenv(apple, "executable_cdhash") ) { unsigned bufferLenUsed; if ( hexStringToBytes(mainExeCdHashStr, mainExecutableCDHashBuffer, sizeof(mainExecutableCDHashBuffer), bufferLenUsed) ) mainExecutableCDHash = mainExecutableCDHashBuffer; } // mainExecutableMH: header of the main program, mainExecutableSlide: Main program slide // Sets the CPU, CPU type, etc. GetHostInfo (mainExecutableMH, mainExecutableSlide); // Uintptr_t result = 0; // Uintptr_t result = 0; sMainExecutableMachHeader = mainExecutableMH; sMainExecutableSlide = mainExecutableSlide; // 3. Set the context, save the configuration information to the gLinkContext variable, do some configuration operations here and save setContext(mainExecutableMH, argc, Argv, enVP, Apple); // Configure whether the process is restricted, envp: Environment variable / / AMFI related (Apple Mobile File Integrity Apple Mobile File protection) configureProcessRestrictions (mainExecutableMH envp); If (dyld3::internalInstall()) {if (dyld3::internalInstall()) {... } / / 4. Testing environment variables, and to the environment variables to set the default value checkEnvironmentVariables (envp); defaultUninitializedFallbackPaths(envp); PrintOptions (argv) if (senv.dyPRINt_opts) printOptions(argv); if ( sEnv.DYLD_PRINT_ENV ) printEnvironmentVariables(envp); // 5. Load the cache, UIKit, Foundtion and other dynamic libraries (before loading the main program, but already got the header of the main program, started to read the main program, according to the header of the main program can know the cup, architecture, system and other information of the main program) For storing system libraries such as UIkit, and making sure that only one copy of these libraries is loaded because access between jOS processes is limited, CheckSharedRegionDisable ((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide); // 6. Load the framework we depend on, and the three party libraries // 7. Start to determine whether to load 'dyld2' or 'dyld3'.Copy the code

Load flow of dyLD3

// ClosureMode: ClosureMode,iOS11 after the introduction, iOS13 start dynamic library, three libraries are loaded with ClosureMode mode, If (sClosureMode == ClosureMode::Off) {... } else {// set the sLaunchModeUsed variable to start in closure mode dyld sLaunchModeUsed = DYLD_LAUNCH_MODE_USING_CLOSURE; / / 1. The configuration of the main program information: mainFileInfo, the main program header: mainExecutableMH const dyld3: : closure: : LaunchClosure * mainClosure = nullptr; dyld3::closure::LoadedFileInfo mainFileInfo; mainFileInfo.fileContent = mainExecutableMH; // check for closure in cache first if ( sSharedCacheLoadInfo.loadAddress ! = nullptr) {/ / go to the Shared cache instances of view mainClosure mainClosure = sSharedCacheLoadInfo. LoadAddress - > findClosure (sExecPath); If (mainClosure == nullPtr) {mainClosure = buildLaunchClosure(mainExecutableCDHash, mainFileInfo, envp, bootToken); } // closure! = nullptr) {/ / open start and return to start the bool launched = launchWithClosure (mainClosure, sSharedCacheLoadInfo loadAddress, (dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide, argc, argv, envp, apple, diag, &result, startGlue, &closureOutOfDate, &recoverable); // If startup fails or closureOutOfDate expires if (! Launched && closureOutOfDate && allowClosureRebuilds) {// Create another instance of mainClosure. MainClosure = buildLaunchClosure(mainExecutableCDHash, mainFileInfo, envp, bootToken); / / and start again launched = launchWithClosure (mainClosure, sSharedCacheLoadInfo loadAddress, (dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide, argc, argv, envp, apple, diag, &result, startGlue, &closureOutOfDate, &recoverable); }} / / if started successfully, then save the variables, the main program is loaded successfully the if (launched) {gLinkContext. StartedInitializingMainExecutable = true; Result if (sSkipMain) result = (uintptr_t)&fake_main; return result; } / / boot failure, so the response error information else {the if (gLinkContext. VerboseWarnings) {dyld: : log (" dyld: unable to use closure %p\n", mainClosure); } if ( ! recoverable ) halt(diag.errorMessage()); }Copy the code

Load flow of dyld2

// If not closure mode dyld2 mode sLaunchModeUsed = 0; // Put the notifyGDB and updateAllImages callbacks in the sBatchHandlers array. stateToHandlers(dyld_image_state_mapped, sSingleHandlers)->push_back(updateAllImages); Simageroots.reserve (16); simageroots.reserve (16); sAddImageCallbacks.reserve(4); sRemoveImageCallbacks.reserve(4); sAddLoadImageCallbacks.reserve(4); sImageFilesNeedingTermination.reserve(16); sImageFilesNeedingDOFUnregistration.reserve(8); // Add dyld to UUIDList addDyldImageToUUIDList(); / / main program/loading/mainExcutableAlreadyRebased: main program Rebased status identification bool mainExcutableAlreadyRebased = false; if ( (sSharedCacheLoadInfo.loadAddress ! = nullptr) && ! dylibsCanOverrideCache() && ! sDisableAcceleratorTables && (sSharedCacheLoadInfo.loadAddress->header.accelerateInfoAddr ! = 0) ) { struct stat statBuf; if ( dyld3::stat(IPHONE_DYLD_SHARED_CACHE_DIR "no-dyld2-accelerator-tables", &statBuf) ! = 0 ) sAllCacheImagesProxy = ImageLoaderMegaDylib::makeImageLoaderMegaDylib(&sSharedCacheLoadInfo.loadAddress->header, sSharedCacheLoadInfo.slide, mainExecutableMH, gLinkContext); } // Instantiate the main program image: Executable file (dyld the first image is the main program to load) sMainExecutable = instantiateFromLoadedImage (mainExecutableMH mainExecutableSlide, sExecPath); gLinkContext.mainExecutable = sMainExecutable; / / code signing gLinkContext. MainExecutableCodeSigned = hasCodeSignatureLoadCommand (mainExecutableMH); // Check whether the main program belongs to the current system, the current device version {if (! isSimulatorBinary((uint8_t*)mainExecutableMH, sExecPath) ) { throwf("program was built for a platform that is not supported by this runtime"); } uint32_t mainMinOS = sMainExecutable->minOSVersion(); // dyld is always built for the current OS, so we can get the current OS version // from the load command in dyld itself. uint32_t dyldMinOS = ImageLoaderMachO::minOSVersion((const mach_header*)&__dso_handle); if ( mainMinOS > dyldMinOS ) {... } if (dyld::isTranslated()) {... } // Set the version of the dynamic library checkVersionedPaths(); DYLD_INSERT_LIBRARIES = senv. DYLD_INSERT_LIBRARIES = senv. DYLD_INSERT_LIBRARIES = senv. DYLD_INSERT_LIBRARIES = senv. DYLD_INSERT_LIBRARIES = NULL ) { for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! = NULL; ++lib) loadInsertedDylib(*lib); } / / record into dynamic library gLinkContext. LinkingMainExecutable = true; // Link to the main program, record the start time at the start of this function, then recursively load the libraries that the main program depends on. When loaded, send a notification, fix the ASLR, bind NoLazy symbols, bind weak symbols, recursively apply the inserted dynamic library, register, Link (sMainExecutable, senv.dyLD_Bind_AT_launch, true, ImageLoader::RPathChain(NULL, NULL), -1); If (sInsertedDylibCount > 0) {allImage for(unsigned int I =0; i < sInsertedDylibCount; ++i) { ImageLoader* image = sAllImages[i+1]; link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1); image->setNeverUnloadRecursive(); } if ( gLinkContext.allowInterposing ) { // only INSERTED libraries can interpose // register interposing info after all  inserted libraries are bound so chaining works for(unsigned int i=0; i < sInsertedDylibCount; ++i) { ImageLoader* image = sAllImages[i+1]; image->registerInterposing(gLinkContext); If (sInsertedDylibCount > 0) {for(unsigned int I =0; i < sInsertedDylibCount; ++i) { ImageLoader* image = sAllImages[i+1]; // Bind the inserted dynamic library! image->recursiveBind(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true, nullptr); }} sMainExecutable->weakBind(gLinkContext);Copy the code

The process of instantiating the main program

This is the process of instantiating an iAMGE. Other dynamic libraries and tripartite libraries instantiate the same process as the main program.

/ / instantiate a iamge process static ImageLoaderMachO * instantiateFromLoadedImage (const macho_header * mh, uintptr_t slide, Const char* path) {// load the MachO file for the main program (header, LoadCommands) ImageLoader * image = ImageLoaderMachO: : instantiateMainExecutable (mh, slide, path, gLinkContext); // Add iamge to allImage array addImage(image); return (ImageLoaderMachO*)image; } / / into the ImageLoader * ImageLoaderMachO instantiateMainExecutable function: : instantiateMainExecutable (const macho_header * mh, uintptr_t slide, const char* path, const LinkContext& context) { //dyld::log("ImageLoader=%ld, ImageLoaderMachO=%ld, ImageLoaderMachOClassic=%ld, ImageLoaderMachOCompressed=%ld\n", // sizeof(ImageLoader), sizeof(ImageLoaderMachO), sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed)); bool compressed; unsigned int segCount; unsigned int libCount; const linkedit_data_command* codeSigCmd; const encryption_info_command* encryptCmd; sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd); / / according to the different subclasses of compressed value choice to instantiate the image if (compressed) return ImageLoaderMachOCompressed: : instantiateMainExecutable (mh, slide, path, segCount, libCount, context); else #if SUPPORT_CLASSIC_MACHO return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context); #else throw "missing LC_DYLD_INFO load command"; # endif} into the void ImageLoaderMachO sniffLoadCommands function: : sniffLoadCommands (const macho_header * mh, const char * path, bool inCache, bool* compressed, unsigned int* segCount, unsigned int* libCount, const LinkContext& context, const linkedit_data_command** codeSigCmd, const encryption_info_command** encryptCmd) { *compressed = false; *segCount = 0; *libCount = 0; *codeSigCmd = NULL; // Code signature *encryptCmd = NULL; If (*segCount > 255) dyld::throwf("malformed Mach-o image: more than 255 segments in %s", path); // fSegmentsArrayCount is only 8-bits if ( *libCount > 4095 ) dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path); if ( needsAddedLibSystemDepency(*libCount, mh) ) *libCount = 1; // dylibs that use LC_DYLD_CHAINED_FIXUPS have that load command removed when put in the dyld cache if ( ! *compressed && (mh->flags & MH_DYLIB_IN_CACHE) ) *compressed = true; }Copy the code

InitializeMainExecutable function

Until initializeMainExecutable does some initialization and load preparation, it’s time to actually enter the main program.

  1. Enter the initializeMainExecutable function
void initializeMainExecutable() { // record that we've reached this step gLinkContext.startedInitializingMainExecutable = true; // run initialzers for any inserted dylibs ImageLoader::InitializerTimingList initializerTimes[allImagesCount()]; initializerTimes[0].count = 0; const size_t rootCount = sImageRoots.size(); if ( rootCount > 1 ) { for(size_t i=1; i < rootCount; ++i) { sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); } } // run initializers for main executable and everything it brings up sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); // register cxa_atexit() handler to run static terminators in all loaded images when this process exits if ( gLibSystemHelpers ! = NULL ) (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL); // dump info if requested if ( sEnv.DYLD_PRINT_STATISTICS ) ImageLoader::printStatistics((unsigned int)allImagesCount(),  initializerTimes[0]); if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS ) ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]); }Copy the code
  1. As you can see from the function call stack, initializeMainExecutable then entersImageLoader The function, whether it’s dyld2 or dyld3 is going to go here

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	uint64_t t1 = mach_absolute_time();
	mach_port_t thisThread = mach_thread_self();
	ImageLoader::UninitedUpwards up;
	up.count = 1;
	up.imagesAndPaths[0] = { this, this->getPath() };
	processInitializers(context, thisThread, timingInfo, up);
	context.notifyBatch(dyld_image_state_initialized, false);
	mach_port_deallocate(mach_task_self(), thisThread);
	uint64_t t2 = mach_absolute_time();
	fgTotalInitTime += (t2 - t1);
}
Copy the code
  1. Again by the function call stackrecursiveInitializationDelta function, let’s see how we end up with delta function. Okayload_imagesThe function.
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
										  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) 
{
  context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
}
Copy the code
  1. Into thenotifySingle Function, but we didn’t find it hereload_images Function, becauseload_images Objc is the library, so how do they get called?
static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{
if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit != NULL) && image->notifyObjC() ) {
		uint64_t t0 = mach_absolute_time();
		dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
		(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
		uint64_t t1 = mach_absolute_time();
		uint64_t t2 = mach_absolute_time();
		uint64_t timeInObjC = t1-t0;
		uint64_t emptyTime = (t2-t1)*100;
		if ( (timeInObjC > emptyTime) && (timingInfo != NULL) ) {
			timingInfo->addTime(image->getShortName(), timeInObjC);
		}
	}
}
Copy the code
  1. Load_images Call procedure

At line 1019 of notifySingle we can see that there’s a callback pointer here, and there’s a judgment that sNotifyObjCInit is not empty that calls the callback, so where is sNotifyObjCInit assigned?

So we’re searching in the current file and we’ll see registerObjCNotifiers that’s assigned here, init, so let’s see where registerObjCNotifiers is called, and an init is passed. Let’s do a global search.

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;
}
Copy the code

Here we see that the registerObjCNotifiers are called here and an init is passed in.

And then search again_dyld_objc_notify_registerYou’ll find it lost. So let’s use symbolic breakpoints and see who called it_dyld_objc_notify_register.

After running, we print out the stack information to see that _dyLD_objc_notify_register is called by _objc_init in the libobjc file. What we need to do here is look at the objc source code, find the _objc_init function, and see what value this place passes to the _dyLD_OBJc_notify_register method.

Void _objc_init(void) {static bool Initialized = false; if (initialized) return; initialized = true; // fixme defer initialization until an objc-using image is found? environ_init(); tls_init(); static_init(); runtime_init(); exception_init(); #if __OBJC2__ cache_t::init(); #endif _imp_implementationWithBlock_init(); // Here we see _dyLD_objC_notify_register called, And the second parameter is load_images _dyLD_OBJC_NOTIFy_register (&map_images, load_images, unmap_image); #if __OBJC2__ didCallDyldNotifyRegister = true; #endif }Copy the code

Here we find the call to _dyLD_OBJC_NOTIFy_register, and we see that the second argument is load_images, so init is load_images, We find that line 1019 of notifySingle calls the load_images function. Here you see the complete process from start to load_images.

Call_load_methods function

From the _objc_init function to the load_images function, then to the call_load_methods function, here we can see line 353 call_class_loads, the load method of the class is called.

Void load_images(const char *path __unused, const struct mach_header *mh) { call_load_methods(); Void call_load_methods(void) {static bool loading = NO; bool more_categories; loadMethodLock.assertLocked(); // Re-entrant calls do nothing; the outermost call will finish the job. if (loading) return; loading = YES; void *pool = objc_autoreleasePoolPush(); While (loadable_classes_used > 0) {call_class_loads(); } // 2. More_categories = call_category_loads(); // 3. Run more +loads if there are classes OR more untried categories } while (loadable_classes_used > 0 || more_categories); objc_autoreleasePoolPop(pool); loading = NO; }Copy the code

Loading the C++ global constructor

After watching the process of calling the load_images function, let’s go to the Demo and try a phenomenon.

#import "appdelegator.h" #import "viewController.h" __attribute__((constructor)) void func1() {printf("func1 executes "); } __attribute__((constructor)) void func2() {printf("func2 executes "); } @interface AppDelegate ()Copy the code

Add __attribute__((constructor)) void func1 and __attribute__((constructor)) void func2 to appdelegator.m file These two functions are global C++ constructors. After the project is compiled, drag the compiled files to the MachOView application, and you can see that the __mod_init_func file is added to the structure of the files. And when the project runs, we see that func1 and func12 are executed after load and before main in the printf order, so what chunk of code controls the loading of functions of type Func1 and type func12?

Now let’s go back to the dyld source code and come to the recursiveInitialization function, and finally to the ImageLoaderMachO function. You can see here that the doModInitFunctions function is actually responsible for loading the constructor.

/ / 1. Enter the bool ImageLoaderMachO function ImageLoaderMachO: : doInitialization (const LinkContext & context) { CRSetCrashLogMessage2(this->getPath()); // mach-o has -init and static initializers doImageInit(context); // Load init constructor doModInitFunctions(context); CRSetCrashLogMessage2(NULL); return (fHasDashInit || fHasInitializers); }Copy the code

InitializeMainExecutable function

Now let’s go back to the main function of dyld and find line 7114 where we call the (uintptr_t)gLibSystemHelpers->startGlueToCallExit function to get the main function of the main application. Assign the address of the main function to result, determine if result has a value, and return result at the end of dyld main.

{ // find entry point for main executable result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN(); if ( result ! = 0 ) { // main executable uses LC_MAIN, we need to use helper in libdyld to call into main() if ( (gLibSystemHelpers ! = NULL) && (gLibSystemHelpers->version >= 9) ) *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit; else halt("libdyld.dylib support not present for LC_MAIN"); } else { // main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main() result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();  *startGlue = 0; }} // return result;Copy the code

The final summary

  • DYLD: Dynamic linker that loads all libraries and executables.
  • Dyld Load process
    • Program execution starts with _dyLD_start -> dyld ‘dyldbootstrap::start
    • Enter the dyld: main function
    • Configure some environments: rebase_dyld
    • Loading the shared cache
    • Determine load mode DYLD2/DYLD3(closure mode)
      • Instantiate the main program
      • Load the dynamic library (insert the dynamic library first) (both the main program and the dynamic library are added to allImages)
      • Link the main program, bind symbols (none of which are lazily loaded, weak symbols), and so on
      • Most critical: initializeMainExecutable
        • dyld`ImageLoader::runInitializers
          • dyld`ImageLoader::processInitializers:
            • dyld`ImageLoader::recursiveInitialization:
              • Dyld ` dyld: : notifySingle: function

                • This function performs a callback
                • Debug with breakpoints: This function is a Load_images function of the _objc_init initialization assignment
                  • Class_load_methods function is executed inside Load_images
                    • Call_class_loads: calls the load method of each class in a loop
              • DoModInitFunction function

                • The C function of the global C++ object’s constructor __attribute__((constructor)) is invoked internally
      • Return the entry function of the main program, and start the main function of the main program

Shortcomings, please also correct……