1. Review

In the last blog introduced the difference between dynamic library and static library, dyLD dynamic linker to do a preliminary exploration and analysis, this blog on further dyLD source code analysis.

2. MachO

In the previous article, you found the entry to dyLD, but there are a few more things to add before you analyze the source code.

How do YOU get Mach-O in iOS?

2.1 MacOS engineeringCheck the MachO

Directly compiling and running gives you mach-O, which is the dark thing below.

2.2 The iOS projectCheck the MachO

For iOS projects, you need to find the.app file in Products

Then Show in Finder finds the file location

Again, this is the MachO executable in black

2.3 MachOView View the MachO structure

I’m going to take this MachO file, and I’m going to drag it into my MachOView and I’m going to look at the MachO structure.

  • HeaderHeader that contains executablesCPUArchitecture, such asx86,arm64
  • Load commandsLoad commands that include the organization of files and how they are laid out in virtual memory
  • Data, data, includingload commandsEach section (segment), every single one of themSegmentThey all have to be of size 1PageAn integer multiple of.

3. Dyld source code analysis

3.1 dyld: : _main

Dyld entry main function, dude! I call it “Boy”! Nearly a thousand lines of code!

This is too long, the code will not be posted, once posted, this post is basically over, too damn long 😂.

  • Three thousand weak water. I’ll take one ladle,
  • A thousand lines of code, and I’m only looking at a few!

What a poem! Ha ha 😁

3.1.1 Setting Environment Variables

Also known from the comments of the underlying source code, here is the entry to dyLD

//
// Entry point for dyld. The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
		int argc, const char* argv[], const char* envp[], const char* apple[], 
		uintptr_t* startGlue)
{
	if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
		launchTraceID = dyld3::kdebug_trace_dyld_duration_start(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, (uint64_t)mainExecutableMH, 0.0);
	}

	//Check and see if there are any kernel flags
	dyld3::BootArgs::setFlags(hexToUInt64(_simple_getenv(apple, "dyld_flags"), nullptr));

#if __has_feature(ptrauth_calls)
	// Check and see if kernel disabled JOP pointer signing (which lets us load plain arm64 binaries)
	if ( const char* disableStr = _simple_getenv(apple, "ptrauth_disabled")) {if ( strcmp(disableStr, "1") = =0 )
			sKeysDisabled = true;
	}
	else {
		// needed until kernel passes ptrauth_disabled for arm64 main executables
		if ( (mainExecutableMH->cpusubtype == CPU_SUBTYPE_ARM64_V8) || (mainExecutableMH->cpusubtype == CPU_SUBTYPE_ARM64_ALL) )
			sKeysDisabled = true;
	}
#endif

Copy the code

The above interception of the main function part of the code, mainly if conditions to determine the setting of various environment variables

3.1.2 Platform Information Settings

Set the platform ID in all image files so that the debugger can determine the process type. Note: “image” does not mean an image, it means a mirror image

3.1.3 Shared Cache

Check whether it is enabled, and whether the shared cache maps to the shared area. This is system-level, system-controlled. Caches are a valuable resource.

  • mapSharedCache
	if ( sJustBuildClosure )
		sClosureMode = ClosureMode::On;

	// load shared cache
	checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
	if( gLinkContext.sharedRegionMode ! = ImageLoader::kDontUseSharedRegion ) {#if TARGET_OS_SIMULATOR
		if ( sSharedCacheOverrideDir)
			mapSharedCache(mainExecutableSlide);
#else
		mapSharedCache(mainExecutableSlide);
#endif

Copy the code

It’s too difficult! So thousands of lines of code line by line look down, not to say the eyes can not stand, people are going crazy! (PS: pain)

So, blogger, do you have a good way to explore?

Oh, what a coincidence! Handsome!!! There really is!

Work backwards and go straight to the last line of code. Let’s work backwards from the result to see what conditions lead to the final return.

Result is located in the main function by searching for the relationship between sMainExecutable and result.


		CRSetCrashLogMessage(sLoadingCrashMessage);
		// instantiate ImageLoader for main executable
		sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
		gLinkContext.mainExecutable = sMainExecutable;
		gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
Copy the code

Search again to see what the key thing sMainExecutable is.

It seems that the sMainExecutable was the right place to find it. As we derive it, what is our goal to find? We’re looking for images. So be sensitive to words like link, bind, and load.

Code from the above, some of the key code sInsertedDylibCount weakBind/linkingMainExecutable these can also verify, we find sMainExecutable is to find the right.

3.1.4 link link

As you can also see from the code, the sMainExecutable is an argument to the link link

// Load any inserted libraries Loads any inserted libraries
		if( sEnv.DYLD_INSERT_LIBRARIES ! =NULL ) {
			for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! =NULL; ++lib) 
				loadInsertedDylib(*lib);
		}
		// record count of inserted libraries so that a flat search will look at 
		// inserted libraries, then main, then others.
		sInsertedDylibCount = sAllImages.size()- 1;

		// link main executable
		gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
		if ( mainExcutableAlreadyRebased ) {
			// previous link() on main executable has already adjusted its internal pointers for ASLR
			// work around that by rebasing by inverse amount
			sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
		}
#endif
		link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
		sMainExecutable->setNeverUnloadRecursive();
		if ( sMainExecutable->forceFlat() ) {
			gLinkContext.bindFlat = true;
			gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
		}

Copy the code
  • traverseDYLD_INSERT_LIBRARIESEnvironment variable, callloadInsertedDylibLoad the dynamic library.
  • Link main program

3.1.5 Entrance of main program

	#if SUPPORT_OLD_CRT_INITIALIZATION
		// Old way is to run initializers via a callback from crt1.o
		if(! gRunInitializersOldWay ) initializeMainExecutable();#else
		// run all initializers
		initializeMainExecutable(); 
	#endif

		// notify any montoring proccesses that this process is about to enter main()
		notifyMonitoringDyldMain();
		if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
			dyld3::kdebug_trace_dyld_duration_end(launchTraceID, DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, 0.0.2);
		}
		ARIADNEDBG_CODE(220.1);

#if TARGET_OS_OSX
		if ( gLinkContext.driverKit ) {
			result = (uintptr_t)sEntryOverride;
			if ( result == 0 )
				halt("no entry point registered");
			*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
		}
		else
#endif
		{
			// find entry point for main executable
			result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();// Find the entry point for the main executable
			if( result ! =0 ) {
				// main executable uses LC_MAIN, we need to use helper in libdyld to call into main()
// The main executable uses LC_MAIN. We need to use the libdyld helper to call main().
				if( (gLibSystemHelpers ! =NULL) && (gLibSystemHelpers->version >= 9) )
					*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
				else
					halt("libdyld.dylib support not present for LC_MAIN");
			}
			else {
				// main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
				result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();
				*startGlue = 0; }}}Copy the code
  • Execute the initialization methodinitializeMainExecutable
  • notifyMonitoringDyldMainNotifies any monitoring process that this process is about to entermain()
  • throughifjudgeresultLook for the entry point to the main program
  • result ! = 0When: LC_MAINWe need to uselibdyldIn thehelper To invoke themain()
  • result == 0Use:LC_UNIXTHREAD.dyldI need to make the program"Start"formain() Set up the

3.2 initializeMainExecutable

3.2.1 runInitializers

Get the number of image files and the loop starts to initialize the image

// run initialzers for any inserted dylibs
	ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
	initializerTimes[0].count = 0;
	const size_t rootCount = sImageRoots.size();
	if ( rootCount > 1 ) {
		for(size_t i=1; i < rootCount; ++i) {
			sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); }}// run initializers for main executable and everything it brings up 
	sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
	
Copy the code

Initializers -> processInitializers

3.2.2 processInitializers

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
									 InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
	uint32_t maxImageCount = context.imageCount()+2;
	ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
	ImageLoader::UninitedUpwards& ups = upsBuffer[0];
	ups.count = 0;
	// Calling recursive init on all images in images list, building a new list of
	// uninitialized upward dependencies.
	for (uintptr_t i=0; i < images.count; ++i) {
		images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
	}
	// If any upward dependencies remain, init them.
	if ( ups.count > 0 )
		processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code

Through recursiveInitialization, the core code is as follows

3.2.3 recursiveInitialization

// let objc know we are about to initialize this image
			uint64_t t1 = mach_absolute_time();
			fState = dyld_image_state_dependents_initialized;
			oldState = fState;
			context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
			
			// initialize this image
			bool hasInitializers = this->doInitialization(context);

			// let anyone know we finished initializing this image
			fState = dyld_image_state_initialized;
			oldState = fState;
			context.notifySingle(dyld_image_state_initialized, this.NULL);
			
Copy the code

3.2.4 notifySingle

A notifySingle can be found by recursiveInitialization

  • NotifySingle Key code

Path loading and image file loading are important

A global search for sNotifyObjCInit shows that it is of type _dyLD_OBJC_notify_init

3.2.5 Connection between objC_init and DYLD

The second parameter in the registerObjCNotifiers method is assigned to sNotifyObjCInit

Search registerObjCNotifiers globally to see where the registerObjCNotifiers are called

Found a registerObjCNotifiers call in the dyldAPIs. CPP file

// _dyld_objc_notify_register
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code

This _dyLD_OBJC_notify_register looks familiar! Deja vu return of swallow! Libobjc.dylib _objc_init = libobjc.dylib The diagram below:

When the _dyLD_OBJC_notify_register function is called, three parameters are passed (the address of map_images and unmap_image of load_images)

So back to the dyLD source code project, registerObjCNotifiers look like this

sNotifyObjCMapped	= mapped = &map_images
sNotifyObjCInit    = init   = load_images
sNotifyObjCUnmapped = unmapped = unmap_image
Copy the code

This is where objc_init is associated with dyLD

Objc_init () registers three functions with dyLD that will be called when dyLD enters the static library loading process under certain circumstances.

We have been in the source code of dyld, now we go back to _objc_init to make a forward guess. Set a breakpoint in the _objc_init function, check the stack information, and find that _oc_object_init is in the libdispatch source.

Go to the libDispatch source now

Click this link to view the source code for libdispatch-1271.120.2

Libdispatch_init: libdispatch_init: libdispatch_init: libdispatch_init: libdispatch_init LibSystem_initializer is found from the stack information

Now I have to go back to the LibSystem source code and see that libdispatch_init does call

Libsystem – 1292.120.1 source code

So who initiated libSystem_initializer?

The stack information reveals doModInitFunctions.

DoModInitFunctions is back to dyLD, which is the validation process from backwards to forwards.

DoModInitFunctions ImageLoaderMachO: : doInitialization it is called, the following code:

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
	CRSetCrashLogMessage2(this->getPath());

	// mach-o has -init and static initializers
	doImageInit(context);
	doModInitFunctions(context);
	
	CRSetCrashLogMessage2(NULL);
	
	return (fHasDashInit || fHasInitializers);
}
Copy the code

ImageLoader: : recursiveInitialization again call doInitialization inside, recursive initialization image file

	// let objc know we are about to initialize this image
			uint64_t t1 = mach_absolute_time();
			fState = dyld_image_state_dependents_initialized;
			oldState = fState;
			context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
			
			// initialize this image
			bool hasInitializers = this->doInitialization(context);
Copy the code

To sum up:objc_init()The call process is as follows:

_dyld_start --> dyldbootstrap::start --> dyld::_main --> dyld::initializeMainExecutable --> ImageLoader::runInitializers --> ImageLoader::processInitializers --> ImageLoader::recursiveInitialization --> doInitialization -->libSystem_initializer (libsystem.b.dylib) --> _os_object_init (libdispatch.dylib) --> _objc_init(libobjc.a.dylib)

When is the method inside the _DYLD_OBJC_notify_register called?

3.3 map_images and load_images

3.3.1 map_images

Go back to libobjc. dylib, also known as objC source project, and set breakpoints on map_images and load_images methods respectively. It is found that map_images is first reached, and then console BT prints stack information

The map_images method executes first and then looks at the run stack as follows: _DYLD_objC_notify_register –> registerobjC_notifybatchpartial –> map_images

Dyld source code can also be verified as follows:

3.3.2 rainfall distribution on 10-12 load_images

Click continue to the load_images breakpoint and print the stack information

When load_images is called:

_dyld_objc_notify_register --> registerObjCNotifiers --> load_images

3.4 Exploring the Call timing of main

In dyld(part 1), we have a test case where the order of execution is + load –> c++ –> main.

3.4.1 Load method call process

load_images(const char *path __unused, const structmach_header *mh) { .... Omit code....// Discover load methods
    {
        mutex_locker_t lock2(runtimeLock);
        prepare_load_methods((const headerType *)mh);
    }

    // Call +load methods (without runtimeLock - re-entrant)
    call_load_methods();
}
Copy the code

Prepare_load_methods prepare_load_methods prepare_load_methods prepare_load_methods prepare_load_methods prepare_load_methods prepare_load_methods

classref_t const *classlist = 
        _getObjc2NonlazyClassList(mhdr, &count);
    for (i = 0; i < count; i++) {
        schedule_class_load(remapClass(classlist[i]));
    }

    category_t * const *categorylist = _getObjc2NonlazyCategoryList(mhdr, &count);
Copy the code

Schedule_class_load recursively schedules + loads for classes and any superclasses that are not loaded

// Recursively schedule +load for cls and any un-+load-ed superclasses.
// cls must already be connected.
static void schedule_class_load(Class cls)
{
    if(! cls)return;
    ASSERT(cls->isRealized());  // _read_images should realize

    if (cls->data()->flags & RW_LOADED) return;

    // Ensure superclass-first ordering
    schedule_class_load(cls->getSuperclass());

    add_class_to_loadable_list(cls);
    cls->setInfo(RW_LOADED); 
}
Copy the code

Add_class_to_loadable_list collects all the + loads together, first by class, then by category.

/*********************************************************************** * add_class_to_loadable_list * Class cls has just become connected. Schedule it for +load if * it implements a +load method. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
void add_class_to_loadable_list(Class cls)
{
    IMP method;

    loadMethodLock.assertLocked();

    method = cls->getLoadMethod();
    if(! method)return;  // Don't bother if cls has no +load method
    
    if (PrintLoading) {
        _objc_inform("LOAD: class '%s' scheduled for +load", 
                     cls->nameForLogging());
    }
    
    if (loadable_classes_used == loadable_classes_allocated) {
        loadable_classes_allocated = loadable_classes_allocated*2 + 16;
        loadable_classes = (struct loadable_class *)
            realloc(loadable_classes,
                              loadable_classes_allocated *
                              sizeof(struct loadable_class));
    }
    
    loadable_classes[loadable_classes_used].cls = cls;
    loadable_classes[loadable_classes_used].method = method;
    loadable_classes_used++;
}
Copy the code

The recursive fetch methods are collected in loadable_classes[loadable_classes_used]. Method and are obtained from the getLoadMethod method.

/*********************************************************************** * objc_class::getLoadMethod * fixme * Called only from add_class_to_loadable_list. * Locking: runtimeLock must be read- or write-locked by the caller. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
IMP 
objc_class::getLoadMethod()
{
    runtimeLock.assertLocked();

    constmethod_list_t *mlist; ASSERT(isRealized()); ASSERT(ISA()->isRealized()); ASSERT(! isMetaClass()); ASSERT(ISA()->isMetaClass()); mlist = ISA()->data()->ro()->baseMethods();if (mlist) {
        for (const auto& meth : *mlist) {
            const char *name = sel_cname(meth.name());
            if (0 == strcmp(name, "load")) {
                return meth.imp(false); }}}return nil;
}

Copy the code

Recursively all baseMethods(), matching “load” with STRCMP, which is how the load method is called

3.4.2 C++ function call timing

In C++ function at the start of the breakpoint, and then bt print call stack information to view

From the stack information, we know that the call sequence is doInitialization –>doModInitFunctions –> C++(JPFunc). Then we go back to the dyld source code and search for doInitialization

The notifySingle process is called load_images and then load, as verified above, so load is called before C++.

Call the stack into the doModInitFunctions method.

DoModInitFunctions: Load commands and macHO_segment_command the for loop iterates through the macho_section function pointer, That is, C++ methods are included.

3.4.3 How does DYLD get to main.m function

After the C++ function, dyldbootstrap::start will look for the main function. You can tell from the stack that the start of the dyLD is _dyLD_START

_dyLD_start is written in assembly. From assembly, main is stored in the RAX register. So let’s go to the project code and verify that.

Through the code assembly tracking debugging, can also be found and dyLD source code assembly execution is the same.

Register read to register main. Register read rax to check rax = main. So that’s the process from dyld to Main.

This is thedyldThe exploration and analysis of the process, ask who can, such a complex thing, so orderly analysis, MY damn nowhere to put the charm ah! Ha ha 😁

4. To summarize

  • dyldAnalysis pushes back to positive validation
  • notifySingleSingle notification injection
  • _dyld_objc_notify_registerRegister callback function, next handle, similarblock.dyldIt’s implemented insidemap_images,load_images,unmap_imageAnd then call back to_objc_initYou can start calling normally.
  • libSystemLibraries are the first to initiate calls inImageLoaderMachO::doModInitFunctionsandImageLoaderMachO::doImageInitYou can check it in there becauseobjcDepends on the system.dyldAlso wait for the system related library initialization is complete, the image initialization, mapping and other operations.

  • Main, + load, C++The order of execution is+ load --> c++ --> main
  • In the lastdyldProcess analysis chart

More content continues to be updated

🌹 if you like, give it a thumbs up 👍🌹

🌹 think learned, can come a wave, collection + attention, comment + forward, lest you can not find me 😁🌹 next time

🌹 welcome everyone to leave a message exchange, criticism and correction, learn from each other 😁, improve themselves 🌹