preface

Before we through the source code reading debugging method analysis of objC class structure, method search, message mechanism process, but all this is built on the premise of DYLD has loaded the relevant information, this article through dyLD source code to explore how DYLD is to do these;

  1. Code – “macho;
  2. Load into memory;
  3. Objc_init – > objc;

How application loading works

Dynamic library && static library

  1. Each program uses a bunch of base libraries, UIKit, CoreFoundation, libsysterm
  2. Libraries: executable binaries that can be loaded into memory by the operating system. They include static libraries (.a.lib), dynamic libraries (.so.dll.framework.dylb),
  3. The difference between dynamic and static is the difference between static and dynamic links;

4. Static libraries are loaded in sequence, and there may be duplication, which will waste space and loading time; Figure B and D link two static libraries twice;

  1. Dynamic libraries will share the same dynamic library according to the situation, and the memory space will be optimized;

  1. Advantages of dynamic libraries: reduced package size, shared memory, and most hot updates are implemented based on dynamic libraries; However, only system-level dynamic libraries can realize shared memory under the current system, and developers’ own dynamic libraries cannot be truly shared.
  2. Running the MAC project executable, you can see that the console can print the relevant information directly

From the main to dyld

  1. The start method in libdyld. Dylib was executed before the main function. However, the breakpoint cannot be successfully found by adding the start symbol break method, indicating that the start may not be the real symbol information, and you need to use another way to set the breakpoint.

2. We tried to add a breakpoint in the load method of the class before main.3. It can be seen that DYLD is the library responsible for loading macho files into memory when APP is started. No matter dynamic library, static library or other Macho files are loaded into memory through DYLD.

Dyld introduction

  1. Dyld (The Dynamic Link Editor) is apple’s dynamic linker, which is an important part of Apple’s operating system. During app startup, dyLD is responsible for the remaining work after the system kernel has prepared the program.
  2. Objc loads all libraries through dyLD during initialization;

Analysis of DYLD loading process

  1. If you look at the dyLD source code, you can see that DyLD_start is written in assembly codedyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
#if __arm__   // This is how dyLD_STAR works
	.text
	.align 2
__dyld_start:
	mov	r8, sp		// save stack pointer
	sub	sp, #16		// make room for outgoing parameters
	bic     sp, sp, #15	// force 16-byte alignment

	// call dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
        // Dyldbootstrap ::start is currently called
	ldr	r0, [r8]	// r0 = mach_header
	ldr	r1, [r8, #4]	// r1 = argc
	add	r2, r8, #8	// r2 = argv
	adr	r3, __dyld_start
	sub	r3 ,r3, #0x1000 // r3 = dyld_mh
	add	r4, sp, #12
	str	r4, [sp, #0]	// [sp] = &startGlue

	bl	__ZN13dyldbootstrap5startEPKN5dyld311MachOLoadedEiPPKcS3_Pm
	ldr	r5, [sp, #12]
	cmp	r5, #0
	bne	Lnew

	// traditional case, clean up stack and jump to result
	add	sp, r8, #4	// remove the mach_header argument.
	bx	r0		// jump to the program's entry point

	// LC_MAIN case, set up stack for call to main()
Copy the code
  1. indyldbootstrapNamespacestart()Method, which is analyzed to find that the last execution isdyld::main()Functions; The dyLDbootstrap ::start() function does a lot of dyLD initialization work, including:
  • RebaseDyld () rebae dyLD and add ASLR.
  • Mach_init () Initializes the Mach message.
  • __guard_setup() stack overflow protection.

Here is the code analysis for dyldBootstrap :: ‘start()

uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
				const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{

    // Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536>
    dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0.0.0.0);

	// if kernel had to slide dyld, we need to fix up load sensitive locations
	// we have to do this before using any global variables
    // Rebae dyLD to add ASLR
    rebaseDyld(dyldsMachHeader);
    

	// kernel sets up env pointer to be just past end of agv array
	const char** envp = &argv[argc+1];
	
	// kernel sets up apple pointer to be just past end of envp array
	const char** apple = envp;
	while(*apple ! =NULL) { ++apple; }
	++apple;

	// set up random value for stack canary
    // Stack overflow protection
	__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
	// run all C++ initializers inside dyld
	runDyldInitializers(argc, argv, envp, apple);
#endif

	_subsystem_init(apple);

	// now that we are done bootstrapping dyld, call dyld's main
    // Get the asly of the current application
	uintptr_t appsSlide = appsMachHeader->getSlide();
    // Enter the main() function that is dyld's core method
	return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
Copy the code
  1. Enter thedyld::main()Function, found a large amount of code forward lookup is very inconvenient, at this time we usebacksteppingTo find outresultWhere the value is assigned, and finally foundresultThe assignment andsMainExecutableThere are relations:result->sMainExecutableAnd findsMainExecutableThe place of the assignment issMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);;

Here is a simplified version of the dyld::main() function

//
// Entry point for dyld. The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
        int argc, const char* argv[], const char* envp[], const char* apple[], 
        uintptr_t* startGlue)
{
    // Grab the cdHash of the main executable from the environment
    // Set up the operating environment
    uint8_t mainExecutableCDHashBuffer[20];
    const uint8_t* mainExecutableCDHash = nullptr;
    if ( hexToBytes(_simple_getenv(apple, "executable_cdhash"), 40, mainExecutableCDHashBuffer) )
        // Get the hash of the main program
        mainExecutableCDHash = mainExecutableCDHashBuffer;
    // Trace dyld's load
    notifyKernelAboutImage((macho_header*)&__dso_handle, _simple_getenv(apple, "dyld_file"));
#if! TARGET_IPHONE_SIMULATOR
    // Trace the main executable's load
    notifyKernelAboutImage(mainExecutableMH, _simple_getenv(apple, "executable_file"));
#endif
    uintptr_t result = 0;
    // Get the macho_header structure of the main program
    sMainExecutableMachHeader = mainExecutableMH;
    // Get the slide value of the main program
    sMainExecutableSlide = mainExecutableSlide;
    CRSetCrashLogMessage("dyld: launch started");
    // Set the context information
    setContext(mainExecutableMH, argc, argv, envp, apple);
    // Pickup the pointer to the exec path.
    // Get the main program path
    sExecPath = _simple_getenv(apple, "executable_path");
    // <rdar://problem/13868260> Remove interim apple[0] transition code from dyld
    if(! sExecPath) sExecPath = apple[0];
    if ( sExecPath[0] != '/' ) {
        // have relative path, use cwd to make absolute
        char cwdbuff[MAXPATHLEN];
        if( getcwd(cwdbuff, MAXPATHLEN) ! =NULL ) {
            // maybe use static buffer to avoid calling malloc so early...
            char* s = new char[strlen(cwdbuff) + strlen(sExecPath) + 2];
            strcpy(s, cwdbuff);
            strcat(s, "/"); strcat(s, sExecPath); sExecPath = s; }}// Remember short name of process for later logging
    // Get the process name
    sExecShortName = ::strrchr(sExecPath, '/');
    if( sExecShortName ! =NULL )
        ++sExecShortName;
    else
        sExecShortName = sExecPath;
    
    // Configure the process restricted mode
    configureProcessRestrictions(mainExecutableMH);
    // Check environment variables
    checkEnvironmentVariables(envp);
    defaultUninitializedFallbackPaths(envp);
    // If DYLD_PRINT_OPTS is set, printOptions() is called to print the parameters
    if ( sEnv.DYLD_PRINT_OPTS )
        printOptions(argv);
    / / if set DYLD_PRINT_ENV call the printEnvironmentVariables () to print environment variable
    if ( sEnv.DYLD_PRINT_ENV ) 
        printEnvironmentVariables(envp);
    // Get the current program architecture
    getHostInfo(mainExecutableMH, mainExecutableSlide);
    //------------- Step 1 end -------------
    
    // load shared cache
    // Load the shared cache
    // Check whether shared cache is enabled. IOS must be enabled
    checkSharedRegionDisable((mach_header*)mainExecutableMH);
    if ( gLinkContext.sharedRegionMode != ImageLoader::kDontUseSharedRegion ) {
        mapSharedCache();
    }
    ...
    try {
        // add dyld itself to UUID list
        addDyldImageToUUIDList();
        // instantiate ImageLoader for main executable
        // Step 3 instantiate the main program
        sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
        gLinkContext.mainExecutable = sMainExecutable;
        gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
        // Now that shared cache is loaded, setup an versioned dylib overrides
    #if SUPPORT_VERSIONED_PATHS
        checkVersionedPaths();
    #endif
        // dyld_all_image_infos image list does not contain dyld
        // add it as dyldPath field in dyld_all_image_infos
        // for simulator, dyld_sim is in image list, need host dyld added
#if TARGET_IPHONE_SIMULATOR
        // get path of host dyld from table of syscall vectors in host dyld
        void* addressInDyld = gSyscallHelpers;
#else
        // get path of dyld itself
        void*  addressInDyld = (void*)&__dso_handle;
#endif
        char dyldPathBuffer[MAXPATHLEN+1];
        int len = proc_regionfilename(getpid(), (uint64_t)(long)addressInDyld, dyldPathBuffer, MAXPATHLEN);
        if ( len > 0 ) {
            dyldPathBuffer[len] = '\ 0'; // proc_regionfilename() does not zero terminate returned string
            if( strcmp(dyldPathBuffer, gProcessInfo->dyldPath) ! =0 )
                gProcessInfo->dyldPath = strdup(dyldPathBuffer);
        }
        // load any inserted libraries
        // Step 4 load the inserted dynamic library
        if( sEnv.DYLD_INSERT_LIBRARIES ! =NULL ) {
            for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! =NULL; ++lib)
                loadInsertedDylib(*lib);
        }
        // record count of inserted libraries so that a flat search will look at 
        // inserted libraries, then main, then others.
        // Record the number of dynamic libraries inserted
        sInsertedDylibCount = sAllImages.size()- 1;
        // link main executable
        // Step 5 link the main program
        gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
        if ( mainExcutableAlreadyRebased ) {
            // previous link() on main executable has already adjusted its internal pointers for ASLR
            // work around that by rebasing by inverse amount
            sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
        }
#endif
        link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
        sMainExecutable->setNeverUnloadRecursive();
        if ( sMainExecutable->forceFlat() ) {
            gLinkContext.bindFlat = true;
            gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
        }
        // link any inserted libraries
        // do this after linking main executable so that any dylibs pulled in by inserted 
        // dylibs (e.g. libSystem) will not be in front of dylibs the program uses
        // Step 6 link the inserted dynamic library
        if ( sInsertedDylibCount > 0 ) {
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                ImageLoader* image = sAllImages[i+1];
                link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
                image->setNeverUnloadRecursive();
            }
            // only INSERTED libraries can interpose
            // register interposing info after all inserted libraries are bound so chaining works
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                ImageLoader* image = sAllImages[i+1]; image->registerInterposing(); }}// <rdar://problem/19315404> dyld should support interposition even without DYLD_INSERT_LIBRARIES
        for (long i=sInsertedDylibCount+1; i < sAllImages.size(); ++i) {
            ImageLoader* image = sAllImages[i];
            if ( image->inSharedCache() )
                continue; image->registerInterposing(); }...// apply interposing to initial set of images
        for(int i=0; i < sImageRoots.size(); ++i) {
            sImageRoots[i]->applyInterposing(gLinkContext);
        }
        gLinkContext.linkingMainExecutable = false;
        
        // <rdar://problem/12186933> do weak binding only after all inserted images linked
        // Step 7 perform weak symbol binding
        sMainExecutable->weakBind(gLinkContext);
        // If cache has branch island dylibs, tell debugger about them
        if( (sSharedCacheLoadInfo.loadAddress ! =NULL) && (sSharedCacheLoadInfo.loadAddress->header.mappingOffset >= 0x78) && (sSharedCacheLoadInfo.loadAddress->header.branchPoolsOffset ! =0) ) {
            uint32_t count = sSharedCacheLoadInfo.loadAddress->header.branchPoolsCount;
            dyld_image_info info[count];
            const uint64_t* poolAddress = (uint64_t*)((char*)sSharedCacheLoadInfo.loadAddress + sSharedCacheLoadInfo.loadAddress->header.branchPoolsOffset);
            // <rdar://problem/20799203> empty branch pools can be in development cache
            if ( ((mach_header*)poolAddress)->magic == sMainExecutableMachHeader->magic ) {
                for (int poolIndex=0; poolIndex < count; ++poolIndex) {
                    uint64_t poolAddr = poolAddress[poolIndex] + sSharedCacheLoadInfo.slide;
                    info[poolIndex].imageLoadAddress = (mach_header*)(long)poolAddr;
                    info[poolIndex].imageFilePath = "dyld_shared_cache_branch_islands";
                    info[poolIndex].imageFileModDate = 0;
                }
                // add to all_images list
                addImagesToAllImages(count, info);
                // tell gdb about new branch island images
                gProcessInfo->notification(dyld_image_adding, count, info);
            }
        }
        CRSetCrashLogMessage("dyld: launch, running initializers"); .// run all initializers
        // Step 8 executes the initialization method
        initializeMainExecutable(); 
        // notify any montoring proccesses that this process is about to enter main()
        dyld3::kdebug_trace_dyld_signpost(DBG_DYLD_SIGNPOST_START_MAIN_DYLD2, 0.0);
        notifyMonitoringDyldMain();
        // find entry point for main executable
        // Step 9 find the entry point and return
        result = (uintptr_t)sMainExecutable->getThreadPC();
        if( result ! =0 ) {
            // main executable uses LC_MAIN, needs to return to glue in libdyld.dylib
            if( (gLibSystemHelpers ! =NULL) && (gLibSystemHelpers->version >= 9) )
                *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
            else
                halt("libdyld.dylib support not present for LC_MAIN");
        }
        else {
            // main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
            result = (uintptr_t)sMainExecutable->getMain();
            *startGlue = 0;
        }
    }
    catch(const char* message) {
        syncAllImages();
        halt(message);
    }
    catch(...) {
        dyld::log("dyld: launch failed\n"); }...return result;
}
Copy the code

InitializeMainExecutable () related logic

  1. _objc_init() breakpoint debugging in objC source code, found that the end is called by libDispatch related objC objc_init method method;

  2. Libdispatch_init is called with the libsysterm_initalizer method;

  3. Libsysterm_initalizer method is by dyld ImageLoaderMacho: : domodInitFunctions tuning up:

  4. Initialize the image file by doInitialization();

Load, C++ constructor, main three loading order;

  1. First load the tripartite library image, and then load the main program image;
  2. In the same image, load first, then load C++ constructor;
  3. Finally, load the main function ();

Load method explanation;

  1. Load_images will call all of the load methods, but in an array of loads that need to be loaded before they are called; 2. When a load method is found and added, it will not only find the load method of the current class, but also continuously look up the load method of the parent class and add it to the load method array.

Dyld loading process

To be continued!!

The resources

  • The change of wwdc2017dyld3
  • Libdyld source code download address
  • Dyld,