preface

We have explored a lot of the underlying knowledge after the program is started, but how the program is loaded and started, as a programmer, should understand the corresponding, now let’s explore the principle of program loading.

The preparatory work

  • Dyld source code.
  • Libdispatch source code.
  • Libsystem source code.
  • Objc4-818.2 – the source code.

1: application loading principle

Before starting today’s topic, make a brief introduction to relevant knowledge, so as to better analyze and understand the following.

1.1: Compilation process

Compilation process:

  • Precompile: replace macros, remove comments, expand header files, and produce.iFile.
  • “Compliling” : to put the previous.iThe file is converted into assembly language and produced.sFile.
  • Asembly: convert assembly language files into machine code files and produce.oFile.
  • Link: will all the.oFiles and linked libraries to generate aMachOType of executable file.

1.2: Compile flow chart

1.3: Introduction to the library

1.3.1: Differences between dynamic and static libraries

  • Static libraries are linked to object code to generate executable files when the program is compiled.aand.frameworkIs the file name extension.
  • Dynamic libraries are not linked to the object code when the program is compiled, but are loaded when the program is run.tbd(previously called.dylib) and.frameworkFor file name extensions (provided to us directly by the systemframeworkAll dynamic libraries!) .

Note:.a is a pure binary file. The.framework has resource files in addition to binaries. A, to have. H files and resource files,. Framework files can be used directly. In summary,.a +.h + sourceFile =.framework. So the best way to create a static library is to use. Framework.

1.3.2: Static library advantages and disadvantages

Static library advantages and disadvantages

  • Static libraries are packaged into executable files that run independently of the external environment when compiled.

disadvantages

  • The compiled executable becomes larger and must be recompiled if the static library is updated.
  • multipleAPPUse the same static library, eachAPPI’ll make a copy. It’s a waste of memory.

1.3.3: Dynamic library advantages and disadvantages

Dynamic library advantages and disadvantages

  • Reduce the number of compiled executablesAPP).
  • Share content and save resources.
  • The purpose of updating the program is achieved by updating the dynamic library.

disadvantages

  • Executables cannot be run alone and must depend on the external environment.

Supplement:

On most other platforms, dynamic libraries can be shared between different applications, resulting in significant memory savings. Prior to iOS8, Apple did not allow third-party frameworks to be loaded dynamically. Starting with iOS8, it allowed developers to conditionally create and use dynamic frameworks. This Framework is called Cocoa Touch Framework. This is also a dynamic framework, but unlike the system framework, a dynamic library created using the Cocoa Touch Framework in an app will be placed in the root directory of the App Main bundle when the app is packaged and submitted, running in the sandbox rather than the system. In other words, even if different apps use the same framework, multiple copies of the framework will be signed, packaged, and loaded separately. However, with iOS8’s App Extension feature, which allows you to create a plug-in for an App, it’s still possible to share the dynamic library between the main App and the plug-in.

Apple-specific frameworks are shared (such as UIKit), but our own dynamic libraries made using the Cocoa Touch Framework are placed in the App Bundle and run in the sandbox

1.3.4: Diagrams of dynamic and static libraries

1.4: dyldIntroduction to the

So how do these libraries load into memory when the program starts?

Is loaded into memory via the dyld dynamic linker. The whole process is as follows:

Dyld (The Dynamic Link Editor) is a dynamic linker of Apple. It is an important part of Apple operating system. After the system kernel completes the program preparation, dyLD is responsible for the rest of the work.

This article will examine the DYLD loading process in detail.

2: dyldA preliminary study

Since dyld is responsible for loading the library, it must be loaded into the main function of the program, so we put a breakpoint on the main function to see the call stack.

You can see that only the start of libdyld. Dylib library is called before main, and the next start symbol is not entered, indicating that the symbol at the bottom is not start at all.

The +load method is called before main. Implement a +load method in the class file and type a breakpoint to see the call stack.

According to the call stack, the call flow of +load method is shown as: _dyld_start->dyldbootstrap::start->dyld::_main->dyld::initializeMainExecutable->ImageLoader::runInitializers->ImageLoade R: : processInitializers – > ImageLoader: : recursiveInitialization – > dyld: : notifySingle – > load_images – > + [ViewController load].

Open dyld source code, search for _dyLD_start, finally in dyldstartup. s file found _dyLD_start assembly implementation (can also directly view the real machine assembly).

Static assembly:

Real machine assembly:

Both real and static assembly show that dyldbootstrap::start is called.

Dyldbootstrap is the c++ namespace, and start is the function in it. After searching dyldbootstrap::start in dyldInitialization. CPP file, then analyze the order of calling dyldbootstrap::start to +load and main function. And how dyld loads images.

3: dyldSource code analysis

3.1: dyldbootstrap::start

// The file where the function resides: dyldinitialization.cpp
uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
				const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{

    // Tell debug Server dyld to start
    dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0.0.0.0);

    // Reposition dyld
    rebaseDyld(dyldsMachHeader);

    // kernel sets up env pointer to be just past end of agv array
    const char** envp = &argv[argc+1];

    // kernel sets up apple pointer to be just past end of envp array
    const char** apple = envp;
    while(*apple ! =NULL) { ++apple; }
    ++apple;

    // Stack overflow protection
    __guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
    // run all C++ initializers inside dyld
    runDyldInitializers(argc, argv, envp, apple);
#endif

    _subsystem_init(apple);

    // Get the virtual memory offset
    uintptr_t appsSlide = appsMachHeader->getSlide(a);// Call the function dyld::_main and pass its return value to __dyLD_start to call the real main() function
    return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
Copy the code

Dyldbootstrap ::start

  • telldebug server``dyldIt’s time to start up.
  • relocationdyld.
  • Stack overflow protection.
  • indyldRun all insideC++Initializer.
  • calldyld::_mainFunction, and pass its return value to__dyld_startTo call the real main program entrymainFunction.

Dyldbootstrap ::start just does some configuration and initialization work, the core logic is in the dyld::_main function.

3.2: dyld::_main

The dyld::_main function is the key function for the whole APP startup, with more than 800 lines of code. Here we just sort out the process, and you can explore the details if you are interested.

// The file where the function resides is dyld2.cpp
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
        int argc, const char* argv[], const char* envp[], const char* apple[], 
        uintptr_t* startGlue)
{
    // Kernel detection code
    if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
        launchTraceID = dyld3::kdebug_trace_dyld_duration_start(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, (uint64_t)mainExecutableMH, 0.0); }.../* -- Step 1: Set the runtime environment -- begin-- */
    // Main program executable cdHash
    uint8_t mainExecutableCDHashBuffer[20];
    const uint8_t* mainExecutableCDHash = nullptr;
    if ( const char* mainExeCdHashStr = _simple_getenv(apple, "executable_cdhash")) {unsigned bufferLenUsed;
        if ( hexStringToBytes(mainExeCdHashStr, mainExecutableCDHashBuffer, sizeof(mainExecutableCDHashBuffer), bufferLenUsed) )
            mainExecutableCDHash = mainExecutableCDHashBuffer;
    }
    // Configure the CPU architecture according to the Macho header
    getHostInfo(mainExecutableMH, mainExecutableSlide); ...CRSetCrashLogMessage("dyld: launch started");
    // The configuration environment puts the information in the gLinkContext (where notifySingle is assigned)
    setContext(mainExecutableMH, argc, argv, envp, apple); ...// Use the enVP environment variable to configure whether the process is restricted, AMFI related
    configureProcessRestrictions(mainExecutableMH, envp); ...#if TARGET_OS_OSX
    if(! gLinkContext.allowEnvVarsPrint && ! gLinkContext.allowEnvVarsPath && ! gLinkContext.allowEnvVarsSharedCache ) {pruneEnvironmentVariables(envp, &apple);
        // set again because envp and apple may have changed or moved
        // Set the context again, which may change enVP when the file is restricted.
        setContext(mainExecutableMH, argc, argv, envp, apple);
    }
    else
#endif
    {
        // Check the environment variable and set the default value. The data has not been loaded yet.
        checkEnvironmentVariables(envp);
        defaultUninitializedFallbackPaths(envp); }...// Print Environment Variables, which can be configured in "Scheme -> Arguments -> Environment Variables"
    if ( sEnv.DYLD_PRINT_OPTS )
        printOptions(argv);
    if ( sEnv.DYLD_PRINT_ENV ) 
        printEnvironmentVariables(envp); .../* -- step 1: Set the runtime environment -- end-- */
    
    
    /* -- Load the shared cache -- begin-- */
    // Check whether the shared cache is available. IOS must have a shared cache.
    checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
    if( gLinkContext.sharedRegionMode ! = ImageLoader::kDontUseSharedRegion ) {#if TARGET_OS_SIMULATOR
        if ( sSharedCacheOverrideDir)
            mapSharedCache(mainExecutableSlide);
#else
        // Load the shared cache method
        mapSharedCache(mainExecutableSlide);
#endif
    /* -- load the shared cache -- end-- */... .../* -- step 3: load dyld3 or dyld2 -- begin-- */
#if! TARGET_OS_SIMULATOR
    //dyld3 ClosureMode mode, iOS11 introduced ClosureMode, iOS13 after the dynamic library and three parties use ClosureMode loading.
    if ( sClosureMode == ClosureMode::Off ) {
        // dyld2
        if ( gLinkContext.verboseWarnings )
            dyld::log("dyld: not using closures\n");
    } else {
        // dyld3
        // Start mode closure mode DYLD_LAUNCH_MODE_USING_CLOSURE
        sLaunchModeUsed = DYLD_LAUNCH_MODE_USING_CLOSURE;
        const dyld3::closure::LaunchClosure* mainClosure = nullptr;
        // Main program info and Headerdyld3::closure::LoadedFileInfo mainFileInfo; mainFileInfo.fileContent = mainExecutableMH; mainFileInfo.path = sExecPath; ...// Find the closure from the shared cache for the first time
        if( sSharedCacheLoadInfo.loadAddress ! =nullptr ) {
            // Look for the instance closure from the shared cache
            mainClosure = sSharedCacheLoadInfo.loadAddress->findClosure(sExecPath);
            if( gLinkContext.verboseWarnings && (mainClosure ! =nullptr) )
                dyld::log("dyld: found closure %p (size=%lu) in dyld shared cache\n", mainClosure, mainClosure->size());
            if( mainClosure ! =nullptr )
                // If you get the SettingssLaunchModeUsed |= DYLD_LAUNCH_MODE_CLOSURE_FROM_OS; }...// Get the closure && Verify the closure if it fails
        if( (mainClosure ! =nullptr) &&!closureValid(mainClosure, mainFileInfo, mainExecutableCDHash, true, envp) ) {
            mainClosure = nullptr;
            // Closure invalid setting statussLaunchModeUsed &= ~DYLD_LAUNCH_MODE_CLOSURE_FROM_OS; }...// Check whether mainClosure is null
        if ( (mainClosure == nullptr) && allowClosureRebuilds ) {
            // if forcing closures, and no closure in cache, or it is invalid, check for cached closure
            if ( !sForceInvalidSharedCacheClosureFormat )
                // Find the cache
                mainClosure = findCachedLaunchClosure(mainExecutableCDHash, mainFileInfo, envp, bootToken);
            if ( mainClosure == nullptr ) {
                // if no cached closure found, build new one
                }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
                mainClosure = buildLaunchClosure(mainExecutableCDHash, mainFileInfo, envp, bootToken);
                if( mainClosure ! =nullptr )
                    // If the creation fails, set the statesLaunchModeUsed |= DYLD_LAUNCH_MODE_BUILT_CLOSURE_AT_LAUNCH; }}...// try using launch closure
        if( mainClosure ! =nullptr ) {
            CRSetCrashLogMessage("dyld3: launch started");
            if ( mainClosure->topImage() - >fixupsNotEncoded() )
                sLaunchModeUsed |= DYLD_LAUNCH_MODE_MINIMAL_CLOSURE;
            Diagnostics diag;
            bool closureOutOfDate;
            bool recoverable;
            // Start the main program. MainClosure acts as a loader
            bool launched = launchWithClosure(mainClosure, sSharedCacheLoadInfo.loadAddress, (dyld3::MachOLoaded*)mainExecutableMH,
                                              mainExecutableSlide, argc, argv, envp, apple, diag, &result, startGlue, &closureOutOfDate, &recoverable);
            // Restart failed or expired allow rebuild
            if ( !launched && closureOutOfDate && allowClosureRebuilds ) {
                // closure is out of date, build new one
                // Create another one
                mainClosure = buildLaunchClosure(mainExecutableCDHash, mainFileInfo, envp, bootToken);
                if( mainClosure ! =nullptr ) {
                    diag.clearError(a); sLaunchModeUsed |= DYLD_LAUNCH_MODE_BUILT_CLOSURE_AT_LAUNCH;if ( mainClosure->topImage() - >fixupsNotEncoded() )
                        sLaunchModeUsed |= DYLD_LAUNCH_MODE_MINIMAL_CLOSURE;
                    else
                        sLaunchModeUsed &= ~DYLD_LAUNCH_MODE_MINIMAL_CLOSURE;
                    / / start
                    launched = launchWithClosure(mainClosure, sSharedCacheLoadInfo.loadAddress, (dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide, argc, argv, envp, apple, diag, &result, startGlue, &closureOutOfDate, &recoverable); }}if ( launched ) {
                // The main program is loaded successfully
                gLinkContext.startedInitializingMainExecutable = true;
                if (sSkipMain)
                    // The main function of the main program returns the main function of the main program
                    result = (uintptr_t)&fake_main;
                return result;
            }
            else {
                // An error is reported
                if ( gLinkContext.verboseWarnings ) {
                    dyld::log("dyld: unable to use closure %p\n", mainClosure);
                }
                if ( !recoverable )
                    halt(diag.errorMessage()); }}}#endif // TARGET_OS_SIMULATOR
    // could not use closure info, launch old way

    / / dyld2 mode
    sLaunchModeUsed = 0;


    // install gdb notifier
    // Put the two callback addresses in the stateToHandlers array
    stateToHandlers(dyld_image_state_dependents_mapped, sBatchHandlers)->push_back(notifyGDB);
    stateToHandlers(dyld_image_state_mapped, sSingleHandlers)->push_back(updateAllImages);
    // make initial allocations large enough that it is unlikely to need to be re-alloced
    // Allocate the initialization space as large as possible to ensure that there is enough space for later.
    sImageRoots.reserve(16);
    sAddImageCallbacks.reserve(4);
    sRemoveImageCallbacks.reserve(4);
    sAddLoadImageCallbacks.reserve(4);
    sImageFilesNeedingTermination.reserve(16);
    sImageFilesNeedingDOFUnregistration.reserve(8); ...try {
        // add dyld itself to UUID list
        // add dyld to uUID list
        addDyldImageToUUIDList(a);#if SUPPORT_ACCELERATE_TABLES...// The main program has not rebase yet
        bool mainExcutableAlreadyRebased = false;
        if( (sSharedCacheLoadInfo.loadAddress ! =nullptr) &&!dylibsCanOverrideCache() &&! sDisableAcceleratorTables && (sSharedCacheLoadInfo.loadAddress->header.accelerateInfoAddr ! =0)) {struct stat statBuf;
            if ( dyld3::stat(IPHONE_DYLD_SHARED_CACHE_DIR "no-dyld2-accelerator-tables", &statBuf) ! =0 )
                sAllCacheImagesProxy = ImageLoaderMegaDylib::makeImageLoaderMegaDylib(&sSharedCacheLoadInfo.loadAddress->header, sSharedCacheLoadInfo.slide, mainExecutableMH, gLinkContext);
        }
// Load all executable file image list, which is equivalent to a tag. Cycle.
reloadAllImages:
#endif
    /* * -- step 3: dyld3 or dyld2 load -end -- */.../* -- step 4: instantiate the main program -- begin-- */
        // Instantiate the main program and add it to allImages (the first image loaded by dyld is the main program)
        sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
        gLinkContext.mainExecutable = sMainExecutable;
        // Sign the code
        gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
    /* -- step 4: instantiate the main program -- end-- */...#ifdefined(__x86_64__) && ! TARGET_OS_SIMULATOR
        // Set the dynamic library version to load
        if (dyld::isTranslated()) {... }#endif

        // Now that shared cache is loaded, setup an versioned dylib overrides
    #if SUPPORT_VERSIONED_PATHS
        // Check the version path
        checkVersionedPaths(a);#endif.../* -- insert the dynamic library -- begin-- */
        //DYLD_INSERT_LIBRARIES
        if( sEnv.DYLD_INSERT_LIBRARIES ! =NULL ) {
            // Insert the dynamic library
            for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! =NULL; ++lib)
                loadInsertedDylib(*lib);
        }
        // -1 to exclude the main program
        sInsertedDylibCount = sAllImages.size(a)- 1;
        
    /* -- insert the dynamic library -- end-- */

        // link main executable
    /* -- step 6: link the main program -- begin-- */
        gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
        if ( mainExcutableAlreadyRebased ) {
            // previous link() on main executable has already adjusted its internal pointers for ASLR
            // work around that by rebasing by inverse amount
            sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
        }
#endif
        // Link the main program
        link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
        sMainExecutable->setNeverUnloadRecursive(a);/* -- step 6: link the main program -- end-- */.../* -- step 7: link dynamic library -- begin-- */
        if ( sInsertedDylibCount > 0 ) {
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                // I +1 insert image after main program
                ImageLoader* image = sAllImages[i+1];
                // Links insert dynamic libraries, which may also depend on other dynamic libraries
                link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
                image->setNeverUnloadRecursive(a); }if ( gLinkContext.allowInterposing ) {
                // only INSERTED libraries can interpose
                // register interposing info after all inserted libraries are bound so chaining works
                for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                    ImageLoader* image = sAllImages[i+1];
                    image->registerInterposing(gLinkContext); }}}/* -- step 7: link dynamic library -- end-- */... sMainExecutable->recursiveBindWithAccounting(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true); ...// Bind and notify for the inserted images now interposing has been registered
        if ( sInsertedDylibCount > 0 ) {
            // Bind recursively
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                ImageLoader* image = sAllImages[i+1];
                // Bind to insert dynamic library
                image->recursiveBind(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true.nullptr); }}// <rdar://problem/12186933> do weak binding only after all inserted images linked
        
        /* -- step 8: weak-bind main program -- begin-- */
        // The weak reference is bound to the main program after all image files are bound.
        sMainExecutable->weakBind(gLinkContext);
        gLinkContext.linkingMainExecutable = false;
        
        /* -- step 8: weak bind the main program -- end-- */

        sMainExecutable->recursiveMakeDataReadOnly(gLinkContext);

        CRSetCrashLogMessage("dyld: launch, running initializers");
    #if SUPPORT_OLD_CRT_INITIALIZATION
        // Old way is to run initializers via a callback from crt1.o
        if(! gRunInitializersOldWay )initializeMainExecutable(a);#else
    /* -- Step 9: Initialize -begin -- */
        // run all initializers
        // Run all initialization
        initializeMainExecutable(a);/* -- step 9: Initialize -- end-- */
    #endif.../* -- step 10: Return main -- begin-- */
        {
            // find entry point for main executable
            // Find the main program entry LC_MAIN
            result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN(a);if( result ! =0 ) {
                // main executable uses LC_MAIN, we need to use helper in libdyld to call into main()
                if( (gLibSystemHelpers ! =NULL) && (gLibSystemHelpers->version >= 9) )
                    *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
                else
                    halt("libdyld.dylib support not present for LC_MAIN");
            }
            else {
                // main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
                result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD(a); *startGlue =0; }}}...if (sSkipMain) {
        notifyMonitoringDyldMain(a);if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
            dyld3::kdebug_trace_dyld_duration_end(launchTraceID, DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, 0.0.2);
        }
        ARIADNEDBG_CODE(220.1);
        result = (uintptr_t)&fake_main;
        *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
    }
    // Return to the main program
    return result;
    Step 10: return main function -end -- */
}
Copy the code

Some important comments have been added to the above code for the reader to see. The entire loading process can be broken down into nine steps:

  • Step 1: Set up the runtime environment.
  • Step 2: Load the shared cache.
  • Third part:dyld2/dyld3(ClosureModeClosure mode) loader.
  • Step 4: Instantiate the main program.
  • Step 5: Load and insert the dynamic library.
  • Step 6: Link the main program to the dynamic library.
  • Step 7: Weakly bind the main program.
  • Step 8: Perform initialization.
  • Step 9: ReturnmainFunction.

Note: The image in this article refers to the image of the dynamic library.

3.2.1: Sets the running environment

This step is to set the running parameters, environment variables, and so on. Code at the beginning of the call getHostInfo function gets the current application architecture, gave the sMainExecutableMachHeader into mainExecutableMH refs assignment, this is a macho_header structure, Represents the Mach-O header information of the current main program, and the loader can parse the entire Mach-O file information according to the Mach-O header information. It then calls setContext() to set the context information, including some callbacks, arguments, flag information, and so on. The callback functions set are implemented by the dyld module itself. For example, loadLibrary() actually calls libraryLocator(), which is responsible for loading the dynamic library. The code snippet is as follows:

static void setContext(const macho_header* mainExecutableMH, int argc, const char* argv[], const char* envp[], const char* apple[])
{
    gLinkContext.loadLibrary = &libraryLocator;
    gLinkContext.terminationRecorder = &terminationRecorder;
    gLinkContext.flatExportFinder = &flatFindExportedSymbol;
    gLinkContext.coalescedExportFinder = &findCoalescedExportedSymbol;
    gLinkContext.getCoalescedImages = &getCoalescedImages;
    gLinkContext.undefinedHandler = &undefinedHandler;
    gLinkContext.getAllMappedRegions = &getMappedRegions;
    gLinkContext.bindingHandler = NULL; gLinkContext.notifySingle = &notifySingle; . }Copy the code

There are some environment variables beginning with DYLD_ in this process, such as:

    // If DYLD_PRINT_OPTS is set then printOptions() is called
    if ( sEnv.DYLD_PRINT_OPTS )
    printOptions(argv);
    / / if set DYLD_PRINT_ENV call the printEnvironmentVariables () to print environment variable
    if ( sEnv.DYLD_PRINT_ENV ) 
    printEnvironmentVariables(envp);
Copy the code

Set the path to Scheme->Arguments->Environment Variables. Add the Environment Variables as shown in the following figure and set the Value to 1.

Run Xcode to see the details printed on the console:

**opt[0] = "/Users/weichunfang/Library/Developer/Xcode/DerivedData/CategoryDemo-evmsrqpceqbdzbdlwhxzunodpqfz/Build/Products/Debug/C ategoryDemo"* * **__XCODE_BUILT_PRODUCTS_DIR_PATHS=/Users/weichunfang/Library/Developer/Xcode/DerivedData/CategoryDemo-evmsrqpceqbdzbdlw hxzunodpqfz/Build/Products/Debug** **MallocNanoZone=0**
**CA_DEBUG_TRANSACTIONS=0**
**COMMAND_MODE=unix2003**
**LOGNAME=weichunfang**
**USER=weichunfang**
**CA_ASSERT_MAIN_THREAD_TRANSACTIONS=0** **HOME=/Users/weichunfang** **PWD=/Users/weichunfang/Library/Developer/Xcode/DerivedData/CategoryDemo-evmsrqpceqbdzbdlwhxzunodpqfz/Build/Products/De bug** **DYLD_LIBRARY_PATH=/Users/weichunfang/Library/Developer/Xcode/DerivedData/CategoryDemo-evmsrqpceqbdzbdlwhxzunodpqfz/Bui ld/Products/Debug:/usr/lib/system/introspection** **__CF_USER_TEXT_ENCODING=0x1F5:0x19:0x34** **LD_LIBRARY_PATH=/Applications/Xcode.app/Contents/Developer/.. /SharedFrameworks/ * * **__XPC_DYLD_LIBRARY_PATH=/Users/weichunfang/Library/Developer/Xcode/DerivedData/CategoryDemo-evmsrqpceqbdzbdlwhxzunodpq fz/Build/Products/Debug** **SQLITE_ENABLE_THREAD_ASSERTIONS=1** **DYLD_INSERT_LIBRARIES=/Applications/Xcode.app/Contents/Developer/usr/lib/libBacktraceRecording.dylib:/Applications/Xco de.app/Contents/Developer/usr/lib/libMainThreadChecker.dylib:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX .platform/Developer/Library/Debugger/libViewDebuggerSupport.dylib** **DYLD_PRINT_OPTS=1** **DYLD_PRINT_ENV=1** **METAL_DEVICE_WRAPPER_TYPE=1** **METAL_DEBUG_ERROR_MODE=0** **DYLD_FRAMEWORK_PATH=/Users/weichunfang/Library/Developer/Xcode/DerivedData/CategoryDemo-evmsrqpceqbdzbdlwhxzunodpqfz/B uild/Products/Debug** **SECURITYSESSIONID=186a6** **OS_ACTIVITY_DT_MODE=YES** **SWIFTUI_VIEW_DEBUG=287** **PATH=/Applications/Xcode.app/Contents/Developer/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin** **SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.mls7Pr51Bb/Listeners** **__XPC_DYLD_FRAMEWORK_PATH=/Users/weichunfang/Library/Developer/Xcode/DerivedData/CategoryDemo-evmsrqpceqbdzbdlwhxzunod pqfz/Build/Products/Debug** **TMPDIR=/var/folders/f5/nnm3g2sd3pl8c6cz5bzpcxw80000gn/T/** **XPC_FLAGS=0x0** **SHELL=/bin/zsh** **GPUProfilerEnabled=YES** **XPC_SERVICE_NAME=com.apple.xpc.launchd.oneshot.0x10000001.Xcode** **LaunchInstanceID=61EE7919-BC5B-4B07-8BAB-4505766DEAB9** **NSUnbufferedIO=YES**Copy the code

There are many such environment variables beginning with DYLD_ that interested readers can test for themselves.

ASLR:image listThe first0Number bit main program address.

3.2.2: Loads the shared cache

IOS has to have a shared cache. The shared cache stores system level dynamic libraries, such as UIKit, CoreFoundation, etc. Non-system level dynamic libraries are not put in the shared cache. They only run in the sandbox of the application.

  • checkSharedRegionDisableIt’s very explicit in the functioniOSA shared cache must be required.

Next, we call mapSharedCache to load the shared cache, which in turn calls loadDyldCache.

static void mapSharedCache(uintptr_t mainExecutableSlide)
{.../ / call loadDyldCache
    loadDyldCache(opts, &sSharedCacheLoadInfo); . }Copy the code
bool loadDyldCache(const SharedCacheOptions& options, SharedCacheLoadInfo* results)
{
    results->loadAddress        = 0;
    results->slide              = 0;
    results->errorMessage       = nullptr;

#if TARGET_OS_SIMULATOR
    // simulator only supports mmap()ing cache privately into process
    return mapCachePrivate(options, results);
#else
    // forcePrivate == YES
    if ( options.forcePrivate ) {
        // At this point, the shared cache will not be loaded. Instead, only the system libraries you need will be cached in this process
        return mapCachePrivate(options, results);
    }
    else {
        // fast path: when cache is already mapped into shared region
        bool hasError = false;
        // The shared cache is loaded without any processing
        if ( reuseExistingCache(options, results) ) { hasError = (results->errorMessage ! =nullptr);
        } else {
            // slow path: this is first process to load cache
            // The current process loads the shared cache for the first time
            hasError = mapCacheSystemWide(options, results);
        }
        return hasError;
    }
#endif
}
Copy the code

There are three cases of loadDyldCache:

  • When enforcing privacy (forcePrivate == YES), only to the current process, not to the shared cache, callmapCachePrivate.
  • The shared cache is loaded without any processing.
  • The current process loads the shared cache for the first timemapCacheSystemWide.

MapCachePrivate, mapCacheSystemWide are specific shared cache parsing logic, interested readers can analyze in detail.

3.3.3: dyld3ordyld2(ClosureModeClosure mode) loader

IOS11 introduces dyld3 closure mode, which is loaded in callback mode. The closure mode is faster and more efficient. After iOS13, both dynamic libraries and third-party libraries enable ClosureMode to load.

  • dyld3:

    • usemainClosureTo load.
    • Find/CreatemainClosureLater, bylaunchWithClosureStart the main program. If the startup fails, it will be re-createdmainClosureRestart logic. Return on successresultMain program entrymainFunction).launchWithClosureLogical sum indyld2Starting the main program logic is basically the same.
  • Dyld2: Start the main program

    • Instantiate the main programinstantiateFromLoadedImage.sMainExecutableIs through theinstantiateFromLoadedImageAssignment, that is to add the main programallImagesIn the.
    • Insert & load dynamic libraryloadInsertedDylib. Loading inloadInsertedDylibIn the callloadBoth the main program and the dynamic library are added toallImagesIn theloadAllImages)
    • Link main program and link insert dynamic library (linkWith the main program link in the front). In the process of recordingdyldLoad duration. It can be printed by configuring environment variables.
    • Binding symbols (non-lazy load, weak), lazy load is bound at call time.
    • Initialize the main programinitializeMainExecutableAt this point the code has not been executed into the main program.
    • Find the main program entryLC_MAIN(mainFunction) and return the main program.

3.3.4: Instantiate the main program

static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
    // Instantiate image
    ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
    // Add image to all Images
    addImage(image);
    return (ImageLoaderMachO*)image;
// throw "main executable not a known format";
}
Copy the code
  • Passed into the main programHeader,ASLR,pathInstantiate main program generationimage(Mirror image).
  • willimagejoinall imagesMain programimageIs the first one added to the array.

Instantiation of the main program is invoked ImageLoaderMachO: : instantiateMainExecutable function.

// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
    bool compressed;
    unsigned int segCount;
    unsigned int libCount;
    const linkedit_data_command* codeSigCmd;
    const encryption_info_command* encryptCmd;
    // Obtain Load Commands
    sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
    // instantiate concrete class based on content of load commands
    // Determine which subclass is used to load the image according to compressed, and select the corresponding subclass to instantiate the image according to the value.
    if ( compressed ) 
        return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
    else
#if SUPPORT_CLASSIC_MACHO
        return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
        throw "missing LC_DYLD_INFO load command";
#endif
}
Copy the code
  • Calls to sniffLoadCommands generate related information, such as compressed.

  • According to compressed determine which subclass to load image, ImageLoader is an abstract class, according to the value of the corresponding subclass to instantiate the main program.

SniffLoadCommands function.

// Only part of the code is cut due to length

void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
                                            unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
                                            const linkedit_data_command** codeSigCmd,
                                            const encryption_info_command** encryptCmd)
{
    LC_DYLIB_INFO and LC_DYLD_INFO_ONLY
    *compressed = false;
    / / number of segment
    *segCount = 0;
    / / lib
    *libCount = 0;
    // Code signing and encryption
    *codeSigCmd = NULL;
    *encryptCmd = NULL; ...// fSegmentsArrayCount is only 8-bits
    // segCount Maximum 256
    if ( *segCount > 255 )
        dyld::throwf("malformed mach-o image: more than 255 segments in %s", path);

    // fSegmentsArrayCount is only 8-bits
    // libCount Max. 4096
    if ( *libCount > 4095 )
        dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path);
    // Make sure to rely on the libSystem library
    if ( needsAddedLibSystemDepency(*libCount, mh) )
        *libCount = 1;

    // dylibs that use LC_DYLD_CHAINED_FIXUPS have that load command removed when put in the dyld cache
    if ( !*compressed && (mh->flags & MH_DYLIB_IN_CACHE) )
        *compressed = true;
}
Copy the code
  • compressedIs based onLC_DYLIB_INFOandLC_DYLD_INFO_ONLYTo get.
  • segCountmost256A.
  • libCountmost4096A.
  • Ensure that rely onlibsystemLibrary.

The segment and command and Macho headers are still a little fuzzy. Use the MachOView tool to get a sense of the executable:

It’s clear from the diagram that the Macho header file is the schema information and the file type and so on. The Macho file consists of three content blocks: Header, Commods, and Data.

3.3.5: Load and insert the dynamic library

static void loadInsertedDylib(const char* path)
{
    unsigned cacheIndex;
    try{...// Call load to load the real function of the dynamic library
        load(path, context, cacheIndex);
    }
    ……
}
Copy the code
  • Initialize the configuration call based on contextloadFunction to load the dynamic library.

3.3.6: Linking main programs to dynamic libraries

void link(ImageLoader* image, bool forceLazysBound, bool neverUnload, const ImageLoader::RPathChain& loaderRPaths, unsigned cacheIndex)
{
    // add to list of known images. This did not happen at creation time for bundles
    if ( image->isBundle() && !image->isLinked())addImage(image);

    // we detect root images as those not linked in yet 
    if ( !image->isLinked())addRootImage(image);
    
    // process images
    try {
        const char* path = image->getPath(a);#if SUPPORT_ACCELERATE_TABLES
        if ( image == sAllCacheImagesProxy )
            path = sAllCacheImagesProxy->getIndexedPath(cacheIndex);
#endif
        // The link to the image is called
        image->link(gLinkContext, forceLazysBound, false, neverUnload, loaderRPaths, path); }}Copy the code

The ImageLoader::link method is called, and ImageLoader is responsible for loading the image file (main program, dynamic library). Each image corresponds to an instance of the ImageLoader class.

void ImageLoader::link(const LinkContext& context, bool forceLazysBound, bool preflightOnly, bool neverUnload, const RPathChain& loaderRPaths, const char* imagePath)
{   
    // clear error strings
    (*context.setErrorStrings)(0.NULL.NULL.NULL);
    // Start time. Use to record the time interval
    uint64_t t0 = mach_absolute_time(a);// Recursively load the libraries that the main program depends on, and send a notification when it is finished.
    this->recursiveLoadLibraries(context, preflightOnly, loaderRPaths, imagePath);
    context.notifyBatch(dyld_image_state_dependents_mapped, preflightOnly); ...uint64_t t1 = mach_absolute_time(a); context.clearAllDepths(a);this->updateDepth(context.imageCount());

    __block uint64_t t2, t3, t4, t5;
    {
        dyld3::ScopedTimer(DBG_DYLD_TIMING_APPLY_FIXUPS, 0.0.0);
        t2 = mach_absolute_time(a);// Reposition recursively to fix the ASLR
        this->recursiveRebaseWithAccounting(context);
        context.notifyBatch(dyld_image_state_rebased, false);

        t3 = mach_absolute_time(a);if ( !context.linkingMainExecutable )
            // Bind NoLazy
            this->recursiveBindWithAccounting(context, forceLazysBound, neverUnload);

        t4 = mach_absolute_time(a);if ( !context.linkingMainExecutable )
            // Bind weak symbols
            this->weakBind(context);
        t5 = mach_absolute_time(a); }// interpose any dynamically loaded images
    if ( !context.linkingMainExecutable && (fgInterposingTuples.size() != 0)) {dyld3::ScopedTimer timer(DBG_DYLD_TIMING_APPLY_INTERPOSING, 0.0.0);
        // Recursively apply the inserted dynamic library
        this->recursiveApplyInterposing(context);
    }

    // now that all fixups are done, make __DATA_CONST segments read-only
    if ( !context.linkingMainExecutable )
        this->recursiveMakeDataReadOnly(context);

    if ( !context.linkingMainExecutable )
        context.notifyBatch(dyld_image_state_bound, false);
    uint64_t t6 = mach_absolute_time(a);if( context.registerDOFs ! =NULL ) {
        std::vector<DOFInfo> dofs;
        this->recursiveGetDOFSections(context, dofs);
        / / register
        context.registerDOFs(dofs);
    }
    // Calculate the end time.
    uint64_t t7 = mach_absolute_time(a);// clear error strings
    // Configure the environment variable to see how long the dyld application takes to load.
    (*context.setErrorStrings)(0.NULL.NULL.NULL);
    fgTotalLoadLibrariesTime += t1 - t0;
    fgTotalRebaseTime += t3 - t2;
    fgTotalBindTime += t4 - t3;
    fgTotalWeakBindTime += t5 - t4;
    fgTotalDOF += t7 - t6;
    
    // done with initial dylib loads
    fgNextPIEDylibAddress = 0;
}
Copy the code
  • Load all dynamic libraries recursively.
  • Recursive relocation, correcting ASLR.
  • Recursive binding is not lazy loading.
  • Bind weak symbols.
  • Registration.
  • Record the time, which can be seen through the configurationdyldApplication loading duration

The recursiveLoadLibraries method is used to recursively load dynamic libraries.

void ImageLoader::recursiveLoadLibraries(const LinkContext& context, bool 
preflightOnly, const RPathChain& loaderRPaths, const char* loadPath){...// get list of libraries this image needs
    // Get the current dynamic library that the image depends on
    DependentLibraryInfo libraryInfos[fLibraryCount]; 
    this->doGetDependentLibraries(libraryInfos);

    // get list of rpaths that this image adds
    // Get the file path of the dynamic library that the current image depends on
    std::vector<const char*> rpathsFromThisImage;
    this->getRPaths(context, rpathsFromThisImage);
    const RPathChain thisRPaths(&loaderRPaths, &rpathsFromThisImage);

    // Load the image dependent dynamic library
    for(unsigned int i=0; i < fLibraryCount; ++i){
      ...
      dependentLib = context.loadLibrary(requiredLibInfo.name, true.this->getPath(),
      &thisRPaths, cacheIndex);
      // Save the loaded dynamic library
      setLibImage(i, dependentLib, depLibReExported, requiredLibInfo.upward); . `} `// Tell the image dependent dynamic libraries to load the required dynamic libraries
    for(unsigned int i=0; i < libraryCount(a); ++i) { ImageLoader* dependentImage =libImage(i);
            if( dependentImage ! =NULL ) {
                 dependentImage->recursiveLoadLibraries(context, preflightOnly, thisRPaths, libraryInfos[i].name); }}}Copy the code
  • Gets the currentimage(Image file) Dependent dynamic libraries and file paths of the dynamic libraries.
  • Load currentimageRely on the dynamic library and keep it up.
  • Tell the currentimageDependent dynamic libraries to load the required dynamic libraries.

Link to dynamic library and link to the main program logic is basically the same, notice that the loop to retrieve the image file from 1, because the 0th position is the main program, run any program, LLDB output image list to verify.

  • lldbThe outputimage listThe first one is the main program.

3.3.7: Weakly bind the main program

You may have noticed that when you link to the main program there is a weak symbol binding. However, when linkingMainExecutable = true, the linkin the weak bind in the main program is not called, such as dynamic library are weak bind, and finally the main program is weak bind.

3.3.8: Initialization

The initializeMainExecutable function has a complex process that is analyzed separately below.

3.3.9: returnmainfunction

Get the entry to the main program and pass its return to __dyLD_start.

4: Initializes process resolution

The initialization process is complex and is explored separately in this section.

4.1: initializeMainExecutableFunction analysis

void initializeMainExecutable(a)
{
    // record that we've reached this step
    gLinkContext.startedInitializingMainExecutable = true;

    // run initialzers for any inserted dylibs
    // Get all the image files
    ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
    initializerTimes[0].count = 0;
    const size_t rootCount = sImageRoots.size(a);if ( rootCount > 1 ) {
        // Start from 1 to the end. (The 0th main program)
        for(size_t i=1; i < rootCount; ++i) {
            // Image is initialized by calling +load and the constructor
            sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); }}// run initializers for main executable and everything it brings up
    // Call the main program initialization
    sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
    
    // register cxa_atexit() handler to run static terminators in all loaded images when this process exits
    if( gLibSystemHelpers ! =NULL ) 
        (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL.NULL);

    // dump info if requested
    if ( sEnv.DYLD_PRINT_STATISTICS )
        ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]);
    if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
        ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]);
}
Copy the code
  • Initialize theimages, the subscript from1Start, and then initialize the main program (subscript0)runInitializers.
  • Environment variables can be configuredDYLD_PRINT_STATISTICSandDYLD_PRINT_STATISTICS_DETAILSPrint relevant information.

4.2: ImageLoader::runInitializersFunction analysis

Can be seen from the above initializeMainExecutable function in the main program and the initial call is ImageLoader: image: runInitializers function.

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
    uint64_t t1 = mach_absolute_time(a);mach_port_t thisThread = mach_thread_self(a); ImageLoader::UninitedUpwards up; up.count =1;
    up.imagesAndPaths[0] = { this.this->getPath() };
    // Process initialization
    processInitializers(context, thisThread, timingInfo, up);
    context.notifyBatch(dyld_image_state_initialized, false);
    mach_port_deallocate(mach_task_self(), thisThread);
    uint64_t t2 = mach_absolute_time(a); fgTotalInitTime += (t2 - t1); }Copy the code
  • up.countValue is set to1And then callImageLoader::processInitializersFunction.

4.3: ImageLoader::processInitializersFunction analysis

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
                                     InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
    uint32_t maxImageCount = context.imageCount() +2;
    ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
    ImageLoader::UninitedUpwards& ups = upsBuffer[0];
    ups.count = 0;
    for (uintptr_t i=0; i < images.count; ++i) {
        // Initialize recursively
        images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
    }
    // If any upward dependencies remain, init them.
    // To ensure that all dependencies are initialized, initialize the uninitialized image again
    if ( ups.count > 0 )
        processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code
  • Call theImageLoader::recursiveInitializationFunction.

4.4: ImageLoader::recursiveInitializationFunction analysis

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
                                          InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{...if ( fState < dyld_image_state_dependents_initialized- 1 ) {
        uint8_t oldState = fState;
        // break cycles
        fState = dyld_image_state_dependents_initialized- 1;
        try {
            // initialize lower level libraries first
            // Initialize the underlying dependencies first
            for(unsigned int i=0; i < libraryCount(a); ++i) { ImageLoader* dependentImage =libImage(i);
                if( dependentImage ! =NULL) {...else if ( dependentImage->fDepth >= fDepth ) {
                        // Recursive initialization of dependent files
                        dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps); }}}... fState = dyld_image_state_dependents_initialized; oldState = fState;// This call is passed dyLD_image_state_dependents_initialized and the image is passed itself. So you end up calling your own +load. Start with libobjc.a.dylib.
            context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
            
            // initialize this image
            // initialize the image file and call the c++ constructor. LibSystem's libSystem_initializer is called here. It's going to call objc_init. _dyld_objc_notify_register calls its own +load method, followed by the c++ constructor.
            //1. Call libSystem_initializer->objc_init to register the callback.
            //2. _dyOBJC_NOTIFy_register: map_images, load_images, map_images, load_images For example, libdispatch.dylib, libsystem_featureflags. Dylib, libsystem_trace.dylib, and libxpc.dylib.
            //3. Its own c++ constructor
            bool hasInitializers = this->doInitialization(context);

            // let anyone know we finished initializing this image
            fState = dyld_image_state_initialized;
            oldState = fState;
            // The +load method is not called. NotifySingle internal fState== dyLD_image_state_dependents_initialized +load is invoked.
            context.notifySingle(dyld_image_state_initialized, this.NULL); ... }... }recursiveSpinUnLock(a); }Copy the code
  • Dynamic libraries that need to be initializedimagefromlibImage()AndlibImage()When the data is linked to the dynamic libraryrecursiveLoadLibrariesIn thesetLibImageThe savedimage.
  • The whole process is a recursive process, first initializing the lowest level of the dependency library, and then gradually initializing itself.
  • callnotifySingleFinally calledobjcAll of the+loadMethods. The first one herenotifySingleCall is+loadMethods, number twonotifySingleBecause the parameter isdyld_image_state_initializedWill not be called+loadMethods. Here,dyld_image_state_dependents_initializedThis means that the dependency file has been initialized and is ready to initialize itself.
  • calldoInitializationFinally calledc++The system constructor of. The first call islibSystem_initializer -> objc_initPerform the registration callback. Called in the callbackmap_images,load_images(+load). Here,load_imagesIs to call some to load some system libraries, such as:Dylib, libsystem_featureflags. Dylib, libsystem_trace. Dylib, libxpc.dylib.

4.5: notifySingleFunction analysis

NotifySingle is a function pointer that is assigned to the setContext function.

Continue to look at the implementation of notifySingle.

  • notifySingleCan’t find inload_imagesCall (the2sectiondyldIn the first exploration,notifySingleThe next thing I call isload_images, follow-up exploration).
  • notifySingleIn the function whenstate == dyld_image_state_dependents_initializedA function callback is performed after the dependency library initialization is completesNotifyObjCInit.

A global search for the sNotifyObjCInit function shows that the registerObjCNotifiers are assigned.

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
    // record functions to call
    // The first parameter is map_images
    sNotifyObjCMapped   = mapped;
    // The second parameter load_images
    sNotifyObjCInit     = init;
    // The third parameter is unmap_image
    sNotifyObjCUnmapped = unmapped;

    // call 'mapped' function with all images mapped so far
    try {
        // Call back to map_images immediately after assignment
        notifyBatchPartial(dyld_image_state_bound, true.NULL.false.true);
    }
    catch (const char* msg) {
        // ignore request to abort during registration
    }

    // <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem)
    // Call the init function on all initialized images
    for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(a); it ! = sAllImages.end(a); it++) { ImageLoader* image = *it;if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC()) {dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0.0);
            // Call some system library load_images.
            (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}Copy the code
  • Record the function to be called,sNotifyObjCInitThe function assignment comes from the second argument.
  • It’s called when it’s assignednotifyBatchPartialFunction (called internallysNotifyObjCMappedFunction).
  • Cycle callload_images, where the dependent system library is calledDylib, libsystem_blocks. Dylib, libsystem_dnssd.dylib, libsystem_featureflags. Dylib, libsystem_trace. libxpc.dylib.

Search globally for the registerObjCNotifiers and find that they are called from within _dyLD_OBJC_NOTIFy_register.

// called in _objc_init.
-> _dyLD_OBJC_NOTIFY_register -> _objc_init in objc-os.mm
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
    dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code

A global search for the _dyLD_OBJC_notify_register function found that it was not called anywhere in the dyld source code.

To continue exploring, add the _dyLD_OBJC_NOTIFY_register symbol breakpoint to your project and look at the function call stack.

According to the function call stack, the call order is: dyld-ImageLoaderMachO::doInitialization->dyld-ImageLoaderMachO::doModInitFunctions->libSystem.B.dylib-libSystem_initiali zer->libdispatch.dylib-libdispatch_init->libdispatch.dylib-_os_object_init->libobjc.A.dylib-_objc_init->libdyld.dylib-_d Yld_objc_notify_register.

  • libSystem_initializerFunction in thelibSystem.B.dylibSystem library.
  • libdispatch_initFunctions and_os_object_initFunction in thelibdispatch.dylibSystem library.
  • _objc_initFunction in thelibobjc.A.dylibSystem library, which we are most familiar withobjcThe source code.

Look at the implementation of the _objc_init function in the objc source code.

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init(a);tls_init(a);static_init(a);runtime_init(a);exception_init(a);#if __OBJC2__
    cache_t: :init(a);#endif
    _imp_implementationWithBlock_init();

    //_objc_init calls dyldAPIs. CPP with _dyLD_OBJC_NOTIFy_register as the second parameter load_images
    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}
Copy the code
  • _objc_initThe function is called_dyld_objc_notify_registerFunction.
  • The first argument ismap_imagesAssigned tosNotifyObjCMapped.
  • The second parameter isload_imagesAssigned tosNotifyObjCInit.
  • The third parameter isunmap_imageAssigned tosNotifyObjCUnmapped.

These three parameters interact with dyld in more detail later.

4.6: ImageLoaderMachO::doInitializationFunction analysis

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
    CRSetCrashLogMessage2(this->getPath());

    // mach-o has -init and static initializers
    // Call some initialization methods in the image
    doImageInit(context);
    // call the c++ constructor (libSystem_initializer also belongs to the c++ constructor)
    doModInitFunctions(context);
    
    CRSetCrashLogMessage2(NULL);
    
    return (fHasDashInit || fHasInitializers);
}
Copy the code
  • doImageInitA function callMach-OSome initialization methods in there.
  • doModInitFunctionsA function calllibSystem_initializer,c++The constructor (__attribute__((constructor))Modification of thecFunction).

  • to__mod_init_funcs sectionIn thec++Constructor is validated and then called.
  • Must be loaded before all libraries are loadedlibSystemSystem library, calllibSystem_initializerFunction (libSystem_initializerThe function is__attribute__((constructor))Modification of thec++Constructor).

Mach-o validates c++ constructor analysis

Add some c++ constructors to your project and look at the mach-o file.

__attribute__((constructor)) void funcA(a) {
    printf("\n ---funcA--- \n");
}

__attribute__((constructor)) void funcB(a) {
    printf("\n ---funcB--- \n");
}
Copy the code

5: the pushobjcwithdyldThe associated

From the symbol breakpoint function call stack above, you can see that there are non-dyld library calls between _dyLD_OBJC_NOTIFy_register and doModInitFunctions. Now let’s go through the process backwards.

Put a breakpoint on the _objc_init function in the most familiar objC library and look at the function call stack.

Since the flow between doModInitFunctions and _objc_init functions is unknown, the best approach is to call the flow backwards from the _objc_init function.

5.1: _os_object_initFunction analysis

The _objc_init function is called by the _os_object_init function in libdispatch.dylib. Download the latest libDispatch-1271.120.2 source code, search for _os_object_init function.

void
_os_object_init(void)
{
    / / _objc_init calls
    _objc_init();
    Block_callbacks_RR callbacks = {
        sizeof(Block_callbacks_RR),
        (void(*) (const void *))&objc_retain,
        (void(*) (const void *))&objc_release,
        (void(*) (const void *))&_os_objc_destructInstance
    };
    _Block_use_RR2(&callbacks);
#if DISPATCH_COCOA_COMPAT
    const char *v = getenv("OBJC_DEBUG_MISSING_POOLS");
    if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
    v = getenv("DISPATCH_DEBUG_MISSING_POOLS");
    if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
    v = getenv("LIBDISPATCH_DEBUG_MISSING_POOLS");
    if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
#endif
}
Copy the code
  • found_objc_initThe function is indeed_os_object_initFunction call.

5.2: libdispatch_initFunction analysis

Libdispatch_init also calls the _os_object_init function.

  • Thread-dependent processing.
  • Called after setting the main thread_os_object_initA series of functions.

5.3: libSystem_initializerFunction analysis

The libdispatch_init function is called by the libSystem_initializer function in the libSystem.dylib library. LibSystem_initializer = libSystem_initializer = libSystem_initializer

  • It callslibdispatch_initFunction, also called__malloc_init,_dyld_initializerAs well as_libtrace_initEtc. Function.

5.4: ImageLoaderMachO::doModInitFunctionsFunction analysis

LibSystem_initializer function is the ImageLoaderMachO dyld: : doModInitFunctions function call, so the whole process together.

In ImageLoaderMachO: : doModInitFunctions was found in the following code:

  • libSystemThe library must be initialized first. And that makes sense, too, because you’re initializingdispatchAs well asobjc. otherimageDepends on it.
  • To obtainc++Constructor, and then call.

ImageLoaderMachO: : no libSystem_initializer doModInitFunctions function function of clear calls, but reading register values can actually read from the breakpoint.

We’ve already analyzed that doModInitFunctions are calls to c++ constructors. LibSystem_initializer happens to be the c++ constructor:

So the whole call flow is connected. The libSystem_initializer c++ constructor is called first.

6: dyldregisteredobjcSimple analysis of callbacks

The above analysis in _objc_init function calls the _dyld_objc_notify_register function callback registration, and calls to dyld: : registerObjCNotifiers function has the following assignment:

    // The first parameter is map_images
    sNotifyObjCMapped   = mapped;
    // The second parameter load_images
    sNotifyObjCInit     = init;
    // The third parameter is unmap_image
    sNotifyObjCUnmapped = unmapped;
Copy the code

Let’s break down the logic of these three callbacks.

6.1: SNotifyObjCMapped (map_images)

SNotifyObjCMapped in dyld is called only in notifyBatchPartial:

While notifyBatchPartial function calls in notifyBatch function, registerImageStateBatchChangeHandler function and registerObjCNotifiers function. According to the previous analysis, the registerObjCNotifiers are called from inside the registerObjCNotifiers after they register the callback.

Add a breakpoint to the map_images function in objc source code:

The function call stack also verifies that the map_images function was called immediately after the callback was registered.

The map_images_nolock function is directly locked in the map_images function, in which the operations related to class loading are carried out, and the subsequent analysis is published separately.

6.2: SNotifyObjCInit (load_images)

SNotifyObjCInit is called in dyld in the following cases:

  1. notifySingleFromCacheIn the.
  2. notifySingleIn the.
  3. registerObjCNotifiers.

NotifySingleFromCache and notifySingle logic is basically the same, except that there is no cache difference. RegisterObjCNotifiers are callbacks that are made directly when the callback function is registered.

Add a breakpoint to the load_images function in the objc source code:

The function call stack also verifies that the system’s base library calls the load_images function immediately after registering the callback.

Starting with the Objc library is the notifySingle callback logic:

6.2.1: load_imagesfunction

SNotifyObjCInit is actually load_images, which is implemented as follows:

void
load_images(const char *path __unused, const struct mach_header *mh)
{
    if(! didInitialAttachCategories && didCallDyldNotifyRegister) { didInitialAttachCategories =true;
        // Load all categories
        loadAllCategories(a); }// Return without taking locks if there are no +load methods here.
    if (!hasLoadMethods((const headerType *)mh)) return;

    recursive_mutex_locker_t lock(loadMethodLock);

    // Discover load methods
    {
        mutex_locker_t lock2(runtimeLock);
        // Prepare all load methods
        prepare_load_methods((const headerType *)mh);
    }

    // Call +load methods (without runtimeLock - re-entrant)
    // Call the + load method
    call_load_methods(a); }Copy the code
  • Load all categories.
  • Prepare allloadMethods.
  • callcall_load_methodsFunction.

6.2.2: prepare_load_methodsfunction

void prepare_load_methods(const headerType *mhdr)
{
    size_t count, i;

    runtimeLock.assertLocked(a);classref_t const *classlist = _getObjc2NonlazyClassList(mhdr, &count);
    for (i = 0; i < count; i++) {
        // Prepare the load method for the main class
        schedule_class_load(remapClass(classlist[i]));
    }
    
    // Get the list of non-lazily loaded categories
    category_t * const *categorylist = _getObjc2NonlazyCategoryList(mhdr, &count);
    for (i = 0; i < count; i++) {
        category_t *cat = categorylist[i];
        Class cls = remapClass(cat->cls);
        if(! cls)continue// category for ignored weak-linked class
        if (cls->isSwiftStable()) {
            _objc_fatal("Swift class extensions and categories on Swift "
                        "classes are not allowed to have +load methods");
        }
        // Force the main class to implement (if not)
        realizeClassWithoutSwift(cls, nil);
        ASSERT(cls->ISA() - >isRealized());
        // Prepare the load method for the classification
        add_category_to_loadable_list(cat); }}Copy the code
  • Prepare the main classloadMethods.
  • class-readyloadMethods.

6.2.3: schedule_class_loadfunction

static void schedule_class_load(Class cls)
{
    if(! cls)return;
    ASSERT(cls->isRealized());  // _read_images should realize
    
    if (cls->data()->flags & RW_LOADED) return;
    
    // Ensure superclass-first ordering
    // Prepare the class's load method recursively until the superclass nil
    schedule_class_load(cls->getSuperclass());
    
    add_class_to_loadable_list(cls);
    cls->setInfo(RW_LOADED); 
}
Copy the code
  • Prepare the load method of the class recursively until the superclass nil.

6.2.4: add_class_to_loadable_listandadd_category_to_loadable_listfunction

void add_class_to_loadable_list(Class cls)
{
    IMP method;

    loadMethodLock.assertLocked(a);// The main class gets the load method
    method = cls->getLoadMethod(a);if(! method)return// Don't bother if cls has no +load method

    if (PrintLoading) {
        _objc_inform("LOAD: class '%s' scheduled for +load", 
                     cls->nameForLogging());
    }
    // If the capacity is insufficient, expand the capacity. The first time is 16 bytes
    if (loadable_classes_used == loadable_classes_allocated) {
        loadable_classes_allocated = loadable_classes_allocated*2 + 16;
        loadable_classes = (struct loadable_class *)
            realloc(loadable_classes,
                              loadable_classes_allocated *
                              sizeof(struct loadable_class));
    }
    // Struct loadable_class (struct loadable_class) has two member variables, CLS and method
    // Save one, the used capacity is the subscript ++
    loadable_classes[loadable_classes_used].cls = cls;
    loadable_classes[loadable_classes_used].method = method;
    loadable_classes_used++;
}
Copy the code
void add_category_to_loadable_list(Category cat)
{
    IMP method;

    loadMethodLock.assertLocked(a);// Obtain the load method by classification
    method = _category_getLoadMethod(cat);

    // Don't bother if cat has no +load method
    if(! method)return;

    if (PrintLoading) {
        _objc_inform("LOAD: category '%s(%s)' scheduled for +load", 
                     _category_getClassName(cat), _category_getName(cat));
    }
    // If the capacity is insufficient, expand the capacity. The first time is 16 bytes
    if (loadable_categories_used == loadable_categories_allocated) {
        loadable_categories_allocated = loadable_categories_allocated*2 + 16;
        loadable_categories = (struct loadable_category *)
            realloc(loadable_categories,
                              loadable_categories_allocated *
                              sizeof(struct loadable_category));
    }
    // Struct loadable_category (struct loadable_category) has two member variables (cat and method)
    // Save one, the used capacity is the subscript ++
    loadable_categories[loadable_categories_used].cat = cat;
    loadable_categories[loadable_categories_used].method = method;
    loadable_categories_used++;
}
Copy the code
struct loadable_class {
    Class cls;  // may be nil
    IMP method;
};

struct loadable_category {
    Category cat;  // may be nil
    IMP method;
};
Copy the code
  • To obtainloadMethods.
  • If the capacity is insufficient, expand. For the first time16Byte, followed byExisting capacity * 2 + 16Bytes.
  • Add the corresponding data toloadable_classeswithloadable_categoriesIn the.

6.2.5: objc_class::getLoadMethodand_category_getLoadMethodfunction

IMP 
objc_class::getLoadMethod(a)
{
    runtimeLock.assertLocked(a);const method_list_t *mlist;

    ASSERT(isRealized());
    ASSERT(ISA() - >isRealized());
    ASSERT(!isMetaClass());
    ASSERT(ISA() - >isMetaClass());

    // Get the list of class methods
    mlist = ISA() - >data() - >ro() - >baseMethods(a);if (mlist) {
        // Iterates through all the methods and finds the load method by string comparison
        for (const auto& meth : *mlist) {
            const char *name = sel_cname(meth.name());
            / / to match the load
            if (0= =strcmp(name, "load")) {
                return meth.imp(false); }}}return nil;
}
Copy the code
IMP 
_category_getLoadMethod(Category cat)
{
    runtimeLock.assertLocked(a);const method_list_t *mlist;
    // Get the list of class methods for the class
    mlist = cat->classMethods;
    if (mlist) {
        // Iterate through the list of class methods and find the load method by string comparison
        for (const auto& meth : *mlist) {
            const char *name = sel_cname(meth.name());
            / / to match the load
            if (0= =strcmp(name, "load")) {
                return meth.imp(false); }}}return nil;
}
Copy the code
  • Obtained by string alignmentloadMethods.

6.2.6: call_load_methodsfunction

void call_load_methods(void)
{
    static bool loading = NO;
    bool more_categories;

    loadMethodLock.assertLocked(a);// Re-entrant calls do nothing; the outermost call will finish the job.
    if (loading) return;
    loading = YES;

    void *pool = objc_autoreleasePoolPush(a);// This loop calls the load method, which is called at this moment
    do {
        // 1. Repeatedly call class +loads until there aren't any more
        while (loadable_classes_used > 0) {
            // Call the load method for each class
            call_class_loads(a); }// 2. Call category +loads ONCE
        // Call the class load, which means that the class load is called after all the class load methods are called (for image).
        more_categories = call_category_loads(a);// 3. Run more +loads if there are classes OR more untried categories
    } while (loadable_classes_used > 0  ||  more_categories);

    objc_autoreleasePoolPop(pool);

    loading = NO;
}
Copy the code
  • callcall_class_loadsLoading of the class+ loadMethods.
  • Then callcall_category_loadsLoad classified+ loadMethods. Here is also the classificationloadIn all classesloadMethod call (forimage).

I’m going to call it here+ loadThe method, this is the same thing as+ loadMethods in themainThe reason the function was called before.

6.2.7: call_class_loadsfunction

static void call_class_loads(void)
{
    int i;
    
    // Detach current loadable list.
    struct loadable_class *classes = loadable_classes;
    int used = loadable_classes_used;
    loadable_classes = nil;
    loadable_classes_allocated = 0;
    / / to empty values
    loadable_classes_used = 0;
    
    // Call all +loads for the detached list.
    for (i = 0; i < used; i++) {
        Class cls = classes[i].cls;
        // Get method from classes
        load_method_t load_method = (load_method_t)classes[i].method;
        if(! cls)continue; 

        if (PrintLoading) {
            _objc_inform("LOAD: +[%s load]\n", cls->nameForLogging());
        }
        / / call the load
        (*load_method)(cls, @selector(load));
    }
    
    // Destroy the detached list.
    if (classes) free(classes);
}
Copy the code
  • fromloadable_classesLoop toloadMethod is called.

6.2.7: call_category_loadsfunction

static bool call_category_loads(void)
{
    int i, shift;
    bool new_categories_added = NO;
    
    // Detach current loadable list.
    struct loadable_category *cats = loadable_categories;
    int used = loadable_categories_used;
    int allocated = loadable_categories_allocated;
    loadable_categories = nil;
    loadable_categories_allocated = 0;
    loadable_categories_used = 0;

    // Call all +loads for the detached list.
    for (i = 0; i < used; i++) {
        Category cat = cats[i].cat;
        // Fetch load from cats
        load_method_t load_method = (load_method_t)cats[i].method;
        Class cls;
        if(! cat)continue;

        cls = _category_getClass(cat);
        if (cls  &&  cls->isLoadable()) {
            if (PrintLoading) {
                _objc_inform("LOAD: +[%s(%s) load]\n", 
                             cls->nameForLogging(), 
                             _category_getName(cat));
            }
            (*load_method)(cls, @selector(load)); cats[i].cat = nil; }}... }Copy the code
  • classificationloadIs also called fromloadable_categoriesCycle takeloadMethod is called. There is more internal processing logic in the classification.

So the + load method and the c++ constructor are called before the main program entry is called. This can be verified by assembling breakpoints:

So that corresponds to where we started. If you change the name of main, you’ll get an error.

The above analysis shows that dyLD is calling runInitializers from the first image in order of the image list. The next image’s runInitializers are called, and finally the main program’s (subscript 0) runInitializers are called. Inside runInitializers, the +load for all classes is called first, then the +load for all classes, and finally the c++ constructor.

The load method is called from objc, and the doModInitFunctions are called from dyld to the c++ constructor.

6.3: SNotifyObjCUnmapped (unmap_image)

SNotifyObjCUnmapped only removeImage is called in dyld:

The removeImage function is called by the checkandAddImage, garbageCollectImages, _dyLD_link_module and NSUnLinkModule functions.

  • checkandAddImage: Check this loadedimageWith any existingimageThe installation path, the same as the direct return, different on the add.

GarbageCollectImages: called when other exceptions such as link and recycling occur. _dyLD_LINK_module: Compatible with older versions. NSUnLinkModule: called when the link is disconnected.

6.3.1: unmap_imageCall theunmap_image_nolock, the core code is as follows:

void
unmap_image_nolock(const struct mach_header *mh)
{... header_info *hi; ...// Release the class to classify related resources.
    _unload_image(hi);

    // Remove header_info from header list
    // Remove the remove Header
    removeHeader(hi);
    free(hi);
}
Copy the code
  • Uninstall and release resources related to classes and classifications.
  • removeHeaderInformation.

7. dyld3Closure pattern analysis

7.1: Analysis introduction

Closure mode returns directly with closure mode enabled, so the core logic is in the launchWithClosure:

static bool launchWithClosure(const dyld3::closure::LaunchClosure* mainClosure,
                              const DyldSharedCache* dyldCache,
                              const dyld3::MachOLoaded* mainExecutableMH, uintptr_t mainExecutableSlide,
                              int argc, const char* argv[], const char* envp[], const char* apple[], Diagnostics& diag,
                              uintptr_t* entry, uintptr_t* startGlue, bool* closureOutOfDate, bool* recoverable)
{...// run initializers
    CRSetCrashLogMessage("dyld3: launch, running initializers");
    libDyldEntry->runInitialzersBottomUp((mach_header*)mainExecutableMH);
    //dyld::log("returned from runInitialzersBottomUp()\n");... }Copy the code

RunInitialzersBottomUp call found in launchWithClosure:

void AllImages::runInitialzersBottomUp(const closure::Image* topImage)
{
    // walk closure specified initializer list, already ordered bottom up
    topImage->forEachImageToInitBefore(^(closure::ImageNum imageToInit, bool& stop) {
        // get copy of LoadedImage about imageToInit, but don't keep reference into _loadedImages, because it may move if initialzers call dlopen()
        uint32_t    indexHint = 0;
        LoadedImage loadedImageCopy = findImageNum(imageToInit, indexHint);
        // skip if the image is already inited, or in process of being inited (dependency cycle)
        if ( (loadedImageCopy.state() == LoadedImage::State::fixedUp) && swapImageState(imageToInit, indexHint, LoadedImage::State::fixedUp, LoadedImage::State::beingInited) ) {
            // tell objc to run any +load methods in image
            if( (_objcNotifyInit ! =nullptr) && loadedImageCopy.image() - >mayHavePlusLoads() ) {
                dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)loadedImageCopy.loadedAddress(), 0.0);
                const char* path = imagePath(loadedImageCopy.image());
                log_notifications("dyld: objc-init-notifier called with mh=%p, path=%s\n", loadedImageCopy.loadedAddress(), path);
                // +load
                (*_objcNotifyInit)(path, loadedImageCopy.loadedAddress());
            }

            // run all initializers in image
            // the c++ constructor
            runAllInitializersInImage(loadedImageCopy.image(), loadedImageCopy.loadedAddress());

            // advance state to inited
            swapImageState(imageToInit, indexHint, LoadedImage::State::beingInited, LoadedImage::State::inited); }}); }Copy the code
  • _objcNotifyInitFinally called+ loadMethods.
  • runAllInitializersInImagecallc++Constructor.
void AllImages::runAllInitializersInImage(const closure::Image* image, const MachOLoaded* ml)
{
    image->forEachInitializer(ml, ^(const void* func) {
        Initializer initFunc = (Initializer)func;
#if __has_feature(ptrauth_calls)
        initFunc = (Initializer)__builtin_ptrauth_sign_unauthenticated((void*)initFunc, 0.0);
#endif
        {
            ScopedTimer(DBG_DYLD_TIMING_STATIC_INITIALIZER, (uint64_t)ml, (uint64_t)func, 0);
            // the c++ constructor
            initFunc(NXArgc, NXArgv, environ, appleParams, _programVars);

        }
        log_initializers("dyld: called initialzer %p in %s\n", initFunc, image->path());
    });
}
Copy the code

Dyld3 :: _dyLD_OBJC_NOTIFy_register = dyLD3 :: _dyLD_OBJC_NOTIFy_register If the condition is not met, the _dyLD_OBJC_NOTIFy_register function is called

View the relevant source code:

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
    if ( gUseDyld3 )
        // dyld3, use this to register the callback
        return dyld3::_dyld_objc_notify_register(mapped, init, unmapped);

    DYLD_LOCK_THIS_BLOCK;
    typedef bool (*funcType)(_dyld_objc_notify_mapped, _dyld_objc_notify_init, _dyld_objc_notify_unmapped);
    static funcType __ptrauth_dyld_function_ptr p = NULL;

    if(p == NULL)
        // Find the _dyLD_OBJc_notify_register function
        dyld_func_lookup_and_resign("__dyld_objc_notify_register", &p);
    // Call _dyLD_objc_notify_register
    p(mapped, init, unmapped);
}
Copy the code
  • gUseDyld3Don’t forNULLCan’t walkdyld3The process, or godyld2The flow, the result of real machine debugging is gonedyld2Now let’s analyze the processdyld3The process.

7.2: dyld3Process analysis

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)

{
    log_apis("_dyld_objc_notify_register(%p, %p, %p)\n", mapped, init, unmapped);

    gAllImages.setObjCNotifiers(mapped, init, unmapped);
}
Copy the code
  • callgAllImages.setObjCNotifiersFunction registers callbacks.
void AllImages::setObjCNotifiers(_dyld_objc_notify_mapped map, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmap)

{
    // Dyld3 registers three callback Pointers that are different from dyld2
    _objcNotifyMapped   = map;
    _objcNotifyInit     = init;
    _objcNotifyUnmapped = unmap;

    // We couldn't initialize the objc optimized closure data in init() as that needs malloc but runs before malloc initializes.
    // So lets grab the data now and set it up.if ( !mhs.empty()) {// Callback to objc Map
        (*map)((uint32_t)mhs.count(), &paths[0], &mhs[0]);
        if ( log_notifications("dyld: objc-mapped-notifier called with %ld images:\n", mhs.count())) {for (uintptr_t i=0; i < mhs.count(a); ++i) {log_notifications("dyld: objc-mapped: %p %s\n", mhs[i], paths[i]); }}}}Copy the code
  • copyobjcThree callbacks of,dyld3Register the three callback function Pointers withdyld2Different.
  • forobjc map(mapping) callback, or call_objcNotifyMappedThe callback.

7.2.1 _objcNotifyMapped (map_images)

_objcNotifyMapped function was AllImages: : runImageCallbacks function call.

AllImages: : runImageCallbacks function is been AllImages: : applyInitialImages function and AllImages: : loadImage function call.

AllImages::applyInitialImagesFunction:

AllImages::applyInitialImagesFunction is_dyld_initializerFunction call.

  • from5.3Section know_dyld_initializerFunction islibSystem.dylibIn the librarylibSystem_initializerFunction call. And because the_dyld_initializerIs in thelibdispatch_initCalled before, so the callback should not have been registered at this point.

AllImages: : loadImage function:

  • callAllImages::runImageCallbacksFunction.
  • callAllImages::runInitialzersBottomUpFunction, called indirectly to_objcNotifyInitFunction.

The AllImages::loadImage function is called by AllImages::dlopen.

  • callAllImages::loadImageFunction.
  • callAllImages::runInitialzersBottomUpFunction, called indirectly to_objcNotifyInitFunction.

7.2.2: _objcNotifyInit (load_images)

_objcNotifyInit function was AllImages: : runInitialzersBottomUp function call.

Continue to find AllImages: : runInitialzersBottomUp function of the caller.

  • In combination with7.1and7.2.1Section, it is concluded thatAllImages::runInitialzersBottomUpFunction islaunchWithClosureThe function,AllImages::loadImageFunctions andAllImages::dlopenFunction call.

7.2.3: _objcNotifyUnmapped (unmap_image)

The _objcNotifyUnmapped function is called by the AllImages::removeImages function.

AllImages: : removeImages function is been AllImages: : garbageCollectImages function call.

AllImages: : garbageCollectImages function is been AllImages: : decRefCount function call.

The AllImages::decRefCount function is called in two cases:

It is called by dlCLOSE in non-MacOS cases.

In the macOS case it is called by the NSUnLinkModule function.

Because the real machine and emulator and Mac have no way to enter closure mode debugging verification. And the closure pattern code is logically less readable, so this is just a conclusion based on the source code, not necessarily true.

7.3: dyld3Closure pattern flowchart

8: dyld profile

Startup Time: indicates the Time before the main function is executed. Lacunch Closure: All the information necessary to start an application.

Dyld has developed to now have 3 major versions, the following will be a simple summary of dyLD’s evolution process

8.1: dyld 1.0 (1996-2004),

  • Included in theNeXTStep 3.3Together with the release before thisNeXTUse static binary data.
  • dyld1History predates standardizationPOSIX dlopen()The call.
  • dyld1Is used in most casesc++Dynamic library system previously written.

    c++There are a number of features, such as its initializer sorting, that work well in a static environment but can degrade performance in a dynamic environment. So largec++The code base leads todyldA lot of work needs to be done, and the speed slows down.
  • inMacOS Cheetah (10)Prebinding technology has been added to the.

    Pre-bind to all in the systemdylibAnd your program finds the fixed address.dyldAll the contents of those addresses will be loaded. Successful loading will edit all of these binary data to get all the precomputed addresses. Then the next time it puts all the data into the same address it doesn’t have to do any extra work. This speeds things up a lot, but it also means editing your binary data every time you start up. This is not a very good practice in terms of security.

8.2: dyld2.0 (2004-2007),

  • As themacOS TigerRelease.
  • dyld2isdyldA completely rewritten version of.
  • The right supportc++Initializer semantics, extendedmach-oFormat and updateddyld.
  • Has a complete machinedlopenanddlsymImplementation, with correct semantics, deprecatedAPI(the old versionAPIStill only locatedmacOS).
  • dyld2Is designed to increase speed, so only limited robustness checks are performed (previously there were not many malicious programs).
  • dyldThere are some security issues, and some functional improvements to make it safer on the current platform.
  • The pre-binding effort can be reduced due to the speed increase. Different from thedyld1Edit your program data,dyld2Edit only system libraries. You can do these things only when the software is updated. So when you update your software, you might see words like “Optimize system performance,” and you’re updating the pre-binding.

8.2.1: dyld2. X (2007-2017)

  • Add more architectures and platforms.

    • x86,x86_64,arm64
    • iOS,tvOS,watchOS
  • Improve security

    • Add code signatures andASLR
    • mach-o headerBoundary check to avoid adding malicious binary data.
  • Improve performance

    • Use shared cache instead of pre-binding.

8.2.2: Shared Cache

Shared cache (DyLD pre-connector) was first introduced in ios 3.1&MacOS Snow Leopard to completely replace pre-binding.

  • It is a single file that contains most system dylib. Because it is merged into one file, it can be optimized

    • Resize the binary data to increase load speed (resize all text segments and all data segments rewrite the entire symbol table to reduce size).
    • Allows you to pack binary data segments with significant savingsram
    • Pre-generated data structure fordyldandobjcUse, use at run time so that we don’t have to do these things when the application starts. It will also save moreramAnd time.
  • The shared cache is locally generated on macOS and runs dyLD shared code to greatly improve system performance. The other platforms are provided by Apple.

8.3: dyld3.0 (2017 -)

Dyld3 is a new dynamic linker, 2017 (iOS11) all system programs will use dyLD3 by default, a third party in 2019 (iOS13) completely replace dyLD2. Dyld3 has made the following three improvements:

  1. Performance, improve startup speed.dyld3Can help us get faster program start and run speed.
  2. Security.dyld2It’s hard to add security to a real-world situation.
  3. Testability and reliability.

    XCTestDepends on thedyldTo insert their libraries into the process. Therefore cannot be used to test existingdyldThe code. This makes it difficult to test security and performance levels.

    dyld3The mostdyldRemove the process, now mostdyldIt’s just a normal background program. You can use standard testing tools to test. Also allow partdyldResident in the process, the resident part as small as possible, so as to reduce the attack area of the program.

8.4: Load comparison of DYLD2 and DYLD3

8.4.1: dyld2 process

  • Parse mach-o headers & Find dependenciesAnalysis:macho headersTo determine which libraries are needed. Recursively analyze dependent libraries until you have all dylib libraries. ordinaryiOSApplication needs to3-600.adylib, which requires a lot of processing.
  • Map mach-o files: Maps allmachoThe file places them in the address space (mapped into memory).
  • Perform symbol lookups: Performs a symbol lookup. Such as the useprintfFunction, will findprintfIs in the library system, then finds its address and copies it to the function pointer in the application.
  • Bind and rebase: resets the binding and base addresses. To copy these Pointers, all Pointers must use the base address (ASLRThe existence of.
  • Run initializers: Runs all initializers. After that, it’s time to prepare for executionmainFunction.

8.4.2: dyld3 process

The whole dyLD3 is divided into three processes:

  • Dyld3 is an out-of-process Macho analyzer and compiler (shown in red).

    • Parse all search paths,rpaths, environment variables.
    • Analysis of themachoBinary data.
    • Perform all symbol lookups.
    • Use the above results to create the closure processing.
    • It is a normal background program, can be normal testing.
    • Most program launches use the cache and never need to call out of processmachoProfiler or compiler.
    • Start closure ratiomachoSimpler, they are memory mapped files, do not need to be analyzed in a complex way, they can be easily verified, for speed.
  • Dyld3 is also an in-process engine.

    • Check that the closure is correct.
    • Use closures to map alldylibs.
    • Binding and base resets.
    • Run all initializers and jump to the main programmain()

    ⚠ ️dyld3You don’t need to analyzemacho HeadersOr perform a symbol lookup. inAppThere is no such process at startup, which greatly improves the startup speed of the program.

  • Start the closure caching service

    • systemappThe closure pattern is built into a shared cache.
    • Third-party applications are built at installation and rebuilt when software is updated.
    • inmacOSThe background process engine can be called in the background process, which is not required on other platforms.

Wwdc2017-413 App Startup Time: Past, Present, and Future

9:

9.1: dyldCalling process

9.2: Core process text summary

  • dyldDynamic linker: Keeps track of all libraries and executables.
  • dyldLoading process:
    • System kernel call_dyld_start
      • relocationdyld(rebaseDyld)
      • Initialize thedyld(_subsystem_init)
      • calldyld::_mainFunction (dyld::_main) :
        • Load the shared cache (mapSharedCache)
          • The actual callloadDyldCacheThere are three cases
            • Load only to the current process callmapCachePrivate. Do not put into the shared cache, only for your own use.
            • It’s already loaded without any processing.
            • The current process is loaded for the first timemapCacheSystemWide
          • dyld2/dyld3(ClosureModeClosure mode) loader (iOS11The introduction ofdyld3Closure mode. Closure mode loads faster and is more efficient.iOS13Both the post-dynamic library and the tripartite library are usedClosureModeLoad) :
            • dyld3:
              • Find/CreatemainClosure
              • throughlaunchWithClosureStart the main program and return after successresultMain program entrymain). Logic anddyld2Starting the main program logic is basically the same.
            • dyld2: Starts the main program
              • Instantiate the main programinstantiateFromLoadedImage(Actually createimage)
                • callsniffLoadCommandsGenerate relevant information, such ascompressed. According to thecompressedTo determine which class to use for instantiation.
                  • compressedIs based onLC_DYLIB_INFOandLC_DYLD_INFO_ONLYTo get.
                  • segCountmost256A.
                  • libCountmost4096A.
                • Instantiate to produceimageTo joinall imagesIn the.
              • Insert & load dynamic libraryloadInsertedDylib
                • Initialize the configuration call based on contextloadLoading the dynamic library
              • Link main program and link insert dynamic library (linkWith the main program link in front)
                • correctionASLR, bindingnoLazySymbol, bind weak symbol
              • Initialize the main programinitializeMainExecutable(Core method)
                • Initialize theimage, the subscript from1Start, and then initialize the main program (subscript0), the calldyld ImageLoader::runInitializers
                  • dyld ImageLoader::processInitializers:
                    • dyld ImageLoader::recursiveInitialization:
                      • dyld dyld::notifySingle:
                        • This function performs a callback
                        • Debug callbacks through breakpoints yes_objc_initInitialize the second parameter passed in the assignmentload_images
                          • load_imagesCall thecall_load_methodsfunction
                            • call_class_loads: calls each class+loadMethods.
                            • call_category_loads: calls each category+loadMethods.
                      • doInitialization
                        • It’s going to end up callingdoModInitFunctions
                          • Global is called internallyc++The constructor (attribute((constructor))Modification of thecFunction)
                          • Will first calllibsystem_initializerConstruct and register upward callbacks
                            • The callback takes three arguments (map_images,load_images,unmap_image)
                              • map_imagesThe class is loaded and called as soon as the callback function is registered
                              • load_imagesinnotifySingleCall in a loop
                              • unmap_imageCalled on exception/recycle/check image file
              • Find the main program entryLC_MAINAnd return to the main program entry (main)
  • load,c++Constructor,mainCall summary:
    • dyldInitialize theimageIs in accordance with theLink Binary With LibrariesInitializing sequentially, from the subscript1Start, and finally initialize the main program (subscript0), can be understood as the pressimageIn groups.
    • imageInternally, all classes are loaded first+loadMethod to reload the classified+loadMethod is finally loadedc++Global constructor (class load-> class load->c++ constructor).+loadMethod isobjcIs called,c++The global constructor is indyld(without considering the optimization of binary rearrangement, etc.,imageThe internal order defaults to byCompile SourcesIn order).
    • mainFunction isdyldReturn entry function (main).