1. DYLD is introduced

DYLD was described in detail at Apple’s WWDC 2017 presentation. I have also made some summary, the details are shown below.

1.1 Application Startup and Optimization

Before introducing DYLD, there are a few concepts you must understand.

  • Startup Time: Startup Time is the Time before main executes (if you are writing an Application, you need to do a lot of things, after that you need to load the NIB file and do other things, then run the UI Application Delegates code)

  • Launch Closure: Launch Closure is all the information you need to start your program (e.g., which dylibs are used by your program, what different symbols are used for their offsets, what code signatures are).

How to optimize application startup? The principle of app startup optimization is that the less code you have, the faster you start. The specific optimization method is as follows:

  • You should use fewer DyLib, fewer embedded DyLib, and use the system libraries better from a time point of view.

  • Fewer libraries and methods should be declared.

  • Reduce initialization functions.

Apple suggests that the best way to use more Swift code is because Swift is designed to avoid many of the pitfalls that can be encountered when using C, C++, and Objective-C, as shown below:

  • Swift does not have an initializer.

  • Swift does not allow certain types of unaligned data structures, which can extend startup times.

  • Swift code is leaner and therefore performs better.

How do we verify that the initialization function really has an effect on the speed of application startup? First let’s create an iOS project and set its launch page to look like this:

Write the following code in the main.m file:

#import <UIKit/UIKit.h> #import "AppDelegate.h" int main(int argc, char * argv[]) { NSString * appDelegateClassName; @autoreleasepool { appDelegateClassName = NSStringFromClass([AppDelegate class]); } return UIApplicationMain(argc, argv, nil, appDelegateClassName); } // Initialize the function, simulating a network request (a time-consuming operation) in the initialization function, But this initialization function is not called anywhere __attribute__((constructor)) void waitForNetworkDebugger (void) {NSLog(@" Initiate network request!" ); for (int i = 0; i < 100000; I ++) {NSLog(@" % DTH ", I); } NSLog(@" Network request completed!" ); }Copy the code

Enter the following two breakpoints:

When we compile and run the program, we find that the program first executes the waitForNetworkDebugger initialization function, which indicates that the initialization function was run before main (which sets the object for you), and continue running the program, we find that the startup screen takes a long time, but if we didn’t know that, Write a lot of initialization functions in the project, and perform some time-consuming operations in the initialization function, the application will start very slow, the user experience will be very bad.

Apple added static initialization tracking to its new iOS 11 tool, Instruments, to help us optimize for shorter app startup times. Now Instruments has added this feature to provide the exact time for each static initializer, using the following:

  • First of all, openinstruments

  • Start an empty template

  • Add static initialization tool

  • Add aTime Profiler, convenient to see the running situation

  • Link to the real machine and click the red “Run” button, as shown in the picture below:

  • To viewwaitForNetworkDebuggerTake the data

As a result, we can quickly discover which stack initializers are causing the slow startup of the program. This involves multiple dylib, including system dylib. These system libraries take a long time due to the input you provide to them (for example: Complex NIB files, which depend on the new infrastructure in the High Sierra and iOS 11 cores and dyld, you need a new generator to see this information, and can now capture most initializers.

1.2 History of DYLD

1.2.1 DYLD1

DYLD1 (1996-2004) : It was included in NeXTStep 3.3. Until then, NeXT used static binary data, which was not very useful, and it predated the standardized POSIX dlopen call. Dlopen still exists in some Unix systems today. NeXTStep had different specialized extensions, so developers wrote third-party wrappings on early versions of macOS 10 to support standard Unix software. The problem was that these wrappings didn’t perfectly support the same semantics, so there might be some boundary examples that didn’t work properly, so it was slow. It is used in most c + + dynamic library system previously written, c + + has many characteristics, such as the initializer sorting methods, etc.), they are working in a static environment is good, but in the dynamic environment, may reduce performance, therefore a large library of c + + code causes the dynamic linker needs to be done a lot of work, slow.

So to improve performance, Apple engineers added Prebinding, a technology that finds fixed addresses for all dylib and your application in the system. Dynamic loaders will load all the contents of those addresses. If successful, Will edit all these binary data, in order to obtain all is expected to address, and then the next time it won’t have to put all the data in the same address for any other additional work, this will greatly improve the speed, but it also means that every time when they start editing your binary data, it is not good, at least from a security is,

1.2.2 DYLD2.0 and DYLD2. X

DYLD 2.0 (2004-2007, macOS Tiger component) : a complete rewrite of DYLD1, with the following additions:

  • With proper support for C++ initializer semantics, apple engineers extended the mach-o format and updated DYLD to achieve efficient C++ library support.

  • Complete native dlopen and DLSYM implementations with correct semantics and deprecated older apis.

DYLD2 is designed to speed up program loading and running. As a result, it had limited health checks, but some security issues remained, so Apple engineers made some improvements to make it more secure on the current platform. Due to its speed is improved, thus can reduce the workload of binding, unlike edit your application data, only edit system libraries, can be in only the software update, do these things, so in the process of software installation, you might see, such as optimization of system performance, text, at this moment is the update for the binding, DYLD now used for all optimization, After the release of DYLD2.0, a number of improvements have been made and performance has improved significantly. Dyld2.x (2007-1017) has expanded the following features:

1. Support more architectures and platforms

  • Added a large number of infrastructure and platforms (x86, X86_64, ARM, ARM64, and many derivative platforms)

  • Launch iOS, tvOS, and watchOS, all of which require new DYLD functionality.

2. Enhance security in various ways

  • Added code signature and ASLR (random loading of address space configuration), which means that every time you load the library, it may be in a different location (see WWDC 2016 video for details).

  • Added field information in the Mach-o header, which is an important boundary checking feature, to avoid the addition of malicious binary data.

  1. Performance is enhanced, so the pre-binding technique can be removed
  • Using the shared cache (first introduced in iOS 3.1 and macOS Snow Leopard and completely replacing pre-binding, it is a single file that contains most of the system’s dylib and can be optimized because it is merged into a single file), the shared cache is locally generated on macOS to run the dyld shared code, Will greatly optimize system performance, among other benefits.

  • Resize all text segments and all data segments Rewrite the entire symbol table to reduce the size so that only a small number of regions are mounted in each process.

  • It allows us to package binary data segments, saving a lot of RAM, and is actually the Dylib prelinker.

  • Pregenerating data structures for dyld and ObjC to use at run time saves us more RAM and time by not having to do this when the program starts.

1.2.3 DYLD2.0andDYLD2.x

DYLD3: The new dynamic linker, which will be launched in 2017, completely changes the concept of dynamic linking. It will become the default setting for most macOS applications. It will be used by default for all Apple OS applications in 2017, and will completely replace DYLD2 in future Apple OS platforms and third-party applications.

Why use dynamic linkers?

  • For the sake of performance, maximizing startup speed can help us get the fastest program up and running.

  • Security, although some security features have been added in DYLD2, but it is difficult to follow the real situation to enhance security, so can more active security check, and improve security from the design?

  • Testability and reliability, can let dyld easier test, this Apple is not a lot of good test framework, XCTest, for example, they depend on the underlying function of the dynamic linker, put their library into the process, so they cannot be used to test the existing dyld code, this let Apple programmer is difficult to test the safety and performance levels.

To solve these problems, Apple’s programmers made the following changes to DYLD:

  • Moving most DYLD out of the process, which is now just a normal background application, can be tested using standard testing tools, which can further improve speed and performance.

  • Also run part of the DYLD to reside in the process, but the resident part is as small as possible to reduce the attack area of the program.

  • Because of the code speed increase, this will increase the startup speed

1.DYLD2andDYLD3Start procedure flow

1.2.4.1 DYLD2Start procedure flow

  1. Parse Mach-o headers. Parse Mach-o files to find out which libraries your application needs.

  2. Do recursion analysis until you get a complete picture of all dyLibs. normal iOS programs need 3-600 DyLibs. this is a lot of data and requires a lot of processing.

  3. Map mach-o files to all mach-o files and place them in the address space.

  4. Perform symbol lookups. If your program uses printf, it will check to see if printf is in the library system, find its address, and copy it to a function pointer in your program.

  5. Bind and rebase (copy these Pointers, all Pointers must use the base address due to random address offsets).

  6. Run Initializers Run all initializers.

  7. Finally, prepare to execute the Main program function.

The process is shown in the figure below:

1.2.4.2 DYLD3 Start procedure flow

So how does DYLD3 speed things up and what steps are taken out of the program? What’s the basis for that?

  1. Identify security sensitive components(Identifying security-sensitive components), from an Apple engineer’s point of view, is one of the biggest security risks analyzedmach-oFile header and lookup dependencies, so people can use coined modifiedmach-oFile header to attack and your program may use@rpathsThey are search paths that can be corrupted by editing these paths or inserting libraries in place, so all this work is done outside of the background program process. As shown in the figure below:

  1. It then also identifies the heavy resource hog, that is, the cache hog, which is a symbol lookup, because in a given library, the symbol will always be at the same offset in the library unless a software update is made or the library is changed on disk.

Now that we’ve identified these things, let’s look at how they work in DYLD3. Okay?

These sections are first moved to the upper layer, and then the final processing is written to disk. An important part of the program is started when closing processing is started, as shown in the figure below:

You can use it later in the process. DYLD3 contains three parts:

  1. An out-of-process Mach-o analyzer and compiler.

  2. The in-process engine performs the start finish processing.

  3. A bootstrap cache service

Most program launches use the cache, but there is always no need to call an out-of-process Mach-o analyzer or compiler. Startup finalizations are simpler than Mach-O, they are memory-mapped files that do not need to be analyzed in a complex way, and they can be easily verified. For speed purposes, let’s look at each part in detail.

As an out-of-process Mach-O analyzer, DYLD3’s analysis process mach-O is as follows:

  • Or it can resolve all search paths, @rpaths, environment variables or they can affect your startup?

  • Parses the Mach-o binaries

  • Performs all symbol lookups.

  • Creates a launch closure with results

  • Is a normal daemon that can be used to improve the performance of normal testing infrastructure.

DYLD3 works as a small in-process engine (this part resides in the process and is what you normally see), and the process is shown below.

  • Validates Launch Closure (check for proper startup closure).

  • Maps in all dylibs (map to dylib, jump to main).

  • Applies fixups

  • Runs Initializers (runs all initializers).

  • Jumps to apply the main function to execute the code.

As a result, DYLD3 does not need to analyze the Mach-o header file or perform the look-up symbol to start your application without doing these things. Since these are the parts that take time, you can start your application much faster, as shown in the figure below:

Dyld3, as a startup end-of-cache service, has the following process:

  • Once we use this tool to run and analyze every Mach-O file in the system, we can put them directly into the shared cache. Make it map to the cache, all dylib starts with it, we don’t even need to open other files).

  • For third-party apps, we generate your closures processing during program installation or system update because by then the system library has changed. By default, Will generate end-of-life processing on iOS, tvOS, and watchOS, even before the program runs).

  • On macOS the in process engine can call out to a daemon if necessary. On macOS the in process engine can call out to a daemon if necessary After that, you can use the cache’s end-of-life processing, which is not required on other platforms.)

Possible problems with DYLD3:

  • First, it’s fully compatibledyld 2.x, so some existingAPIIt will cause your program to run slower or it will run indyld3Is used in the rollback mode.

2. Analyze the main process of dyld loading the application

First we create a project and put a breakpoint in the load class method in the viewController.m file, as shown below:

Compile and run the program to the breakpoint and view the function call stack information, as shown below:

Print the stack with the bt command as follows:

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 9.1
    frame #0: 0x0000000105509e8c DemoApp`+[ViewController load](self=ViewController, _cmd="load") at ViewController.m:18:1
    frame #1: 0x00007fff20181ff2 libobjc.A.dylib`load_images + 1439
    frame #2: 0x000000010551fe2c dyld_sim`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 425
    frame #3: 0x000000010552eba5 dyld_sim`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 437
    frame #4: 0x000000010552cec7 dyld_sim`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 191
    frame #5: 0x000000010552cf68 dyld_sim`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 82
    frame #6: 0x000000010552026b dyld_sim`dyld::initializeMainExecutable() + 199
    frame #7: 0x0000000105524f56 dyld_sim`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 4789
    frame #8: 0x000000010551f1c2 dyld_sim`start_sim + 122
    frame #9: 0x00000001072cea88 dyld`dyld::useSimulatorDyld(int, macho_header const*, char const*, int, char const**, char const**, char const**, unsigned long*, unsigned long*) + 2093
    frame #10: 0x00000001072cc162 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 1198
    frame #11: 0x00000001072c6224 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) + 450
  * frame #12: 0x00000001072c6025 dyld`_dyld_start + 37
Copy the code

Ld_start = _dyLD_start — ld_start = _dyld_start — ld_start = _dyld_start — ld_start = _dyld_start — ld_start = _dyld_start

_dyLD_start = / dyldBOOTstrp/dyldBOOTstrp/dyldbootstrp/dyldbootstrp/dyldbootstrp/dyldbootstrp/dyldbootstrp/dyldbootstrp

The start function code is as follows:

// // This is code to bootstrap dyld. This work in normally done for a program by dyld and crt. // In dyld we have to do  this manually. // uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[], const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue) { // Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536> dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0); // if kernel had to slide dyld, we need to fix up load sensitive locations // we have to do this before using any global variables rebaseDyld(dyldsMachHeader); // kernel sets up env pointer to be just past end of agv array const char** envp = &argv[argc+1]; // kernel sets up apple pointer to be just past end of envp array const char** apple = envp; while(*apple ! = NULL) { ++apple; } ++apple; // set up random value for stack canary __guard_setup(apple); #if DYLD_INITIALIZER_SUPPORT // run all C++ initializers inside dyld runDyldInitializers(argc, argv, envp, apple); #endif _subsystem_init(apple); // now that we are done bootstrapping dyld, call dyld's main uintptr_t appsSlide = appsMachHeader->getSlide(); return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue); }Copy the code

At the end of the function, it will call and return the return value of _mian function in namespace dyld. This function has 853 lines of code, which is the core part of dyld loading.

So where do we start to analyze the code in this function?

We can analyze the code based on the loading process of dyld2 and dyld3:

  1. Check and prepare the environment
  • callsetContextFunction parsing of principal and binaryimage headerAnd other information.

  • Gets a pointer to the path of the executable file

  • Check environment variables (should they be enforceddyld3, whether the shared cache should be forced, whether there is a closure pattern, etc., and make corresponding processing according to different cases)

  1. checkshared cacheWhether they have beenmap, if not, execute firstmap shared cacheOperation.

  1. Determine whether closure mode can be used (dyld3New features)

  1. Is called if closure mode is available and a bootstrap closure is availablelaunchWithClosureFunction to obtainresultValue and return.

  1. If you use non-closure mode (which isdyld2The first step is to instantiate the main binaryimage loader(ImageLoaderType) to verify the main binary sumdyldIs the version of, as shown in the figure below:

  1. checkDYLD_INSERT_LIBRARIES, then load the inserted dynamic library (instantiation)image loader), as shown in the figure below:

  1. performlinkOperation. This is a complex process that recursively loads all the dependent dynamic libraries (the dependencies are sorted, with the dependent always coming first), and performs symbolic binding at this stage, as wellRebase, bindingOperation, as shown in the figure below:
  • Link main program

  • Link to all the dynamic libraries that are inserted

  • Recursively bind the main program and its dependent libraries

  • Recursively bind the inserted dynamic library and its dependencies

  1. Execute the initialization method.OCthe+loadAs well asC++theconstructorMethods are executed at this stage, as follows:

  1. readMach-OtheLC_MAINSegment to get the program’s entry address, calledmainMethod, the code looks like this:

3. Analyze the initialization process

3.1 loadImages Method call process

We’ve seen the _objc_init function in the objc source code, but we don’t know how it’s called. Now let’s look at it. The _objc_init function code looks like this:

/***********************************************************************
* _objc_init
* Bootstrap initialization. Registers our image notifier with dyld.
* Called by libSystem BEFORE library initialization time
**********************************************************************/

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    runtime_init();
    exception_init();
#if __OBJC2__
    cache_t::init();
#endif
    _imp_implementationWithBlock_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}
Copy the code

Enter the breakpoint in the _objc_init function, compile and run the Objc source program, run to the breakpoint, and view the stack information, as follows:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00000001002e7804 libobjc.A.dylib`_objc_init at objc-os.mm:925:9
    frame #1: 0x000000010046588f libdispatch.dylib`_os_object_init + 13
    frame #2: 0x0000000100476a03 libdispatch.dylib`libdispatch_init + 285
    frame #3: 0x00007fff2a6745ff libSystem.B.dylib`libSystem_initializer + 238
    frame #4: 0x00000001000316c7 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535
    frame #5: 0x0000000100031ad2 dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40
    frame #6: 0x000000010002c4b6 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 492
    frame #7: 0x000000010002c421 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 343
    frame #8: 0x000000010002a26f dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 191
    frame #9: 0x000000010002a310 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 82
    frame #10: 0x000000010001686b dyld`dyld::initializeMainExecutable() + 129
    frame #11: 0x000000010001ceb2 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 8702
    frame #12: 0x0000000100015224 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) + 450
    frame #13: 0x0000000100015025 dyld`_dyld_start + 37
Copy the code

Going back to the familiar process, we’ll see that it’s actually the initializeMainExecutable function that’s called in the dyld command space, InitializeMainExecutable is called by the _main function in the dyld command space, which brings us back to the eighth process we explored in part 2. Let’s first look at this function in the dyld source code:

void initializeMainExecutable() { // record that we've reached this step gLinkContext.startedInitializingMainExecutable = true; // Run initialzers for any inserted dylibs // Run to initialize all inserted dynamic libraries, So the inserted the load method of dynamic libraries and the constructor will call ImageLoader: first: InitializerTimingList initializerTimes [allImagesCount ()]; initializerTimes[0].count = 0; const size_t rootCount = sImageRoots.size(); if ( rootCount > 1 ) { for(size_t i=1; i < rootCount; ++i) { sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); } // Run initializers for main executable and everything it brings up // sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); // register cxa_atexit() handler to run static terminators in all loaded images when this process exits if ( gLibSystemHelpers ! = NULL ) (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL); // dump info if requested if ( sEnv.DYLD_PRINT_STATISTICS ) ImageLoader::printStatistics((unsigned int)allImagesCount(),  initializerTimes[0]); if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS ) ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]); }Copy the code

In this function, the runInitializers function is called to run the initialization main program. The code looks like this:

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	uint64_t t1 = mach_absolute_time();
	mach_port_t thisThread = mach_thread_self();
	ImageLoader::UninitedUpwards up;
	up.count = 1;
	up.imagesAndPaths[0] = { this, this->getPath() };
	processInitializers(context, thisThread, timingInfo, up);
	context.notifyBatch(dyld_image_state_initialized, false);
	mach_port_deallocate(mach_task_self(), thisThread);
	uint64_t t2 = mach_absolute_time();
	fgTotalInitTime += (t2 - t1);
}
Copy the code

Inside this function, we call processInitializers, which looks like this:

// <rdar://problem/14412057> upward dylib initializers can be run too soon // To handle dangling dylibs which are upward  linked but not downward, all upward linked dylibs // have their initialization postponed until after the recursion through downward dylibs // has // In order to handle dangling dyLibs that link up instead of down, all uplinked dyLibs delay their initialization until the recursion through the downlinked DyLib has completed. void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread, InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images) { uint32_t maxImageCount = context.imageCount()+2; ImageLoader::UninitedUpwards upsBuffer[maxImageCount]; ImageLoader::UninitedUpwards& ups = upsBuffer[0]; ups.count = 0; // Calling recursive init on all images in images list, // Init a list of uninitialized upward dependencies. // Init a list of uninitialized upward dependencies. for (uintptr_t i=0; i < images.count; ++i) { images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups); } // If any upward dependencies remain, init them. if ( ups.count > 0 ) processInitializers(context, thisThread, timingInfo, ups); }Copy the code

Inside this function, we call recursiveInitialization, and the code in this function looks like this:

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) { recursive_lock lock_info(this_thread); recursiveSpinLock(lock_info); if ( fState < dyld_image_state_dependents_initialized-1 ) { uint8_t oldState = fState; // break cycles fState = dyld_image_state_dependents_initialized-1; Try {// initialize lower level libraries for(unsigned int I =0; i < libraryCount(); ++i) { ImageLoader* dependentImage = libImage(i); if ( dependentImage ! = NULL ) { // don't try to initialize stuff "above" me yet if ( libIsUpward(i) ) { uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) }; uninitUps.count++; } else if ( dependentImage->fDepth >= fDepth ) { dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps); }}} / / record termination order record to terminate the order if (this - > needsTermination ()) context. TerminationRecorder (this); // Objc know we are about to initialize this image // Uint64_t t1 = mach_absolute_time(); fState = dyld_image_state_dependents_initialized; oldState = fState; context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo); Bool hasInitializers = this->doInitialization(context); // let anyone know we finished initializing this image // Let all libraries know we initialized this image fState = dyLD_image_state_initialized; // Let all libraries know we initialized this image fState = dyLD_image_state_initialized; oldState = fState; context.notifySingle(dyld_image_state_initialized, this, NULL); if ( hasInitializers ) { uint64_t t2 = mach_absolute_time(); timingInfo.addTime(this->getShortName(), t2-t1); }} catch (const char* MSG) {// this image is not initialized // fState = oldState; recursiveSpinUnLock(); throw; } } recursiveSpinUnLock(); }Copy the code

So the first thing we can see clearly is that this function is going to be called recursively, it’s going to recursively initialize the library that the main program depends on, and then the context calls notifySingle and sends a notification, so we need to know what notifySingle does, so we need to search globally for notifySingle, The setContext function is assigned, as shown in the figure below:

NotifySingle is a function address, and the code in the function looks like this:

static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo) { //dyld::log("notifySingle(state=%d, image=%s)\n", state, image->getPath()); std::vector<dyld_image_state_change_handler>* handlers = stateToHandlers(state, sSingleHandlers); if ( handlers ! = NULL ) { dyld_image_info info; info.imageLoadAddress = image->machHeader(); info.imageFilePath = image->getRealPath(); info.imageFileModDate = image->lastModified(); for (std::vector<dyld_image_state_change_handler>::iterator it = handlers->begin(); it ! = handlers->end(); ++it) { const char* result = (*it)(state, 1, &info); if ( (result ! = NULL) && (state == dyld_image_state_mapped) ) { //fprintf(stderr, " image rejected by handler=%p\n", *it); // make copy of thrown string so that later catch clauses can free it const char* str = strdup(result); throw str; } } } if ( state == dyld_image_state_mapped ) { // <rdar://problem/7008875> Save load addr + UUID for images from outside the shared cache // <rdar://problem/50432671> Include UUIDs for shared cache dylibs in all image info when using  private mapped shared caches if (! image->inSharedCache() || (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion)) { dyld_uuid_info info; if ( image->getUUID(info.imageUUID) ) { info.imageLoadAddress = image->machHeader(); addNonSharedCacheImageUUID(info); } } } if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit ! = NULL) && image->notifyObjC() ) { uint64_t t0 = mach_absolute_time(); dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); uint64_t t1 = mach_absolute_time(); uint64_t t2 = mach_absolute_time(); uint64_t timeInObjC = t1-t0; uint64_t emptyTime = (t2-t1)*100; if ( (timeInObjC > emptyTime) && (timingInfo ! = NULL) ) { timingInfo->addTime(image->getShortName(), timeInObjC); } } // mach message csdlc about dynamically unloaded images if ( image->addFuncNotified() && (state == dyld_image_state_terminated) ) { notifyKernel(*image, false); const struct mach_header* loadAddress[] = { image->machHeader() }; const char* loadPath[] = { image->getPath() }; notifyMonitoringDyld(true, 1, loadAddress, loadPath); }}Copy the code

How do we analyze it in this function? What’s the point of all this code? Let’s first take a look at what methods are called in this function, as shown in the figure below:

Then when we search globally for these methods, we find that sNotifyObjCInit is assigned in the registerObjCNotifiers function, as follows:

// _dyld_objc_notify_init void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped) { // record functions to call sNotifyObjCMapped = mapped; sNotifyObjCInit = init; sNotifyObjCUnmapped = unmapped; // call 'mapped' function with all images mapped so far try { notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true); } catch (const char* msg) { // ignore request to abort during registration } // <rdar://problem/32209809> call 'init' Function on all images already init' Ed (below libSystem) // Call the 'init' function on all images already init' Ed (below libSystem) (std::vector<ImageLoader*>::iterator it=sAllImages.begin(); it ! = sAllImages.end(); it++) { ImageLoader* image = *it; if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) { dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}Copy the code

The registerObjCNotifiers are called in _dyLD_OBJC_notify_register as follows:

// _dyld_objc_notify_register
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code

And _dyLD_OBJc_notify_register is a familiar function, because we’ve seen it called in the _objc_init function in the objC source code, A call to notifyObjcinit in the notifySingle function is actually a call to load_images in the objC source code, and what load_images does is call all OC classes and all class methods load, as follows:

void load_images(const char *path __unused, const struct mach_header *mh) { if (! didInitialAttachCategories && didCallDyldNotifyRegister) { didInitialAttachCategories = true; loadAllCategories(); } // Return without taking locks if there are no +load methods here. if (! hasLoadMethods((const headerType *)mh)) return; recursive_mutex_locker_t lock(loadMethodLock); // Discover load methods { mutex_locker_t lock2(runtimeLock); prepare_load_methods((const headerType *)mh); } // Call +load methods (without runtimeLock - re-entrant) call_load_methods(); }Copy the code

The code in the call_load_methods function is as follows:

void call_load_methods(void)
{
    static bool loading = NO;
    bool more_categories;

    loadMethodLock.assertLocked();

    // Re-entrant calls do nothing; the outermost call will finish the job.
    if (loading) return;
    loading = YES;

    void *pool = objc_autoreleasePoolPush();

    do {
        // 1. Repeatedly call class +loads until there aren't any more
        while (loadable_classes_used > 0) {
            call_class_loads();
        }

        // 2. Call category +loads ONCE
        more_categories = call_category_loads();

        // 3. Run more +loads if there are classes OR more untried categories
    } while (loadable_classes_used > 0  ||  more_categories);

    objc_autoreleasePoolPop(pool);

    loading = NO;
}
Copy the code

Of which call_class_loads function is the class of methods for all load call, call_category_loads function is to call all the class method in the classification of load, the code is as follows:

typedef void(*load_method_t)(id, SEL); Static void call_class_loads(void) {int I; // Detach current loadable list. struct loadable_class *classes = loadable_classes; int used = loadable_classes_used; loadable_classes = nil; loadable_classes_allocated = 0; loadable_classes_used = 0; // Call all +loads for the detached list. for (i = 0; i < used; i++) { Class cls = classes[i].cls; load_method_t load_method = (load_method_t)classes[i].method; if (! cls) continue; if (PrintLoading) { _objc_inform("LOAD: +[%s load]\n", cls->nameForLogging()); } (*load_method)(cls, @selector(load)); } // Destroy the detached list. if (classes) free(classes); Static bool call_category_loads(void) {int I, shift; bool new_categories_added = NO; // Detach current loadable list. struct loadable_category *cats = loadable_categories; int used = loadable_categories_used; int allocated = loadable_categories_allocated; loadable_categories = nil; loadable_categories_allocated = 0; loadable_categories_used = 0; // Call all +loads for the detached list. for (i = 0; i < used; i++) { Category cat = cats[i].cat; load_method_t load_method = (load_method_t)cats[i].method; Class cls; if (! cat) continue; cls = _category_getClass(cat); if (cls && cls->isLoadable()) { if (PrintLoading) { _objc_inform("LOAD: +[%s(%s) load]\n", cls->nameForLogging(), _category_getName(cat)); } (*load_method)(cls, @selector(load)); cats[i].cat = nil; }}... }Copy the code

This process is why the loadImages function is called in the objC source code, The _objc_init function is called by the _os_object_init function in the libDispatch function of the dylib library (as you can see from the stack of _objc_init calls). The code looks like this:

void
_os_object_init(void)
{
	_objc_init();
	Block_callbacks_RR callbacks = {
		sizeof(Block_callbacks_RR),
		(void (*)(const void *))&objc_retain,
		(void (*)(const void *))&objc_release,
		(void (*)(const void *))&_os_objc_destructInstance
	};
	_Block_use_RR2(&callbacks);
#if DISPATCH_COCOA_COMPAT
	const char *v = getenv("OBJC_DEBUG_MISSING_POOLS");
	if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
	v = getenv("DISPATCH_DEBUG_MISSING_POOLS");
	if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
	v = getenv("LIBDISPATCH_DEBUG_MISSING_POOLS");
	if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
#endif
}
Copy the code

The _os_object_init function is called by libDispatch_init in the libDispatch library

void libdispatch_init(void) { dispatch_assert(sizeof(struct dispatch_apply_s) <= DISPATCH_CONTINUATION_SIZE); . . . _os_object_init(); _voucher_init(); _dispatch_introspection_init(); }Copy the code

The libdispatch_init function is called by the initializer _libdispatch_init function, as shown below:

DISPATCH_NOTHROW __attribute__((constructor))
void
_libdispatch_init(void);

DISPATCH_NOTHROW
void
_libdispatch_init(void)
{
	libdispatch_init();
}
Copy the code

The _libdispatch_init function is called in the libSystem initializer function of the library (as you can see from the stack of _objc_init calls), and part of the code looks like this:

extern void libdispatch_init(void); // from libdispatch.dylib __attribute__((constructor)) static void libSystem_initializer(int argc, const char* argv[], const char* envp[], const char* apple[], const struct ProgramVars* vars) { static const struct _libkernel_functions libkernel_funcs = { .version = 4, #if ! TARGET_OS_DRIVERKIT .dlsym = dlsym, #endif .malloc = malloc, .free = free, .realloc = realloc, ._pthread_exit_if_canceled = _pthread_exit_if_canceled, // V2 functions (removed) // V3 functions .pthread_clear_qos_tsd = _pthread_clear_qos_tsd, // V4 functions .pthread_current_stack_contains_np = pthread_current_stack_contains_np, }; . . . libdispatch_init(); _libSystem_ktrace_init_func(LIBDISPATCH); . . . }Copy the code

The libSystem_initializer function is called by doModInitFunctions in the dyld library. The code for libSystem_initializer is as follows:

void ImageLoaderMachO::doModInitFunctions(const LinkContext& context) { if ( fHasInitializers ) { const uint32_t cmd_count = ((macho_header*)fMachOData)->ncmds; const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)]; const struct load_command* cmd = cmds; for (uint32_t i = 0; i < cmd_count; ++i) { if ( cmd->cmd == LC_SEGMENT_COMMAND ) { const struct macho_segment_command* seg = (struct macho_segment_command*)cmd; const struct macho_section* const sectionsStart = (struct macho_section*)((char*)seg + sizeof(struct macho_segment_command)); const struct macho_section* const sectionsEnd = &sectionsStart[seg->nsects]; for (const struct macho_section* sect=sectionsStart; sect < sectionsEnd; ++sect) { const uint8_t type = sect->flags & SECTION_TYPE; if ( type == S_MOD_INIT_FUNC_POINTERS ) { Initializer* inits = (Initializer*)(sect->addr + fSlide); const size_t count = sect->size / sizeof(uintptr_t); // <rdar://problem/23929217> Ensure __mod_init_func section is within segment if ( (sect->addr < seg->vmaddr) || (sect->addr+sect->size > seg->vmaddr+seg->vmsize) || (sect->addr+sect->size < sect->addr) ) dyld::throwf("__mod_init_funcs section has malformed address range for %s\n", this->getPath()); for (size_t j=0; j < count; ++j) { Initializer func = inits[j]; <rdar:// Problem /8543820&9228031> Verify initializers are in the image if (! this->containsAddress(stripPointer((void*)func)) ) { dyld::throwf("initializer function %p not in mapped image for %s\n", func, this->getPath()); } if ( ! dyld::gProcessInfo->libSystemInitialized ) { // <rdar://problem/17973316> libSystem initializer must run first // The libSystem library initializer must be the first to be called const char* installPath = getInstallPath(); if ( (installPath == NULL) || (strcmp(installPath, libSystemPath(context)) ! = 0) ) dyld::throwf("initializer in image (%s) that does not link with libSystem.dylib\n", this->getPath()); } if ( context.verboseInit ) dyld::log("dyld: calling initializer function %p in %s\n", func, this->getPath()); bool haveLibSystemHelpersBefore = (dyld::gLibSystemHelpers ! = NULL); { dyld3::ScopedTimer(DBG_DYLD_TIMING_STATIC_INITIALIZER, (uint64_t)fMachOData, (uint64_t)func, 0); func(context.argc, context.argv, context.envp, context.apple, &context.programVars); } bool haveLibSystemHelpersAfter = (dyld::gLibSystemHelpers ! = NULL); if ( ! haveLibSystemHelpersBefore && haveLibSystemHelpersAfter ) { // now safe to use malloc() and other calls in libSystem.dylib dyld::gProcessInfo->libSystemInitialized = true; } } } else if ( type == S_INIT_FUNC_OFFSETS ) { const uint32_t* inits = (uint32_t*)(sect->addr + fSlide); const size_t count = sect->size / sizeof(uint32_t); // Ensure section is within segment if ( (sect->addr < seg->vmaddr) || (sect->addr+sect->size > seg->vmaddr+seg->vmsize)  || (sect->addr+sect->size < sect->addr) ) dyld::throwf("__init_offsets section has malformed address range for %s\n", this->getPath()); if ( seg->initprot & VM_PROT_WRITE ) dyld::throwf("__init_offsets section is not in read-only segment %s\n", this->getPath()); for (size_t j=0; j < count; ++j) { uint32_t funcOffset = inits[j]; // verify initializers are in image if ( ! this->containsAddress((uint8_t*)this->machHeader() + funcOffset) ) { dyld::throwf("initializer function offset 0x%08X not in mapped image for %s\n", funcOffset, this->getPath()); } if ( ! dyld::gProcessInfo->libSystemInitialized ) { // <rdar://problem/17973316> libSystem initializer must run first const char* installPath = getInstallPath(); if ( (installPath == NULL) || (strcmp(installPath, libSystemPath(context)) ! = 0) ) dyld::throwf("initializer in image (%s) that does not link with libSystem.dylib\n", this->getPath()); } Initializer func = (Initializer)((uint8_t*)this->machHeader() + funcOffset); if ( context.verboseInit ) dyld::log("dyld: calling initializer function %p in %s\n", func, this->getPath()); #if __has_feature(ptrauth_calls) func = (Initializer)__builtin_ptrauth_sign_unauthenticated((void*)func, ptrauth_key_asia, 0); #endif bool haveLibSystemHelpersBefore = (dyld::gLibSystemHelpers ! = NULL); { dyld3::ScopedTimer(DBG_DYLD_TIMING_STATIC_INITIALIZER, (uint64_t)fMachOData, (uint64_t)func, 0); func(context.argc, context.argv, context.envp, context.apple, &context.programVars); } bool haveLibSystemHelpersAfter = (dyld::gLibSystemHelpers ! = NULL); if ( ! haveLibSystemHelpersBefore && haveLibSystemHelpersAfter ) { // now safe to use malloc() and other calls in libSystem.dylib dyld::gProcessInfo->libSystemInitialized = true; } } } } } cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize); }}}Copy the code

The code in this function is pretty long, too, but what’s the point? In fact, the role of this function is used to invoke the main program and the main program depends on library in all of the constructors to initialize, but in the process of call, must first ensure that the bottom is dependent on library libSystem is initialized first, and then in turn upward initialization depends on library, until the initialization of the main program itself all the constructors to initialize, The doModInitFunctions are called in the doInitialization function. The code looks like this:

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
	CRSetCrashLogMessage2(this->getPath());

	// mach-o has -init and static initializers
	doImageInit(context);
	doModInitFunctions(context);
	
	CRSetCrashLogMessage2(NULL);
	
	return (fHasDashInit || fHasInitializers);
}
Copy the code

When doInitialization is called in recursiveInitialization, it is called after the context.notifysingle call, so there is a perfect closed loop of function calls.

But at this moment you might have a lot of doubt, that is if you are in the main projects at the same time wrote a constructor and rewrite the class load method in the class, you will find in the process of program is running, will call the load method, and then calls the constructor, the last call the main function, code and operation result is as follows:

@interface Person: NSObject @end @implementation Person + (void)load {NSLog(@" Load "); } @end __attribute__((constructor)) void initFun() {NSLog(@" constructor "); } int main(int argc, const char * argv[]) {@autoreleasepool {NSLog(@" execute main "); } return 0; }Copy the code

Execute the output:

And as we’ve seen before, if you want to call sNotifyObjCInit in the context notifySingle, you have to assign sNotifyObjCInit to the ObjC library initializer, To call the _objc_init function of the ObjC library, you must first call dyld’s doInitialization function, but in dyld’s code, the library’s doInitialization function is called after notifySingle. Shouldn’t the result be that the initialization method of the main program is called before the load method? But in practice, the opposite is true. Why?

You should know that this is a recursive initialization process. You need to recursively initialize the libraries that the main project depends on. LibSystem is the bottom library, and libDispath is initialized when the libSystem initialization function is called. The libDispath initializer calls the objc initializer, and then the main program initializer is executed. By the time notifySingle is called in the main program initializer, the objC initializer has been executed. When dyld’s sNotifyObjCInit is assigned and OC is loaded (see map_images process for details), you can call all OC’s load methods in the project. Then doInitialization is performed to call the constructor of the main project.

3.2 Map_images Function call Process

If you dig into the map_images function, you’ll see that this function is used to load OC, but when is this function called?

First of all, we know that map_images is called by calling the function _dyLD_OBJC_notify_register in dyld and passing it to dyld as a pointer, which is different from passing load_images. Why do we do that? The reason is that the OC class load is so important that the map_images function is called synchronously, but how is the pass to dyld called? So let’s look at the _dyLD_OBJC_notify_register source code, which again calls the registerObjCNotifiers, The registerObjCNotifiers assign a pointer to the map_image function to sNotifyObjCMapped, so we just need to see where the static variable is called. Search globally for sNotifyObjCMapped as follows:

static void notifyBatchPartial(dyld_image_states state, bool orLater, dyld_image_state_change_handler onlyHandler, bool preflightOnly, bool onlyObjCMappedNotification) { std::vector<dyld_image_state_change_handler>* handlers = stateToHandlers(state, sBatchHandlers); if ( (handlers ! = NULL) || ((state == dyld_image_state_bound) && (sNotifyObjCMapped ! = NULL)) ) { ... . . // tell objc about new images if ( (onlyHandler == NULL) && ((state == dyld_image_state_bound) || (orLater && (dyld_image_state_bound > state))) && (sNotifyObjCMapped ! = NULL) ) { const char* paths[imageCount]; const mach_header* mhs[imageCount]; unsigned objcImageCount = 0; for (int i=0; i < imageCount; ++i) { ImageLoader* image = findImageByMachHeader(infos[i].imageLoadAddress); bool hasObjC = false; if ( image ! = NULL ) { if ( image->objCMappedNotified() ) continue; hasObjC = image->notifyObjC(); } #if SUPPORT_ACCELERATE_TABLES else if ( sAllCacheImagesProxy ! = NULL ) { const mach_header* mh; const char* path; unsigned index; if ( sAllCacheImagesProxy->addressInCache(infos[i].imageLoadAddress, &mh, &path, &index) ) { hasObjC = (mh->flags & MH_HAS_OBJC); } } #endif if ( hasObjC ) { paths[objcImageCount] = infos[i].imageFilePath; mhs[objcImageCount] = infos[i].imageLoadAddress; ++objcImageCount; if ( image ! = NULL ) image->setObjCMappedNotified(); } } if ( objcImageCount ! = 0 ) { dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_MAP, 0, 0, 0); uint64_t t0 = mach_absolute_time(); (*sNotifyObjCMapped)(objcImageCount, paths, mhs); uint64_t t1 = mach_absolute_time(); ImageLoader::fgTotalObjCSetupTime += (t1-t0); }}}... . . }}Copy the code

Then let’s search to see where notifyBatchPartial is called. The code looks like this:

When sNotifyObjcmcmapped is called, it executes the map_images function in objC to load OC, which we will discuss in more detail in the next article.

4. Map_images and load_images call process mind maps