Application loading

Libraries: Executable binary files that can be loaded into memory by the system. Library is divided into two kinds, one is static library, one is dynamic library (.so.DLL. Framework…) Static libraries: load them sequentially, and may be added repeatedly. Dynamic libraries: Load them only when they are needed, not repeatedly. Shared memory reduces the package size, so Apple’s libraries are all dynamic

The build process

Executable file

Build any project and click on it after successShow in Finder -> Show the package contents, the black one is the executable mach-o. To run, drag a Mach-o file directly to the terminal, where ios projects require authorization to open the emulator, and MAC projects run directly.

Dyld Dynamic linker

Image: Library mapped to memory is an image

Dyld Load process

Exploration helps us look at the source code and go to the _objc_init function and see a little bit more, and see the comment here, 1. 2. Call libSystem before the library is loaded

/*********************************************************************** * _objc_init * Bootstrap initialization. Registers our image notifier with dyld. * Called by libSystem BEFORE library initialization time * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    // Fix delaying initialization until a usable objc mirror is found
    environ_init();
    tls_init();
    static_init();
    runtime_init();
    exception_init();
#if __OBJC2__
    cache_t::init();
#endif
    _imp_implementationWithBlock_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}
Copy the code

When we put a breakpoint on the main function of the iOS project, we found thatmainBefore the function is called,ViewControllertheloadMethod has been calledloadMethods in themainBefore the call.Focusing on theloadMethod, I’m going to go ahead and put a breakpoint here and see what methods were called before loadBecause this is a stack structure, first in, then out, so the analysis from the bottom up. At this point we found the Dyld library, which we started fromopensourceDownload the latest dyld library to analyze, this library underlying dependency is more, so temporarily running up, but does not prevent us from analyzing 1._dyld_start 2.dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) 3.dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) 4.dyld::useSimulatorDyld(int, macho_header const*, char const*, int, char const**, char const**, char const**, unsigned long*, unsigned long*) 5.dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) 6.dyld::initializeMainExecutable() 7.ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) 8.ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) 9.dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) 10.load_imagesWe also opendyldThis library, from method 1 to method 2Continue to searchdyldbootstrapFind the one in this namespacestartFunction corresponds to the console method 2startThe last line of the function returnsdyld::_mainWith method 3 above, we have now finished booting dyldbootstrap, calling dyldmainThe function,I looked through it and there were about a thousand lines in there. I don’t know where to start, so let’s start with returnresultThis function has in itresultThere’s not a lot of places, look at the assignment and the comment “Find the entry point to the main executable.”sMainExecutable Continue to searchsMainExecutableSo let’s see what it does, and I glanced at it, and I found that it was right. Result andsMainExecutable1. Initializes the image file loader for the main executable, instantiating the main program2. Load any inserted libraries3. The link of the main program4. Dynamic library comments inserted by Link (do this after link’s main program so that dylibs are inserted)5. Weak references bind to the main program (after all image files are linked)6. Run all initialization runs corresponding to the console method 66.1 Methods to run the main executable and all libraries corresponding to the console 76.2 Image File Loading and Initialization ProcedureWe look atImageLoader::processInitializersNote: The upward-linked dylib initializer is very fast, and in order to handle dangling Dylibs that link up rather than down, all upward-linked dylibs defer their initialization until the recursions through the down-linked Dylibs are complete. 602 Line comment: Call the recursive init of the image file in the mirror list to build a new list of uninitialized dependenciesGlobal searchrecursiveInitialization(constGo to the recursive initialization method of the image file, and look at the comments to let Objc know that we’re going to initialize the image by first initializing the dependent libraries of the image, which is found herenotifySingleFunction corresponds to the console method 96.3 findnotifySingleFunction to notify objc that the image file is initializedstatic _dyld_objc_notify_init sNotifyObjCInit Find where this assignment is_dyld_objc_notify_initI found a familiar figure _dyld_objc_notify_registerThat’s what we are_objc_initThe bottom source appeared, at this point officially from dyLD jump out of the bottom C travel a closed loop7. Notify any monitoring processes that the process is about to enter main

_dyld_objc_notify_register(&map_images, load_images, unmap_image)

The first argument to map_images is the class loading, protocol properties ro/ RW class initialization, lazy loading, etc. Load_imagesload is a collection of methods and we need to know the assignment and call time of the three arguments in it.

// _dyld_objc_notify_register
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code

Enter dyld: : registerObjCNotifiers method, see the assignment in this method

// _dyld_objc_notify_init
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;
	sNotifyObjCUnmapped = unmapped;
 	// ...
}
Copy the code

Timing of the sNotifyObjCMapped call

The global search locates the static void notifyBatchPartial method, and we see a line of calls in it

				(*sNotifyObjCMapped)(objcImageCount, paths, mhs);
Copy the code

Continue searching for static void notifyBatchPartial, and we find this assignment

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;
	sNotifyObjCUnmapped = unmapped;

	// call 'mapped' function with all images mapped so far
	try {
		notifyBatchPartial(dyld_image_state_bound, true.NULL.false.true); }}Copy the code

That is, after sNotifyObjCMapped, (*sNotifyObjCMapped) is called.

The call time of sNotifyObjCInit

The global search found calls in this function and found two places to call, one in the registerObjCNotifiers function

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to callsNotifyObjCMapped = mapped; sNotifyObjCInit = init; sNotifyObjCUnmapped = unmapped; . (*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
}
Copy the code

That is, after sNotifyObjCInit is assigned, (*sNotifyObjCInit) is also called; Another place is in static void notifySingle

whyloadBefore C++ functions?

So we know from that(*sNotifyObjCInit)In the methodnotifySingleWe know from the order in which the methods are called that the image file is called in the recursive loadnotifySinglefunctionIn this order, it would be natural to assume that c++ methods are called in order before the load method, but is that really the case? 1. Write a c++ function in place of main

__attribute__((constructor)) void testFunc(a){
    printf("Coming: %s \n",__func__);
}
Copy the code

2. Implement the load() method in Person

+ (void)load {
    printf("Coming: %s \n",__func__);
}
Copy the code

3. Implement a c++ function in the source codeBreak point on main. Find the call order like thisSo you can get the sequence image file in C++ > load > current project in C++

Introduced libSystem

From the above analysis, we know that the registration notification notifying OBJc when dyld has finished loading the image file_dyld_objc_notify_registerAt this time and_objc_initHere echo, so we directly in the source code under the breakpoint, you can see the call stack inside the method call order, in_dyld_start ->_os_object_init -> _objc_initThere’s one step in the middle that we haven’t analyzed yet, which is_os_object_initThis is alibdispatchThe library, we’re going to go straight fromopenSourceOn the download

libdispatch/libSystem

Open the LibDispatch library and search to locate_os_object_initThe method is called internally_objc_init(This is the objc method, not the init of dispatchd) methodlibdispatch_initMethod of,libSystem.B.dyliblibSystem_initializer-> _libdispatch_init->libdispatch.dyliblibdispatch_init -> libdispatch.dylib_os_object_init->libobjc.A.dylib_objc_initIn summary, it is once again verified that the execution of the four lines of code circled by the call stack in the figure above involveslibdispatchandlibSystemlibrary

Dyld Load process

Open dyld again and searchdoModInitFunctionsWe see a line commenting that libSystem Initializer must be loaded first. You can guess that this method is loadinglibSystemLibrary, the call to this method is actually inImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&)While the image file dependency loading in 6.2 above is called belowThus, a closed loop of the entire analysis is formed.

supplement

image list

To view the images used by the current project, useimage listCommand to find the local library path

The difference between dyLD2 and dyLD3

In iOS 13, the new DyLD 3 will be adopted across iOS to replace dyLD 2. Before we start, let’s show you a screenshot from wwDC2017/413 session to show the difference between dyLD 2 and DyLD 3

dyld2

According to the figure above and dyLD source code, the main workflow of DYLD 2 is:

  • Initialization of dyld, the main code in dyldbootstrap::start, then run dyld::_main, dyld::_main more code, is the core part of dyld loading;
  • Check and prepare the environment, such as getting the binary path, checking the environment variables, resolving the image header of the main binary, etc.
  • Instantiate the image Loader of the master binary and verify that the versions of the master binary and dyld match.
  • Check whether the shared cache is mapped. If not, perform the map shared cache operation first.
  • Check DYLD_INSERT_LIBRARIES and load the inserted dynamic library (instantiate the Image Loader).
  • The link operation is performed. This is a complex process that recursively loads all the dependent dynamic libraries (which are sorted with the dependent libraries always first), and performs symbolic binding, as well as rebase and binding operations.
  • Execute the initialization method. OC’s +load and C’s constructor methods are executed at this stage;
  • Read the LC_MAIN section of Mach-o to get the program’s entry address and call the main method.

dyld3

Dyld 3 isn’t new to WWDC19, it was introduced to iOS 11 back in 2017 to optimize the system library. Now, it will also be used to launch third-party apps in iOS 13, completely replacing Dyld 2. Since the dyLD 3 code is not open source, it is currently only possible to know what improvements have been made through official disclosures. The best thing about Dyld 3 is that it’s partly out-of-process and cached, so when you open the APP, a lot of the work is actually done.

Dyld 3 contains three components:

  • An out-of-process Mach-O analyzer/compiler

Parse Mach-o Headers and Find Dependencies are a security risk in dyLD 2’s loading process (you can attack them by modifying the Mach-o headers and adding an illegal @rpath). Perform symbol lookups takes more CPU time because the symbol will always be at the same offset in the library if the library file is unchanged. These two parts will form a Lauch closure in DyLD 3 by caching the resulting data into a file using write ahead.

  • The engine that executes Lauch Closure in this process

Verify that “lauch closures” is correct, map dylib, and execute main. At this point, it no longer has to analyze the Mach-o header and perform symbol lookups, saving a lot of time.

  • Lauch Closure cache

Lauch Closure for system applications is built directly into shared Cache. For third-party applications, it will be generated when the APP is installed or updated, ensuring that Lauch Closure is always ready before the APP is opened. Overall, dyLD 3 takes care of a lot of time consuming operations ahead of time, greatly improving startup time.