preface

The author sorted out a series of low-level articles about OC, hoping to help you.

1. Alloc principle of OC object creation in iOS

2. Align the OC object memory of iOS

3. The underlying principle of iOS OC ISA

4. Structural analysis of OC source code of iOS

5. Source code analysis of OC method cache of iOS

6. Search principle of OC method of iOS

7. Resolution and message forwarding principle of OC method of iOS

There are a lot of low-level libraries that you rely on during App loading, but what are libraries? A library is a binary of executable code that can be recognized by the operating system and written to memory. There are static libraries and dynamic libraries respectively in the bottom library.

1. Static and dynamic libraries

There is a flow during the compilation of a program, and this flow is as follows

  • Precompilation: this is mainly macro substitution, where the imported header is replaced with the code in the header file, and the precompiled instructions starting with # are expanded, such as #define,#include,#import.
  • Compilation: conducts lexical analysis, syntax analysis and semantic analysis on the pre-compiled files, optimizes the source code, and then generates assembly code;
  • Assembly: The use of an assembler to convert assembly code into instructions that can be executed by the machine and generate object files. O files
  • Link: Link object files to used static libraries and dynamic libraries into executable files.

Static libraries: In the linking phase, assembler generated objects are linked and packaged into an executable along with the referenced libraries.

Dynamic libraries: program compilation is not linked to object code, but is loaded while the program is running.

In this process, it is obvious that dynamic library has advantages over static library. In this way, dynamic library can reduce the size of packaged App, share content to save resources, and update App program through updating dynamic library. In iOS system, the system libraries we use are generally dynamic libraries, such as UIKit, libdispatch, libobJC.dyld and so on.

2.dyld

Dyld’s full name is The Dynamic Link Editor, which is an important part of Apple’s operating system. After applications are compiled and packaged into executable file format Mach-O, DyLD is responsible for linking and loading programs. Therefore, in iOS system, App startup and loading are all completed in dyLD, the dynamic linker. And Apple has also opened the source code of this part. If necessary, you can go to Apple to download the official source code. This article introduces DYLD based on the source code of DYLD-635.2. To facilitate the rest of the presentation, I created a demo project and added a load method to the ViewController and a breakpoint to the load method. This is in the real machine debugging to get.

This is done in the simulator, but it’s a little bit different.The following process is introduced through the mode of the real machine, can also be inlldbUsing directivebtTo view the

Thread #1, queue = 'com.apple.main-thread', stop reason = breakPoint 2.1 * frame #0: 0x0000000100942578 dyldDemo`+[ViewController load](self=ViewController, _cmd="load") at ViewController.m:19:5 frame #1: 0x00000001b5b31e78 libobjc.A.dylib`load_images + 908 frame #2: 0x00000001009a20d4 dyld`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 448 frame #3: 0x00000001009b15b8 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 524 frame #4: 0x00000001009b0334 dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 184 frame #5: 0x00000001009b03fc dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 92 frame #6: 0x00000001009a2420 dyld`dyld::initializeMainExecutable() + 216 frame #7: 0x00000001009a6db4 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 4616 frame #8: 0x00000001009a1208 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) + 396 frame #9: 0x00000001009a1038 dyld`_dyld_start + 56 (lldb)Copy the code

As you can see from the program’s call stack, the entry to dyLD is _dyLD_START.

2.1 _dyld_start

Can be found inlldbThrough theupCommand or click on the call stack to go to_dyld_startAssembly source codeYou can see that one is calledc++The method is indyldbootstrapCall from this scopestartThis function, throughdyldThe source can be searched firstdyldbootstrapScope and then searchstartYou can find it bycmd + shift + jCan be located in thedyldInitialization.cppIn this file.

// // This is code to bootstrap dyld. This work in normally done for a program by dyld and crt. // In dyld we have to do  this manually. // uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[], intptr_t slide, const struct macho_header* dyldsMachHeader, uintptr_t* startGlue) { // if kernel had to slide dyld, we need to fix up load sensitive locations // we have to do this before using any global variables slide = slideOfMainExecutable(dyldsMachHeader); bool shouldRebase = slide ! = 0; #if __has_feature(ptrauth_calls) shouldRebase = true; #endif if ( shouldRebase ) { rebaseDyld(dyldsMachHeader, slide); } // allow dyld to use mach messaging mach_init(); // kernel sets up env pointer to be just past end of agv array const char** envp = &argv[argc+1]; // kernel sets up apple pointer to be just past end of envp array const char** apple = envp; while(*apple ! = NULL) { ++apple; } ++apple; // set up random value for stack canary __guard_setup(apple); #if DYLD_INITIALIZER_SUPPORT // run all C++ initializers inside dyld runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple); #endif // now that we are done bootstrapping dyld, call dyld's main uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader); return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue); }Copy the code

The appsMachHeader argument is the header of Mach-O. Slide is an ASLR random value. When each MachO is loaded into memory, a variable will be randomly added to ensure the variable memory distribution, which is also a means of cache overflow. So the main function of start is this

  • 1. Calculate through the main program memoryASLROffset value.
  • 2. Start to passrebaseDyldStart redirectiondyldAnd through themach_init()To initialize.
  • Through 3.__guard_setupTo protect against stack overflow.

The next step is to execute the function dyld::_main.

2.2 dyld: : _main function

The function dyld::_main is in the dyld. CPP file.

// // Entry point for dyld. The kernel loads dyld and jumps to __dyld_start which // sets up some registers and call this function. // // Returns address of main() in target program which __dyld_start jumps to // uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], Const char* apple[], uintptr_t* startGlue)Copy the code

This function is mainly used to load MachO, and the key function for App startup is also inside the dyld::_main function. The next section of this article will examine only a fragment of the code.

This section is used to configure environment variables.

 // Grab the cdHash of the main executable from the environment
uint8_t mainExecutableCDHashBuffer[20];
const uint8_t* mainExecutableCDHash = nullptr;
if ( hexToBytes(_simple_getenv(apple, "executable_cdhash"), 40, mainExecutableCDHashBuffer) )
		mainExecutableCDHash = mainExecutableCDHashBuffer;

// Trace dyld's load
notifyKernelAboutImage((macho_header*)&__dso_handle, _simple_getenv(apple, "dyld_file"));
Copy the code

This part of the content is sMainExecutableMachHeader used to obtain the head of the main program of MachO, ASLR sMainExecutableSlide used to retrieve the main program.

uintptr_t result = 0;
sMainExecutableMachHeader = mainExecutableMH;
sMainExecutableSlide = mainExecutableSlide;
Copy the code

Here is the setting context information, including some callback functions and parameters and some flag information is set here.

CRSetCrashLogMessage("dyld: launch started");
setContext(mainExecutableMH, argc, argv, envp, apple);
Copy the code

Whether configureProcessRestrictions configuration process, whether the current process will appear in this function. CheckEnvironmentVariables is testing environment variable. These configurations and checks are loaded by DyLD itself. For example, these environment variables can set whether third-party libraries are loaded or not.

configureProcessRestrictions(mainExecutableMH); . checkEnvironmentVariables(envp);Copy the code

DYLD_PRINT_OPTS = MachO; DYLD_PRINT_ENV = MachO; Specific can be achieved manually.

if ( sEnv.DYLD_PRINT_OPTS )
	printOptions(argv);
if ( sEnv.DYLD_PRINT_ENV ) 
	printEnvironmentVariables(envp);
getHostInfo(mainExecutableMH, mainExecutableSlide);
Copy the code

2.2.1 Loading the Shared Cache

At this point it is time to load the shared cache, where checkSharedRegionDisable checks the disabled status of the cache.

// load shared cache checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide); . if ( gLinkContext.sharedRegionMode ! = ImageLoader::kDontUseSharedRegion ) { mapSharedCache(); } static void checkSharedRegionDisable(const dyld3::MachOLoaded* mainExecutableMH, uintptr_t mainExecutableSlide) { #if __MAC_OS_X_VERSION_MIN_REQUIRED // if main executable has segments that overlap the  shared region, // then disable using the shared region if ( mainExecutableMH->intersectsRange(SHARED_REGION_BASE, SHARED_REGION_SIZE) ) { gLinkContext.sharedRegionMode = ImageLoader::kDontUseSharedRegion; if ( gLinkContext.verboseMapping ) dyld::warn("disabling shared region because main executable overlaps\n"); } #if __i386__ if ( ! gLinkContext.allowEnvVarsPath ) { // <rdar://problem/15280847> use private or no shared region for suid processes gLinkContext.sharedRegionMode = ImageLoader::kUsePrivateSharedRegion; } #endif #endif // iOS cannot run without shared region }Copy the code

According to the source code, iOS shared cache cannot be disabled. The checkSharedRegionDisable function does not make sense for iOS. The shared cache is where you put some system libraries, like UIKit, Foundation. The mapSharedCache function is used to load the shared cache library. The mapSharedCache function loadDyldCache(opts, &sSharedCacheloadInfo) is used to load shared caches, which only load the current process of the shared cache, if there is a shared cache is not loaded again. Next, after loading the shared cache, the main program is instantiated.

2.2.2 reloadAllImages

This code is to load the main program, including instantiateFromLoadedImage function through isCompatibleMachO access to judging mach_header can get to the magic is 64 bit or 32 bit. Access to judging cputype MachO file compatibility, compatibility if meet can be created by instantiateMainExecutable ImageLoader. Then addImage(image) into the reloadAllImages. We’re still initializing the main program.

reloadAllImages:
#endif
	CRSetCrashLogMessage(sLoadingCrashMessage);
	// instantiate ImageLoader for main executable
	sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
//======================
// The kernel maps in main executable before dyld gets control.  We need to 
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	// try mach-o loader
	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
		addImage(image);
		return (ImageLoaderMachO*)image;
	}
	
	throw "main executable not a known format";
}

Copy the code

In the instantiateMainExecutable function sniffLoadCommands function is a function of the main instantiation. The ImageLoader that this function loads is an abstract class.

// determine if this mach-o file has classic or compressed LINKEDIT and number of segments it has void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed, unsigned int* segCount, unsigned int* libCount, const LinkContext& context, const linkedit_data_command** codeSigCmd, const encryption_info_command** encryptCmd) { *compressed = false; *segCount = 0; *libCount = 0; *codeSigCmd = NULL; *encryptCmd = NULL; . // fSegmentsArrayCount is only 8-bits if ( *segCount > 255 ) dyld::throwf("malformed mach-o image: more than 255 segments in %s", path); // fSegmentsArrayCount is only 8-bits if ( *libCount > 4095 ) dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path); if ( needsAddedLibSystemDepency(*libCount, mh) ) *libCount = 1; }Copy the code

  • Compressed: representsMachOtheLC_DYLD_INFO_ONLYTo load the program.
  • SegCount:LC_SEGMENTCommand length, through the source can be knownsegCountNo more than255The article.
  • LibCount:LC_LOAD_DYLIBCommand length, is the system library, through the source can be knownlibCountNo more than4095The article.
  • CodeSigCmd: signature of the code.
  • EncryptCmd: Encrypts information, such as application uploads and shells.
2.2.3 Loading of dynamic library

After the main program is loaded, it’s time to load the dynamic library. Use the environment variable DYLD_INSERT_LIBRARIES to determine whether to insert the library. If the environment variable has a value, the insert dynamic library is loaded by the load function in the loadInsertedDylib function. ` `

// load any inserted libraries if( sEnv.DYLD_INSERT_LIBRARIES ! = NULL ) { for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! = NULL; ++lib) loadInsertedDylib(*lib); } static void loadInsertedDylib(const char* path) { ImageLoader* image = NULL; unsigned cacheIndex; try { LoadContext context; context.useSearchPaths = false; context.useFallbackPaths = false; context.useLdLibraryPath = false; context.implicitRPath = false; context.matchByInstallName = false; context.dontLoad = false; context.mustBeBundle = false; context.mustBeDylib = true; context.canBePIE = false; context.enforceIOSMac = true; context.origin = NULL; // can't use @loader_path with DYLD_INSERT_LIBRARIES context.rpath = NULL; image = load(path, context, cacheIndex); } catch (const char* msg) { if ( gLinkContext.allowInsertFailures ) dyld::log("dyld: warning: could not load inserted library '%s' into hardened process because %s\n", path, msg); else halt(dyld::mkstringf("could not load inserted library '%s' because %s\n", path, msg)); } catch (...) { halt(dyld::mkstringf("could not load inserted library '%s'\n", path)); }}Copy the code

SInsertedDylibCount is the number of dynamic libraries that have been inserted.

// record count of inserted libraries so that a flat search will look at 
// inserted libraries, then main, then others.
sInsertedDylibCount = sAllImages.size()-1;
// link main executable
gLinkContext.linkingMainExecutable = true;
Copy the code

The real link to the dynamic library is the link function. The tripartite libraries are loaded and linked (symbolic binding) through the recursiveLoadLibraries function in the link function.

// link any inserted libraries // do this after linking main executable so that any dylibs pulled in by inserted // dylibs (e.g. libSystem) will not be in front of dylibs the program uses if ( sInsertedDylibCount > 0 ) { for(unsigned int i=0; i < sInsertedDylibCount; ++i) { ImageLoader* image = sAllImages[i+1]; link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1); image->setNeverUnloadRecursive(); }... } void ImageLoader::link(const LinkContext& context, bool forceLazysBound, bool preflightOnly, bool neverUnload, const RPathChain& loaderRPaths, const char* imagePath) { //dyld::log("ImageLoader::link(%s) refCount=%d, neverUnload=%d\n", imagePath, fDlopenReferenceCount, fNeverUnload); // clear error strings (*context.setErrorStrings)(0, NULL, NULL, NULL); uint64_t t0 = mach_absolute_time(); this->recursiveLoadLibraries(context, preflightOnly, loaderRPaths, imagePath); . }Copy the code

Weak binding, or lazy binding, occurs after dynamic libraries are loaded and linked.

// <rdar://problem/12186933> do weak binding only after all inserted images linked
sMainExecutable->weakBind(gLinkContext);
Copy the code

This series of operations are: configure environment variables -> load shared cache -> instantiate the main program -> load dynamic library -> link tripartite library. All of this is actually reading MachO. And all of these are in the dyld::_main function.

2.3 Run the main program

Next comes the initializeMainExecutable function, which runs the main program. This is the fourth step in the process.

// run all initializers
initializeMainExecutable(); 
Copy the code

Through the source can step by step frominitializeMainExecutable–>runInitializers–>processInitializers–>recursiveInitialization, then walk torecursiveInitializationThe function can’t jump. Need to becmd + shift + oAnd then search forrecursiveInitializationYou can find the source code for this function. You can tell by the call stack of the function that is called in this functionnotifySingleThis function.

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) { ..... Omit some code...... // let objc know we are about to initialize this image uint64_t t1 = mach_absolute_time(); fState = dyld_image_state_dependents_initialized; oldState = fState; context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo); // initialize this image bool hasInitializers = this->doInitialization(context); // let anyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; context.notifySingle(dyld_image_state_initialized, this, NULL); . Omit some code............ }Copy the code

The source of the function can also be found by CMD + Shift + O and searching notifySingle. SNotifyObjCInit is a callback, and sNotifyObjCInit must have a value. SNotifyObjCInit is assigned in the registerObjCNotifiers function.

static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo) { .... Omit some code..... if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit ! = NULL) && image->notifyObjC() ) { uint64_t t0 = mach_absolute_time(); dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); uint64_t t1 = mach_absolute_time(); uint64_t t2 = mach_absolute_time(); uint64_t timeInObjC = t1-t0; uint64_t emptyTime = (t2-t1)*100; if ( (timeInObjC > emptyTime) && (timingInfo ! = NULL) ) { timingInfo->addTime(image->getShortName(), timeInObjC); }}... Omit some code...... } void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped) { // record functions to call sNotifyObjCMapped = mapped; sNotifyObjCInit = init; sNotifyObjCUnmapped = unmapped; . Omit some code....... }Copy the code

You can find out where to call the registerObjCNotifiers by searching for them, and get the following result.

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code

At this point indyldSource search_dyld_objc_notify_registerFunction calls are not searchable, so you can type one in the demoSymbol breakpoint. You can see this in the function call stack_dyld_objc_notify_registerFunction is_objc_initFunction.throughupThe instructions can be known,_objc_initFunction islibobjcSource code, this is before the article on the underlying principle of analysis has been usedObjc4-756.2 -The source code.

Objc4-756.2 -Through the source codecmd + shift + osearch_objc_initYou can locate the source code, and you can see the call in the source code_dyld_objc_notify_registerFunction. And the parameters passed are respectively&map_images.load_imagesandunmap_image. Among themload_imagesIs a pointer to a function and is calledcall_load_methodsDelta function, from the topdyldThe load function call stack can be known.

void _objc_init(void) { static bool initialized = false; if (initialized) return; initialized = true; // fixme defer initialization until an objc-using image is found? environ_init(); tls_init(); static_init(); lock_init(); exception_init(); _dyld_objc_notify_register(&map_images, load_images, unmap_image); } void load_images(const char *path __unused, const struct mach_header *mh) { // Return without taking locks if there are no +load methods here. if (! hasLoadMethods((const headerType *)mh)) return; recursive_mutex_locker_t lock(loadMethodLock); // Discover load methods { mutex_locker_t lock2(runtimeLock); prepare_load_methods((const headerType *)mh); } // Call +load methods (without runtimeLock - re-entrant) call_load_methods(); }Copy the code

(*sNotifyObjCInit)(image->getRealPath(), image->machHeader()))(image-> MachOHeader ())) The process from _DYLD_START to dyLD ::notifySingle is in dyLD. The _objc_init and call_load_methods functions are all in objC. The call_class_loads function in the call_load_methods function iterates through the load method defined in each class.

void call_load_methods(void)
{
    static bool loading = NO;
    bool more_categories;

    loadMethodLock.assertLocked();

    // Re-entrant calls do nothing; the outermost call will finish the job.
    if (loading) return;
    loading = YES;

    void *pool = objc_autoreleasePoolPush();

    do {
        // 1. Repeatedly call class +loads until there aren't any more
        while (loadable_classes_used > 0) {
            call_class_loads();
        }

        // 2. Call category +loads ONCE
        more_categories = call_category_loads();

        // 3. Run more +loads if there are classes OR more untried categories
    } while (loadable_classes_used > 0  ||  more_categories);

    objc_autoreleasePoolPop(pool);

    loading = NO;
}
Copy the code

(*sNotifyObjCInit)(image->getRealPath(), image->machHeader())) The doInitialization function needs to continue.

// let objc know we are about to initialize this image
uint64_t t1 = mach_absolute_time();
fState = dyld_image_state_dependents_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
	
// initialize this image
bool hasInitializers = this->doInitialization(context);

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
	CRSetCrashLogMessage2(this->getPath());

	// mach-o has -init and static initializers
	doImageInit(context);
	doModInitFunctions(context);
	
	CRSetCrashLogMessage2(NULL);
	
	return (fHasDashInit || fHasInitializers);
}
Copy the code

The main function is not executed at this point; c++ ‘s fixed __attribute__((constructor) constructor is executed when doModInitFunctions are executed. You can implement the following code in the ViewController, which is executed before main and after Load.

__attribute__((constructor)) void funcC1(){printf("\n execute funcC1\n"); } __attribute__((constructor)) void funcC2(){printf("\n perform funcC3\n"); } __attribute__((constructor)) void funcC3(){printf("\n perform funcC3\n"); }Copy the code

After doing this, we now go back to the dyld::_main function, and the code below is the main function of the main program. At this point the dyLD loading process can be said to be over.

// find entry point for main executable
result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
Copy the code

3. The last

In fact, you can know through the source code, dyLD loading process, this process is the most important in the main function of dyLD. Finally, the loading process of DYLD is summarized:

  • dyldLoad all the libraries and executables.
  • dyldLoading process of
    • The execution flow of the program is from_dyld_startThe start of the
    • Enter theThe main dyldFunction (really main function)
      • Start by configuring some environment variables
      • Load the shared cache library (determine whether to disable it at the beginning, iOS can not be disabled)
      • Instantiate the main program (equivalent to creating an object)
      • Loading the Dynamic library
      • Link main program
      • The most critical part: the initialization method
        • After a series of initial callsnotifySinglefunction
          • This function performs a callback
          • Debug through breakpoints: The callback is_objc_initIs assigned toload_imagesfunction
            • load_imagesPerform insidecall_load_methodfunction
              • call_load_methodFor each classloadmethods
        • doModInitFunctionsfunction
          • The band is called internally__attribute__((constructor))The c function
        • Returns the entry function of the main program(uintptr_t)sMainExecutable->getEntryFromLC_MAIN()To enter the main programmainFunction.