preface

When you’re writing code, you can’t do it without main, and we know that main is the entry point, so what does the system do before main? Let’s explore.

Compilation processes and libraries

  • The compilation process of the program is:precompiled -> compile -> assembly -> link -> Executable file, as shown below:

Compile the diagram



Code to view

  • You can also view the process code tomain.mFor example, run the following command:clang -ccc-print-phases -framework Foundation main.m -o main:
clang -ccc-print-phases -framework Foundation main.m -o main

-framework Foundation main.m -o main
   +- 0: input, "Foundation", object
   |           +- 1: input, "main.m", objective-c
   |        +- 2: preprocessor, {1}, objective-c-cpp-output
   |     +- 3: compiler, {2}, ir
   |  +- 4: backend, {3}, assembler
   |- 5: assembler, {4}, object
+- 6: linker, {0.5}, image
7: bind-arch, "x86_64", {6}, image
Copy the code

The core steps mean the following:

  • preprocessor, {1}, objective-c-cpp-output: preprocessor, compiler front end
  • compiler, {2}, ir: The compiler generates intermediate codeir
  • backend, {3}, assembler:LLVMThe back end is assembled
  • assembler, {4}, object: Survival machine code
  • linker, {0, 5}, image: the linker
  • bind-arch, "x86_64", {6}, image: Generates executable binary filesimage

Static and dynamic libraries

Application loading principle is the essence of the binary folder library into memory process, libraries are usually divided into two types: static library and dynamic library:

Static library

  • In the static his libraryThe linking process already links the content to the executable.
  • Advantage: after linking, the library file is useless and the executable file can be run directly.The execution speed is relatively fast
  • Disadvantages: Static library links can be copied multiple times, resulting in large package references. After the link, the content cannot be changed

The dynamic library

  • The dynamic library does not link in content during the link process, but insteadDuring execution, it will find the memory to link to.
  • Advantages: Since the duplicated place is copied only once when linking,Small packet size is occupied. After the link can alsoThe content is replaced at run time.
  • Cons: Run time to findLink to the required contentIf the environment lacks dynamic libraries after linking, the program will not run.

graphic

Dynamic linkerdyld

Dyld (The Dynamic Link Editor) is an important part of Apple’s operating system. After the app is compiled and packaged as an executable file, Dyld takes care of the rest of the work.

  • soApp startup processAs follows:



Start the entrance

Now run main at a breakpoint, and when I get to the breakpoint, I see a start before main:



But you can’t see what’s in there, and then the next onestartThe break point, it still doesn’t break, which means that’s not what you started off with. We know that+loadFunction in themainFunction can be called before+loadMethod, and look at the call stack:



  • Here we know: the program runs from_dyld_startstart

Source code analysis

Next, we analyze it in the source code dyLD-852

_dyld_start

  • indyld-852Search in project_dyld_start:



  • In this paperTake the ARM64 architecture as an example._dyld_startThe essence is callingdyldbootstrap::startMethod,In c + +,: :The preceding is the namespace, and the following is the method in the namespace, similar classes call class methods.

dyldbootstrap::start

  • Here is dyldbootstrap calling start, then search dyldbootstrap and find start:



    • It was observed that,startThe core of the function is to return a value ofdyld::_main.dyldthemainThe first argument ismacho_header, it isMach-OOne of the components of.
    • Mach-OIs a collection of executable files that group the data stream into three main parts:Mach_header.Load Command.Data, can be passedMachOViewLook at it.
      • Mach_header: there will beMach-OtheCPUInformation, andLoad CommandThe information of
      • Load Command: containsMach-OIn theCommand Type Information.The name of theandBinary position.
      • DataBy:SegmentThe data composition of, isMach-OThe largest portion of the code has data, such as symbol tables.DataA total of threeSegment.TEXT,DATA,LINKEDIT. Among themTEXTandDATACorresponds to one or moreSection.LINKEDITThere is noSection, need to cooperateLC_SYMTABTo resolvesymbol tableandstring table. And in these areMach-OKey data of
    • Refer to the Apple operating system executable mach-o for details

dyld::_main

  • Enter the_mainThrough observation,_mainReturns aresult, we can start fromresultTo begin the analysis, there are several steps:

Environment Variable Configuration

  • Obtain the configuration environmentcpuSuch architecture information



Shared cache

  • Check whether the share cache is enabled. If it is not enabled, the share cache is loaded



Main program initialization

  • Load the executable file and generateImageLoaderMachOA case



Inserting a dynamic library

  • traverseDYLD_INSERT_LIBRARIES(insert dynamic library) and callloadInsertedDylibloading



Link main program

  • Call the link function to link the executable



Link dynamic library

  • Link inserted dynamic libraries



Weak sign binding



Execute the initialization method



Looking for the main

  • fromLoad CommandReads theLC_MAIN, if foundlibSystemHelpercallmain()If I don’t find it,dyldNeed to makestartStart in a programmain()



The following sections focus on main program initialization and performing initialization

dyld::_mainMain program initialization of

instantiateFromLoadedImage

  • The initialization of the main program is calledinstantiateFromLoadedImageMethod, enter method view:
// The kernel map is before the main executable, dyld is controlled. We need to create an ImageLoader for the ImageLoader* already mapped in the main executable
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
    ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
    addImage(image);
    return (ImageLoaderMachO*)image;
}
Copy the code
  • The method is throughinstantiateMainExecutableTo create aImageLoaderInstance, and thenaddImage, and then strong intoImageLoaderMachOType returns

instantiateMainExecutable

  • Enter theinstantiateMainExecutableMethods:
// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
    sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed));
    bool compressed;
    unsigned int segCount;
    unsigned int libCount;
    const linkedit_data_command* codeSigCmd;
    const encryption_info_command* encryptCmd;
    // Determine whether the macho file has a classic or compressed LINKEDIT and its number of segments
    sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
    // instantiate concrete class based on content of load commands
    if ( compressed ) 
    return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
    else
#if SUPPORT_CLASSIC_MACHO
        return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
	throw "missing LC_DYLD_INFO load command";
#endif
}
Copy the code
  • The main purpose here is to create a map of the main program and return oneImageLoaderType object, that isimageImage file.sniffLoadCommandsThe function is to determinemachOFile information

dyld::_mainTo perform initialization

initializeMainExecutable

  • Performing initialization is a callinitializeMainExecutableMethods:
void initializeMainExecutable(a)
{
	// record that we've reached this step
	gLinkContext.startedInitializingMainExecutable = true;

	// run initialzers for any inserted dylibs
	ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
	initializerTimes[0].count = 0;
	const size_t rootCount = sImageRoots.size(a);if ( rootCount > 1 ) {
		for(size_t i=1; i < rootCount; ++i) {
			sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); }}// run initializers for main executable and everything it brings up 
	sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
	
	// register cxa_atexit() handler to run static terminators in all loaded images when this process exits
	if( gLibSystemHelpers ! =NULL ) 
		(*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL.NULL);

	// dump info if requested
	if ( sEnv.DYLD_PRINT_STATISTICS )
		ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]);
	if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
		ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]);
}
Copy the code
  • Through the firstsImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0])performSome of the dependent files of the main executable, and then execute the main executable file

runInitializers

  • Enter therunInitializers:
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	uint64_t t1 = mach_absolute_time(a);mach_port_t thisThread = mach_thread_self(a); ImageLoader::UninitedUpwards up; up.count =1;
	up.imagesAndPaths[0] = { this.this->getPath() };
	processInitializers(context, thisThread, timingInfo, up);
	context.notifyBatch(dyld_image_state_initialized, false);
	mach_port_deallocate(mach_task_self(), thisThread);
	uint64_t t2 = mach_absolute_time(a); fgTotalInitTime += (t2 - t1); }Copy the code
  • Guess by looking at the core code isprocessInitializersandnotifyBatch

processInitializers

  • Enter theprocessInitializers:
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
									 InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
	uint32_t maxImageCount = context.imageCount() +2;
	ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
	ImageLoader::UninitedUpwards& ups = upsBuffer[0];
	ups.count = 0;
	// Calling recursive init on all images in images list, building a new list of
	// uninitialized upward dependencies.
	// Recursively init all the images in the images list to build a new list of uninitialized dependencies
	for (uintptr_t i=0; i < images.count; ++i) {
		images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
	}
	// If any upward dependencies remain, init them.
	if ( ups.count > 0 )
		processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code
  • Here eachimageAll callsrecursiveInitializationMethod,
recursiveInitialization
  • The global search locates the implementation of this function:



  • There are mainly two functions, one isnotifySingleThe other isdoInitializationLet’s look at the first one
notifySingle
  • searchnotifySingle:

  • Through the analysis to locate the focus of the method(*sNotifyObjCInit)(image->getRealPath(), image->machHeader())Let’s seesNotifyObjCInit, it is a global_dyld_objc_notify_initType variables:
static _dyld_objc_notify_init		sNotifyObjCInit;
Copy the code
  • And then let’s go and see where is it assigned, global searchsNotifyObjCInit, thus locating the assigned position:
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;
	sNotifyObjCUnmapped = unmapped;

	// call 'mapped' function with all images mapped so far
	try {
		notifyBatchPartial(dyld_image_state_bound, true.NULL.false.true);
	}
	catch (const char* msg) {
		// ignore request to abort during registration
	}

	// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem)
	for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(a); it ! = sAllImages.end(a); it++) { ImageLoader* image = *it;if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC()) {dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0.0);
			(*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}Copy the code
  • That is to say callregisterObjCNotifiersMethod,sNotifyObjCInitWill be assigned a value, and then continue the searchregisterObjCNotifiers:
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code
  • We’re at the final point_dyld_objc_notify_register, but at the moment,Dyld sourceThere is no call to it, go againObjc sourceIn the view:
void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init(a);tls_init(a);static_init(a);runtime_init(a);exception_init(a);#if __OBJC2__
    cache_t: :init(a);#endif
    _imp_implementationWithBlock_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}
Copy the code
  • Thus it can be seensNotifyObjCInitThe assignment to PI is essentially PIload_imagesLet’s seeload_images:



  • To enter thecall_load_methodsMethod, there’s ado-whileCycle callcall_load_methods



  • Respectively into thecall_load_methodsandcall_category_loadsMethods:
    • call_load_methods:



    • call_category_loads:

  • You can see it hereclsandclassificationcallloadThe method, which is the methodload_imagesMethod is calling+loadMethod just before the end of the stack information:

Summary: The process of +load method is as follows: _dyld_start -> dyldbootstrap::start -> dyld::_main -> initializeMainExecutable -> ImageLoader::runInitializers -> ImageLoader::processInitializers -> ImageLoader::recursiveInitialization -> notifySingle -> sNotifyObjCInit -> load_images -> +load

  • Now I know+loadBut as mentioned above_objc_initWhen is it called, and then continue to explore
doInitialization
  • inobjcSource,_objc_initPut a break point at zero, and thenbtView the stack:



  • Found it in the stack_objc_initThe call,doInitialization -> doModInitFunctions -> libSystem_initializer -> libdispatch_init -> _os_object_init -> _objc_initToday,Work backward:
    1. See first_os_object_initIt is inlibdispatchThe source code,
void
_os_object_init(void) { _objc_init(); . }Copy the code

And then we go to _OS_object_init and we call _objc_init

    1. To search againlibdispatch_init:
void
libdispatch_init(void)
{... _os_object_init(); . }Copy the code

Find libdispatch_init call _OS_object_init, and follow the stack prompt

    1. libSystem_initializerIs in thelibSystemSearch for methods in the source code:
static void
libSystem_initializer(int argc,
		      const char* argv[],
		      const char* envp[],
		      const char* apple[],
		      const struct ProgramVars* vars)
{...libdispatch_init(a); . }Copy the code
  • Not surprisingly inlibSystem_initializerTo find thelibdispatch_initMethod, and then moving on to the stack,doModInitFunctionsanddoInitializationAll indyldSource code, we continue to viewdyldSource:
    1. indyldThe source code searchdoModInitFunctions



Here,doModInitFunctionsLoaded allC++In the library

    1. In the searchdoInitializationThe implementation of the:
bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
	CRSetCrashLogMessage2(this->getPath());

	// mach-o has -init and static initializers
	doImageInit(context);
	doModInitFunctions(context);
	
	CRSetCrashLogMessage2(NULL);
	
	return (fHasDashInit || fHasInitializers);
}
Copy the code

And then we know that doInitialization calls doModInitFunctions, and when we search, we find the doInitialization method called here:

At this point the main flow forms a closed loop.notifySingleIs to send a message,registerObjCNotifiersYes Add listener

conclusion
  • +loadThe process of the method is:_dyld_start -> dyldbootstrap::start -> dyld::_main -> initializeMainExecutable -> ImageLoader::runInitializers -> ImageLoader::processInitializers -> ImageLoader::recursiveInitialization -> notifySingle -> sNotifyObjCInit -> load_images -> +load
  • _objc_initThe calling process is as follows:doInitialization -> doModInitFunctions -> libSystem_initializer -> libdispatch_init -> _os_object_init -> _objc_init

The main function

We know that_dyld_startIt’s an assembly functionSimulator for example, and then look at:

  • The analysis assembles,raxAnd in themainRelevant information. How do we verify that? We knowloadThe function will go laterC++Function,mainTo define aC++Function:
__attribute__((constructor)) void wushuang_func(){
    printf("Coming: %s \n",__func__);
}

int main(int argc, char * argv[]) {
    NSString * appDelegateClassName;
    
    NSLog(@ "1223333");
    @autoreleasepool {
        // Setup code that might create autoreleased objects goes here.
        appDelegateClassName = NSStringFromClass([AppDelegate class]);
    }
    return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}
Copy the code

Let’s see the print again:

The execution sequence is+load > C++ > mainAnd then inC++Function to make a breakpoint and click according to the assemblyStep OverGo down to_dyld_startThe pastdyldbootstrap::startAnd then read the registerregister read:

And here it israxDeposit ismain, and the assembly is the last to gojmpq *%raxWhich is to jump tomainDelta delta delta delta delta delta delta delta delta delta delta delta delta delta delta delta deltaFrom _dyLD_start to main.

Dyld Load process

conclusion

  • This article through a rough analysisdyldProcess, mainly convey a kind of exploration idea, sometimes the analysis of the problem to a certain place, often no way to take him, this time can also try to reverse derivation, may obtain unexpected results ~