IOS underlying principles + reverse article summary

The purpose of this article is to analyze the loading process of dyld and see what else is done at the bottom before main

primers

  • Create a project inViewControllerThe rewrite theloadMethods,mainI added oneC++Methods, i.e.,kcFUncWhat is the order in which they were printed?

  • Run the program to see the load, kcFunc, mainPrint orderThe following is the printed result, which can be seen in the order ofLoad --> C++ method --> main

Why in this order? According to conventional thinking, isn’t main the entry function? Why doesn’t main execute first?

With that in mind, let’s explore what else we do before we get to Main.

Compilation process and libraries

Before analyzing app launch, we need to understand the compilation process of the iOSapp code and the dynamic and static libraries.

The build process

The compilation process is shown in the following figure, which is mainly divided into the following steps:

  • The source file: Loads files such as. H,. M, and. CPP
  • pretreatment: Replaces macros, removes comments, expands header file, produces.i file
  • compile: Converts. I files into assembly language to produce. S files
  • assembly: converts assembly files to machine code files to produce.o files
  • link: Makes references to other libraries in the.o file to generate the final executable

Static and dynamic libraries

  • Static library: In the link phase, assembler generated object programs are linked and packaged into executable files along with referenced libraries. The static library will not change at this point because it is compile timeIt was copied directly into the target program
    • Benefits: Once compiled, the library files are virtually useless, and the target program has no external dependencies and can run directly

    • Disadvantages: Because the static library will have two copies, so it will cause the target program to increase the volume, memory, performance, speed consumption

  • The dynamic library: a programIt does not link to the target program at compile timeIn the program, the target program will only store references to the dynamic library in the programIs loaded at runtime
    • advantage:
      • Reduce the size of the packaged app: Since there is no need to copy to the target program, it does not affect the size of the target program, reducing the size of the app compared to the static library

      • Shared memory, saving resources: the same library can be used by more than one program

      • Update the program by updating the dynamic library: because it is loaded at runtime, the library can be replaced at any time without recompiling the code

    • disadvantages: Dynamic loading will bring partPerformance lossUsing dynamic libraries also makes the program dependent on the external environment. If the environment lacks dynamic libraries or the library version is incorrect, the program will not run

Diagrams of static and dynamic libraries are shown in the figure below

Dyld Load process analysis

According to dyLD source code, and libobJC, libSystem, libDispatch source collaborative analysis

What is dyld?

Dyld (The Dynamic Link Editor) is an important part of Apple’s operating system. After the app is compiled and packed into a Mach-O file in executable format, dyLD is responsible for connecting and loading the program

So the startup flowchart of the App is as follows

The starting point for the app to start

  • In the previous demo, inloadMethodThe breakpointThrough thebtViewing stack InformationWhere does app launch start

[App starting point] : Through the program run discovery, fromdyldIn the_dyld_startI started, so I need to goOpenSource download a copy of dyld sourceTo analyze

  • The entry can also be found through the stack information to the left of Xcode

Dyld ::_main function source analysis

  • inDyld - 750.6 -Look in the source code_dyld_startTo findArm64 architectureDiscovery, which is implemented by assembly, is called through assembly annotationsdyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)Method, is oneC++Methods (using the ARM64 architecture as an example)

  • Search in source codedyldbootstrapfindNamespace, and then look in this filestartMethod, the core of which is the return value of the calldyldthemainFunction, wheremacho_headerisMach-OHead, anddyldThe file that’s loaded isThe Mach - O type, i.e.The Mach-O type is an executable file type, consists of four parts:Mach-o header, Load Command, section, and Other Data, can be passedMachOViewView executable file information

  • Enter dyld::_main source code implementation, particularly long, about 600 lines, if the load process of dyld is not very familiar, can be based on the return value of the _main function, here for more. The _main function does a few things:

    • [Step 1:Environment Variable Configuration】 : Set values based on environment variables and get the current running schema

    • [Step 2:Shared cache】 : Check whether the shared cache is enabled and whether the shared cache is mapped to a shared area, for exampleUIKit,CoreFoundationEtc.

    • [Step 3:Initialization of the main program】 : callinstantiateFromLoadedImageThe function instantiates oneImageLoaderobject

    • [Step 4:Inserting a dynamic library】 : traversalDYLD_INSERT_LIBRARIESEnvironment variable, callloadInsertedDylibloading

    • [Step 5:The link of the main program

    • [Step 6:Dynamic link library

    • [Step 7:Weak sign binding
    • [Step 8:Execute the initialization method

    • [Step 9:Look for the main program entrynamelymainFunction 】 : FromLoad CommandreadLC_MAINEntry, if not, readLC_UNIXTHREADThis brings us to the familiar in everyday developmentmainThe function

The following is the main analysis of [Step 3] and [Step 8].

Step 3: Main program initialization

  • sMainExecutableRepresents the main program variable, to view its assignment, is passedinstantiateFromLoadedImageMethod initialization

  • Enter theinstantiateFromLoadedImageSource code, which creates oneImageLoaderInstance object, passinstantiateMainExecutableMethod to create

  • Enter theinstantiateMainExecutableSource code, whose role is to create an image of the main executable file, return oneImageLoaderImage object of type, i.eThe main program. Among themsniffLoadCommandsFunction timeMach-o type filestheLoad CommandAnd carry out various checks on it

Step 8: Execute the initialization method

  • Enter theinitializeMainExecutableSource code, mainlyTo iterate over, will be implementedrunInitializersmethods

  • Global searchrunInitializers(consTo find the following source code, its core code isprocessInitializersCalling a function

  • Enter theprocessInitializersFunction, where the mirror list is calledrecursiveInitializationFunction is recursively instantiated

  • Global searchrecursiveInitialization(consFunction, its source code implementation is as follows

In this case, we need to explore the notifySingle function in two parts, the notifySingle function and the doInitialization function. We will explore the notifySingle function first

NotifySingle function
  • Global searchnotifySingle(Function, the point is(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());This sentence

  • Global searchsNotifyObjCInitFound no implementation found, there is an assignment operation

  • searchregisterObjCNotifiersWhere is the call found in_dyld_objc_notify_registerMade a call

Note:_dyld_objc_notify_registerThe function of theta needs to be inlibobjcSearch in source code

  • inobjc4-781Search in source code_dyld_objc_notify_register, found in_objc_initThe method is called in the source code and passed in parameters, sosNotifyObjCInittheThe assignmentisobjcIn theload_imagesAnd theload_imagesAll of them will be called+loadMethods. So to sum up,notifySingleIs aThe callback function

Load function to load

Let’s go to the source code of load_images and look at its implementation to prove that all load functions are called in load_images

  • Through the objC source _objC_init source implementation, enterload_imagesSource code implementation of

  • Enter thecall_load_methodsSource code implementation can be found through its coredo-whileCycle call+loadmethods

  • Enter thecall_class_loadsSource code implementation, understand the call hereloadMethod to validate the class we mentioned earlierloadmethods

So,load_imagesAll of them are calledloadFunction, the above source analysis process corresponds exactly to the stack print information【 summary 】 Load source chain is:_dyld_start –> dyldbootstrap::start –> dyld::_main –> dyld::initializeMainExecutable –> ImageLoader::runInitializers –> ImageLoader::processInitializers –> ImageLoader::recursiveInitialization –> dyld::notifySingle(this is a callback process) –>sNotifyObjCInit –> load_images(libobjc.A.dylib)

So the question is, when is _objc_init called? Please read on

DoInitialization function
  • Go to theobjcthe_objc_initFunction, and it’s not going to work, so let’s go back torecursiveInitializationRecursive function source code implementation, found that we ignored a functiondoInitialization

  • Enter thedoInitializationFunction source implementation

This is also divided into two parts, one part isdoImageInitThe function, part of it isdoModInitFunctionsFunction – enterdoImageInitSource code implementation, its core is mainlyThe for loop loads the call to the methodAnd the thing to note here is,libSystemThe initializationMust run first – to enterdoModInitFunctionsSource code implementation, this method loads allCxxfileYou can verify this by testing the program’s stack information by placing a breakpoint at the C++ method

When I get here, I still don’t find the call to _objc_init? What to do? Give up? Of course not, we can also look at the stack before we call _objc_init with a symbolic breakpoint,

  • _objc_initAdd a symbolic breakpoint, run the program, and see_objc_initThe stack information after the break

  • inlibsystemLook forlibSystem_initializer, to see the implementation

  • Based on the previous stack information, we find that the walk islibSystem_initializerWill calllibdispatch_initFunction, and the source code for this function is inlibdispatchOpen source library, inlibdispatchIn the searchlibdispatch_init

  • Enter the_os_object_initSource code implementation, its source code implementation call_objc_initfunction

Combined with the above analysis, from initialization_objc_initregistered_dyld_objc_notify_registerParameter 2 of, i.eload_imagesAnd to thesNotifySingle –> Parameters sNotifyObjCInie = 2sNotifyObjcInit()Call to form aThe closed loop

So the simple way to think about it is sNotifySingle, which is to add a notification which is addObserver, _objc_init, _dyLD_OBJC_notify_register which is to send a notification, which is push, And sNotifyObjcInit is the notification handler, the selector

[Summary] : _objc_init source chain: _dyld_start –> dyldbootstrap::start –> dyld::_main –> dyld::initializeMainExecutable –> ImageLoader::runInitializers –> ImageLoader::processInitializers –> ImageLoader::recursiveInitialization –> doInitialization LibSystem_initializer (libsystem.b.dylib) –> _os_object_init (libdispatch.dylib) –> _objc_init(libobjc.a.dylib)

Step 9: Find the main entry function

  • Assembly debugging, you can see the display coming+[ViewController load]methods

  • Go ahead. Here we gokcFuncThe c + + function

  • Click on thestepoverYou go down, you run through the process, you go back to_dyld_start, and then callmain()Function, completed by assemblymainParameter assignment and other operations

dyldAssembly source implementation

Note:mainIt’s a writable function, writes to memory, reads todyld, if modifiedName of the main functionAnd complains

So, to sum up, finallyDyld Load process, as shown in the figure below, which also illustrates the question: Whyload-->Cxx-->mainCall order of