This is the 8th day of my participation in the August More Text Challenge

IOS basic principles + reverse article summary

This article focuses on a demonstration of the internal structure of a Mach-O file

Dyld profile

  • Dyld full name The Dynamic Link Editor;

  • Apple’s dynamic linker;

  • Is an important part of apple’s operating system;

  • After the application is compiled and packaged into an executable (i.e., Mach-O), dyLD is responsible for linking and loading the application.

  • Dyld runs through the process of App startup, including loading dependent libraries and main programs. If we need to optimize performance and startup, we will inevitably have to deal with DYLD

  • And dyLD is open source, we can download its source code on the official website to read and understand

Dyld 1.0 (1996-2004)

  • Dyld 1 was included in NeXTStep 3.3, and before that NeXT used static binary data. It doesn’t work very well,

  • Dyld 1 was written before the system made widespread use of the C++ dynamic library, which works well in a static environment due to a number of C++ features, such as its initializer work, but can degrade performance in a dynamic environment. So a large C++ dynamic library can cause dyld to do a lot of work and slow down

  • Prebinding was also added before the release of macOS 10.0 and Cheetah. We can use Prebinding technology to find fixed addresses for all dylibs and applications in the system. Dyld will load all the contents of these addresses. If the load is successful, all dylib and program binary data will be edited to get all the predicted calculations. The next time you need to put all the data into the same address, you don’t need to do anything extra, which makes it much faster. But this also means that you need to edit these binaries every time you start up, which is not friendly, at least from a security standpoint.

Dyld 2 (2004-2017)

Dyld 2 has gone through several iterations since its release in 2004, and some of the features we see today, such as ASLR, Code Sign, share Cache, etc., were introduced in DyLD 2

Dyld 2.0 (2004-2007)

  • Dyld 2 was released in macOS Tiger in 2004

  • Dyld 2 is a completely rewritten version of dyld 1 that correctly supports C++ initializer semantics while extending the mach-o format and updating dyld. Thus, the efficient C++ library is supported.

  • Dyld 2 has a complete implementation of Dlopen and DLSYM (mainly for dynamically loading libraries and calling functions) and correct semantics, so the older API is deprecated

    • Dlopen: Opens a library and gets a handle

    • Dlsym: Finds the value of a symbol in an open library

    • Dlclose: closes the handle.

    • Dlerror: Returns a string describing the error of the last call to dlopen, DLSYm, or DLCLOSE.

  • Dyld is designed to speed up startup. Therefore, only limited health tests are performed. Mainly because there were fewer malicious programs in the past

  • Dyld also has some security issues, so some features have been improved to improve dyLD’s security on the platform

  • We were able to reduce the Prebinding effort due to the large increase in startup speed. The difference with editing program data is that here we only edit the system library and can do so only during software updates. So during software updates, you might see words like “optimize system performance.” This is Prebinding at update time. Dyld is now used for all optimizations, and its purpose is optimization. So we have dyld two

Dyld 2.x (2007-2017)

  • A number of improvements were made between 2004 and 20017, and dyLD 2 performance significantly improved
  • First of all,increaseA lot ofThe infrastructureandplatform.
    • Since dyLD 2 was released on PowerPC, it has been addedx86,x86_64,arm,arm64And many derivative platforms.
    • Also launchediOS,tvOSandwatchOS, all of which require new DYLD functionality
  • Increase security in a number of ways
    • increasecodeSigningCode signing,
    • ASLR (Address Space Layout Randomization)Address space configuration random loading: each time the library is loaded, it may be at a different address
    • bound checkingBoundary checking: The Header boundary checking feature was added to the Mach-O file to avoid the injection of malicious binary data
  • Enhanced performance
    • Prebinding can be eliminated withshare cacheShared code substitution

ASLR

  • ASLR is a computer security technology to prevent memory corruption vulnerability from being exploited. By randomly placing the address space of the process’s critical data area to prevent attackers from jumping to a specific location in memory, ASLR uses functions

  • Linux has added ASLR in kernel version 2.6.12

  • Apple introduced random address offsets in some libraries in Mac OS X Leopard 10.5 (October 2007), but the implementation did not provide the full protection defined by ASLR. Mac OS X Lion 10.7 provides ASLR support for all applications.

  • Apple introduced ASLR in iOS 4.3.

Bounds checking Bounds checking

  • Significant additions to much of the Mach-o headerThe border checkFunction, thus canAvoid malicious binary data injection

Share Cache Share code

  • Share cache was first introduced in iOS3.1 and macOS Snow Leopard as a complete replacement for Prebinding

  • The share cache is a single file that contains most of the system dylibs, which can be optimized because they are merged into a single file.

    • Resize all text segments (_TEXT) and data segments (_DATA) and rewrite the entire symbol table to reduce the file size so that only a small number of regions are mounted in each process. Allows us to package binary data segments, saving a lot of RAM

    • Essentially a Dylib prelinker, its savings on RAM are significant, saving 500-1g of RAM when running in a normal iOS program

    • You can also pre-generate data structures for dyLD and OB-C to use at run time. You don’t have to do this at startup, which also saves more RAM and time

  • Share Cache is generated locally on macOS and runs dyLD shared code, which greatly optimizes system performance

Dyld 2 workflow

Dyld 2 is purely in-process, that is, executed within the application process, meaning that dyLD 2 can only start executing tasks when the application is started

The following is a diagram of the dyLD 2 workflow

  • 1, dyLD initialization, the main code indyldbootstrap::start, then executedyld::_main.dyld::_mainMore code, is the core part of dyLD loading;
  • 2. Check and prepare the environment, such as obtaining the binary path, checking the environment configuration, parsing the primary binaryimage headerInformation such as
  • Instantiate primary binaryimage loaderTo check whether the main binary and dyLD versions match
  • 4, check theShare Cache Indicates whether a map existsIf no, run the Map Share cache operation first
  • 5, check theDYLD_INSERT_LIBRARIES, if so, load the inserted dynamic library (instantiate the image loader)
  • 6, execute,linkOperation, the dynamic libraries of the dependency are first loaded recursively (the dependent libraries are sorted, the dependent ones always come first), and symbolic binding is performed at this stage, as wellrebase.bindingOperation;
  • 7, execute the initialization method, OC+loadAnd CconstructorMethods are executed at this stage;
  • 8, readMach-otheLC_MAINSection gets the address of the program’s entry and calls main

Simplified version

  • (1) Parsing mach-O files to find its dependent libraries and recursively finding all dependent libraries to form a dependency map of dynamic libraries. Most apps on iOS rely on hundreds of dynamic link libraries (most of which are the system’s dynamic libraries), so this step involves a lot of work.

  • ② Match the Mach-O file to its own address space

  • Perform symbol lookups

  • (4) Rebase and binding: All Pointers need to have a base address because the app requires the address space to be loaded randomly

  • ⑤ Run the initializer, and then run the main() function

Dyld 3 (2017-present)

  • Dyld 3 is a new dynamic linker released by WWDC in 2017. It completely changes the concept of dynamic linking and will become the default setting for most macOS applications. Dyld 3 will be used by default for all Apple OS applications in 2017.

  • Dyld 3 was first introduced in iOS 11 in 2017 to optimize the system library.

  • In iOS 13, the new DYLD 3 was fully adopted in place of dyLD 2. Because dyLD 3 is fully compatible with DYLD 2, its API interface is the same, so in most cases, developers do not need to make additional adaptations to smooth the transition.

Why is DYLD 2 redesigned to form a new DYLD 3?

The redesign of DYLD was mainly considered from the following aspects

  • Performance: You want to start up as fast as possible

  • Security: Security features were added in DYLD 2, but it was difficult to keep up with reality, and despite a lot of work, it was difficult to achieve this goal

  • Reliability and testability: Apple releases a number of good test frameworks for this purpose, such as XCTest, but these test frameworks rely on the underlying functionality of the dynamic linker and then plug the library of the test framework into the process, so they cannot be used to test existing DYLD code and are difficult to test security and performance levels

How to improve and optimize DYLD 2 to DYLD 3?

Suggestions for improvement and optimization

From the above dyLD 2 workflow, we know the execution process of DYLD 2, which can be improved and optimized from the following two aspects:

  • Identify the security-sensitive parts

    • Parse Mach-o headers and Find dependencies

    • Maliciously modifying the Mach-O header to allow for certain attacks;

    • If your App uses @rPaths (search paths), you can corrupt the application by maliciously modifying the paths or inserting libraries into specific locations.

  • Identify the parts that are heavy resource hogs (the cacheable parts)

    • Perform symbol lookupsSymbol lookup is one of them, because symbols in a particular library will always be at the same offset in the library unless there is a software update or library change on disk (i.eThe sign offset is fixed);

Dyld 2 improvements and optimizations

The following is adyld 2dyld 3Some of the changes, mainly will beSecurity sensitiveThe part andTake up a lot of resourcesMove the part to the upper layer and then move oneclosureWrite to disk for caching, and then we use closure in the program process. The following is the illustration

Dyld 3 components/workflow

The dyLD 3 workflow is divided into three parts, as shown below

Part 1: Out-of-process: Mach-o Parser

Out-of-process Mach-O parsers and compilers are common background programs used to improve the performance of the test infrastructure.

The first part mainly does the following work outside the App process:

  • Resolve all search paths @rpath, environment variables, as they affect startup speed

  • Analyze mach-O binary data

  • Perform symbol lookup

  • Use these results to create Launch Clourse

Part TWO: In-process: Engine

The in-process engine, which is resident in memory, can start applications in DYLD 3 without analyzing mach-O headers or performing symbol lookups, because analyzing Mach-O and performing symbol lookups are time-consuming operations, so the program startup speed is greatly improved.

The second part mainly does the following work in the App process:

  • Check launch Closure for correctness

  • Map to dylib and jump to main

Launch closure: Cache

Launch closure caching service. Most of these programs start with a cache without calling an out-of-process Mach-O profiler and compiler. And launch Closure is much simpler than Mach-O because launch Closure is a memory-mapped file that doesn’t need to be analyzed in a complex way and can be easily validated for speed

  • The system application’s launch closure is added directly to the shared cache

  • For third-party applications, we will build launch closure during application installation or update because the System Library has changed at this point

  • By default, on iOS, tvOS, and watchOS, these actions will be pre-built for you before they run.

  • On macOS, apps can be side loaded (non-App Store installed apps), so if necessary, The In-process engine can make Remote Procedure calls (RPC) to out to the Daemon when it is first started, and then it can use cached closure.

So all in all, dyLD 3 prehandles a lot of time-consuming lookup, computation, and I/O operations, leading to a big jump in startup speed. Dyld 3 takes care of many time-consuming operations ahead of time, greatly improving startup speed.

Launch closure

This is a newly introduced concept that refers to all the information an app needs during startup. For example, what dynamic link libraries the app uses, the offsets of each symbol, where the code signature is, and so on.

Dyld 3 symbol missing problem

  • Dyld 2 uses lazy symbol loading by default

  • In DYLD 3, the result of symbol resolution is already in the launch closure before the app starts, so lazy symbols are no longer needed.

  • If at this point, if there is a sign missing, dyLD 2 and DYLD 3 behave differently

    • In DYLD 2, the App crashes when the missing symbol is first called

    • In DyLD 3, missing symbols caused the App to crash as soon as it started

conclusion

  • Dyld 2 workflow

    • Parse the Mach-O header

    • Finding dependent libraries

    • Map the Mach-O file into the address space

    • Perform symbol lookup

    • Use ASLR for rebase and bind bindings

    • Run all initializers

    • Execute main

  • Dyld 3 workflow

    • Out-of-process: Move the Mach-O header parsing and symbol lookup in DYLD 2 out of process and store the results in startup closures to disk

    • In-process: Verify startup closure correctness, map dylib, execute main function

    • Start the closure cache service

Refer to the link

  • Dyld 3 improvements and optimizations in iOS 13
  • Static linking vs Dyld 3 for iOS startup optimization
  • IOS Dyld has lived and died
  • IOS startup time with Dyld3