“This is the 15th day of my participation in the First Challenge 2022. For details: First Challenge 2022”

MachO file

MachO is short for Mach Object file format. It is the MAC and iOS Executable format similar to THE PE(Portable Executable) format on Windows, O,. A,. Dylib, Framework, dyld, and. Dsym are common executables on MAC and iOS.

The MachO file format is as follows:

  • Header: contains general information about binary files: byte order, schema type, number of loading instructions, processor, and file type.
  • Load commands: a table containing many contents, including the location of regions, symbol tables, dynamic symbol tables, etc.
LC_SEGMENT_64 The files in the32or64Bits are mapped into the process address space
LC_DYLD_INFO_ONLY Dynamically link related information
LC_SYMTAB Symbolic address
LC_DYSYMTAB Dynamic symbol table address
LC_LOAD_DYLINKER Dyld load
LC_UUID File the UUID
LC_VERSION_MIN_MACOSX Supports the lowest operating system version
LC_SOURCE_VERSION Source code version
LC_MAIN Set the program entrance address and stack size
LC_LOAD_DYLIB Path to dependent libraries, including third-party libraries
LC_FUNCTION_STARTS Function start address table
LC_CODE_SIGNATURE Code signing
  • Data: Mainly responsible for code and data recording;MachOBased onSegmentThis is a structure for organizing dataSegmentCan contain0One or moreSection. According to theSegmentWhich one is the mappingLoad command.SegmentIn thesectionIt can be read ascode,constant, orOther data types. When loaded in memory, also according toSegmentTo do memory mapping;

Introduction to MachO files

Let’s take a look at the MachO file generated by the above code:

The Header Header file

  • Magic NumberIs:32A still64A;
  • CPU Type: the currentCPUType;arm64
  • File Type: Current file type; Executable fileMH_EXECUTE
  • Number of Load Commands: Needs to be loadedLoad CommandsThe number of commands;
  • Size of Load Commands:Load CommandsInstruction size;
  • Flags: identity;

Binary instructions for code

A hard-coded string

__objc_classList Records the OC class

__swift5_types Records swift classes and structures

So in there is the Descriptor information for the Swift class (TargetClassDescriptor); We can evaluate at that address and get the method of the class;

Verify function address

  1. Because it is the small – endian mode, so the address must be reversed; Address:0xFFFFFB8C, plus offset0xBB8C, the result is:0x10000B718
  2. Because the virtual memory address is from0x1000....So we subtract the virtual memory address to get0xB718, we are inMachOFound in the0xB718Location:

  1. 0xB718Is ourTeacherOf the classTargetClassDescriptorThe starting position of the structure, as defined by this structure, offset12a4You can find it in bytessizeandvTable:

  1. sizeAfter isvTable, thenvTableThe first address in the0xB740add12So bytes, that’s just bytes0xB74CThis is usteachFunction in theMachOAddress in;
  2. throughimage listThe command to getASLRAnd then add0xB74CI can getteachAddress in memory:

**0x0000000102188000**+0xB74C = 0x10219374CThis is when our program is runningteachMemory address of the function;

  1. SwiftThe data structure of the method in memory is as follows:

So, what we calculated0x10219374CThat’s usteachThe first address of the method data structure; We need to findimp, also need to be offsetFlags(4 bytes), it should be noted that in this structureImplIt’s a relative pointeroffset, continue offsetoffsetWe can findimp;

  1. Flagsfor4Bytes, according toMachOAnd we can see that0xB74CCorresponding to the second four bytes, namely0xFFFFAEF0isoffset, thenteachThe address in memory is:0x10219374C + 0x4 + 0xFFFFAEF0 = 0x20218E640;
  2. 0x20218E640Then go to the base where the program is running0x1000...., the result is:0x10218E640;

  1. Through the above steps, we verified thatSwiftClass methods are indeed storedvTableIn the table.

So why call methods in assembly with metadata offset 0x50, 0x58, 0x60?

InitClassVTable ();

In the process of running the program will generate a vtableOffset, and then through vtableOffset to load vTable;

Structure method scheduling

Struct (struct); struct (struct);

Run the program to view the assembly instruction:

In structs, methods are called from addresses, which means methods in structs are called statically; That is, after compilation, the address of the function is determined; This is because structs are value types that cannot be inherited, and their internal methods are their own. There is no need to create extra memory to record function tables. So use static call directly;

We can also through the source code of the source code to verify in the StructContextDescriptorBuilder:

You can see that there is no way to call vTable internally;

Method calls in extension

A call to a method in a struct’s Extension

Add an extension to the struct and create a Teach3 method:

View assembly instructions:

Struct extension methods are still called directly (statically distributed);

Class extension method calls

Let’s change the structure to a class and look at teach3 method calls:

As can be seen from the assembly instruction:

The class extension method is also called directly (statically distributed);

We create a subclass of Teacher and add teach4 methods:

Let’s generate the SIL file and look at its vTable:

The Teacher method will also appear in the vtable of the SubTeacher class, but the extension method will not appear in the Vtable of the SubTeacher class.

Summary of method scheduling

type scheduling extension
Value type struct Static distributed Static distributed
Class class Function table distribution Static distributed
NSObject subclass Function table distribution Static distributed

Affects the way functions are distributed

  • final: addedfinalKeyword function cannot be rewritten, using static distribution, no morevtableAppears in, andobjcNot visible at runtime.
  • dynamic: All functions can be addeddynamicThe keyword is notobjcClasses and value types are given dynamic functions, but are distributed as function tables.
  • @objc: This keyword can be usedSwiftFunction exposure toobjcAt runtime, the function table is still distributed;

In actual development, @objc is often used with Dynamic, where the Method becomes a mechanism for scheduling objc_msgSend messages. Method-swizzling can be used, but to be called by OC, the class must inherit from NSObject.