• Self-cultivation of an iOS programmer (I) Compile and link
  • What’s in Mach-O
  • An iOS programmer’s Self-cultivation (iii) Mach-O file static links
  • Self-cultivation of an iOS programmer (iv) Executable file loading
  • An iOS programmer’s Self-cultivation (5) Mach-O file dynamic linking
  • The Self-cultivation of an iOS Programmer (6) Dynamically linked Applications: The Fishhook Principle
  • The self-cultivation of an iOS programmer (7) Static link Applications: The Principle of static library staking
  • Self-cultivation of an iOS programmer (8) Memory

The whole structure of a Mach-O file is described above. One of the most important processes is static linking. The linker packages all the input “. O “files into the output executable, which can be easily understood as a Mach-o file.

Suppose we have only two modules, “A.c” and “b.C”, whose code is defined as follows:

/* a.c */
extern int shared;
int main() {
    int a = 100;
    swap(&a, &shared);
}

/* b.c */
int shared = 1;
void swap( int* a, int* b) {
    *a ^= *b ^= *a ^= *b;
}
Copy the code

Take x86 architecture as an example, first use clang command to compile “a.c” and “b.C” into target files “A.O” and “b.O” respectively:

clang -fmodules -c a.c b.c -o a.o b.o
Copy the code

After compiling, we have the two object files A.O.B.O. As can be seen from the code, b.c defines two global symbols, “shared” and “swap”. A.c defines a global symbol “main”. A.c refers to b.c’s “shared” and “swap”. The next thing to do is to link the a.O and b.O object files into the “AB” executable.

Functions and variables are collectively referred to as symbols.

Space and address allocation

The linker will first scan all the input object files, obtain the length, attributes and positions of each segment, and collect all symbol tables into a global symbol table. In this step, the linker merges all target files, calculates the merged length and position, and establishes the mapping relationship.

In fact, the object file is identical to the executable file’s mach-o structure.

Symbol resolution and relocation

Before analyzing symbol resolution and relocation, let’s take a look at how the two external symbols “shared” and “swap” are used in a.o: using MachOView, we can see the disassembly result of a.o:

The leftmost column is the offset of each entry in virtual memory, and each row represents an instruction. The red box shows two references to “shared” and “swap”, where shared uses the MOV instruction, which takes up 3 bytes, and swap calls use the call instruction, The 0x488B35 and 0xE8 opcodes are near-address relative displacement call instructions, and the last four bytes are the offset of the called function relative to the next instruction of the calling instruction. 0x1E3 and 0x1F5 actually hold only temporary dummy addresses for “shared” and “swap” because the compiler does not know their real addresses at compile time. The compiler temporarily replaces the address of the two instructions with 0x00000000. The linker determines the virtual address of all symbols after space and address allocation, and then corrects each instruction that needs to be relocated. Link A.O.B.O to the executable ab below:

clang a.o b.o -o ab
Copy the code

A.o disassembler a.o disassembler a.o disassembler

After modification, the addresses of “shared” and “swap” are 0x000000A1 and 0x0000000F (small endienment mode) respectively. In the case of swap, the call instruction is a near-address relative shift call instruction followed by an offset relative to the next instruction xor. The sum of 0xF71 + 0x0F is exactly the sum of 0xF80, which is the address of swap.

Relocation table

So how does the linker know which instructions need to be adjusted? In fact, there is a relocation table in the target file that holds symbols associated with relocation, and it is defined in the Relocations section of the target file. The Relocations segment of a.o is defined as follows:

Each location to be relocated is called a relocation entry. It can be seen that a.o has two relocation entries in __TEXT and __TEXT segments. According to the disassembly analysis of a.o, 0x1D and 0xB are the address parts of the call instruction and mov instruction in the code segment.

Relocation table can be understood as an array containing relocation entry. The structure of relocation entry is defined in Loc. h of Mach-O, and its structure is as follows:

struct relocation_info {
   int32_t	r_address;	/* offset in the section to what is being
				   relocated */
   uint32_t     r_symbolnum:24,	/* symbol index if r_extern == 1 or section
				   ordinal if r_extern == 0 */
		r_pcrel:1, 	/* was relocated pc relative already */
		r_length:2,	/* 0=byte, 1=word, 2=long, 3=quad */
		r_extern:1,	/* does not include value of sym referenced */
		r_type:4;	/* if not 0, machine specific relocation type */
};
Copy the code

R_address and r_symbolnum are two important fields, r_address and r_symbolnum. R_address and r_symbolnum are two important fields. R_address and r_symbolnum are two important fields, r_address and r_symbolnum. R_symbolnum is used to find the symbol’s position in the symbol table.

Symbol resolution

In general, the reason for linking is that the symbols used in the object file are defined in other object files, so they should be linked together. For example, when we directly link to the “a.o” linker, we find that shared and swap symbols are not defined and cannot complete the linking work:

Undefined symbols for architecture x86_64:
  "_shared", referenced from:
      _main in a.o
  "_swap", referenced from:
      _main in a.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Copy the code

This is also one of the most common mistakes in programming, which is that symbols are not defined when linking. From a programmer’s point of view, symbolic parsing takes up the bulk of the linking process.

In fact, the relocation process is also accompanied by the symbol resolution process. Each object file may define symbols or reference symbols defined in other object files, for example, a.o references “shared” and “swap” in b.o. The symbols are defined in the symbol table array, and its structure is defined in loader.h under Mach-o:

struct nlist_64 {
    union {
        uint32_t  n_strx; /* index into the string table */
    } n_un;
    uint8_t n_type;        /* type flag, see below */
    uint8_t n_sect;        /* section number or NO_SECT */
    uint16_t n_desc;       /* see <mach-o/stab.h> */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};
Copy the code
  • N_strx: subscript in a string table.
  • N_sect: Section number.
  • N_value: indicates the symbolic address.

For example, a.O’s symbol table:

You can see that all symbols are defined except for the main functionN_UNDFThe “undefined” type, in fact, can be found in the relocation table. These undefined symbols should be found in the global symbol table after the linker has scanned all the input files, otherwise the linker will report a symbol undefined error. The following figure compares the changes of “shared” and “swap” symbols in the symbol table before and after linking:

Undefined types in the object file A.o become values when linked to the executable file AB.

relocation

In relocation, the r_address of each relocation entry in the relocation table is a reference to a symbol. When the linker relocates a referenced symbol, it determines the address of the target symbol. In this case, the subscript r_symbolnum of the relocation entry is used to look up the symbol in the global symbol table. After finding it, the address of the symbol is backfilled into the location where the symbol is called according to certain rules (such as the way of invoking instructions with relative displacement), and the relocation process is ended.

Static pile insertion hook objc_msgSend analysis

Static staking is the substitution of the objc_msgSend method during static linking. The specific implementation scheme is to implement hook_msgSend function in the main project by assembly, and then replace the objc_msgSend in the string table of the static library with hook_msgSend, for example, replace the objc_msgSend in a Pod library. Used to monitor OC method calls.

String table

Each object file or static library has a string table that serves the symbol table, storing names such as segment names, variable names, function names, and so on. Because the length of a string is indeterminate, there is no structure like a symbol to represent it. All strings are stored together in a table, and the string is referenced by the offset of the string in the table, which is the n_strx value by which the symbol table indexes to its symbol name.

Let’s take the main function of the ab executable as an example to analyze the process of the symbol table searching the corresponding symbol name through the index through MachOView:

The value in the red box is n_strx in the symbol table, where main is offset to 0x16 in the string table, or 22 in decimal. Then look at the structure in the string table:The hexadecimal in the red box translates to ASCII exactly_mainString, the offset in the string table is exactly 22 bits, which corresponds to the offset in the symbol table.

Because main is not defined externally, it has a value, and if it’s an external function, like objc_msgSend, the value would be zero, because objc_msgSend is a library function that belongs to the Runtime, it’s a dynamic library, Function addresses in dynamic libraries are bound during dynamic linking, which will be discussed later. The actual address of objc_msgSend is unknown when generating the executable file Mach -o, and this symbol will not be relocated during static linking. If you change objc_msgSend to hook_msgSend before linking to the main project and static library, the value in the symbol table after linking becomes the address of hook_msgSend.

Since a static library is itself a collection of object files, there is no difference between static libraries in linking procedures and object files. From the above analysis, we can see that when the object file (.o file) refers to the external symbol, the state of the external symbol in the global symbol table is N_UNDF, and the relocation entry of the symbol will be added to the relocation table. After space and address allocation, the offset of unknown symbols in the symbol table in the virtual address is determined. After that, all relocation entries are iterated and the locations where relocation is needed are corrected, that is, all calls to the objc_msgSend directive are corrected to calls to hook_msgSend.

For specific code implementation can refer to this open source tool: KKMagicHook

reference

Self-cultivation of the Programmer

Juejin. Cn/post / 684490…

Github.com/maniackk/KK…