The title of this paper is an extended reading of Fishhook, which is actually a unified study and consolidation of the basic knowledge points involved in Fishhook. Therefore, this paper is still a study of the basic knowledge points involved in the loading process of Mach-O and Dyld. As long as we learn all the basic knowledge points well, we will have infinite possibilities in the future!

Summary of the hooks

Hook: To hook a function in a process or a process in a computer to extend the functionality of a program or change the flow of a program. The application scenarios of hooks in iOS include burying, Crash protection, application hardening, and application isolation.

Why can’t you hook custom functions (C functions)

Run the following test to see that the func function we defined is not caught by the fishhook. (Here you can learn something about functions in different partitions.)

#import "ViewController.h"
#import "fishhook.h"

/ / function
static void func(void) {
    printf("♻ ️ ♻ ️ ♻ ️ % s \ n", __func__);
}

/ / the new function
static void hook_func(void) {
    printf("♻ ️ ♻ ️ ♻ ️ % s \ n", __func__);
}

@interface ViewController (a)

@end

@implementation ViewController

- (void)viewDidLoad {
    [super viewDidLoad];
    // Do any additional setup after loading the view.
    
    NSLog(@"✳ ️ ✳ ️ ✳ ️ NSLog: % p", NSLog);
    
    func(a);// hook func
    struct rebinding func_reb;
    func_reb.name = "func";
    func_reb.replacement = hook_func;
    func_reb.replaced = nil;

    // Define the struct array variable for the function that needs hook
    struct rebinding rebs[] = {func_reb};
    
    // Pass the address of the struct array and the number of its member variables
    rebind_symbols(rebs, 1);
    
    func(a); }// Console output:♻ ️ ♻ ️ ♻ ️ func ♻ ️ ♻ ️ ♻ ️ funcCopy the code

For example, fishhook is used to hook only Lazy Symbol Pointers and non-lazy Symbol Pointers from dynamically linked libraries (DLLS) Known only in (__DATA, __got)/(__DATA_CONST __got), (__DATA __la_symbol_ptr), (__DATA __nl_symbol_ptr) exist in each of these sections, Our custom functions are in the (_TEXT, __text) Section (meaning function definitions, not symbol Pointers, as in previous partitions). We know that the contents of the __DATA segment are readable and writable, and that the contents of the __text segment are only readable and executable. So our custom functions in the (__TEXT, __TEXT) Section are only readable and executable (callable), and our custom functions are called directly from the function address, not dynamically bound with Symbol Pointers. Fishhook is not allowed to hook our custom function for two reasons: write permission restriction + direct call from address.

Let’s verify that my custom function is in the __TEXT section in different ways.

  • useimage listGet the first memory address of the current process, and then usefuncSubtract it from the starting address of the functionfuncFunction in the mach-o binary executable for the current process, and then visually locate the offset in the Mach-o binary executable using MachOView and view its contents.

In the func (); Type a breakpoint at, run the program to the breakpoint, then p func print: (void (*)()) $0 = 0x000000010027d520 (TEST_Fishhook func at viewcontroller.m :12) 0x000000010027d520, for comparison we also p NSLog prints the address of the NSLog function from the Foundation. Framework dynamic library: (void (*)(NSString * _Nonnull __strong, …) $1 = 0x00007fff20805d0d (Foundation NSLog); $1 = 0x00007fff20805d0d (Foundation NSLog); $1 = 0x00007fff20805d0d (Foundation NSLog); Then use image list -h to print out a set of memory addresses, which are the memory addresses of the current process and its dependent libraries. The first address is the memory address of the current process: [0] 0 x000000010027c000, Then print through (LLDB) p/x 0x000000010027d520-0x000000010027c000 (long) $1 = 0x0000000000001520 to see the func function in the current process’s Mach-o The offset in the binary executable is 0x0000000000001520, and then we look through MachOView and see that its location is in the (__TEXT, __TEXT) Section.

Let’s go further and read down the offset 0x0000000000001520 to see the assembly jump instruction: Call 0x10000238c is the call to printf (func is the call to printf), offset 0x238c is the (__TEXT, __stubs) area, we can see Value is _printf. (One takeaway here: functions in the system dynamic library used by the current process are dynamically bound when the process is started and when it is first called.)

  • Using Hopper Disassembler to disassemble the Mach-o binary executable of the current process to verify the above function address, you can find that the address of the assembler instruction is exactly the same as the offset printed by our calculation above. (There is also an issue of lazy loading of a Symbol Pointer, which we will discuss in the next section.)

We can see that there is no change in the call to _func function before and after fishhook hook, that is, fishhook cannot hook our custom function.

Then we double click on the _func function (tag), jump to the _func function, find that its offset is 0x1520 as we calculated manually above, and see that its internal call to printf is actually a call to imp___stubs__printf, Printf is a function from the system dynamic library (see MachOView, printf is in libsystem.b.dylib, but is in libsystem_c.dylib when printed in process). As you can see, the printf function is called in a completely different way than the func function. Our custom func function can be directly located in the mach-o binary of the current process. The functions used by the current process in the system’s dynamic library have only one peg in the current process’s Mach-o binary, which is bound with dyLD_STUB_binder after the program starts.

Double click on the imp___stubs__printf to jump to the imp___stubs__printf definition and see that its offset is also 0x238c. Double click on _printF_ptr again.

(__DATA, __la_symbol_ptr) is a pointer to the Lazy Symbol _printf. So when we get to this point we see that the call to printf is made from the _printf Lazy Symbol Pointer, and unlike our own custom functions, which are called at the address of the function, And here we see that the _printf symbol pointer is located in the _DATA data segment, and the contents of the data segment can be read and modified, so this is an important reason why Fishhook can hook the printf in the Foundation. Framework of the dynamic link library.

Specifies the default point for symbolic Pointers in Lazy Sympol and non-lazy Symbol Pointers

Let’s continue where we left off in the last paragraph. The real printf function is defined in the system dynamic link library libsystem.b.dylib, so printf is an external symbol for the current process. Due to the ASLR offset, the dynamic link library is loaded into memory at a variable address. Therefore, when the executable is loaded into memory, the symbol table needs to be rebound (Lazy/Non_Lazy) to fix the n_value value of each nLIST_64 structure in the symbol table. Replace the n_value of nlist_64 with the address of printf in libsystem.b. dylib (or, more accurately, the address of printf in memory). The Lazy Symbol pointer for Pointers to symbols is rebound the first time they are used. The non-lazy Symbol pointer for Pointers to symbols is rebound at startup, and in the program’s Mach-o binary, The Lazy Symbol Pointer defaults to dyLD_STUB_binder, while the non-lazy Symbol Pointer defaults to 0x000000. So let’s take a look at it through MachOView.

As shown in the figure above, all non-lazy Symbol Pointers are set to 0000000000000000 by default, and the objc_msgSend Symbol n_value is set to 0000000000000000. When the program starts, all non-lazy Symbol Pointers for the current process in the Symbol table are rebound, and the correct non-lazy Symbol Pointers are changed.

By default, the Lazy Symbol Pointers are directed to dyLD_STUB_binder.

As you can see in the figure below, the _printf Symbol pointer in the Lazy Symbol points to 0x10000246E. And the pointer of the symbol before and after _printf is almost all in the order of 0x100002400, 0x100002478, 0x100002482 and so on.

As shown in the figure, 0x10000246E is in the (__TEXT, __stub_Helper) Section, and then the assembly instruction transformation sees that there is a jump instruction: JMP 0x1000023A0. 0x2478 is a pointer to the _strcmp symbol, 0x245A is a pointer to the _malloc symbol, 0x2450 is a pointer to the _free symbol, Then we scroll up and down in the (__TEXT, __stub_helper) Section to see that each Lazy Symbol pointer that points to the Lazy Symbol here goes to 0x1000023A0 by default.

JMP qword PTR [rip + 0xEC71] = JMP qword PTR [rip + 0xEC71] = dyLD_stub_binder; Dyld_stub_binder is dyld for pile binding, which is dyLD for symbolic binding.

Dynamic binding of Lazy Symbol Pointers (dyLD_STUB_binder)

Known in (__DATA __got)/(__DATA_CONST __got), (__DATA, __la_symbol_ptr), (__DATA, __NL_symbol_ptr) for Lazy Symbol and non-lazy Symbol Pointers That is, the Lazy Binding pointer table and the Non Lazy Binding pointer table, respectively. The Lazy Binding pointer table is special in that it is named Lazy Binding. When the Mach-o binary executable is loaded through dyld, the symbol Pointers in the Lazy Binding pointer table are not directly bound (to determine the symbol addresses to which they point), but the first call to the function corresponding to the symbol pointer. Procedure Linkage Table (PLT) is used to perform a Lazy Binding. Let’s verify this with the following sample code.

#include <stdio.h>

int main(int argc, char * argv[]) {
    
    printf("♻ ️ ♻ ️ ♻ ️ % s \ n"."hello world");
    printf("♻ ️ ♻ ️ ♻ ️ % s \ n"."hello desgard");
    
    return 0;
}
Copy the code

Then we use Hopper Disassembler to look at its assembly implementation as follows:

_main:
0000000100002160         push       rbp
0000000100002161         mov        rbp, rsp
0000000100002164         sub        rsp, 0x20
0000000100002168         mov        dword [rbp+var_4], 0x0
000000010000216f         mov        dword [rbp+var_8], edi
0000000100002172         mov        qword [rbp+var_10], rsi
0000000100002176         lea        rdi, qword [aXe2x99xbbxefxb] ; argument "format" for method imp___stubs__printf, "\\xE2\\x99\\xBB\\xEF\\xB8\\x8F\\xE2\\x99\\xBB\\xEF\\xB8\\x8F\\xE2\\x99\\xBB\\xEF\\xB8\\x8F %s \\n"
000000010000217d         lea        rsi, qword [aHelloWorld]     ; "hello world"
0000000100002184         mov        al, 0x0
0000000100002186         call       imp___stubs__printf          ; printf
000000010000218b         lea        rdi, qword [aXe2x99xbbxefxb] ; argument "format" for method imp___stubs__printf, "\\xE2\\x99\\xBB\\xEF\\xB8\\x8F\\xE2\\x99\\xBB\\xEF\\xB8\\x8F\\xE2\\x99\\xBB\\xEF\\xB8\\x8F %s \\n"
0000000100002192         lea        rsi, qword [aHelloDesgard]   ; "hello desgard"
0000000100002199         mov        dword [rbp+var_14], eax
000000010000219c         mov        al, 0x0
000000010000219e         call       imp___stubs__printf          ; printf
00000001000021a3         xor        ecx, ecx
00000001000021a5         mov        dword [rbp+var_18], eax
00000001000021a8         mov        eax, ecx
00000001000021aa         add        rsp, 0x20
00000001000021ae         pop        rbp
00000001000021af         ret
   ; endp
Copy the code

You can see that the call imp___stubs__printf function is triggered when it is called; Printf, double click on imp___stubs__printf to see where it is stored:

imp___stubs__printf:        // printf
0000000100002452         jmp        qword [_printf_ptr] ; _printf_ptr, _printf_ptr,_printf, CODE XREF=_main+38, _main+62
   ; endp
Copy the code

Then double click on _printf_ptr to see the following:

_printf_ptr:
0000000100008028         extern     _printf ; DATA XREF=imp___stubs__printf
Copy the code

From the above assembly instruction, we see that the imp___stubs__printf pointer points to 0x0000000100008028, We know that the _printf pointer in our (__DATA, __la_symbol_ptr) Section has an offset of 0x8028 in the Mach-o binary executable. Here we prefixed the two printf methods with breakpoints and debug them using LLDB:

First we use the image list directive to get the first address of the current process in memory 0x000000010b941000.

(lldb) image list
[  0] 7FE17B7A-D271- 3133.9 -A56-A92A3780D8BC 0x000000010b941000/Users/hmc/Library/Developer/Xcode/DerivedData/TEST_Fishhook-guvclcyaszalpmdldofnoxqksebw/Build/Products/Debug-iphonesim ulator/TEST_Fishhook.app/TEST_Fishhook ...Copy the code

We then use the memory read 0x000000010b941000+0x8028 instruction to see what the _printf symbol pointer points to. Given that iOS is in small endian mode, we know that the _printf symbol pointer points to 0x010b94349a

(lldb) memory read 0x000000010b941000+0x8028
0x10b949028: 9a 34 94 0b 01 00 00 00 81 00 00 00 28 00 00 00  4.. (...0x10b949038: 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00(...... (lldb)Copy the code

We then use dis-s 0x010b94349a to convert the data for this address into an assembler instruction and see that there is a JMP 0x10b943458 instruction jump.

(lldb) dis -s 0x010b94349a
    0x10b94349a: pushq  $0x91
    0x10b94349f: jmp    0x10b943458
    0x10b9434a4: jbe    0x10b94350f               ; ""
    0x10b9434a6: ja     0x10b9434ed               ; "ector:"
    0x10b9434a9: imull  $0x72006461.0x6f(%rsp,%rcx,2), %esp ; imm = 0x72006461 
    0x10b9434b1: outsl  (%rsi), %dx
    0x10b9434b2: insb   %dx, %es:(%rdi)
    0x10b9434b3: addb   %ch, %gs:0x6e(%rcx)
(lldb) 
Copy the code

Dyld_stub_binder (); / / dyLD_stub_binder (); / / dyLD_stub_binder (); / / dySTUB_binder (); It then calls the corresponding function.

(lldb) dis -s 0x10b943458
    0x10b943458: leaq   0x7019(%rip), %r11        ; _dyld_private
    0x10b94345f: pushq  %r11
    0x10b943461: jmpq   *0x1ba9(%rip)             ; (void *)0x00007fff2025cbb4: dyld_stub_binder
    0x10b943467: nop    
    0x10b943468: pushq  $0x0
    0x10b94346d: jmp    0x10b943458
    0x10b943472: pushq  $0x12
(lldb) 
Copy the code

So at this point we step through, we hit the second breakpoint, and we see that the console says ♻️♻️♻️ Hello world and that means that our first call to printf is over. At this point we call Memory Read 0x000000010b941000+0x8028 again to see what the _printf symbol pointer points to.

(lldb) memory read 0x000000010b941000+0x8028
0x10b949028: e8 f4 0b 20 ff 7f 00 00 81 00 00 00 28 00 00 00. . (...0x10b949038: 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00(...... (lldb)Copy the code

We can see that at this point the _printf symbol pointer points to 0x7fff200bf4e8, and then we use dis-s 0x7fff200bf4e8 to look at the content, The _printf pointer points to the libsystem_c.dylib printf function.

(lldb) dis -s 0x7fff200bf4e8
libsystem_c.dylib`printf:
    0x7fff200bf4e8 <+0>:  pushq  %rbp
    0x7fff200bf4e9 <+1>:  movq   %rsp, %rbp
    0x7fff200bf4ec <+4>:  pushq  %r14
    0x7fff200bf4ee <+6>:  pushq  %rbx
    0x7fff200bf4ef <+7>:  subq   $0xd0, %rsp
    0x7fff200bf4f6 <+14>: movq   %rdi, %r14
    0x7fff200bf4f9 <+17>: testb  %al, %al
    0x7fff200bf4fb <+19>: je     0x7fff200bf526            ; <+62>
    0x7fff200bf4fd <+21>: movaps %xmm0, -0xb0(%rbp)
(lldb) 
Copy the code

Our Lazy Binding Pointer is called with dyLD_STUB_binder on the first call, and any subsequent calls to it are directly calling the function to which the Lazy Symbol Pointer refers.

ASLR briefly

ASLR(Address Space Layout Randomization) is a security protection technology against buffer overflow. With THE help of ASLR, the starting address of a PE file changes randomly each time it is loaded into memory. ASLR has been implemented in most major operating systems, such as Windows Vista, Linux 2.6.12, Mac OS X 10.7, iOS 4.3, and Android 4.0.

Simply put, ASLR makes penetration (based on buffer overflow) attacks significantly more difficult and increases system security. However, for non-security/reverse Programmer, this is a little bit of a pain in debugging programs. Controlling variables is one of the principles of the debug phase.

Refer to the link

Reference link :🔗

  • LLDB Quick Start Guide
  • Introduction to the LLDB debugger
  • IOS Reverse – Hopper Disassembler
  • IOS retrograde Hopper progression
  • Read fishhook principle
  • Address space layout randomizes ASLR