1. Hook classification

runtime

Using the Runtime feature of OC, the corresponding relationship between SEL (method number) and IMP (method implementation) is dynamically changed to achieve the purpose of changing the process of OC method call. Mainly used for OC methods

fishHook

MachO is a tool to dynamically modify the link mach-o file, using MachO file loading principle, by modifying the lazy load and non-lazy load two table pointer to achieve the purpose of C function HOOK

Cydia Substrate

Formerly known as Mobile Substrate, it is mainly used to HOOK operations against OC methods, C functions and function addresses. It is not only designed for iOS, but can also be used on Android

Second, the fishHook

  • Static: We know that C is static, and the address of the function is determined at compile time,
  • Dynamic: OC is run-time, dynamic, and only goes back to the address of the function when it is actually executed

Let’s take a look at fishHook. First, there’s a structure

struct rebinding { const char *name; // The name of the function that needs a HOOK, C string void *replacement; // The address of the new function. The name of the C function is the function pointer void **replaced; // Pointer to the original function address! };Copy the code

There is a function that hooks all C functions

  • Parameter 1: An array of rebinding structures that can swap multiple functions at once, with the names of the functions to be swapped and the addresses of old and new functions known from the structure
  • Parameter 2: Length of array
int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel);
Copy the code

Basic usage – Hook system function NSLog

//---------------NSLog-------------------- +(void)handlNSLog{ struct rebinding nslog; nslog.name = "NSLog"; nslog.replacement = myNSLog; // When fishhook is running, the NSLog address is dynamically obtained. nslog.replaced = (void *)&sys_nslog; Struct rebinding rebs[1] = {nslog}; * arg2: */ rebind_symbols(rebs, 1); Static void(*sys_nslog)(NSString * format,...) ; // the name of the function is a pointer to the function. // Define a new function void myNSLog(NSString * format,... {format = [format stringByAppendingString:@"\n "] ; // call sys_nslog(format); }Copy the code

Why is the replacement a second-level pointer and given the value (void *)&sys_nslog?

  • Because NSLog is in the funcation framework, the memory location of funcations of different mobile phones is different, so we cannot determine the real address of the function during compilation
  • But fishHook can dynamically retrieve NSLog addresses at run time
  • At this point fishHook needs a pointer to hold the function addresses it gets. That’s why the replacement is a second-level pointer to hold the function addresses fishHook gets.
  • And when we HOOK the function, it is possible to execute the new function after the original function, this time the custom function pointer is used;
  • The OC is object-oriented, passing data through an object, and the object itself is a pointer, so we can change the value of the object,
  • But for c functions, they are all passed by value, so you can’t change the value of the function, so you need to get a pointer to the address of the function,
  • NSLog is a function address, so you can’t change it, you have to get a pointer to the address that holds NSLog, and then you can change it, so that’s why you need to use void * &nslog

Hook custom functions

void func(const char * str){ NSLog(@"%s",str); } - (void)viewDidLoad { [super viewDidLoad]; Struct rebinding nslog; nslog.name = "func"; nslog.replacement = new_func; nslog.replaced = (void *)&old_func; Struct rebinding rebs[1] = {nslog}; struct rebinding rebs[1] = {nslog}; /** * rebind_symbols(rebs, 1); } / / -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- change the NSLog -- -- -- -- -- -- -- -- -- -- - / / function pointer static void (* old_func) (const char * STR); // define a new function void new_func(const char * STR){NSLog(@"%s + 1", STR); } -(void)touch began :(NSSet< touch *> *)touches withEvent:(UIEvent *)event {// touch (); NSLog(@" ha ha!" ); }Copy the code

Print:

[33081:7724521] 2020-11-05 20:59:26.343317+0800 --fishHookDemo[33081:7724521 001 - fishHookDemo [33081-7724521] ha-haCopy the code

We found no hook success. Why is this?

  • Since the runtime can only exchange dynamic functions, and custom functions are static, the memory address is determined at compile time, in the MachO file
  • The system functions have dynamic parts, because iOS puts all the system libraries in dyld shared cache to save memory
  • The memory address of each mobile phone’s dynamic library is different, so the compiler cannot locate the real memory address of the system function during compilation, so it cannot access the real address of the system function
  • At this point, dyld is used to link MachO files

PIC technology

  • When a compiler compiles a program, it compiles the program into a binary file, and all the code for C functions is binary.
  • Due to the nature of C language, these binaries have to become memory addresses, while system functions such as NSLog cannot be determined. Apple uses PIC technology (location code independence) to solve this problem, which causes the system function call to become dynamic.

When we analyzed the MachO file, we found that the data layer of MachO file and Load Commands directly had a free memory space, and this space was filled with one symbol after another. It was in the data segment, because the data segment is readable and writable.

So at compile time, since the system functions need a function memory address, compile time will compile the system functions into symbol after symbol in the MachO file, and then convert the functions into symbols pointing to MachO

When the application is loaded into the memory by dyLD, MachO file is read during loading, and dynamic library of the dependent system is read from Load Commands, and then a link binding action is performed, that is, the real address of the function in the system dynamic library is assigned to the symbol of the MacO file, which is called binding

After binding, our program will first look for the symbol when calling the system function, and then find the real function address from the symbol, at this time will mobilize the system function

Custom functions do not need to be linked because they are in the MachO file in the compiler, so PIC technology is not used

FishHook was able to hook system functions by rebinding the symbols, so that the symbols in MachO pointed to system functions. FishHook now changed the symbols to point to custom functions, and then copied the system function addresses to function pointer variables.

We can look at the MachO file, these are all symbols in MachO

We can see that the symbol is actually the name of the function, so in fishHook’s case we can pass in the function name and he can find the corresponding symbol by the function name

Analysis of symbol binding process

First we’ll write the following code in our Demo

And then you compile it, and when you’re done you take the executable file and open it with MachOView

We see that the Symbol table is divided into non-lazy Symbol Pointers and non-lazy Symbol Pointers, and we see that the NSLog is inside the Lazy Symbol Pointers, We guessed that the corresponding symbol was not linked to the NSLog function in the system library before the NSLog call.

  • We find that the memory Offset for the NSLog binding symbol is 00008010, which is position 00008010 from the MachO file
  • We print out all the image files through the image List and find the starting address of our project memory, which is 0x0000000102960000

  • We know the start address of our MachO file, and we know the offset size of the NSLog symbol, and we add up the memory address of the NSLog symbol 0x0000000102960000 + 00008010, which is 0x102968010 in hexadecimal

  • Then use Xcode to read the memory instruction memory read 0x102968010 to see the memory address value stored in this symbol

(lldb) memory read 0x102968010 0x102968010: 14 69 96 02 01 00 00 00 78 83 ac 99 01 00 00 00 .i...... x....... 0x102968020: 68 1d 0d 9b 01 00 00 00 98 69 96 02 01 00 00 00 h........ i...... (lldb)Copy the code
  • Since Pointers are all 8 bytes, and iOS is in small-endian mode and reads memory from right to left, the address 0x0000000102966914 is stored inside the symbol
  • Let’s look at the value in this memory address using the assembly instruction dis-S

It’s obviously not an NSLog function, because in the MachO file the NSLog is in the lazy loading symbol table, which means that the real NSLog function is not linked to the symbol until the NSLog is called

  • So we’re going to skip the first breakpoint, and now NSLog has executed, and we’re going to look at the value that’s stored in the symbol
(lldb) memory read 0x102968010 0x102968010: bc 46 ad 99 01 00 00 00 78 83 ac 99 01 00 00 00 .F...... x....... 0x102968020: 68 1d 0d 9b 01 00 00 00 98 69 96 02 01 00 00 00 h........ i...... (lldb)Copy the code

Notice that the value has changed, and then check the assembly through dis-S

(lldb) memory read 0x102968010 0x102968010: bc 46 ad 99 01 00 00 00 78 83 ac 99 01 00 00 00 .F...... x....... 0x102968020: 68 1d 0d 9b 01 00 00 00 98 69 96 02 01 00 00 00 h........ i...... (lldb) dis -s 0x0199ad46bc Foundation`NSLog: 0x199ad46bc <+0>: pacibsp 0x199ad46c0 <+4>: sub sp, sp, #0x20 ; =0x20 0x199ad46c4 <+8>: stp x29, x30, [sp, #0x10] 0x199ad46c8 <+12>: add x29, sp, #0x10 ; =0x10 0x199ad46cc <+16>: adrp x8, 321987 0x199ad46d0 <+20>: ldr x8, [x8, #0x80] 0x199ad46d4 <+24>: ldr x8, [x8] 0x199ad46d8 <+28>: str x8, [sp, #0x8] (lldb)Copy the code

What we find is that the memory address that is stored in the symbol becomes the NSLog function in Foundation, and when the NSLog is executed, the symbol is bound to the actual NSLog function address

  • Let’s look at the fishHook rebinding process again. Let’s go over the breakpoint again and look at the memory address stored in the symbol:
(lldb) memory read 0x102968010
0x102968010: a8 5b 96 02 01 00 00 00 78 83 ac 99 01 00 00 00  .[......x.......
0x102968020: 68 1d 0d 9b 01 00 00 00 a4 2d 8b a1 01 00 00 00  h........-......
(lldb) dis -s 0102965ba8
fishHookDemo`myNSLog:
    0x102965ba8 <+0>:  sub    sp, sp, #0x30             ; =0x30 
    0x102965bac <+4>:  stp    x29, x30, [sp, #0x20]
    0x102965bb0 <+8>:  add    x29, sp, #0x20            ; =0x20 
    0x102965bb4 <+12>: sub    x8, x29, #0x8             ; =0x8 
    0x102965bb8 <+16>: mov    x9, #0x0
    0x102965bbc <+20>: stur   x9, [x29, #-0x8]
    0x102965bc0 <+24>: str    x0, [sp, #0x10]
    0x102965bc4 <+28>: mov    x0, x8
(lldb) 
Copy the code

We found that the memory address stored in the symbol changed to myNSLog, which completes symbol rebinding. This is why fishHook can only hook system functions, but not custom functions

How does fishHooK find symbols through strings

But how does fishHook find the symbol for the corresponding function? We just give fishHook a function name, and the fishHook framework on Github has a graph

In fact, this diagram is fishHook’s process of finding symbols through strings, but after all, abstract, let’s use MachOView to visualize the analysis:

  • We see in the machO file that NSLog is at the top of the Lazy Symbol Pointers table

  • The Lazy Symbol Pointers Table is opposite to the Dynamic Symbol Table, and the first one we see in this Table is also NSLog

  • And the Dynamic Symbol Table also corresponds to another Symbol Table one by one. In the Dynamic Symbol Table, we found that the Data corresponding to the NSLog was 8A, changed to base 10, that is, 138. Let’s look at item 138 of the Symbol Table:

  • It is found that item 138 corresponds to NSLog, and the corresponding Data in the Symbol Table is A1, and the string Table Index written in the description indicates that A1 is the offset value in the String Table. Let’s look for sting Table again.

  • All the strings in the STING table are string constants, and the qualified table starts with 0000CEE8. We know that the offset of the NSLog is A1, which adds up to 0xCF89. Since each row has 16 self, we find the row CF88, The next one is the NSLog code 5F, 4E, 53, 4C, 6F, and then you find the NSLog string

  • So when we pass in a string, how does fish find the corresponding symbol

Preliminary protection of hook

  • As we learned in dynamic injection, we can hook functions in ipA packages by dynamically injecting them into the framework
  • We know that fishHook can hook system functions
  • So we can hook the library runtime related functions, so that we can monitor who hooks our functions, so that we can do some protection

Code:

+(void)load{// struct rebinding bd; bd.name = "method_exchangeImplementations"; bd.replacement = myExchange; bd.replaced = (void *)&exchangeP; struct rebinding rebs[1] = {bd}; rebind_symbols(rebs, 1); // Attack code! // getIMP setIMP Method old = class_getInstanceMethod(self, @selector(btnClick1:)); Method newMethod = class_getInstanceMethod(self, @selector(click1Hook:)); method_exchangeImplementations(old, newMethod); Void (*exchangeP)(Method _Nonnull m1, Method _Nonnull m2); Void myExchange(Method _Nonnull m1, Method _Nonnull m2){NSLog(@" HOOK detected!! "); ); } -(void)click1Hook:(id)sender{NSLog(@"HOOK successfully!! ") ); }Copy the code

Note:

  • First of all, we need to ensure that our protection code is executed before the hook code, because only after the protection code is executed, we can monitor the execution of system functions. For example, we can write the protection code into the framework, and the framework through injection is behind the framework of Macho itself. Therefore, it can ensure that the execution time of protection code is relatively early,
  • If we hook the system function with fishHook, then the original system function will not be used, if we want to continue to use the function after we hook. For example, myExchange above
  • Because is the system function, system functions, system framework will call, if we hook up, may cause the app is not stable, so we can’t literally was banned from system function hook on to the end, for example WeChat didn’t do some source protection, just do the test, if the test is hook, will the current account titles
  • We can actually disable the guard code by modifying the machO file. For example, we hook system functions by the name of the function, if we get the executable file to modify the corresponding position of the character, will let the guard code try.

Let’s manipulate it:

We open it with MachO, and then we go to the character area for the function name string of the corresponding function, so for example we can find method_exchangeImplementations character

Then change the ASCII code of the corresponding symbol, changing 6e to 6F, which changes n to O

So that makes method_exchange method_exchaoge, so that the safeguard code can’t find the corresponding symbol with method_exchangeImplementations, so we can crack the safeguard, and there are many ways like that, But essentially we all need the principles of properties