• fishhook Github

1. Fishhok principle

Dyld binds lazy and non-lazy symbols by updating Pointers to specific __DATA segments in the Mach-O binary. Fishhook determines where the symbols need to be updated by passing the symbol names to rebind_symbols, and then rebinding these symbols with the appropriate replacements.

For a given image, the __DATA segment can contain two parts related to dynamic symbol binding: __NL_symbol_ptr and __la_symbol_ptr.

  • __nl_symbol_ptr is an array of Pointers to non-lazily bound data (these Pointers are bound when the library is loaded).

  • __la_symbol_ptr is an array of Pointers to import functions, usually populated by a routine named dyLD_STUB_Binder the first time the symbol is called (dyLD can also be told to bind these Pointers at startup).

To find the name of a symbol that corresponds to a particular location in one of these sections, we need to look through several layers of indirection.

  • For the two related parts, the section header (struct section declared in

    ) provides an offset (in the reserveD1 field) to the so-called indirect symbol table.

  • The indirect symbol table, located in the __LINKEDIT section of the binary file, is simply an array of indexes in the symbol table (also in __LINKEDIT) in the same order as Pointers in the non-lazy and lazy symbol sections. Struct section NL_SYMBOL_ptr, indirect_symbol_TABLE [NL_symbol_ptr -> reserveD1]

  • The symbol table itself is an array of struct nLists (see < Mach-o /nlist.h>), and each nlist contains an index to the string table in __LINKEDIT, where the actual symbol names are stored. Therefore, for each pointer __nl_symbol_ptr and __la_symbol_ptr, we can find the corresponding symbol, then find the corresponding string to compare with the requested symbol name, and replace the pointer in the section with a replacement if there is a match.

2. Test code

/ / -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- change the NSLog -- -- -- -- -- -- -- -- -- -- -
// Function pointer
static void(*sys_nslog)(NSString* format,...) ;// Define a new function
void my_nslog(NSString* format,...) { format = [format stringByAppendingString:@" Why are you here again \n"];
    // Call the original
    sys_nslog(format);
}

@implementation ViewController

- (void)viewDidLoad {
    [super viewDidLoad];
    
    NSLog("Here comes the log, buddy.");
    
    struct rebinding nslog;
    nslog.name = "NSLog";
    nslog.replacement = my_nslog;
    nslog.replaced = (void *)&sys_nslog;
    struct rebinding rebs[1] = {nslog};
    rebind_symbols(rebs, 1);
    
    NSLog("Here comes the log, buddy.");
}

@end
Copy the code

Running results:

2020-03-16 09:47:38.536892+0800 Demo[28657:5210895] log [28657:5210895] log [28657:5210895] Dude, you're back in townCopy the code

3. The Mach – O attached

MachOView will bring up an input box for you to enter PID.

The PID can be ⌘ to + 7 quickly under the Xcode Show the Debug Navigator menu. Here we can see the PID of the process, entered in the box above.

4. MachOView and source code reading verification

Top data definition and initialization

struct rebindings_entry {
    struct rebinding *rebindings;
    size_t rebindings_nel;
    struct rebindings_entry *next;
};

static struct rebindings_entry *_rebindings_head;

// Allocate space to method structures that require rebinding
// Generate the corresponding linked list structure (rebindingS_entry)
static int prepend_rebindings(struct rebindings_entry **rebindings_head,
                              struct rebinding rebindings[],
                              size_t nel) {
    // create a rebindingS_entry size space
    struct rebindings_entry *new_entry = (struct rebindings_entry *) malloc(sizeof(struct rebindings_entry));
    if(! new_entry) {return - 1;
    }
    // There are nel rebinding
    new_entry->rebindings = (struct rebinding *) malloc(sizeof(struct rebinding) * nel);
    if(! new_entry->rebindings) { free(new_entry);return - 1;
    }
    // Assign rebinding to new_entry->rebindings
    memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel);
    // Continue to assign nel
    new_entry->rebindings_nel = nel;
    // New_entry is inserted into the header each time
    new_entry->next = *rebindings_head;
    // rebindings_head repoints to the head
    *rebindings_head = new_entry;
    return 0;
}
Copy the code

Here the rebindingS_ENTRY linked list is defined. Each time a binding is done, the struct Rebinding reBindings [] array is passed in, a new rebindingS_entry structure is created, and this structure is inserted into the linked list header.

Two public methods

static void _rebind_symbols_for_image(const struct mach_header *header, intptr_t slide) {
    // Find the corresponding symbol and rebind it
    rebind_symbols_for_image(_rebindings_head, header, slide);
}

// This method can be used when MachO is known to be certain
int rebind_symbols_image(void *header,
                         intptr_t slide,
                         struct rebinding rebindings[],
                         size_t rebindings_nel) {
    struct rebindings_entry *rebindings_head = NULL;
    int retval = prepend_rebindings(&rebindings_head, rebindings, rebindings_nel);
    rebind_symbols_for_image(rebindings_head, (const struct mach_header *) header, slide);
    if (rebindings_head) {
        free(rebindings_head->rebindings);
    }
    free(rebindings_head);
    return retval;
}

int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel) {
    int retval = prepend_rebindings(&_rebindings_head, rebindings, rebindings_nel);
    if (retval < 0) {
        return retval;
    }
    // If this is the first call, add a registration callback for image (this will also be called for the existing image, otherwise, only run on the existing image
    if(! _rebindings_head->next) {// Register the _rebind_SYMBOLs_for_image function with each image and trigger it immediately
        _dyld_register_func_for_add_image(_rebind_symbols_for_image);
    } else {
        // _dyLD_image_count () gets the number of images
        uint32_t c = _dyld_image_count();
        for (uint32_t i = 0; i < c; i++) {
            // _dyLD_GET_image_header (I) Gets the header pointer for the ith image
            // _dyLD_GEt_image_vmaddr_slide (I) gets the base address of the ith image_rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i)); }}return retval;
}
Copy the code

Rebind_symbols_image and rebind_symbols are two public methods for rebinding symbols. Rebind_symbols_image is used to specify the symbol binding for the images. Rebind_symbols processes all images.

In either case, you end up calling Rebind_SYMBOLs_for_image to get the address of the relevant part.

Address of the relevant section

static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
                                     const struct mach_header *header,
                                     intptr_t slide) {
    Dl_info info;
    // Check whether macho is currently in process, if not, return directly
    if (dladdr(header, &info) == 0) {
        return;
    }
    
    // Define several variables
    segment_command_t *cur_seg_cmd;
    // linkedit in Load Commons in MachO
    segment_command_t *linkedit_segment = NULL;
    / / LC_SYMTAB MachO
    struct symtab_command* symtab_cmd = NULL;
    / / LC_DYSYMTAB MachO
    struct dysymtab_command* dysymtab_cmd = NULL;
    
    // header header address +mach_header memory size
    // get the address to skip mach_header, i.e. directly to Load Commons
    uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);
    // Traverse Load Commons to find the above three traversals
    for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
        cur_seg_cmd = (segment_command_t *)cur;
        // 如果是LC_SEGMENT_64
        if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
            / / find linkedit
            if (strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) { linkedit_segment = cur_seg_cmd; }}// if LC_SYMTAB, symtab_cmd is found
        else if (cur_seg_cmd->cmd == LC_SYMTAB) {
            symtab_cmd = (struct symtab_command*)cur_seg_cmd;
        }
        // if LC_DYSYMTAB is used, dysymtab_cmd is found
        else if (cur_seg_cmd->cmd == LC_DYSYMTAB) {
            dysymtab_cmd = (structdysymtab_command*)cur_seg_cmd; }}// Return if any of the following values are not returned
    // Because image is not the image to be found
    if(! symtab_cmd || ! dysymtab_cmd || ! linkedit_segment || ! dysymtab_cmd->nindirectsyms) {return;
    }
    
    // Find base symbol/string table addresses
    // Find the header address of linkedit
    uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
    // Get the real address for symbol_table
    nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
    // Get the real address of string_table
    char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);
    
    // Get indirect symbol table (array of uint32_t indices into symbol table)
    // Get the real address of indirect_symtab
    uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);
    // Again, get the address to skip mach_header, get the address to Load Commons
    cur = (uintptr_t)header + sizeof(mach_header_t);
    // Iterate through Load Commons to find the corresponding symbol to rebind
    for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
        cur_seg_cmd = (segment_command_t *)cur;
        if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
            // If it is not __DATA or __DATA_CONST, skip it
            if(strcmp(cur_seg_cmd->segname, SEG_DATA) ! =0&& strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) ! =0) {
                continue;
            }
            // Iterate over all sections
            for (uint j = 0; j < cur_seg_cmd->nsects; j++) {
                section_t *sect = (section_t *)(cur + sizeof(segment_command_t)) + j;
                // load the table S_LAZY_SYMBOL_POINTERS
                if ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {
                    // Rebind the real function
                    perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
                }
                // find the non-lazy loading table S_NON_LAZY_SYMBOL_POINTERS
                if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {
                    // Rebind the real function
                    perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
                }
            }
        }
    }
}
Copy the code

At the top, the base address of the load instruction is obtained through the header pointer and header size. Then iterate to get three data structures:

// linkedit in Load Commons in MachO
segment_command_t *linkedit_segment = NULL;
/ / LC_SYMTAB MachO
struct symtab_command* symtab_cmd = NULL;
/ / LC_DYSYMTAB MachO
struct dysymtab_command* dysymtab_cmd = NULL;
Copy the code

Here is the core code:

// Find the header address of linkedit
uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
Copy the code

Linkedit_segment ->vmaddr = 4294995968 linkedit_segment-> fileOFF = 28672 It might not be obvious that this is the base address, so let’s format it:

(lldb) p/x 4294995968
(long) $0 = 0x0000000100007000
(lldb) p/x 28672
(int) $1 = 0x00007000
(lldb) p/x 4294995968 - 28672
(long) $2 = 0x0000000100000000
Copy the code

We can see that this part is just getting the memory base of the image.

// Get the real address for symbol_table
nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
// Get the real address of string_table
char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);
Copy the code

Get the offset of the symbol table from the struct symtab_command structure, then load the base address is the address of the two tables in memory.

(lldb) p/x 0x0000000100000000 + 30200
(long) $3 = 0x00000001000075f8
(lldb) p/x 0x0000000100000000 + 33408
(long) $4 = 0x0000000100008280
Copy the code

Through MachOView we also verify that these two addresses are correct.

// Get the real address of indirect_symtab
uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);
Copy the code

Struct dysymtab_command to obtain the indirect symbol table.

(lldb) p/x 0x0000000100000000 + 33224
(long) $5 = 0x00000001000081c8
Copy the code

The address of the indirect symbol table is also obtained.

Two parts related to dynamic symbol binding

// Again, get the address to skip mach_header, get the address to Load Commons
cur = (uintptr_t)header + sizeof(mach_header_t);
// Iterate through Load Commons to find the corresponding symbol to rebind
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
    cur_seg_cmd = (segment_command_t *)cur;
    if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
        // If it is not __DATA or __DATA_CONST, skip it
        if(strcmp(cur_seg_cmd->segname, SEG_DATA) ! =0&& strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) ! =0) {
            continue;
        }
        // Iterate over all sections
        for (uint j = 0; j < cur_seg_cmd->nsects; j++) {
            section_t *sect = (section_t *)(cur + sizeof(segment_command_t)) + j;
            // load the table S_LAZY_SYMBOL_POINTERS
            if ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {
                // Rebind the real function
                perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
            }
            // find the non-lazy loading table S_NON_LAZY_SYMBOL_POINTERS
            if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {
                // Rebind the real functionperform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab); }}}}Copy the code

For a given image, the __DATA segment contains two parts related to dynamic symbol binding: __nl_symbol_ptr and __la_symbol_ptr. Iterate to find the two parts, and then symbolic rebind.

Symbol rebinding

static void perform_rebinding_with_section(struct rebindings_entry *rebindings,
                                           section_t *section,
                                           intptr_t slide,
                                           nlist_t *symtab,
                                           char *strtab,
                                           uint32_t *indirect_symtab) {
    // reserved1 corresponds to the offset in indirect_symbol, which is the real address of indirect_symbol
    // indirect_symtab+offset = indirect_symbol_indices
    uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1;
    // function address, addr is the offset address of section
    void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);
    // Iterate over each symbol in section
    for (uint i = 0; i < section->size / sizeof(void *); i++) {
        // Access indirect_symbol, symtab_index is the value of data in indirect_symbol
        uint32_t symtab_index = indirect_symbol_indices[i];
        if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL ||
            symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) {
            continue;
        }
        // Access symbol_table and get the offset in symbol_table according to symtab_index
        uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx;
        // Access string_table and get symbol_name based on strtab_offset
        char *symbol_name = strtab + strtab_offset;
        // all function names in string_table start with ".", so a function must have two characters
        bool symbol_name_longer_than_1 = symbol_name[0] && symbol_name[1];
        struct rebindings_entry *cur = rebindings;
        // Already saved rebindings_entry
        while (cur) {
            // Loop through functions that need to be rebound in each entry
            for (uint j = 0; j < cur->rebindings_nel; j++) {
                // Check whether symbol_name is the correct function name
                // Whether the function name to be rebound is equal to the current symbol_name
                if (symbol_name_longer_than_1 &&
                    strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) {
                    // Determine whether the function exists
                    // Determine whether the new function is the same as the old one
                    if(cur->rebindings[j].replaced ! =NULL&& indirect_symbol_bindings[i] ! = cur->rebindings[j].replacement) {// Give the address of the old function to the new function
                        *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i];
                    }
                    // Assign replacement to the one just found
                    indirect_symbol_bindings[i] = cur->rebindings[j].replacement;
                    gotosymbol_loop; }}// Proceed to the next function to bindcur = cur->next; } symbol_loop:; }}Copy the code

This part is like the Fishhook principle:

  1. indirect_symbol_indices[nl_symbol_ptr->reserved1]Get the function starting address of the indirect symbol table.
  2. indirect_symbol_bindingsisnl_symbol_ptrThe corresponding array of function Pointers to.
  3. Iterate through the indirect symbol table in turn to get the symbol table index value, and extract the symbol table corresponding index value structure, get the offset in the character table.
  4. Gets the first character array address of the function name from the character table and offset.
  5. All function names in the character table.It starts, so at least 2 characters.symbol_name[1]Yes, remove the beginning.String of.
  6. Loop through the linked list we want to bind, comparing function names withsymbol_name[1]If it is equal, give the original function address toreplacedAnd replace the address of the original function with the one we want to bindreplacementFunction address.

If you found this article helpful, give me a thumbs up