One, foreword

Stub, Stub_Helper, la_symbol_ptr, etc. I have read a lot of articles about delayed binding. I know that stub, Stub_Helper, la_symbol_ptr, etc. For externally defined function calls, stub_Helper is used to dynamically find the address of the function call at runtime for the first call. It is then stored in the data segment of the LA_SYMBOL_ptr segment. On the second visit, just call to the address of the function call. But if you don’t know the details, prepare to figure it out yourself today.

What is lazy bind?

When our program is compiled, some externally defined functions (functions contained in the system library or App’s own dynamic library) cannot be called at the address of the compiler, because the address loaded by the dependent dynamic library in the runtime environment is uncertain. Of course, the external definition of the first version refers to C/C++ static call, if the OC method does not need to delay binding. Because OC’s method calls rely on MSG_send to dynamically search the list of methods in the class for the call address. So even if an externally defined OC method is called, as long as the OC class is loaded, it can be called through the OC runtime. During compilation, externally defined C methods are recorded in the binary. The symbols requiring Lazy Binding are recorded in the Dynamic Loader Info -> Lazy Binding Info binary file. Through MachOView we can observe the segment information in binary, as shown in Figure 1 below.

Printf is a method called in the app. The definition of this method must be implemented by the system library. During the program compilation process, you can know that the printf method is implemented in the system library libsystem.b.dylib. Libsystem.b.dilib is the address of the printf function call when the program is running. The system library loading address does not change while the program is running, and the printf call address relative to libsystem.b.dilib does not change. This means that the printf call address does not change while the program is running. This behavior only needs to be done once. This process is called lazy bind.

Lazy bind

To explore this process, I wrote a Demo that calls printf twice in the main function.

#import <UIKit/UIKit.h>

int main(int argc, char * argv[]) {
Copy the code

So what we want to track is how printf is called. Debug it on my real ARM64 device. First, make a breakpoint at printf(“1”) and switch the code to assembly with Debug->Debug Workflow->Always Show Disassembly so that we can see the call address. See Figure 2.

Figure 2 shows the assembly code of main method. It can be seen that printf is called twice in main, both of which are called by bl 0x10464a624. Here the system has helped us to find the corresponding symbol 0x10464a624, which is symbol Stub for: Printf, this is just the post function of printf. Don’t worry about the implementation of the pile function, let’s look at the call address here. Due to the existence of ASLR (Address Space Layout Randomization), every time our App starts, the first Address loaded in binary is randomly assigned, which is a means for iOS to prevent hackers. In our example, you can see that the entry to the main function is at the address 0x10464a300. In MachOView we can search the main method in the __TEXT, __TEXT section, as shown in Figure 3. As you can see in the figure, the starting address of the main function is 0x100006300, which is calculated from the offset 0x00006300 of the _main method in the binary, plus the virtual memory 0x100000000 of the __PAGEZERO segment (Figure 4). That is, if there is no ASLR, the address of main should be 0x100006300. We can see from diff that the ASLR offset of our program currently running is 0x10464A300-0x100006300 = 0x4644000. Remember this offset, we’ll use it later. This offset will change every time you restart or Debug.

Returning to the process of calling printf, we use the LLDB command breakpoint set -a 0x10464a624 to make a breakpoint at the stake function of printf to continue the program and jump to the stake function as shown in Figure 5.

There are only three lines of code here, noP stands for an empty operation, and it’s the last two that really make sense. Add the current PC to the immediate number 0x5a10. The result is an address. Read the number stored at that address into register X16. The third line of code jumps to the x16 stored address. So what we care about is where this address came from, and what does the data stored in it (the address to which the BR finally jumped) represent?

0x10464A624 <+0>: nop, ignore 0x10464a628 <+4>: LDR X16,#0x5a10 // x16 = *(pc + 0x5a10) = *(0x10464a628 + 0x5a10) = *(0x104650038)0x10464A62c <+8>: br X16 // Jumps to the X16 storage addressCopy the code

Through calculation, we know that the address is 0x104650038. Through LLDB command, we trace the information about this address, as shown in Figure 6:

As you can see, the address 0x104650038 stores the data 0x10464a69c, which is consistent with the call address resolved by Xcode in Figure 5. The address 0x104650038 exists in the binary __data.__la_symbol_ptr at line 56 offset. We use MachOView to locate and find that this stores lazy pointer data for the _printf symbol, as shown in Figure 7. We can also verify by offsetting the address 0x104650038-0x4644000 = 0x10000C038, which also matches the address displayed in MachOView.

The post function reads the address of the symbol in lazy_symbol_ptr and jumps to it. But the jump address 0x10464a69c we read at this point is not the actual call address for printf. Let’s follow up on this address. As you can see, this address is offset 108 from the __TEXT.__stub_helper section of the current binary. Let’s make a breakpoint at this address and continue to run the program.

As shown in Figure 9, the first three lines are useful. The first line reads the data stored in address 0x10464a6A4 into register W16, and the address 0x10464a6A4 happens to be the third line, which actually stores the hard-coded number 0xb9, so w16 = 0xB9. The second line jumps the current program directly to the address 0x10464a630. The address we trace is then converted to 0x10464a630, and the 0xb9 stored in W16 represents what we care about (this is obviously a pass-through). Make a breakpoint at 0x10464A630 and continue running the program.

See Figure 10

0x10464a630: adr    x17, #0x6e38 // PC + 0x6e38 = 0x104651468 // PC + 0x6E38 = 0x104651468 // PC + 0x6E38 = 0x104651468 // See Figure 110x10464a634: NOP 0x10464a638: STP x16, x17, [sp,#-0x10]! // Move the SP stack pointer to push data stored in X16 and X17 onto the stack0x10464A63C: NOP 0x10464A640: LDR X16,#0x19c0 // 0x000000019F3DFB44: DYLD_STUB_binder call address written to the X16 register0x10464a644: br x16 // Jump to dyLD_stub_binder (libdyld.dylib.__text.__TEXT + 11568)Copy the code

Dyld_stub_binder is the dyLD library’s lazy binding method. We call this method with two arguments: x16 (0xB9) and _dyLD_private (X17). So the question is, what exactly does the x16 parameter carry? If we call dyLD_stub_binder to bind, we must at least tell the function the call address of the method we are looking for (in this case, printf). Since the second argument is not, the first argument must contain this information. 0xb9 represents the offset of printf in the Dynamic Loader Info -> Lazy Binding Info -> Actions section, as shown in Figure 1. 0x10351-0x10298 = 0xb9. We can see that not only the name of symbol _printf, but also the name of the dynamic library where the symbol resides libsystem.b.dylib, dyld can search for the corresponding symbol in the corresponding dynamic library through these two information, and then return the address. Write back to the la_SYMBOL_ptr data segment corresponding to printf. By observing stub_Helper data section in MachOView, we can more clearly find the internal connection between these data sections. At the top is a generic binding code whose purpose is to get the address of the symbol by calling dyLD_stub_binder. The following data segment, in groups of three lines, acts as an intermediate springboard to which each symbol’s LA_symbol_ptr refers when initialized. This springboard provides the symbol information (in this case, by providing 0xB9 offset to dyLD_STUB_binder to provide the symbol name and dynamic library name information), and each springboard eventually calls the generic binding method to achieve the final binding.

The second time we run printf, when we read la_symbol_ptr, we find that we don’t need to bind anymore. 0x000000019F24c978 is the address where printf is called.

Four, some questions

The above procedure has explained how to find the symbol of printf, but there is one problem that has not been covered

1. When the binding is first acquiredprintfAfter the address, how to write back toprintfthela_symbol_ptrWhat about in the data segment?

Printf (la_symbol_ptr) : printf (la_symbol_ptr) : printf (la_symbol_ptr) The diagram below:

2,dyld_stub_binderThe method that actually finds the address of the symbol is itself an externally defined symbol. How is the address of its call determined?

Guess: DyLD_stub_binder is not found in lazy_bind_info, that is, it is not an address determined by a delayed binding mechanism. As you can see from the Dynamic Symbol Table, unlike printf, the associated address is in the __DATA_CONST,__got data segment, which is a non-lazy Symbol pointer as shown in Figure 16. So the address of the symbol dyLD_STUB_binder must have been determined when the binary was loaded, not through late binding

3,_dyld_privateWhat does this reference mean? This is not yet clear, but the great god of deep research is hoping to help.

Five, the conclusion

Delayed binding is a very common technical point, but it can be implemented differently on different platforms. I often read some big V articles before, but most of the time it is said that only a little understanding, really get started when the operation or can learn a lot of new things, knowledge is very helpful. The above question is only my understanding at the present stage, if there is any problem, please criticize and correct.

Vi. Reference materials…………