preface

This article is the final chapter of this compilation, mainly covering four parts:

  1. Compiler optimization
  2. Pointer to the
  3. OC disassembly
  4. Block the disassembly

Compiler optimization

Compiler optimization first, this is actually the optimization function of the XCode compiler itself, the compiler automatically simplifies the optimization of assembly code logic. This knowledge point we know can, do not need to understand too deep.

Or, as usual, an example of how the compiler is optimized? Sample πŸ‘‡

int global = 10;

int main(int argc, char * argv[]) {
    int a = 20;
    int b = global + 1;
    return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class]));
}
Copy the code

Xcode default does not optimize in case of assembly code πŸ‘‡

Next we change the compiler optimization rules πŸ‘‡

The following table describes the optimal configuration πŸ‘‡

Configuration options instructions Specific meaning
None [-O0] Don’t optimize The goal of the compiler is to reduce compilation costs and ensure that the desired results are produced during debugging. The program statements are independent: if the program stops at a breakpoint on a line, we can assign a new value to any variable or point the program counter to any statement in the method, and get a run result that is exactly the same as the source code.
Fast [-O, O1] Large functions require slightly more compile time and memory consumption In this setting, the compiler tries to reduce the size of the code file and the execution time, but does not perform optimizations that require a lot of compile time. In Apple’s compiler, strict aliases, block rearrangements, and scheduling between blocks are disabled by default during optimization. This optimization level provides a good debugging experience, improved stack utilization, and better code quality than None[-o0].
Faster [-O2] The compiler performs all supported optimization options that do not involve time-space swapping Higher performance optimization Fast[-O1]. In this setting, the compiler does not loop unwrap, function inlining, or register renaming. This setting increases compilation time and generated code performance compared to the ‘Fast[-o1]’ option.
Fastest [-O3] When all optimization items supported by Fast[-O1] are enabled, function inlining and register renaming options are enabled Faster[-O2] is the higher performance optimization that instructs the compiler to optimize the performance of the generated code, ignoring the size of the generated code, potentially resulting in larger binaries. It also reduces the debugging experience.
Fastest, Smallest [-Os] Maximize performance without significantly increasing code size This setting enables all optimizations in Fast[-o1] that do not increase the code size, and further optimizations that reduce the code size are performed. The code size increased is smaller than the Fastest[-O3]. It also reduces the debugging experience compared to Fast[-O1].
Fastest, Aggressive Optimizations [-Ofast] This rank also performs other, more aggressive optimizations than the Smallest[-OS] This setting enabled all of the optimizations in the Fastest[-O3] option, as well as active optimizations that might break the strict compilation standards, but did not affect the code that worked well. This level degrades the debugging experience and can lead to increased code size.
Smallest, Aggressive Size Optimizations [-Oz] Reduce code size without using LTO Similar to -OS, instructs the compiler to optimize only for code size, ignoring performance tuning, which can cause code to slow down.
Use the standard

In short, use the standard πŸ‘‡

Second, the pointer

Now let’s look at Pointers. How does it read and write in assembly? As we all know, Pointers point to addresses, so let’s first look at the basic point of pointer correlation πŸ‘‡

2.1 Basic Pointers

2.1.1 Pointer width

The width of the pointer (also known as the step in Swift) is 8 bytes, which means the size of the pointer in memory is 8 bytes. For example, the following example prints πŸ‘‡

Void function() {// Int *a; printf("%lu",sizeof(a)); } int main(int argc, char * argv[]) { function(); return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class])); }Copy the code

See the compilation of function() at πŸ‘‡

The sizeof operator yields a constant 0x8 in assembly, which is 8 in decimal.

2.1.2 Operation of Pointers

Pointer + +

  • int*++
int *a;
a = (int *)100;
a++;
Copy the code

The result is πŸ‘‰ 104, since int takes up 4 bytes, the pointer +1 is shifted 4 bytes at a time.

  • char*++
char *a;
a = (char *)100;
a++;
Copy the code

The result πŸ‘‰ 101, as above, char is only 1 byte.

  • Secondary pointer ++ :
int **a;
a = (int **)100;
a++;
Copy the code

πŸ‘‰ 108, a is a 2-level pointer to a pointer, which can be regarded as int * (*a), and int * is a pointer type of 8 bytes, so a++ translation is +8, which is 108.

Pointer +

int **a;
a = (int **)100;
a = a + 1;
Copy the code

It is obvious that a is a second-order pointer to an int*, so +1 is translated +8, resulting in 108.

⚠️ Note: A = a+ 1 is equivalent to a++. The ++ (increment) and — (decrement) operators are compiler dependent.

Pointer –

int *a;
a = (int *)100;
int *b;
b = (int *)200;
int x = a - b;
Copy the code

Results of x πŸ‘‰ -25, why?

  • Pointer operations and pointing toData type width(step size) about
  • Pointer to theOperation unitIs the data type of the executionThe width of the
  • The structure of the bodyandBasic types of Can'tCast, common types can pass&

Based on the above 3 points, then a = 100/4 = 25, b = 200/4 = 50, so x = a-b = 25-50 = -25.

2.2 Disassembly of Pointers

Take a look at the following example πŸ‘‡

void func() {
    int* a;
    int b = 10;
    a = &b;
}
Copy the code

Assembly πŸ‘‡

[sp, #0x8] [sp, #0x8] [sp, #0x8]

Arrays and Pointers

Finally, let’s look at an example of a common scenario πŸ‘‰ arrays and Pointers πŸ‘‡

Void function() {int arr[5] = {1,2,3,4,5}; //int *a == &arr[0] == arr int *a = arr; for (int i = 0; i < 5; i++) { printf("%d\n",arr[i]); printf("%d\n",*(arr + i)); printf("%d\n",*(arr++)); printf("%d\n",*(a++)); }}Copy the code

Obviously, *(arr++) will return an error. int *a = arr; And then a++ so you don’t get an error.

Array names are the same as pointer variable names, the only difference being that one is a constant and the other is a variable.

So, int *a == &arr[0] == arr.

2.3 Basic usage of Pointers

If you look at the following example, what’s the problem? πŸ‘‡

void function() {
    char *p1;
    char c = *p1;
    printf("%c",c);
}
Copy the code

Assembly πŸ‘‡

The P1 pointer is not initialized and will not report an error at compile time, but will report an error at run time. In iOS, the pointer P1 is not initialized, so the system default value is 0.

##### pointer to char +0 look at this example πŸ‘‡

void func() {
    char *p1;
    char c = *p1;
    char d = *(p1 + 0);
}
Copy the code

Assembly πŸ‘‡

ASMPrj`func: 0x104b661bc <+0>: sub sp, sp, #0x10 ; 0x104b661C0 <+4>: LDR x8, [sp, #0x8] //c = [x8] to w9 -> 0x104B661C4 <+8>: ldrb w9, [x8] 0x104b661c8 <+12>: strb w9, [sp, #0x7] 0x104b661cc <+16>: LDR x8, [sp, #0x8] //d = [x8] to w9 0x104B661d0 <+20>: LDRB w9, [x8] 0x104B661d4 <+24>: strb w9, [sp, #0x6] 0x104b661d8 <+28>: add sp, sp, #0x10 ; =0x10 0x104b661dc <+32>: retCopy the code

Each time LDR x8, [sp, #0x8], sp refers to the address at the top of the stack. After shifting 0x8 bytes from the top of the stack, it stores x8, then LDRB w9, [x8], stores the value in x8 address into W9, so it can be seen that c and D are the same.

Pointer to char +1

To change the code, let’s +1πŸ‘‡

void func() {
    char *p1;
    char c = *p1;
    char d = *(p1 + 1);
}
Copy the code

Assembly πŸ‘‡

ASMPrj`func:
    0x1041f21bc <+0>:  sub    sp, sp, #0x10             ; =0x10 
    //p1
    0x1041f21c0 <+4>:  ldr    x8, [sp, #0x8]
    //c
->  0x1041f21c4 <+8>:  ldrb   w9, [x8]
    0x1041f21c8 <+12>: strb   w9, [sp, #0x7]
    0x1041f21cc <+16>: ldr    x8, [sp, #0x8]
    //d
    0x1041f21d0 <+20>: ldrb   w9, [x8, #0x1]
    0x1041f21d4 <+24>: strb   w9, [sp, #0x6]
    0x1041f21d8 <+28>: add    sp, sp, #0x10             ; =0x10 
    0x1041f21dc <+32>: ret  
Copy the code

The line of d becomes LDRB w9, [x8, #0x1], 0x1 is decimal 1, because char is 1 byte in size, so D is shifted by 1 byte relative to C.

Pointer to int +1

Next we change the char type to int πŸ‘‡

void func() {
    int *p1;
    int c = *p1;
    int d = *(p1 + 1);
}
Copy the code
ASMPrj`func:
    0x1040e61bc <+0>:  sub    sp, sp, #0x10             ; =0x10 
    //p1 [x8]
    0x1040e61c0 <+4>:  ldr    x8, [sp, #0x8]
    //c
->  0x1040e61c4 <+8>:  ldr    w9, [x8]
    0x1040e61c8 <+12>: str    w9, [sp, #0x4]
    0x1040e61cc <+16>: ldr    x8, [sp, #0x8]
    //d
    0x1040e61d0 <+20>: ldr    w9, [x8, #0x4]
    0x1040e61d4 <+24>: str    w9, [sp]
    0x1040e61d8 <+28>: add    sp, sp, #0x10             ; =0x10 
    0x1040e61dc <+32>: ret    
Copy the code

Because char is 1 byte and int is 4 bytes, this becomes LDR w9, [x8, #0x4], where 0x4 is 4 in decimal, so d is shifted by 4 bytes relative to C.

Pointer to a pointer to int +1

Continue to upgrade difficulty πŸ‘‡

void func() {
    int **p1;
    int *c = *p1;
    int *d = *(p1 + 1);
}
Copy the code
ASMPrj`func:
    0x1041821b8 <+0>:  sub    sp, sp, #0x20             ; =0x20 
    //p1 [x8]
    0x1041821bc <+4>:  ldr    x8, [sp, #0x18]
    //c
->  0x1041821c0 <+8>:  ldr    x8, [x8]
    0x1041821c4 <+12>: str    x8, [sp, #0x10]
    0x1041821c8 <+16>: ldr    x8, [sp, #0x18]
    //d
    0x1041821cc <+20>: ldr    x8, [x8, #0x8]
    0x1041821d0 <+24>: str    x8, [sp, #0x8]
    0x1041821d4 <+28>: add    sp, sp, #0x20             ; =0x20 
    0x1041821d8 <+32>: ret  
Copy the code

D becomes LDR x8, [x8, #0x8] becomes 0x8, why? πŸ‘‰ previously we have analyzed that P1 is a 2-level pointer, which is itself a pointer, and the pointer is 8 bytes in size, so P1 + 1 πŸ‘‰ is shifted according to the width of P1 type, of course, the shift is 8 bytes in size.

⚠️ Note: here the stack space is stretched #0x20, involving Pointers that need at least 16 bytes, and to maintain 16-byte alignment, stretched at least 32 bytes, so #0x20.

Pointer to pointer

An example of a pointer operation, and of course a pointer pointer πŸ‘‡

void func() {
    char **p1;
    char c = **p1;
}
Copy the code
ASMPrj`func: 0x102cf61c4 <+0>: sub sp, sp, #0x10 ; =0x10 // initial value 0x102CF61C8 <+4>: LDR x8, [sp, #0x8] ldrb w9, [x8] 0x102cf61d4 <+16>: strb w9, [sp, #0x7] 0x102cf61d8 <+20>: add sp, sp, #0x10 ; =0x10 0x102cf61dc <+24>: retCopy the code

LDR is the second level pointer addressing.

Pointer & pointer mixed offset

The last one πŸ‘‡

void func() {
    char **p1;
    char c = *(*(p1 + 2) + 2); 
}
Copy the code

Will there be any problems with the operation? Please see πŸ‘‡

We analyzed the cause of the error above. Note that char c = *(*(p1 + 2) + 2) corresponds to the assembly πŸ‘‡

ldr    x8, [x8, #0x10]
ldrp   w9,[x8, #0x2]
Copy the code

πŸ‘‰ p1 offset (2 * 8) + (2 * 1(char width)) πŸ‘‡

void func() {
    char **p1;
    char c = *(*(p1 + 2) + 2); 
    char c2 = p1[1][2];
}
Copy the code

C knows how much it shifted. What about c2? Leave it to your own analysis!

⚠️ Tip: P1 [1][2] is equivalent to *(*(P1 + 1) + 2)

C. OC disassembly

Next look at the third point πŸ‘‰ OC disassembly, as usual, the previous example code πŸ‘‡

//LGPerson.h
@interface LGPerson : NSObject

@property (nonatomic, copy) NSString *name;
@property (nonatomic, assign) int age;

+ (instancetype)person;

@end

//LGPerson.m
#import "LGPerson.h"

@implementation LGPerson

+ (instancetype)person {
    return [[self alloc] init];
}

@end
Copy the code

Call πŸ‘‡ in main.m

#import "LGPerson.h"

int main(int argc, char * argv[]) {
    LGPerson *person = [LGPerson person];
    return 0;
}
Copy the code

Then look at the compilation πŸ‘‡

As we all know, objc_msgSend defaults to two parameters, self and CMD, of type ID and SEL, respectively. Next, we verify πŸ‘‡ against the address of the assembly

  • 0x1006ea000 <+24>: adrp x8, 3After executing this sentence,3Shift to the leftthree -> 0x300Plus the address of x80x1006ea000 -> 0x1006ed000
  • add x8, x8, #0x6a0After executing this sentence, the x8 address is0x1006ed6a0
  • Then look at0x1006ed6a0The first 8 bits (the first input parameter id type, which is a pointer, takes 8 bits) πŸ‘‡

Sure enough, LGPerson, because [LGPerson person] is a class method, so the first entry is LGPersonπŸ‘πŸ‘πŸ‘, and similarly, the second entry SELπŸ‘‡

Sure enough, the method name is person. The following is the LLDB view command code πŸ‘‡ (you can manually debug verification again)

(lldb) x 0x1006ed6a0 0x1006ed6a0: 90 d7 6e 00 01 00 00 00 40 d7 6e 00 01 00 00 00 .. n..... @.n..... 0x1006ed6b0: c8 d6 6e 00 01 00 00 00 08 00 00 00 08 00 00 00 .. n............. (lldb) po 0x01006ed790 LGPerson (lldb) x 0x1006ed670 0x1006ed670: fc cb 9c 64 02 00 00 00 da d8 8c 64 02 00 00 00 ... d....... d.... 0x1006ed680: 40 91 fb 70 02 00 00 00 50 40 fb 70 02 00 00 00 @.. p.... [email protected].... (lldb) po 0x02649ccbfc 10277932028 (lldb) po (SEL)0x02649ccbfc "person"Copy the code

Moving on, let’s go to the person method and assemble πŸ‘‡

As you can see from the above figure, the objc_alloc method is followed by the objc_msgSend method

⚠️ : This field is related to the lowest supported version.

  • iOS9πŸ‘‰ forobjc_msgSend ε’Œ objc_msgSendThe correspondingallocandinit.
  • iOS11πŸ‘‰ forobjc_alloc ε’Œ objc_msgSendHere,Optimize the allocDirectly calledobjc_alloc, no callobjc_msgSend.
  • iOS13πŸ‘‰ forobjc_alloc_init, here at the same timeOptimized alloc and init.

Next, let’s look at a compilation of the return values for [LGPerson person] πŸ‘‡

objc_storeStrong

The breakpoint continues with πŸ‘‡

Above we notice that the objc_storeStrong function (which is called in OC with the strong modifier) has a reference count of +1 if it is referenced externally, otherwise it is destroyed.

In objC4-818.2 source objc_storeStrong source (in nsobject.mm)πŸ‘‡

void
objc_storeStrong(id *location, id obj)
{
    id prev = *location;
    if (obj == prev) {
        return;
    }
    objc_retain(obj);
    *location = obj;
    objc_release(prev);
}
Copy the code

This function takes two arguments, id* and id. The purpose of this function is to retain + 1 on the strong object and release on the old object.

Following our assembly code analysis of the LGPerson example, we call πŸ‘‡ in ViewDidLoad

- (void)viewDidLoad {
    [super viewDidLoad];
    
    LGPerson *person = [LGPerson person];
}
Copy the code

This assembly code is much simpler than main πŸ‘‡

The code in the red box above is a call to objc_storeStrong πŸ‘‡

0x100f99a8c <+104>: add x8, sp, #0x8; STR x0, [sp, #0x8] 0x100f99a94 <+112>: x0, [sp, #0x8] 0x100f99a94 <+112> Mov x0, x8 0x100f99a98 <+116>: mov x8, #0x0 Mov x1, x8 //objc_storeStrong The first argument is x0, the value is &person, the second argument is x1, the value is 0x0 0x100F99aa0 <+124>: bl 0x100f9a450; symbol stub for: objc_storeStrongCopy the code

Based on the above analysis of the assembly code, the procedure for calling objc_storeStrong is equivalent to πŸ‘‡

// pass &person and 0x0 void objc_storeStrong(id *location, id obj) {id prev = *location; //id prev = *person if (obj == prev) { return; } objc_retain(obj); // nil *location = obj; // location points to the second input parameter, obj, nil objc_release(prev); Release person, free heap space}Copy the code

So here objc_storeStrong is called to free the object.

Tool disassembly

Because OC code is complex in most cases, it can be difficult to analyze on your own. We usually use tools to assist disassembly, such as MachoView,Hopper,IDA.

Modify the code slightly πŸ‘‡

#import "LGPerson.h"

int main(int argc, char * argv[]) {
    LGPerson *person = [LGPerson person];
    person.name = @"cat";
    person.age = 1;
    return 0;
}
Copy the code

Open the macho file via Hopper at πŸ‘‡

You can see that Hopper has automatically resolved method names and arguments, so how does the compiler do that? Double-click objc_cls_ref_LGPerson to jump to the corresponding address πŸ‘‡

Then go to MachoView to find the corresponding address 00000001000096b0 πŸ‘‡

Similarly, check out setName setAgeπŸ‘‡

The corresponding value in machoView is πŸ‘‡

And you can see that all the methods are here. So when you analyze the assembly code you can find these strings by address, and that’s why you can restore them, so called decompilation.

Block disassembly

Finally, let’s look at Block disassembly. Sample πŸ‘‡

int main(int argc, char * argv[]) {
    void(^block)(void) = ^() {
        NSLog(@"block test");
    };
    block();
    return 0;
}
Copy the code

View the compilation πŸ‘‡

The implementation of block is invoke with address 0x102C4E160. The block source code is defined as follows (block_private.h)πŸ‘‡

struct Block_layout { void *isa; //8 bytes volatile int32_t flags; // contains ref count //4 bytes int32_t reserved; //4 bytes BlockInvokeFunction invoke; struct Block_descriptor_1 *descriptor; // imported variables };Copy the code

An ISA translation of 16 bytes is an invoke, which we can view with the LLDB directive πŸ‘‡

Then let’s look at hopper πŸ‘‡

Double-click 0x00000001000060cc again to jump to the Invoke implementation πŸ‘‡

StackBlock

The example above is GlobalBlock GlobalBlock, now let’s look at StackBlock StackBlock πŸ‘‡

int main(int argc, char * argv[]) {
    int a = 10;
    void(^block)(void) = ^() {
        NSLog(@"block test:%d",a);
    };
    block();
    return 0;
}
Copy the code

Assembly πŸ‘‡

LLDB check isa and Invoke πŸ‘‡

(lldb) po 0x100a8c000 <__NSStackBlock__: 0x100a8c000> signature: "<unknown signature>" (lldb) x 0x100a8c000 0x100a8c000: 30 88 ae df 01 00 00 00 94 3f c5 89 01 00 00 00 0........ ? . 0x100a8c010: 00 00 00 00 00 00 00 00 24 00 00 00 00 00 00 00 ........ $... (lldb) po 0x01dfae8830 __NSStackBlock__ (lldb) dis -s 0x100a8a140 TestOC&BlockASM`__main_block_invoke: 0x100a8a140 <+0>: sub sp, sp, #0x30 ; =0x30 0x100a8a144 <+4>: stp x29, x30, [sp, #0x20] 0x100a8a148 <+8>: add x29, sp, #0x20 ; =0x20 0x100a8a14c <+12>: stur x0, [x29, #-0x8] 0x100a8a150 <+16>: str x0, [sp, #0x10] 0x100a8a154 <+20>: ldr w8, [x0, #0x20] 0x100a8a158 <+24>: mov x0, x8 0x100a8a15c <+28>: adrp x9, 2Copy the code

The IMP implementation of Invoke looks at the assembly implementation through dis-S.

πŸ‘‡ in the hopper

Look at the implementation of block πŸ‘‡

Different from global blocks:

  • global blocktheblockanddescriptorIs in thetogetherthe
  • stack blockandNot together

conclusion

  • Compiler optimization
    • DebugMode isNone [-O0]
    • ReleaseMode isFastest, Smallest [-Os]
  • Pointer to the
    • Pointer to theThe width of the(also can be saidStep length) for8 bytes
    • Pointer operation
      • Pointer operations and pointing toData type width(step size) about
        • + + since - the decrementIs based on the type of the variable modified by the pointerThe width of theDecision of the
        • Level 2 Pointers+ 1The width of the pointer (8 bytes)
      • Pointer to theOperation unitIs the data type of the executionThe width of the
      • The structure of the bodyandBasic types of Can'tCast, common types can pass&
    • Disassembly of Pointers
      • Pointers are 8 bytes in memory, for example[sp, #0x8]It’s actually a pointer, in memoryBy 8 from 0 ~ 0 x10Save a pointer
      • An array ofandPointer variable nameisThe sameUnique, uniqueThe difference betweenIs one isconstant, one isvariable.
      • Basic usage of Pointers
        • Pointer + 0 + 1It’s all based on the pointerThe width of the variable type pointed toDecision of the
        • The secondary pointer, assembly willPerform LDR twice -> Addressing 2 times
  • OC disassembly
    • objc_msgSendThere are two parameters by defaultself andcmd, respectively stored inx0andx1In the register
    • alloc & initAnd what the current App supportsMinimum versionThe relevant
      • iOS9πŸ‘‰ forobjc_msgSend ε’Œ objc_msgSendThe correspondingallocandinit
      • iOS11πŸ‘‰ forobjc_alloc ε’Œ objc_msgSendHere,Optimize the allocDirectly calledobjc_alloc.There is nocallobjc_msgSend
      • iOS13πŸ‘‰ forobjc_alloc_initHere,Both alloc and init are optimized
    • objc_storeStrong
      • There are two parameters id* ε’Œ id
      • An object whose purpose is to modify strongretain + 1forRelease the old object
  • Block the disassembly
    • A block is a structure at the bottomBlock_layout
    • Block_layoutThe first member of theIsa pointerThrough theShift 16 bytesAvailable membersinvokeThat blockRealize the imp
      • Imp implementationThe LLDB directive is availabledis -sView assembly implementation
    • GlobalBlock ε’ŒStackBlockThe difference between
      • global blocktheblockanddescriptorIs in thetogetherthe
      • stack blockandNot together