Conditional statements

We learn assembly code directly from the compiler:

int g = 12;

void func(int a, int b){
    if (a > b) {
        g = a;
    } else {
        g = b;
    }
}

func(1.2);
Copy the code

Let’s look at assembly code:

We found a few strange assembly instructions, let’s learn one by one:

adrp

Let’s take a look at adr’s explanation:

This is a small range of address read instruction, it will be based on PC relative offset address read into the target register; Use format: ADR Register exper

After adding p, it becomes offset in the form of page. The size of a page is 0x1000, i.e. 4K. Of course, 4k here is a mnemonic, does not mean that the current device memory size is 4K, we run the command pagesize on the MAC terminal:

We see that a page of memory on the current device is exactly 4k on the MAC, but 16K on the phone, which can be tried on a jailbroken phone.

So how does ADRP calculate the address?

  • We first get the address of the memory page where the PC register is located, and the command address of adRP is0x10472ddf4, then the current memory page address is0x10472d000, as long as the last three bits are 0, is the current memory page address.
  • The second step is offset by page, which is 88 * 0x1000
  • Finally, we add the base address to the offset address to get the target value of the command, i.e0x10472d000 + 8 * 0x1000And finally get0x104735000

0x104735000 = 0x3d0 = 0x1047353d0 = 0x1047353d0

Let’s first subtract the offset of ASLR 0x0000000104728000:

The result is 0xd3d0.

Global variable value 12 was found, verified correctly.

cmp

The comparison instruction, in essence, performs subtraction operation and does not change the value being compared, but updates the value of the status register CPSR to provide a basis for the following B instruction.

CMP compares the contents of one register with the contents or immediate number of another register. But no results are stored, just correct change flags. The general CMP will jump after finishing the judgment, usually followed by the B command!

B instruction

B itself stands for jump, and if it is followed by other symbols, there will be other operations:

  • Bl: skip to the label and store the address value of the next instruction in the LR register for the function to return
  • B. Gt: If the comparison result is greater than, execute the label. Otherwise, the switch will not be jumped
  • B. Gee: If the comparison result is greater than or equal to, perform the label. Otherwise, the jump is not performed
  • B. T: If the comparison result is less than, perform the label. Otherwise, no jump is performed
  • B. Lee: If the comparison result is less than or equal to, perform the label. Otherwise, no jump is performed
  • B. eq: If the comparison result is equal to, execute the label. Otherwise, no jump is performed
  • B. Hi: If the comparison result is greater than or equal to unsigned, execute the label. Otherwise, no jump will be performed
  • B. Hs: If the comparison result is greater than or equal to the unsigned value, execute the label. Otherwise, no jump is performed
  • B. Lo: If the comparison result is unsigned less than, execute the label. Otherwise, no jump is performed
  • B. ls: If the comparison result is less than or equal to the unsigned value, execute the label. Otherwise, no jump is performed

The while loop

The do while statement

Let’s write a simple code:

void func() {
    int sum = 0;
    int i = 0;
    do {
        sum += 1;
        i++;
    } while (i < 100);
}
Copy the code

Take a look at the assembly code:

It’s a little bit easier to understand, execute the code first, then execute the CMP instruction when it hits a while statement, compare it to 0x64, which is 100, and if it does, jump to the top and re-execute the code.

There is a new register, WZR, which is special and represents 0.

While statement

Change the above code to a while structure:

void func() {
    int sum = 0;
    int i = 0;
    while (i < 100) {
        sum += 1; i++; }}Copy the code

Take a look at the assembly code:

This is also easy to understand, call the CMP instruction, if it meets the instruction then jump to the next line of the B instruction, our code saysi < 100And the comparison after CMP isb.geIs greater than or equal to, which is equivalent to breaking out of the loop, which depends on the following B instruction.

The for loop

Let’s change this to a for loop:

void func() {
    int sum = 0;
    for (int i = 0; i < 100; i++) {
        sum += 1; }}Copy the code

See assembly:

It’s basically the same as the while loop.

A switch statement

Switch assembly code is split into two forms, one of which is converted to an if else implementation, and the following is converted to an if else implementation:

  • If case statements are less than or equal to 3, thenif else.
  • If the value difference between case statements is too large, becomesif else. How to define too big a difference, let’s just call it dispersion, and I remember when I was in school and I was studying statistics, there was a variance evaluation to determine dispersion, and I guess the compiler would do the same thing.

Let’s write a demo:

void func(int a) {
    switch (a) {
        case 1:
            printf("aaa");
            break;
        case 2:
            printf("bbb");
            break;
        case 3:
            printf("ccc");
            break; 
        default:
            printf("default");
            break; }}Copy the code

Or:

void func(int a) {
    switch (a) {
        case 1:
            printf("aaa");
            break;
        case 10:
            printf("bbb");
            break;
        case 100:
            printf("ccc");
            break;
        case 1000:
            printf("ddd");
            break;
        default:
            printf("default");
            break; }}Copy the code

Let’s look at the assembly:

The red box is the comparison of case statements, the blue box is the jump default statement, and the orange box is the end of each branch end jump function.

In essence, it’s not hard to see how this is implemented using if else, and both of the previous demos do this.

Let’s look at another assembly implementation:

void func(int a) {
    switch (a) {
        case 1:
            printf("aaa");
            break;
        case 4:
            printf("bbb");
            break;
        case 3:
            printf("ccc");
            break;
        case 6:
            printf("ddd");
            break;
        default:
            printf("default");
            break; }}Copy the code

When we call printf function through assembly, we find that the implementation of several cases are connected together (red box). There are two jump places, one is B.Hi and the other is BR, where B.Hi is obviously the jump of default, and the jump place of BR is the register value, that is, the address stored in X9. What is this address?

The principle will be easier to understand: we see that the minimum value of case in the switch is 1, and the maximum value is 6. Then the system will create a table, storing an assembly implementation address for each value 1-6, which is the red box in the screenshot just now. Since the address of the ASLR is different each time it is started, the table contains the offset relative to the table header.

Let’s look at the position of the table header from the above assembly, there are two strange instructions:

  • ubfx x9, x9, #0, #32: This is to take out the x9 register 0-32 bits assigned to x9, the rest of the position fill 0.
  • ldrsw x10, [x8, x11, lsl #2]: LSL is logical shift to the left, so this side is x11 shift to the left by 2 bits, plus X8, take the address value into X10.

Let’s analyze it step by step:

    // add w0 to stack [x29, #-0x4] and store w8
    0x100251d50 <+12>:  stur   w0, [x29, #-0x4]
    0x100251d54 <+16>:  ldur   w8, [x29, #-0x4]
    // Subtract 1 from the value of our argument. This 1 is the minimum number of cases, so w8 represents the index of the table
    0x100251d58 <+20>:  subs   w8, w8, #0x1              ; =0x1 
    //index is assigned the lower 32 bits to x9
    0x100251d5c <+24>:  mov    x9, x8
    0x100251d60 <+28>:  ubfx   x9, x9, #0#,32
    // Compare index with 5, which is the difference between the maximum value and the minimum value of the case, to see if index is in the range 0-5, if not, go to default logic. If the index value is less than 0, it is a negative number. If the index value is less than 0, it is a negative number. If the index value is less than 0, it is a negative number. So all you have to do here is say no sign is greater than 5. It's very clever
    0x100251d64 <+32>:  cmp    x9, #0x5                  ; =0x5 
    // add index to stack [sp]
    0x100251d68 <+36>:  str    x9, [sp]
    // Determine whether unsigned is greater than the jump to the default branch
    0x100251d6c <+40>:  b.hi   0x100251dc8               ; <+132> at ViewController.m
    0x100251de0 = 0x100251de0 = 0x100251de0 Therefore, if the switch is designed as a table structure, the table is immediately followed by the switch function call
    0x100251d70 <+44>:  adrp   x8, 0
    0x100251d74 <+48>:  add    x8, x8, #0xde0            ; =0xde0 
    // Add x11 to index
    0x100251d78 <+52>:  ldr    x11, [sp]
    // LSL #2 = x11; // LSL #2 = x11; // LSL #2 = x11; // LSL #2 = x11
    0x100251d7c <+56>:  ldrsw  x10, [x8, x11, lsl #2]
    // The address of x10 is relative to the header, so add the header address here to get the assembly address of the branch implementation
    0x100251d80 <+60>:  add    x9, x8, x10
    // The branch address of the implementation of the jump
    0x100251d84 <+64>:  br     x9
    ...
    0x100251ddc <+152>: ret  
Copy the code

Let’s get the table header address and see what it looks like:

There are 6 negative numbers in the table, which correspond to 0 to 5, so we can calculate it (the first cell is 0xffffffa8, which is the low address).

You can just find the implementation of the upper branch.

Let’s compare the two implementations of the switch. The first one is if else. From the first one to the last one, it’s clear that the algorithm complexity is O(n). In the second way, you just subtract the minimum case from the value you’re comparing, and you get the branch, which is O(1). Obviously, if n is very large, the second one is much more efficient than the first one, but the second one is much more space complex, trading space for time.

So let’s go back and see how the system balances these two methods, when n is small, this is less than or equal to 3, method 1, because the number is so small, the second algorithm is hardly any faster, and method 1 saves a little space. There are also cases where method 1 is called, because the case values are very discrete, so it opens up a lot of space, just to make the algorithm faster.

The so-called rawValue is a calculated attribute, which is automatically generated by Swift. Therefore, no matter whether rawValue is set as integer type, string type, or floating point type, The bottom store is still 0, 1, 2, 3, 4, 5…… , if we switch to judge swift enum, assembly generation method is most likely method 2.