This is the third day of my participation in the More text Challenge. For details, see more text Challenge

IOS underlying principles + reverse article summary

The main purpose of this article is to understand the storage of global variables and constants in assembly, and how to restore assembly code like if and while to high-level code

The global variable

Before this first need to understand the partition of memory, this is not particularly clear, suggest to see this article iOS- underlying principle 24: Memory five areas, the following is a simple summary of the description

  • Code area: store code, readable, executable

  • Stack: store parameters, local variables, temporary data, read and write

  • Heap area: dynamically applied by developers, variable size, readable and writable

  • Global variables: can be read and written

  • Constant: read-only

Case analysis

Define a function and a global variable in main.m

int g = 12;

int func(int a, int b){
    printf("haha");
    int c = a + g;
    return c;
}


int main(int argc, char * argv[]) {
    
    func(1, 2);
}
Copy the code
  • The func function runs at a breakpoint. Here is the assembly code for the main function

  • Look at the func assembly code and analyze it as follows

– Check whether X0 is “haha”, which can be verified by debugging.X0 saves the address of haha – View its address:x 0x000000010098bf9f, which belongs to the constant area of the string (that is, the left is the ASCII code of the right string)

Adrp x0,1 and Add x0,x0,#0xf9f

  • adrpInstruction (Address page) :
    • Shift the value of 1 12 bits to the left. The 1 is binary
    • Add the value of the PC register (need to clear the low 12 bits of the PC first)
<! -- (Addressing by page) --> <! --adrp--> 0x10098a824 <+20>: Adrp x0, 1-1) 1 left shift 12 bits: 0x1000-2) PC register low 12 bits clear: 0x10098A000-3) Add PC register value: 0x10098A000 + 0x1000 = 0x10098B000 ===> (0x10098A000 + 0x1000 = 0x10098B000) --add--> 0x10098a828 <+24>: add x0, x0, #0xf9f ; = 0xf9f-adRP The obtained address plus offset: 0x10098B000 + 0xf9F = 0x10098BF9F ===> X0 is the address of a section of code on a page, that is, the address of the current section of codeCopy the code

The result of this calculation is consistent with the x0 address debugger above

According to? : the size of a page is 4096, and 0xFFF is 4095, plus 1 is 0x1000 (i.e. 4096), so 1 is shifted 12 bits to the left to get the first address of a page. The macOS pageSize is 4k (0x1000), and the iPhone pageSize is 16K (0x4000), but 16 is still a multiple of 4. Adrp is compatible with both MAC and iPhone, so the page is still located.

  • Continue to analyzebl printfThe following assembly code

ldur w8, [x29, #-0x4]: Takes out data in the stack, that is, 1adrp + add + ldr0x10098CE98 memory address 0x10098CE98 memory address 0x10098CE98 So I have the global variable g

Disassembly analysis

The sample code is as follows

int g = 12;

int func(int a, int b){
    printf("haha");
    int c = a + g + b;
    return c;
}


int main(int argc, char * argv[]) {
    func(10, 20);
}
Copy the code

Hopper is used for disassembly analysis

  • First compile the project:CMD+B
  • Enter the App’s package

  • Drag the executable from Step 5 into hopper for analysis

  • Search for func in Hopper

  • Copy func assembly code and restore it to high-level language code (i.e., disassembly)
<! Int gl = 12; int gl = 12; Int func2(int a, int b){int func2(int a, int b){ #0x10] 0000000100006814 stur w0, [x29, #0x10] 0000000100006814 stur w0, [x29, #-0x4] 0000000100006818 STR w1, [sp, #0x8] #0x100007000 0000000100006820 add x0, x0, #0xf9f ; "haha" 0000000100006824 bl imp___stubs__printf */ printf("haha"); /* 0000000100006828 ldur w8, [x29, #-0x4] */ int w8 = a; /* //===> Obtain data 000000010000682c adRP x9, #0x100008000 0000000100006830 Add x9, x9, #0xe98; _g */ // int gl = 12; /* 0000000100006834 LDR w10, x9 */ int w10 = gl; /* 0000000100006838 add w8, w8, w10 */ w8 += w10; /* 000000010000683c ldr w10, [sp, #0x8] */ w10 = b; /* 0000000100006840 add w8, w8, w10 */ w8 += w10; /* 0000000100006844 str w8, [sp, #0x4] 0000000100006848 ldr w8, [sp, #0x4] 000000010000684c mov x0, x8 */ return w8; /* // End of a function 0000000100006850 LDP x29, x30, [sp, #0x10] 0000000100006854 add sp, sp, #0x20 0000000100006858 ret */} <! Int gl = 12; int func2(int a, int b){ printf("haha"); int w8 = a; int w10 = gl; w8 += w10; w10 = b; w8 += w10; return w8; } <! --> int gl = 12; int func2(int a, int b){ printf("haha"); return a + b + gl; }Copy the code

The simplification process is shown in the figure below (Note: yesRestore from the bottom up, rather than top down (business logic is executed from top to bottom) :

Among them

000000010000681c ADRP x0, #0x100007000 0000000100006820 add x0, x0, #0xf9f at this time, the obtained data of address 0x100007F9f has no VALUE of ASLRCopy the code
  • In the hopper byGTo find0x100007f9fCorresponding data

The same goes for getting the global variable g

//===> Obtain 0x100008E98 data 000000010000682c ADRP x9, #0x100008000 0000000100006830 Add x9, x9, #0xe98; _g 0000000100006834 ldr w10, x9Copy the code

conclusion

  • When fetching global variables and constants, the adRP and ADD directives get an address

  • ADRP (Address Page)

    • Adrp x0, 1
      • Clear the low 12 bits of the PC register to zero

      • If I shift the value of 1 12 bits to the left, that’s 0x1000 in hexadecimal

      • The above two results are added to the X0 register

  • The offset value in memory for this page is obtained by the ADD instruction

conditions

You have the following code, see its assembly

int g = 12;
void func(int a, int b){
    if (a > b) {
        g = a;
    }else{
        g = b;
    }
}

int main(int argc, char * argv[]) { 
    func(1, 2);
}
Copy the code

View the assembly through Hopper, which looks like this

_func: ==> stretch stack space 0000000100006828 sub sp, sp, #0x10; STR w0, [sp, #0xc] 0000000100006830 STR w1, [sp, #0xc] 0000000100006830 STR w1, [sp, LDR w8, [sp, #0xc] 0000000100006838 LDR w9, [sp, #0x8] ==> compare w8, w9, 000000010000683c CMP w8, w9 // If it is less than or equal to loc_100006858 // if it is more than, Go to 0000000100006840 B.let loc_100006858 0000000100006844 LDR w8, [sp, #0xc] 0000000100006848 ADRP x9, #0x100008000 000000010000684c add x9, x9, #0xe90 ; _g 0000000100006850 STR w8, x9 // Hard hop to avoid code less than or equal to loc_100006868 0000000100006854 b loc_100006868 loc_100006858: 0000000100006858 ldr w8, [sp, #0x8] ; CODE XREF=_func+24 000000010000685c adrp x9, #0x100008000 0000000100006860 add x9, x9, #0xe90 ; _g 0000000100006864 str w8, x9 loc_100006868: 0000000100006868 add sp, sp, #0x10 ; CODE XREF=_func+44 000000010000686c retCopy the code

This is typicalif-elseWith hopper to view its assembly code as follows

Restore the above assembly code

<! --1, restore --> int cc = 12; Void func2(int a, int b){void func2(int a, int b){void func2(int a, int b){void func2(int a, int b){ [sp, #0x8] //0000000100006834 LDR w8, [sp, #0x8] #0xc] //0000000100006838 ldr w9, [sp, #0x8] int w8 = a; int w9 = b; //000000010000683c CMP w8, w9 //// if it is less than or equal to, Jump to loc_100006858 and execute, if it is greater than, If (w8 > w9) {// greater than //0000000100006844 LDR w8, [sp, #0xc] //0000000100006848 adrp x9, #0x100008000 //000000010000684c add x9, x9, #0xe90 ; _g //0000000100006850 str w8, x9 cc = w8; // In this case, w8 is a //// hard hop, to avoid the code less than or equal to loc_100006868 //0000000100006854 b loc_100006868}else{// Less than or equal to // loc_100006858: //0000000100006858 ldr w8, [sp, #0x8] //000000010000685c adrp x9, #0x100008000 //0000000100006860 add x9, x9, #0xe90 ; _g //0000000100006864 str w8, x9 cc = w8; // loc_100006868: //0000000100006868 add sp, sp, #0x10 //000000010000686c ret} <! Int cc = 12; Void func2(int a, int b){if (a > b){// > cc = a; }else{// less than or equal to cc = b; }}Copy the code

CMP (Compare) comparison instruction

  • CMPCombine the contents of one register with the contents of another registerThe contents or immediate numbers are compared, but the results are not stored, just the correct change flags(CMP is followed byB.LE, the else condition)
  • Generally, CMP will jump after judgment, followed by B instruction
    • BL label: Jump to the label

    • B. Lt Label: If the comparison result is less than (less than), run the label. Otherwise, skip to the switch

    • B. Label: If the comparison result is less than or equal to (less than or equal to), run the label

    • B. Gt Label: the comparison result is greater than (greater than). If the comparison result is greater than (greater than), run the label

    • B.G label: the comparison result is greater than or equal to (greater than or equal to). If the comparison result is greater than or equal to, run the label

    • B. Q Label: If the comparison result is equal to, run the label. Otherwise, skip to

    • B. Ne label: If the comparison result is not equal, run the label. Otherwise, the jump does not occur

    • B. I label: If the comparison result is unsigned greater than, label. Otherwise, skip to

    • B. HSS Label: If the comparison result is unsigned or greater than or equal to, label. Otherwise, skip to

cycle

Loop commonly used are for, while, do-while, the following to analyze one by one

Do – while analysis

Examine the code for the following do-while

int main(int argc, char * argv[]) {
    int sum = 0;
    int i = 0;
    do{
        sum += 1;
        i++;
    }while (i<100);
}
Copy the code
  • View the assembly through Hopper

  • The assembly ends as shown below

Conclusion: do-while loop: judge the condition after, satisfy the condition to jump out

While loop analysis

int main(int argc, char * argv[]) { int sum = 0; int i = 0; while (i<100){ sum += 1; i++; }}Copy the code

The assembly is shown in the figure

Conclusion: while loop: the condition is inside, and if not, jump out

For loop analysis

int main(int argc, char * argv[]) { int sum = 0; for (int i = 0; i < 100; i++) { sum += 1; }}Copy the code

This is the same thing as the while assembly

Conclusion: A for loop is like this: the condition is inside, and if not, it jumps out

conclusion

Global variables and constants

  • When fetching global variables and constants, the adRP and ADD directives get an address

  • ADRP (Address Page)

    • Adrp x0, 1
      • Clear the low 12 bits of the PC register to zero

      • Move the value of 1 12 bits to the left

      • The above two results are added to the X0 register

  • The offset value in memory for this page is obtained by the ADD instruction

conditional

  • CMPCombine the contents of one register with the contents of another registerThe contents or immediate numbers are compared, but the results are not stored, just the correct change flags(CMP is followed byB.LE, the else condition)
  • Generally, CMP will jump after judgment, followed by B instruction
    • BL label: Jump to the label

    • B. Lt Label: If the comparison result is less than (less than), run the label. Otherwise, skip to the switch

    • B. Label: If the comparison result is less than or equal to (less than or equal to), run the label

    • B. Gt Label: the comparison result is greater than (greater than). If the comparison result is greater than (greater than), run the label

    • B.G label: the comparison result is greater than or equal to (greater than or equal to). If the comparison result is greater than or equal to, run the label

    • B. Q Label: If the comparison result is equal to, run the label. Otherwise, skip to

    • B. Ne label: If the comparison result is not equal, run the label. Otherwise, the jump does not occur

    • B. I label: If the comparison result is unsigned greater than, label. Otherwise, skip to

    • B. HSS Label: If the comparison result is unsigned or greater than or equal to, label. Otherwise, skip to

cycle

  • Do -while loop: judge the condition is behind, satisfy the condition to jump out

  • A for loop is like a while loop: the condition is inside, and if not, it jumps out