IOS assembly

IOS assembler voice has a lot of clocks. Common 8086 assembly, ARM assembly, x86 assembly and so on.

Arm assembly

The iOS architecture evolved from armV6 to ARMV7 and ARMV7s, and finally to the present ARM64. Both ARMV6 and later ARMV7, as well as ARM64, are the instruction set of ARM processors. Armv7 and ARMV7s are the architectures used by real 32-bit processors, while ARM64 is the architecture used by real 64-bit processors. The iPhone 5C is the last arm32-bit version of the iPhone, and after the iPhone 5S, all iPhone devices use the ARM64 architecture. Arm64 assembler is used on a real machine as follows:

TestFont`-[ViewController test]:
    0x10286e574 <+0>:  sub    sp, sp, #0x20 ; =0x20
    0x10286e578 <+4>:  mov    w8, #0x14
    0x10286e57c <+8>:  mov    w9, #0xa
    0x10286e580 <+12>: str    x0, [sp, #0x18]
    0x10286e584 <+16>: str    x1, [sp, #0x10]
->  0x10286e588 <+20>: str    w9, [sp, #0xc]
    0x10286e58c <+24>: str    w8, [sp, #0x8]
    0x10286e590 <+28>: add    sp, sp, #0x20 ; =0x20
    0x10286e594 <+32>: ret  

Copy the code

X86 assembly

X86 assembly is the assembly language used by the simulator, and its instructions differ from the syntax of ARM64 assembly, as follows

TestFont`-[ViewController test]:
    0x10b089520 <+0>:  pushq  %rbp
    0x10b089521 <+1>:  movq   %rsp, %rbp
    0x10b089524 <+4>:  movq   %rdi, -0x8(%rbp)
    0x10b089528 <+8>:  movq   %rsi, -0x10(%rbp)
->  0x10b08952c <+12>: movl   $0xa, -0x14(%rbp)
    0x10b089533 <+19>: movl   $0x14, -0x18(%rbp)
    0x10b08953a <+26>: popq   %rbp
    0x10b08953b <+27>: retq   

Copy the code

Why learn ARM64 assembly?

Code debugging

In normal development, if the program crashes while debugging, it will usually locate the specific crashed code. However, sometimes there are some weird crashes, such as the crash in the system library. At this time, it is very difficult to locate the specific cause of the crash. If we use assembly debugging techniques for debugging, we may get twice the result with half the effort.

Reverse debugging

In the process of reversing other people’s App, we can use LLDB to perform breakpoint operation on memory address, but when the breakpoint is executed, LLDB shows us assembly code instead of OC code, so if you want to reverse and dynamically debug other people’s App, you need to learn assembly knowledge.

Introduction to ARM64 assembly

To learn arm64 assembly, you need to start with the following three aspects: registers, instructions and stacks.

register

Arm64 has 34 registers, as follows

Universal register

  • There are 29 64-bit general purpose registers, respectively x0 ~ X28
  • W0 ~ w28 (lower 32 bits of X0 ~ x28)

  • X0 ~ x7 is usually used to store the parameters of the function. If there are more parameters, the stack is used to pass them
  • X0 usually holds the return value of a function
Some people also call X0 ~ x30 as general purpose registers, but in actual use x29 and X30 do not have corresponding lower 32 bit registers W29 and W30, and X29 and X30 registers have special purpose, so I only say X0 ~ x28 as general purpose registersCopy the code

Program counter

The PC (Program Counter) register, which records the address of the instruction currently being executed by the CPU, registers read PC to check the value stored in the register

(lldb) register read pc
      pc = 0x000000010286e588  TestFont`-[ViewController test] + 20 at ViewController.m:28
(lldb) 

Copy the code

The stack pointer

  • sp (Stack Pointer)
  • Fp (Frame Pointer), also known as X29

Link register

The LR (Link Register) Register, also known as the X30 Register, stores the return address of the functionCopy the code

Program status register

The ARM system contains a Current Program Status Register (CPSR) and five backup Program Status registers (SPSR). The backup program status register is used for exception handling.

  • Each bit of the program status register has a specific purpose, and only a few commonly used flag bits are described here

  • N, Z, C and V are all conditional code flag bits, whose contents can be changed by the result of arithmetic or logical operation, and can determine whether a certain instruction is executed. The specific meanings of the conditional code symbols are as follows

instruction

Mov instruction

The MOV instruction can load another register, a shifted register, or an immediate number into the destination register

The actual use of the MOV instruction in ARM64 assembly

  • Create a new test.s file in Xcode and add the following code to the test.s file
; Here. Text means the code is in the text segment. Global means to expose the following method, otherwise it cannot be called externally, and the method name starts with _. Global _test; Here is _test method _test:; Mov instruction, load the immediate number 4 into register X0 mov x0,#0x4mov x1, x0 ; In assembly instructions, RET represents the termination RET of a functionCopy the code
  • Create a new test.h header file in Xcode to expose the _test method in test.s
#ifndef test_h
#define test_h

void test(void);

#endif /* test_h */

Copy the code
  • Call test() in viewDidLoad, and then use Register Read x0 in LLDB to read the value stored in the register
(lldb) register read x0
      x0 = 0x000000010320c980
(lldb) si
(lldb) register read x0
      x0 = 0x0000000000000004
(lldb) register read x1
      x1 = 0x00000001e60f3bc7  "viewDidLoad"
(lldb) si
(lldb) register read x1
      x1 = 0x0000000000000004

Copy the code

By adding breakpoints to the assembly instruction, step by step debugging showed that the values of registers X0 and X1 were changed after the execution of the MOV instruction

ret

The RET instruction represents the return of the function, and it has the very important function of assigning the value of the LR (X30) register to the PC register

  • Call test() in viewDidLoad, set a breakpoint on test(), and execute as follows

  • Use Register Read to view the values of the LR and PC registers
(lldb) register read lr
      lr = 0x00000001021965a4  TestFont`-[ViewController viewDidLoad] + 68 at ViewController.m:23
(lldb) register read pc
      pc = 0x00000001021965a4  TestFont`-[ViewController viewDidLoad] + 68 at ViewController.m:23
(lldb) 

Copy the code

At this point, both the LR register and the PC register are the starting addresses of the test() function

  • Jump to the test() function using the si instruction

  • Looking at the values of the LR and PC registers again, we find that the value of LR becomes the address of the next instruction of the test() function, that is, the next instruction that the main program needs to execute after the test() function has been executed. The PC register holds the address of the current instruction to be executed, as follows
(lldb) register read lr
      lr = 0x00000001021965a8  TestFont`-[ViewController viewDidLoad] + 72 at ViewController.m:24
(lldb) register read pc
      pc = 0x0000000102196abc  TestFont`test

Copy the code
  • After the test() function is executed, it is found that the program jumps to the instruction address stored in the LR register, that is, 0x00000001021965A8. At this time, the value of LR and PC register is checked again, and it is found that the address stored in the PC register has become the address stored in the LR register
(lldb) register read lr
      lr = 0x00000001021965a8  TestFont`-[ViewController viewDidLoad] + 72 at ViewController.m:24
(lldb) register read pc
      pc = 0x00000001021965a8  TestFont`-[ViewController viewDidLoad] + 72 at ViewController.m:24
(lldb) 

Copy the code

The add instruction

The add instruction adds the two operands and stores the result in the target register. The details are as follows

In ARM64 assembly, the corresponding operations are x0 to X28, which execute the following assembly code

.text
.global _test

_test:

mov x0, #0x4
mov x1, #0x3

add x0, x1, x0

ret

Copy the code

X0 = 7; x0 = 7

(lldb) register read x0
      x0 = 0x0000000000000004
(lldb) si
(lldb) register read x1
      x1 = 0x0000000000000003
(lldb) si
(lldb) register read x0
      x0 = 0x0000000000000007

Copy the code

Sub instruction

The sub instruction subtracts operand 1 from operand 2, then subtracts the inverse of the C condition flag in CPSR, and stores the result in the target register

CMP instruction

The CMP instruction compares the contents of one register with the contents of another register or the immediate number, and updates the value of the conditional flag bit in the CPSR register

  • Execute the following assembly code
.text
.global _test

_test:

mov x0, #0x4
mov x1, #0x3

cmp x0, x1

ret

Copy the code
  • The CPSR register values are printed as follows before and after the CMP code is executed
(lldb) register read cpsr
    cpsr = 0x60000000
(lldb) si
(lldb) si
(lldb) si
(lldb) register read cpsr
    cpsr = 0x20000000
(lldb) 

Copy the code

It can be seen that after the CMP operation, the CPSR register value is changed to 0x20000000. After converting to hexadecimal, the 32-bit flag bit is as follows

  • Modify the assembly code by switching the x0 and X1 registers as follows
_test:

mov x0, #0x4
mov x1, #0x3

cmp x1, x0

ret

Copy the code
  • The value of the CPSR register is read again before and after the CMP code executes
(lldb) register read cpsr
    cpsr = 0x60000000
(lldb) s
(lldb) register read cpsr
    cpsr = 0x80000000
(lldb) 

Copy the code

At this point, the CPSR register value becomes 0x80000000, converted to hexadecimal, as follows

B instruction

B instruction is the simplest jump instruction, once encountered B instruction, the program will unconditionally jump to B after the specified target address for execution.

BL instruction

BL instruction is another jump instruction, but before jumping, it will first store the next instruction of the current marker bit in register LR (X30), then jump to the marker and start executing the code. When ret is encountered, the address stored in LR (X30) will be reloaded into the PC register. Causes the program to return the next instruction of the marker bit to continue execution.

  • Start by executing the following assembly code
.text
.global _test

label:
mov x0, #0x1
mov x1, #0x8
ret

_test:
mov x0, #0x4
bl label
mov x1, #0x3
cmp x1, x0
ret

Copy the code
  • The breakpoint to the BL label instruction reads the values of the LR register and PC register

  • Execute bl label, jump to label, read the LR (X30) register and PC register again, at this time, it will find that the address stored in LR (X30) register has changed to mov X1, #0x3 memory address

  • After executing all the code in the label, the program returns to the address stored in the LR register, that is, mov X1, #0x3, and the address stored in the PC register is also changed to the address of mov X1, #0x3.

Addressing mode

The so-called addressing mode is the way that the processor looks for the physical address according to the address information given in the instruction. Currently, ARM supports the following common addressing modes.

Addressing immediately

Register addressing

Register indirect addressing

Base address change address addressing

Multi-register addressing

Relative addressing

Stack addressing

Stack operation

Type of function

  • A leaf function is one in which no other functions are called
  • Non-leaf functions are values in this function that have calls to other functions

Leaf function

Nonleaf function