preface

In daily application development, the main language is Objective(Swift). In some special scenarios, C/C++, JavaScript, Shell, Python and so on May also be used.

So why would an iOS developer need to know such a low-level language as assembly?

Because understand assembly can improve code debugging and reverse ability.

This article is the author of learning assembly process organized notes, divided into two parts: the first part is mainly some basic preparation, the second part introduces Objective C assembly and some reverse Demo.

The command line

Objective C source files (.m) are compiled by Clang + LLVM, Swift source files are compiled by Swift + LLVM.

So with the clang command, we can view the result of a compilation of a.c or.m source file

clang -S Demo.m

Copy the code

This is an x86 assembler, for ARM64 we can use xcRun,

xcrun --sdk iphoneos clang -S -arch arm64 Demo.m

Copy the code

All of the assembly code in this article is based on the ARM64 architecture CPU

What is assembly?

Assembly language is a low-level programming language that operates directly on hardware (CPU, registers, memory, etc.), unlike higher-level languages like Objective C.

Assembly language is still not the lowest level language and cannot be executed directly by the CPU. It needs to be converted into machine language (0101) by the assembler.

Assembly language consists of assembly instructions, each of which directly operates the CPU to perform a series of operations.

A typical assembly statement:

// store integer 0 in register x0mov x0, #0

Copy the code

register

ARM stands for Advanced RISC Machine, which translates as Advanced compact instruction set Machine.

The CPU architecture of iOS devices is based on ARM. For example, you’ve heard the terms arm64, arm7… They all refer to CPU instruction sets. The CPU on the iPhone 5S and later iOS devices is ARM 64.

ARM64 common universal register 31 64bit, named x0-x30, function:

The X series registers are 64-bit and are called W0 to W30 when using only low 32 bits.

  • A Stack Pointer tells you how tall the container is. You can move the Stack Pointer to increase or decrease the size of your container.

  • LR(Link Register, X30) stores the memory address of the next instruction to be executed when the subroutine is called.

  • FP(Frame Pointer, x29), saves the base address of the function stack.

In addition, there are several special registers

  • The PC register stores the address of the next instruction to be executed. Under normal circumstances, THE PC instruction increases by 1 to execute the next hop instruction sequentially, and the PC executes the instruction according to the condition (such as instruction 1, instruction 5, instruction 3 in sequence), which is the theoretical basis of the conditional branch (such as if/while).

  • FLAGS Program state register, which holds several FLAGS. The state will be changed by the data processing instruction, and the conditional branch instruction will read the FLAGS and decide to jump.

  • XZR, and WZR stand for zero registers.

Floating point Numbers

Due to the particularity of floating-point operation, ARM 64 also has 31 floating-point registers q0~ Q31, which are named differently depending on length. B, H, S, D, and Q stand for Byte (8 bits), half(16 bits), Single (32 bits), Double (64 bits), and Quad (128 bits) respectively.

Hello world

As with any tutorial, start by looking at assembly code, the simplest form of assembly. Create a new helloWorld.c:

#include <stdio.h>int main(){    printf("hello, world\n");    return 0;}

Copy the code

Then, the assembler file is generated:

xcrun --sdk iphoneos clang -S -arch arm64 helloworld.c

Copy the code

Also in XCode, Product -> Perform Action -> Assemble generates the assembly file.

Ios_version_min 11, 2. Globl _main. P2align 2_main:; @main; P #0: sub sp, sp, #32; =32 STP x29, x30, [sp, #16]; 8-Byte Spill add X29, SP, #16; =16 stur WZR, [x29, # -1] adrp x0, l_.str@PAGE add x0, x0, l_.str@PAGEOFF bl _printf mov w0, #0 STR w0, [sp, #8]; 4-byte Spill MOV X0, x8 LDP x29, x30, [SP, #16]; 8- Byte Folded Reload add SP, sp, #32; =32 ret. Section __TEXT, __cstring, cstring_literalsl_. STR:; @.str.asciz "Hello, world\n". Subsections_via_symbols

Copy the code

Assembly code several rules:

  • Assembler instructions begin with a. (dot)

Assembler instructions tell the assembler how to generate machine code and can usually be ignored when reading assembly code.

.section __TEXT, __TEXT, regular, pure_instructions

Copy the code

__TEXT (Section) in the Mach-o file __TEXT (Segment) when generating binary code

  • The one that ends with (colon) is the Label

Tags are necessary so that other functions can locate the function by string matching. Local tags starting with a lowercase letter L are used only inside functions.

In ARM, stack memory is allocated from the high address to the low address, so when the pin on the top of the stack moves low, it allocates temporary storage space, and when the pin on the top of the stack moves high, it releases temporary storage space.

Here we first look at @1 and @5. These two parts are symmetric and are called Function prologue and Function prologue respectively.

  • In the method header, memory is allocated on the stack by moving the stack pointer sp down, saving the previous state sp and LR to the top 16 bytes of the stack, and then setting SP (x29) to the top of the stack according to arm64 conventions.

  • At the end of the method, restore sp and LR on the stack to registers, and then move the stack pointer SP up to free memory on the stack.

@2, store the value (0) in the zero register in sp-4byte position to reserve stack space for the return value

At @3, we find the position of the string “hello world” in memory by “page in memory address + offset” and write that value to x0, passing it as an argument to the following method call, which then calls printf

@4, the ARM register does not support writing values directly to memory, so first write the return value 0 to W8, then write w0 to SP +8Byte memory, and finally write x8(storing 0) to x0 as the return value.

Knowledge:

  • LoaD Register (LDR) reads data from memory to Register.

  • STore Register (STR) stores the data village Register to the memory.

A function call

New Max. C

#include <stdio.h>int Max (int a, int b){if(a >=b){return a; }else{ return b; }}int main(){ int a = 10; int b = 20; Int c = Max (a, b); return 0; }

Copy the code

After the assembly is generated, look at the two functions separately

max

main

Here, we see that when the argument is passed in, it is placed in x0, x1:

LDR W0, [SP, #8]; W0ldr w1, [sp, #4]; Sp +4 writes w1bl _max; Call Max

Copy the code

Inside the function, the parameters are read from x0/x1 and temporary variables are stored on the stack

STR w0, [sp, #8]; W0 (a) write to memory sp+8Bytestr w1, [sp, #4]; W1 (variable b) writes sp+4Byte

Copy the code

Arguments to and from functions have the following rules:

  • When the number of function parameters is less than or equal to eight, x0-x7 stores the first eight parameters in sequence

  • When the number of parameters is greater than 8, the excess parameters are passed through the stack

  • Methods typically return data through X0, or x8 if the data structure is large

Call Stack

Let’s look at the method header and method tail of the Hello World assembly:

Sub sp, sp, #32; X29, x30, [sp, #16]; 8-Byte Folded Spilladd X29, SP, #16; =16bl _printf // subroutine call // method end LDP x29, x30, [sp, #16]; 8 -Byte Folded Reloadadd sp, sp, #32; = 32

Copy the code

Methods:

In the figure, green represents the free stack memory and orange represents the stack memory occupied by main. B Main is the state before main is executed.

As can be seen from the figure, the stack space is allocated in the method header, and the previous LR and FP are saved, and FP is then pointed to the high address of the current program.

Subroutine (printf) call

When bl _printf is called, the bl command first copies the next instruction to LR, so that when the program returns, it knows at which instruction to continue execution.

As you can see, printf’s method header is similar to main’s, allocating space, saving main’s LR, FP, and resetting FP to the stack height address.

Methods the tail

When printf returns, the LR and FP of main are first restored from the stack, and then the stack space is freed. As you can see from the figure, FP and SP are the same after the method returns as before printf was called.

After LR is recovered from the stack, the address of the next instruction after main function calls printf is stored in LR, and the CPU reads the instruction in LR and continues to execute main function.

Stack backtrace

Stack backtracking is of great significance to code debugging and crash location. Through the diagram of the previous steps, the principle of stack backtracking is relatively clear.

  • The stack frame of the current function can be obtained by SP and FP, and the address of the current execution can be obtained by PC.

  • Above the FP of the current stack, you can get the Caller’s FP and LR. Through the offset, we can also obtain the Caller SP. Since LR saves the address of Caller’s next instruction, we actually obtain Caller’s PC

  • With the FP, SP, and PC as the Caller, we can obtain the stack frame information as the Caller, so we can recursively obtain no stack frame information.

In the process of stack tracing, we get the address of the function. How do we get the name and offset from the address of the function?

  • For system libraries, like CoreFoundation, we can get them directly from the symbol table of the system

  • For your own code, you rely on the DSYM files generated at compile time.

This process is called symbolicate. For crash logs on iOS devices, XCode’s tool Symbolicatecrash can be used directly:

cd /Applications/Xcode.app/Contents/SharedFrameworks/DVTFoundation.framework/Versions/A/Resources./symbolicatecrash ~/Desktop/1.crash ~/Desktop/1.dSYM > ~/Desktop/result.crash

Copy the code

Of course, you can use the tool dwarfdump to query a function address:

dwarfdump --lookup 0x000000010007528c  -arch arm64 1.dSYM

Copy the code

Piece of code

In this section, we will take a look at the compiled pieces of code commonly encountered in daily development. After accumulating enough pieces of code, it is easier to look at the assembly of the program.

if/else

C code

int a = 10; if (a > 8){ a = a + 1; }

Copy the code

assembly

Mov w8, #8 CMP w8, #8; =8 b.le LBB0_2; Add w8, w8, #1; =1 STR w8, [sp, #8]LBB0_2: #...

Copy the code

A > 8 is realized by CMP instruction, the comparison result is stored in PSR(instruction status register), then read PSR through B. lee (less or equal),

  • If less than or equal to 8, jump directly to LBB0_2 for method return

  • If greater than 8, execute sequentially, inside the if block, and execute a = a + 1

It can be seen that PRS + jump instruction (B.LE) gives assembly great flexibility, code is no longer a layer of constant order execution.

for

C code

int a = 3; while(a < 10){ a = a + 1; }

Copy the code

Assembly code

As you can see, for is similar to if in that it is judged by the condition and then jumps to the specified symbol to execute the code.

Pointer to the

What are Pointers at the assembly level? A pointer is essentially the address of a variable. At the assembly level there are only registers and memory and data

C code

void hello_word(int * a){ *a = 10; }int main(){ int * a; hello_word(a); Printf (" % d ", a); return 0; }

Copy the code

Assembly:

Here are a few core lines of code:

Mov w9, #11; W9add x10, sp, #8; Bmov w11, #10; 10(a) write w11str w11, [sp, #8]; W9, [x10]; w9, [x10]; w9, [x10]; Write the contents of w9 to the address of register x10 (sp+8Byte).

Copy the code
  • As you can see, writing the memory address (SP +8Byte) to X10 implements pointer declaration and memory allocation

  • Write data to the corresponding memory (SP +8Byte) to achieve pointer assignment

The structure of the body

Structures are essentially contiguous memory allocated according to certain rules.

C code

struct point{ float x; float y; }; Struct point makePoint(float x, float y){struct point p; p.x = x; p.y = y; return p; Void logPoint(struct point p){printf("(%.2f, %.2f)", p.x, p.y); }int main(){struct point p = makePoint(1.5, 2.3); logPoint(p); return 0; }

Copy the code

MakePoint assembly:

Registers and stacks operate like a tiger, essentially writing x to s0 and y to S1 as the return value.

LogPoint assembly

As you can see, when the structure is used as a parameter, the parameters are passed in via s0 and s1.

The main assembly

Note that when the structure is too large, arguments and return values are passed through the stack, similar to when a function has too many arguments.

An array of

C code

#include <stdio.h>void logArray(int intArray[], size_t length){for(int I = 0; i < length; I++) {printf (" % d ", intArray [I]); }}int main(){int arr[3] = {1,2,3}; LogArray (arr, 3); return 0; }

Copy the code

The main assembly:

Stack usage:

Through the assembly, we can see several things:

  • Arrays are passed as Pointers to function arguments, as in this example, sp+12Byte addresses are passed to the logArray function as arguments to x0.

  • After compilation, a ___stack_chk_guard is inserted at the top and bottom of the variable area, and after execution, the ___stack_chk_guard on the stack is checked to see if it has been modified. If it has been modified, an error is reported.

  • The variables that initialize the array are stored in the constant area of the code snippet

.section __TEXT, __constl_main.arr:.long 1; 0x1 .long 2 ; 0x2.long 3 ; 0x3

Copy the code

LogArray assembly:

Next: iOS Assembler Quick Start part 2 (not yet completed)

Refer to the link

  • IOS debugging advanced https://zhuanlan.zhihu.com/c_142064221

  • The iOS application of reverse engineering, https://book.douban.com/subject/25826902/

  • ios-assembly-tutorial https://www.raywenderlich.com/37181/ios-assembly-tutorial

  • iOS ABI Function Call Guide https://developer.apple.com/library/content/documentation/Xcode/Conceptual/iPhoneOSABIReference/Introduction/Introductio n.html

  • Procedure Call Standard for the ARM 64-bit Architecture https://developer.apple.com/library/content/documentation/Xcode/Conceptual/iPhoneOSABIReference/Articles/ARM64FunctionCa llingConventions.html#//apple_ref/doc/uid/TP40013702-SW2

Recommended reading

OCLint Code Review – To improve the quality of your Code

Understand weak’s delayed release from a crash

Experience delayed release of strong from a crash

Step by step guide to iOS Audio Spectrum animation (2)

Explain the nature of iOS componentization and routing

Just count while you’re watching