ARM64 assembler language

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”

This article also participated in the “Digitalstar Project” to win a creative gift package and creative incentive money

register

The CPU has controllers, arithmetic units and registers. The function of register is the temporary storage of data.

CPU computing speed is very fast. For performance purposes, the CPU creates a small temporary storage area and copies data from the memory to this small temporary storage area before performing operations. We call this small temporary storage area a register.

For ARM64 cpus, a register beginning with an X indicates a 64-bit register, and a register beginning with a W indicates a 32-bit register. There are no 16 – and 8-bit registers available for access and use. The 32 bit register is the lower 32 bit part of the 64 bit register and does not exist independently.

The cache

The ARM processor A11 on the iPhoneX has a level 1 cache of 64KB and a level 2 cache of 8M.

Before executing an instruction, the CPU reads the instruction from memory into the CPU and executes it. Registers run much faster than memory reads and writes, and the CPU integrates a cache storage area for performance. When a program is running, the code and data to be executed are copied to the cache (done by the operating system). The CPU reads the instructions from the cache to execute them.

Universal register

ARM64 has 31 64-bit general-purpose registers x0 through X30. These registers are usually used to store general data, called general-purpose registers (and sometimes special-purpose registers).
- So w0 through W28 these are 32 bits. 64-bit cpus are 32-bit compatible. So you can use only the lower 32 bits of the 64-bit register.
- For example, w0 is the lower 32 bits of x0!

Data address register

Data address register is usually used for temporary storage, accumulation, counting, address storage and other functions of data calculation. The main purpose of these registers is to store operands in CPU instructions and use them as regular variables in the CPU. In the ARM64

64-bit: X0-X30, XZR(zero register, which holds data 0)
32 bits: W0-W30, WZR(zero register)

Note: there is a special register segment register :CS,DS,SS,ES to hold the base address of these segments, this belongs to the Intel architecture CPU. Not in ARM

Floating point and vector registers

Because of the storage of floating point numbers and the special nature of their operations, floating point registers are provided in the CPU to handle floating point numbers

Floating point register 64-bit: D0-D31 32-bit: S0-S31

The current CPU support vector operation.(vector operation in the graphics processing related field is very much used) for the support vector calculation system also provides a number of vector registers.

Vector register 128 bits :V0-V31

PC Register (Program Counter)

Is the instruction pointer register, which indicates the address of the instruction that the CPU is currently reading, similar to the X86 assembler cs+ IP

SP and FP registers

The SP register holds our address at the top of the stack at any given time.
The FP register, also known as the X29 register, is a general purpose register, but at some point we use it to store the address at the bottom of the stack! (a)

Note :ARM64 starts, cancel 32-bit LDM,STM,PUSH,POP instructions! Instead, stack operations in LDR \ LDP STR \ STP ARM64 are 16 bytes aligned!!

About memory read and write instructions

The sp pointer moves from high to low, while the heap pointer moves from low to high. If the two Pointers collide, the stack runs out of memory

STR (store register) instructions

To read data out of a register and store it in memory.

LDR (load register) instructions

To read data from memory and store it in a register

This LDR and STR variant LDP (pair) and STP (pair) can also operate two registers.

; Text. Global _A _A: sub sp,sp,#0x20; STP x0,x1,[sp,#0x10]; X0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0, x0 Namely [sp, # 0 x10]! LDP x0 [sp,#0x10]; Add sp,sp,#0x20; Stack balance, free memory space RET; Return to the next line of the call instructionCopy the code

Stur instruction: use when offset is minus. Stur w0, [X29, #0x8] offset is negative. Store the value of register W0 to the memory address of X29-0x8

[sp]: sp stores the address value of stack space. [sp] indicates the value. The corresponding space obtained is similar to [bx] in 8086

Also assembly shorthand

stp x29,x30,[sp,#-0x10]! ; There's one more tail! Number; Sub sp,sp,#0x10 STP x29,x30,[sp]; Or directly understood as plus! LDP x29,x30,[sp],#x010; LDP x29,x30,[sp] add sp,#0x10Copy the code

Bl instruction

Where the CPU executes instructions from is determined by the contents of the PC, and we can control the CPU to execute the target instructions by changing the contents of the PC
ARM64 provides a MOV instruction (transfer instruction) that can be used to modify most register values, such as
- Mov x0,#10, mov x1,#20
However, the MOV instruction cannot be used to set the value of a PC, and ARM64 does not provide such functionality
ARM64 provides additional instructions to modify PC values. These instructions are collectively called transfer instructions, the simplest of which are BL instructions

Similar to call in x86 assembly

Bl label

Place the address of the next instruction into the LR (X30) register
Go to the label to execute the instruction

ret

The default value of lr(X30) register is used, and the underlying instruction prompts the CPU to use this as the next instruction address!

ARM64 platform feature instructions, it is optimized for hardware processing

X30 register

The X30 register holds the return address of the function. When the RET instruction is executed, the address value saved in the X30 register is searched!

Note: before the function is nested. Need to push x30!

Arm code Examples

.text ; Code snippet. Global _A,_B; Define two global functions A and B _A: mov x0,#0xa0; Mov x1,#0x00 add x1,x0,#0x14; x1=x0+0x14 ret ; Return to the next instruction _B corresponding to bl instruction: add x0, x0,#0x10 retCopy the code

Registers and stacks

Registers are global containers shared by all functions, but stacks are different. A function occupies unique stack space. Nested in each function call, the register is easy to be overwritten, speaking, reading and writing, in order to maintain the register data is not be changed at this time, usually in combination with temporary save register values in the stack, and then ret () function before the data recovery, so it can ensure that a function of the data is not be changed, that is to achieve the use will register as a local variable

The stack is aligned

Stack operations in ARM64 are 16-byte aligned, meaning that stack space is created at least 16 bytes at a time, or a multiple of 16, otherwise an error will be reported

.text ; Code snippet. Global _A,_B; Define two global functions A and B _A: mov x0,#0xaaaa; STR x30,[sp,#-0x10]! ; Temporarily save the address in the LR register before calling the next function. Lr holds the address of the next instruction bl _B MOV x0,# 0xCCcc LDR x30,[sp],#0x10; Lr restore RET; Return to the next instruction _B corresponding to bl instruction: add x0, x0,#0x10 retCopy the code

Since sp stretches at least 16 bytes at a time, how much space does the following function stretch:

void sum(int a, int b){
    int c=3;
    int c=4;
    int c=5;
  
}
Copy the code

Since there are 5 ints in the stack, 5*4=20 bytes of space is required, so sp stretches 0x20 at one time, which is 32 bytes of stack space

16Bit register --> maximum load2Bytes of data -->0xFFFF
32Bit register --> maximum load4Bytes of data -->0xFFFFFFFF
64Bit register --> maximum load8Bytes of data -->0XFFFFFFFFFFFFFFFF
 
Copy the code

If a function is called from within the function, how much does sp stretch?

void sum(int a, int b){
    int c=3;
    int c=4;
    int c=sumb(a,b);
  
}
Copy the code

Since bl overwrites the x30(LR) register before calling the function, the X29 and X30 registers need to be temporarily protected. These two registers occupy 16 bytes, plus the 16 bytes occupied by the local variables and parameters of sum, making a total of 32 bytes

Leaf function

Functions that do not call other functions in the function body are called leaf functions, also known as trailing functions

This function can omit the use of stack space when writing assembly code, stack space is to temporarily protect data from being polluted by the next function, leaf function does not have this risk, so there is no need to protect processing, directly use the register

The ARM64 method returns the value

Under ARM64, function parameters are usually stored in the eight registers X0 through X7(W0 through W7). If more than eight parameters, you will get into the stack. (one is related to number of parameters, and also more data about structure, occupies the first 8 bytes pointer to just a 64 – bit register, if the type of the hamster beyond 8 bytes, namely stored elsewhere, such as stack space) function of the return value is normally within X0 registers.

ORR instruction

Called or instructions, perform or operations, blog.csdn.net/qq_39416311…

orr w8,wzr,#0x1 ; The numbers 0x1 and 0 are immediately or and then copied to W8Copy the code

Function nested reuse

Suppose there are two functions A and B, and their call chain is A–>B–>A

In high-level language,A function is multiplexed, but there is no concept of multiplexing in assembly, every call A function will open up A stack space, so even if the same function is called, if the recursive nesting times are too many, it will cause memory overflow

Status register (marker register)

Registers Current Program Status (CPSR) registers

The CPSR is different from other registers, which are used to store data, and the whole register has one meaning. The CPSR register works bitwise, meaning that each bit has a special meaning and records specific information.

To affect the value of the token register in arithmetic operations, an s must be added to the instruction, as in:

add--->adds
sub--->subs
Copy the code

Note: the CPSR register is 32 bits

The lower 8 bits of the CPSR (including I, F, T, and M[4:0]) are called the control bits and cannot be modified by a program unless the CPU is running in privileged mode!
N, Z, C, and V are all conditional code flag bits. Their contents can be changed by the results of arithmetic or logical operations, and can determine whether an instruction is executed or not! Significant!

N indicates Negative

Bit 31 of the CPSR is N, the symbol flag bit. It records whether the result is negative after the relevant instruction is executed. If it’s negative N is equal to 1, if it’s non-negative N is equal to 0.

Note that in the ARM64 instruction set, some instructions that affect the status register, such as add, sub, or etc., are mostly operational instructions (perform logical or arithmetic operations).

Z (Zero)

The 30th bit of CPSR is the Z, 0 flag bit. It records whether the result is 0 after the relevant instruction is executed. If the result is 0, then Z = 1. If it’s not 0, then Z is equal to 0.

For the value of Z, we can see in this way,Z marks whether the calculation result of relevant instructions is 0. If it is 0, N should record such positive information as “yes”. In computers, 1 means logical truth, positive. So when the result is 0 Z = 1 means “the result is 0″. If the result is not 0, Z records the negative message” not 0″. In computers, 0 means logic false, means negation, so when the result is not 0, Z = 0 means “the result is not 0”.

C (Carry)

Bit 29 of the CPSR is C, the carry flag bit. In general, unsigned numbers are performed. Addition operation: C=1 if the operation results in a carry (unsigned overflow), otherwise C=0. Subtraction operations (including CMP) : C=0 when a debit occurs (unsigned overflow), otherwise C=1.

For an unsigned number with bits N, the highest bit of the corresponding binary information, i.e., the n-1st bit, is its most significant bit, while the imaginary NTH bit is the higher bit relative to the most significant bit. As shown below:

carry

We know that when two pieces of data are added, it is possible to produce a carry from the most significant bit to a higher one. For example, two 32-bit bits of data: 0xAAAAaAAA + 0xAAAAaAAA will produce a carry. Since the carry value cannot be stored in 32 bits, we simply say that the carry value is lost. In fact, the CPU does not discard the carry system, but records it in a special register. ARM uses C bits to record the carry value. For example, the following command

mov w0,# 0 xaaaaaaaa; The binary of 0xA is 1010Adds w0, w0, w0; Equivalent to 1010 <<1 carry 1(Unsigned overflow) so C is marked by 1 adds W0,w0,w0; Equivalent to 0101 <<1 carries 0 (unsigned and no overflow) so C is marked with 0 adds W0,w0,w0; Repeat the above to add W0,w0,w0Copy the code

A borrow

When you subtract two numbers, it’s possible to borrow higher. For another example, two 32-bit data: 0x00000000-0x000000FF will generate a debit. After the debit, it is equivalent to calculating 0x100000000-0x000000FF. I get the value 0xffffFF01. Since we borrowed one bit, the C bit is used to mark the borrowing. C = 0. For example:

mov w0,#0x0
subs w0,w0,#0xff ;
subs w0,w0,#0xff
subs w0,w0,#0xff
Copy the code

V(Overflow) Indicates the Overflow flag

Bit 28 of the CPSR is V, the overflow flag bit. When a signed number operation is performed, if it exceeds the range that the machine can identify, it is called an overflow.

Positive + positive overflow for negative numbers
Negative + negative is positive overflow
Positive and negative numbers cannot overflow

Adrp instruction

Adrp (Address Page): the address page used to calculate the offset between the physical address of the specified data and the current PC address, that is, using this command to calculate the physical address of a constant

Adrp x0, 1; 1, 1 0000 0000 0000 ==0x1000; 2. Clear the lower 12 bits of the PC register. 3. Add 0x1000 to the value after clearing and assign the final result to register X0; The value 1 after the ADRP command is a hexadecimal numberCopy the code

Memory partition

Code area: readable, writable and executable
Stack area: readable and writable
Heap area: dynamic application, readable and writable
Global variable area: readable and writable
Constant area: read only