Learning to program is really learning high-level languages, computer languages designed for humans.

However, computers do not understand high-level languages and must be converted to binary code by a compiler before they can run. Learning a high-level language is not the same as understanding how a computer actually works.

What computers do understand is low-level language, which is specifically used to control hardware. Assembly language is a low-level language that directly describes/controls CPU operations. If you want to understand what the CPU does and how the code works, learn assembly language.

Assembly language is not easy to learn, and even concise introductions are hard to find. I’ll try to write a best-understood assembly language tutorial that explains how the CPU executes code.

What is assembly language?

As we know, the CPU is only responsible for calculation, not intelligence itself. You type a instruction, “Instruction,” and it runs once and then it stops and waits for the next instruction.

These instructions are binary, called opcodes, and the addition instruction, for example, is 00000011. The function of a compiler is to translate a program written in a high-level language into a strip of opcodes.

The binary is unreadable to a human, making it impossible to tell what the machine is doing. To solve the problem of readability, and occasionally the need for editing, assembly language was born.

Assembly language is the text form of binary instruction, which has a one-to-one correspondence with instruction. For example, the addition instruction 00000011 is written in assembly language as ADD. Assembly language can be executed directly by the CPU as long as it is reduced to binary, so it is the lowest level of low-level language.

Second, the origin

In the earliest days, writing programs consisted of writing binary instructions by hand and feeding them into a computer through switches, such as an addition switch. Later, tape punches were invented to automatically input binary instructions into a computer by punching holes in the tape.

To solve the readability problem of binary instructions, engineers wrote those instructions in octal. Binary to octal is easy, but octal is not readable either. Naturally, the final word is written as ADD. Memory addresses are no longer referenced directly, but are represented by labels.

The levitator is one more step to translate the verbal instruction into binary, and the program that does it is called assembler. The text it processes is, of course, called Aseembly code. After standardization, it is called Assembly Language, abbreviated as ASM, and translated into Assembly language in Chinese.

The machine instructions for each CPU are different, and therefore the corresponding assembly language is also different. This article introduces the most common x86 assembly language, the one used by Intel cpus.

3. Register

To learn assembly language, we must first understand two knowledge points: registers and memory model.

Look at the registers first. The CPU itself is only responsible for computing, not storing data. Data is usually stored in memory, and the CPU reads and writes data when it needs it. However, the CPU’s processing speed is much faster than the memory’s reading and writing speed. To avoid being slowed down, the CPU has its own level-1 cache and level-2 cache. Basically, a CPU cache can be thought of as memory with fast read and write speeds.

However, the CPU cache is still not fast enough, and the location of the data in the cache is not fixed, and the CPU’s need to address each read and write slows things down. Therefore, in addition to the cache, the CPU also has a register, which is used to store the most commonly used data. That is, the most frequently read and written data (such as loop variables) are stored in registers, which are first read and written by the CPU and then exchanged with memory.

Registers do not differentiate data by address, but by name. Each register has its own name, we tell the CPU to go to the specific register to get data, this is the fastest. Some people compare the register to the CPU’s zero-level cache.

4. Types of registers

Early x86 cpus had only eight registers, and each had a different purpose. Now there are more than 100 registers, all become general purpose registers, not specifically designated, but the names of the early registers have been preserved.

  • EAX
  • EBX
  • ECX
  • EDX
  • EDI
  • ESI
  • EBP
  • ESP

Of the eight registers above, the first seven are universal. The ESP register serves a specific purpose, holding the address of the current Stack (see the next section).

We often see names such as 32-bit CPU and 64-bit CPU refer to register size. The register size of a 32-bit CPU is 4 bytes.

Memory model: Heap

Registers can only hold a small amount of data, and most of the time, the CPU needs to direct the registers and exchange data directly with memory. So, in addition to registers, you must also understand how memory stores data.

When a program is running, the operating system allocates it a block of memory to store the program and the data it generates. This memory segment has a start address and an end address, such as 0x1000 to 0x8000, where the start address is the smaller address and the end address is the larger address.

In the process of the program is running, for dynamic memory footprint request (such as new objects, or use malloc command), the system will from pre-allocated good that memory, fraction to the user, the specific rules began dividing the starting address (in fact, the starting address there will be a period of static data, here to ignore. For example, if the user asks for 10 bytes of memory, it will be allocated starting at address 0x1000 and going all the way to address 0x100A, and if it asks for another 22 bytes, it will be allocated to address 0x1020.

This area of memory that is created by an active user request is called the Heap. It starts at the start address and grows from the low address to the high address. An important feature of Heap is that it does not disappear automatically and must be released manually or collected by a garbage collection mechanism.

Vi. Memory model: Stack

In addition to Heap, other memory usage is called a Stack. Simply put, a Stack is an area of memory temporarily occupied by a function.

Take a look at the following example.


int main() {
   int a = 2;
   int b = 3;
}
Copy the code

In the above code, when the system starts executing main, it creates a frame in memory for it. All the internal variables of Main (such as a and b) are stored in this frame. After the main function completes, the frame is reclaimed, freeing all internal variables and taking up no more space.

What happens if other functions are called inside a function?


int main() {
   int a = 2;
   int b = 3;
   return add_a_and_b(a, b);
}
Copy the code

In the above code, add_a_and_b is called inside main. When this line is executed, the system also creates a new frame for ADD_A_AND_B to store its internal variables. That is, there are two frames: main and add_A_and_B. In general, there are as many frames as there are layers in the call stack.

When add_A_AND_B finishes running, its frames are reclaimed and the system returns to where main left off and continues. With this mechanism, layers of functions are called, and each layer can use its own local variables.

All frames are stored in the Stack, which is called a Stack because frames are stacked on top of each other. Create a new frame, which is called “push”; The collection of stacks is called “out of stack”, which is Pop in English. A Stack is characterized by the fact that the last frame to be loaded is the first frame to leave the Stack (because the innermost function call ends first). This is called a “lifO” data structure. Every time a function finishes, it automatically releases a frame, and when all functions finish, the Stack is released.

The Stack starts with the end address of the memory area and is allocated from the highest address to the lowest address. For example, if the end address of the memory area is 0x8000, and the first frame is assumed to be 16 bytes, the next address allocated will start at 0x7FF0; Frame 2 assumes 64 bytes, so the address moves to 0x7FB0.

CPU instructions

7.1 An Instance

With registers and memory models in mind, it’s time to see what assembly language really is. Here is a simple program example.c.


int add_a_and_b(int a, int b) {
   return a + b;
}

int main() {
   return add_a_and_b(2, 3);
}
Copy the code

GCC converts this program into assembly language.


$ gcc -S example.c
Copy the code

After executing the above command, a text file example.s is generated, which is assembly language, containing dozens of lines of instructions. To put it this way, a simple operation in a high-level language may consist of several or even dozens of CPU instructions at the bottom. The CPU executes these instructions in turn to complete this operation.

After simplification, example.s looks something like this.


_add_a_and_b:
   push   %ebx
   mov    %eax, [%esp+8] 
   mov    %ebx, [%esp+12]
   add    %eax, %ebx 
   pop    %ebx 
   ret  

_main:
   push   3
   push   2
   call   _add_a_and_b 
   add    %esp, 8
   ret
Copy the code

As you can see, the two functions of the original program, add_a_and_b and main, correspond to the two tags _add_a_and_b and _main. Inside each tag is the CPU flow that this function converts to.

Each line is an operation performed by the CPU. It’s divided into two parts, so let’s take one of them.


push   %ebx
Copy the code

In this line, push is the CPU instruction and %ebx is the operator used by the instruction. A CPU instruction can have anywhere from zero to multiple operators.

I’m going to go through the assembler line by line, and I suggest you copy it in another window so you don’t have to scroll it up.

7.2 push instruction

By convention, the program starts with the _main tag, creates a frame for main on the Stack, and writes the address that the Stack points to to the ESP register. Any subsequent data to be written to the main frame is written to the address stored in the ESP register.

Then, the first line of code is executed.


push   3
Copy the code

Push is used to put an operator on the Stack, in this case writing 3 to the main frame.

As simple as it may seem, the push instruction actually has a front operation. It first fetches the address in the ESP register, subtracts 4 bytes from it, and writes the new address to the ESP register. Subtraction is used because the Stack goes from high to low, and 4 bytes is used because 3 is of type int and takes up 4 bytes. After getting the new address, 3 writes the first four bytes of the address.


push   2
Copy the code

In the second line, the push instruction writes 2 to the main frame, right next to the 3. At this point, the ESP register is subtracted by an additional 4 bytes (total minus 8).

7.3 call instructions

The third line of the call directive calls the function.


call   _add_a_and_b
Copy the code

The above code calls the add_a_and_B function. At this point, the program looks for the _add_A_AND_B tag and creates a new frame for the function.

Let’s start executing the code for _add_A_AND_B.


push   %ebx
Copy the code

This line writes the value of the EBX register to frame _add_A_AND_B. This is because the register needs to be used later, so the value is first taken out and then written back.

At this point, the push instruction subtracts the address in the ESP register by 4 bytes (minus 12 total).

7.4 mov instruction

The MOV instruction is used to write a value to a register.


mov    %eax, [%esp+8] 
Copy the code

This line of code adds 8 bytes to the address in the ESP register to get a new address, and then retrieves data from the Stack at that address. Based on the previous steps, you can deduce that 2 is fetched here, and 2 is written to the EAX register.

The next line of code does the same thing.


mov    %ebx, [%esp+12] 
Copy the code

The code above adds 12 bytes to the ESP register and fetches data from the Stack at that address, this time 3, and writes it to the EBX register.

7.5 the add instruction

The add directive adds two operators and writes the result to the first operator.


add    %eax, %ebx
Copy the code

The code above adds the value of the EAX register (that is, 2) to the value of the EBX register (that is, 3), resulting in 5, which is then written to the first operator, the EAX register.

7.6 pop instructions

The pop instruction fetches the last value written to the Stack (that is, the value of the least significant address) and writes it to the location specified by the operator.


pop    %ebx
Copy the code

The above code takes the most recent value written to the Stack (the original value of the EBX register) and writes it back to the EBX register (which is no longer used because the addition is done).

Note that the POP instruction also increments the address in the ESP register by 4, reclaiming 4 bytes.

7.7 ret instruction

The RET directive is used to terminate the execution of the current function, giving running control back to the upper function. That is, the frame for the current function is recycled.


ret
Copy the code

As you can see, this instruction has no operator.

As add_a_and_b terminates, the system returns to where main left off and continues.


add    %esp, 8 
Copy the code

The code above shows that the address in the ESP register is manually added 8 bytes, and then written back to the ESP register. This is because the ESP register is the start address of the Stack. The previous pop operation has reclaimed 4 bytes.


ret
Copy the code

Finally, the main function ends and the RET instruction exits the program.

8. Reference links

  • Introduction to reverse engineering and Assembly, by Youness Alaoui
  • x86 Assembly Guide, by University of Virginia Computer Science

(after)