Assumptions:

  • AMD64 Linux
  • C/C++

First of all, we don’t need to go through too many concepts. Just review a few basic registers:

  • %rsp: Saves the top pointer of the stack
  • %rbp: Saves the bottom pointer of the stack
  • %rbp~%rspThe area that goes down here is the stack frame.
  • %rip: Saves the address of the next instruction
  • %rdi: saves the first argument to the function
  • %rsi: saves the second argument to the function
  • %rax: Saves the returned value

Then, go straight to the code!

The sample application

Suppose you have the following program:

int sum(int x, int y)
{
    return a + b;
}
int main(int argc, char const *argv[])
{
    int a = 1, b = 2;
    int c = sum(a, b);
    return 0;
}

use
gcc -g prog.c -o progCompile.

Its assembly code is as follows:

int sum(int x, int y)
{
    1125:    55                       push   %rbp
    1126:    48 89 e5                 mov    %rsp,%rbp
    1129:    89 7d fc                 mov    %edi,-0x4(%rbp)
    112c:    89 75 f8                 mov    %esi,-0x8(%rbp)
    return a + b;
    112f:    8b 55 fc                 mov    -0x4(%rbp),%edx
    1132:    8b 45 f8                 mov    -0x8(%rbp),%eax
    1135:    01 d0                    add    %edx,%eax
}
    1137:    5d                       pop    %rbp
    1138:    c3                       retq

0000000000001139 <main>:
int main(int argc, char const *argv[])
{
    1139:    55                       push   %rbp
    113a:    48 89 e5                 mov    %rsp,%rbp
    113d:    48 83 ec 20              sub    $0x20,%rsp
    1141:    89 7d ec                 mov    %edi,-0x14(%rbp)
    1144:    48 89 75 e0              mov    %rsi,-0x20(%rbp)
    int a = 1;
    1148:    c7 45 fc 01 00 00 00     movl   $0x1,-0x4(%rbp)
    int b = 2;
    114f:    c7 45 f8 02 00 00 00     movl   $0x2,-0x8(%rbp)
    int c = sum(a, b);
    1156:    8b 55 f8                 mov    -0x8(%rbp),%edx
    1159:    8b 45 fc                 mov    -0x4(%rbp),%eax
    115c:    89 d6                    mov    %edx,%esi
    115e:    89 c7                    mov    %eax,%edi
    1160:    e8 c0 ff ff ff           callq  1125 <sum>
    1165:    89 45 f4                 mov    %eax,-0xc(%rbp)
    return 0;
    1168:    b8 00 00 00 00           mov    $0x0,%eax
}

Execute the process

Let’s just start with main. Be sure to pay close attention to changes in the call stack.

0000000000001139 <main>: int main(int argc, char const *argv[]) { 1139: 55 push %rbp # 113a: 48 89 e5 mov % RSP,% RBP # 113d: 48 83 ec 20 sub $0x20,% RSP # 89 7d ec mov %edi,-0x14(% RBP) # 89 7d ec mov %edi,-0x14(% RBP) # 48 89 75 e0 mov %rsi,-0x20(%rbp) # int a = 1; # 1148: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp) # int b = 2; # 114f: c7 45 f8 02 00 00 00 movl $0x2,-0x8(%rbp) #

This code actually saves the context of the main function into memory.

Why save it? Since the arguments argc and argv are now stored in the register, the local variable is still an immediate number. If we don’t save it, by the time we call sum and come back, the registers and the field data will have changed.

# int c = sum(a, b); # \ 1159: 8b 45 fc mov -0x4(% RBP),%eax # \ 8b 45 fc mov -0x4(% RBP),%eax # \ 8b 45 fc mov -0x4(% RBP),%eax # \ 89 c7 mov %eax,%edi # / # 1160: E8 c0 ff ff ff callq 1125 <sum> # callq # pushq %rip # JMPQ <sum> # pushq %rip = # sub $0x8, % RSP # movq %rip, (% RSP) #  # +-------+ # |main_val| <--- rbp # | ... | | #... | | #... # # | | main_val | | 1165 | < -- -- -- RSP # | | # # 1165 of them were under the instruction of address

The above code is the preparation for calling the sum function. The first is to prepare the parameters, and then to save the data of the next instruction (% RIP). This way, after we finish the call, we can read the % RIP value from memory and continue executing the program.

Due to the function of the callQ command, we jump to 1125:

0000000000001125 < sum > : int the sum (int x, int y) {# before the formal execution function, the content of the stack is: # + # -- -- -- -- -- -- -- + | main_val | < -- -- -- RBP (main_rbp) # |... | | #... | | #... # # | | main_val | | 1165 | < -- -- -- RSP # | | # 1125:55 push % RBP # this step, RBP and same in the main, because there is no # modified. Remember when push it into the stack, its value is main_rsp. # % RBP into the stack, stack contents into: # + # -- -- -- -- -- -- -- -- + | main_val # | |... | | #... | | #... | # |main_val| # | 1165 | # |main_rbp| <--- rsp # 1126: RSP, 48, 89 e5 mov % % RBP # here, RBP values into RSP, RBP has identified as # stack contents into a new function: # + # -- -- -- -- -- -- -- -- + | main_val # | |... | | #... | | #... | # |main_val| # | 1165 | # |main_rbp| <--- rsp, rbp

Both instructions are normal operations:

  1. Saves the last function on the stack to memory.
  2. Create your own stack base.
(x, y) 1129: 89 7d fc mov %edi,-0x4(% RBP) 89 75 f8 mov %esi,-0x8(%rbp) # +--------+ # |main_val| # | ... | | #... | | #... | # |main_val| # | 1165 | # |main_rbp| <--- rsp, rbp # | x | # | y |

Again, the normal operation is to store the values of the two parameters on the stack.

return a + b; # 112f: 8b 55 fc mov -0x4(% RBP),%edx # \ 1132:8b 45 f8 mov -0x8(% RBP),%eax # > 1135:01 d0 add %edx,%eax # /

The above sentences prepare the parameters of the add machine instruction, and then call the add instruction to complete the operation.

# eax is the return value register specified by Linux # 1137:5d pop % RBP # Here, pop the value from the stack into the RBP register. Since the # stack pointer RSP points to main's RBP, RBP # is restored to its original value, RBP is back to the original location # + -- -- -- -- -- -- -- -- + # | main_val | < -- -- -- RBP (main_rbp) # |... | | #... | | #... # # | | main_val | | 1165 | < -- -- -- RSP # | | # | | x, y value has been abandoned. # | | # 1138: C3 retq # retq instruction is equivalent to popq %rip # so 1165 will be eject with rip # and rip will point to the next # instruction in main callq <sum>, so main resumes its execution process #

Next, we return to main:

1165:89 45 f4 mov %eax,-0xc(% RBP) 1168: b8 00 00 00 00 mov $0x0,%eax} 116d: c9 leaveq 116e: c3 retq 116f: 90 nop

conclusion

After the Program runs, all the function calls are reflected on a stack called the Program stack, or the stack for short. The stack is stored in memory and grows from high addresses to low addresses (i.e. the top of the stack is “down”).

A Stack frame is a unit of a Stack. It is composed as follows:

You can think of the stack frame as holding all the information about the function being called. It’s just that A tunes B and B tunes C, so that for C, B is Caller and for B, A is Caller. The information on the stack frame includes:

  • BP pointer (%ebp). This pointer is stored from the beginning of the function being called.
  • Saved registers and local variables. This is what the called function is responsible for saving.
  • Input parameters. This is what the called function is responsible for saving.
  • Return address. This is what the caller saves byretqInstruction is set to after readingrip.

The process of function call is as follows:

1. Calling: Saves the context

Save your actual parameters, local variables, caller save registers, and so on to the stack.

2. Calling: The callq command is executed

Save the return address % RIP on the stack and jump to the called function

3. Switched: Replace bottom of stack

Save the bottom % RBP of the call to the stack, and then set the top % RSP value of the call as its bottom value.

4. Tuned: Save the context

Save your actual parameters, local variables, caller save registers, and so on to the stack.

5. Be tuned: Execute your own instructions

Execute machine instructions, etc

6. Tuned: Restore the bottom of the stack

Restores the bottom of the stack of the call saved at the top of the stack

7. Called: Restore the return address

% RIP is taken off the stack by retq

8. Calling: Continue

The CPU continues to execute the calling instruction from % RIP.