Introduction to the

In languages like Python and Java, when an exception occurs, the call stack is printed as shown below. In C/C ++, we rely on libraries like libbackTrace to print the call stack.

How to implement a library like libbackTrace to print function call stacks in C/C ++? This article introduces a naive Backtrace implementation and explains how backTrace works.

Trace the base of the call stack – stack Frames

The stack frame holds the function call information, which is stored in the stack area of C memory layout as shown in the figure below.

The stack frame is the core of backTrace. It stores the local variables of function call and records the context of function call, such as the return address. Its specific layout is shown in the following figure.

The EBP register stores the stack base address pointer, and the ESP register stores the current stack top pointer. We call the memory between EBP-ESP as stack frame.

To see how stack frames change when a function is called we can look at assembly code, and we disassemble the following code, okay

// file:naive.c
int add(int a, int b){
    return a + b;
}

int main(int argc, char* argv[]){
    add(1.2);
}
GCC -m32-s-o0-masm = Intel naive. C
Copy the code

You get the following code

_add: ## @add push ebp mov ebp, esp mov eax, dword ptr [ebp + 12] add eax, dword ptr [ebp + 8] pop ebp ret _main: Mov eax, dword PTR [ebp + 12] mov ecx, Dword PTR [eBP + 8] mov dword PTR [esp] 2 mov dword ptr [esp + 4], 2 mov dword ptr [ebp - 4], eax ## 4-byte Spill mov dword ptr [ebp - 8], ecx ## 4-byte Spill call _add xor ecx, ecx mov dword ptr [ebp - 12], eax ## 4-byte Spill mov eax, ecx add esp, 24 pop ebp retCopy the code

When call _add is used in assembly, it pushes the next address onto the stack and jumps to the function location, i.e. call _add is equivalent to two instructions,push PC; JMP _add, after the call _add directive is used, the top of the stack (the address pointed to by ESP) stores xor ECx, the address of the ECX directive.

After entering _add, the current stack base address will be pushed into the stack, and a new stack frame will be formed through mov EBP and ESP.

As shown in the figure above, we can get the return address of the function by [ebp+4] in a 32-bit program, and the value of the eBP to the address is the saved EBP value.

For more details, please refer to the following article

journey-to-the-stack

How do stacks push and pop during function calls?

stack frameout layout

Get call stack information

How do we get the eBP value? It’s hard to do this in pure C code, so I ended up using inline assembly to do this

typedef void* ptr_t;

inline ptr_t* get_ebp(){
	ptr_t* reg_ebp;

  asm volatile(
          "movq %%rbp, %0 \n\t"
          : "=r" (reg_ebp)
  );
	return reg_ebp;
}
Copy the code

However, we still can’t get the complete call stack information through the function stack frame. We need to restore which function is called, so we need to use the return address to get the calling function.

All functions actually have a range, which is stored in the code area of the function. We record the address of the function and look for the function whose address is closest to the return address and lower than the return address. I originally recorded the function address with the following code

typedef struct {
        char *function_name;
        int *function_address;
}function_record_t;

typedef struct{
        int now_size;
        int max_size;
        function_record_t *records;
}function_record_vec_t;

function_record_vec_t vec;


int function_record_vec_init(function_record_vec_t *self){
        self->now_size = 0;
        self->max_size = 5;
        self->records = (function_record_t *)malloc(sizeof(function_record_t) * self->max_size);
        if(! self->records)return 0;
        return 1;
}

int function_record_vec_push(function_record_vec_t *self, function_record_t record){
        if(self->now_size == self->max_size){
                self->max_size = self->max_size << 1;
                self->records = (function_record_t *)realloc(self->records, sizeof(function_record_t) * self->max_size);                if(! self->records)return 0;
        }
        self->records[self->now_size++] = record;
        return 1;
}

// You need to manually record all function information
function_record_t * find_best_record(int *return_address){
        for(int i=0; i<vec.now_size; i++){
                if(vec.records[i].function_address < return_address)
                {
                        return vec.records+i; // Return the address of the function that best matches the requirements}}}int main(void){
        function_record_vec_init(&vec);
        function_record_t main_f = {"main", &main};
        function_record_vec_push(&vec, main_f);
  			// omit the process of recording all function addresses and names
        qsort(vec.records, vec.now_size, sizeof(function_record_t), compare_record);// Address sorted from lowest to highest
}
Copy the code

This approach was so stupid that I started looking for an API that could get the call information directly from the address. The DLADDR in Linux was exactly what I needed, so the code above could be simplified to the following version

void identify_function_ptr( void *func)  {
  Dl_info info;
  int rc;

  rc = dladdr(func, &info);

  if(! rc) {printf("Problem retrieving program information for %x: %s\n", func, dlerror());
  }

  printf("Address located in function %s within the program %s\n", info.dli_fname, info.dli_sname);
}
Copy the code

If you pass in an address, you can get which function the address is most likely to be in.

The final code is as follows

#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <dlfcn.h> void identify_function_ptr( void *func) { Dl_info info; int rc; rc = dladdr(func, &info); if (! rc) { printf("Problem retrieving program information for %x: %s\n", func, dlerror()); } printf("Address located in function %s within the program %s\n", info.dli_fname, info.dli_sname); } typedef void* ptr_t; typedef struct _frame_t{ ptr_t return_address; ptr_t ebp; struct _frame_t *next_frame; }frame_t; int frame_init(frame_t *self, ptr_t ebp, ptr_t return_address){ self->return_address = return_address; self->ebp = ebp; self->next_frame = NULL; } void back_trace(){ ptr_t* reg_ebp; asm volatile( "movq %%rbp, %0 \n\t" : "=r" (reg_ebp) ); frame_t* now_frame=NULL; while(reg_ebp){ frame_t *new_frame = (frame_t *) malloc(sizeof(frame_t)); frame_init(new_frame, (ptr_t)reg_ebp, (ptr_t)(*(reg_ebp+1))); new_frame->next_frame = now_frame; now_frame = new_frame; reg_ebp = (ptr_t)(*reg_ebp); } while(now_frame){ identify_function_ptr((ptr_t)now_frame->return_address); now_frame = now_frame->next_frame; } } void two(){ back_trace(); } void one(){ two(); } int main(void){ one(); }Copy the code

The results are as follows

Libbacktrace relies on libunwind to restore the call stack. To use the above code with c++, use demangle to restore the symbolic name of the c++ function.