C++ reverse/disassemble perspective of C++ object memory model

A deeper understanding of the memory model and C++ implementation can be gained through IDA tool reverse executable. The following analysis is based on the ARM32 instruction.

C++ classes are backward compatible with C constructs

class MyClass
{
public:
    int x;
    int y;
    int z;
};
int main(int argc, char const *argv[])
{

    MyClass myclass;
    myclass.x = 0x11;
    myclass.y = 0x22;
    myclass.z = 0x33;
    return 0;
}
Copy the code

In the simple code above, the class has only three member variables x,y,z IDA disassembler output:

You can see: [sp, #0x18 + var_14] = [sp, #0x18 + var_14] = [sp, #4] = [sp, #0x18 + var_14] = [sp, #4] 3. The address of the z member variable at [sp, #0x18 + var_10] = [sp, #8] is assigned 0x33

So the memory layout of the object in this case is as follows:

The address of the object, the address of the x member variable, can be verified by printing &myclass and &myClass.x equal. Another conclusion from the assembly results is that the compiler did not generate a “nontrivial default constructor”, or synthesized constructor, for the MyClass class at this time.

Member functions

Add a member function setZ:

class MyClass { public: int x; int y; int z; void setZ(int value) { z = value; }}; int main(int argc, char const *argv[]) { MyClass myclass; myclass.x = 0x11; myclass.y = 0x22; myclass.z = 0x33; myclass.setZ(0x88); return 0; }Copy the code

You can see that the setZ call is compiled to jump :BL _ZN7MyClass4setZEi:

The code for the _ZN7MyClass4setZEi function is:

You can see that the difference between a member function and a normal function is that the first argument R0 is the this pointer.

Static member variables and static member functions:

Add a static member variable and a static member function:

class MyClass { public: int x; int y; int z; static int s_value; void setZ(int value) { z = value; } static void set_s_value(int value) { s_value = value; }}; int MyClass::s_value; int main(int argc, char const *argv[]) { MyClass myclass; myclass.x = 0x11; myclass.y = 0x22; myclass.z = 0x33; myclass.setZ(0x88); MyClass::set_s_value(0x99); return 0; }Copy the code

You can see that the call to set_s_value is compiled as follows:

You can see that the set_s_value argument value(0x99) is placed in R0 and the function is called_ZN7MyClass11set_s_valueEi:

Static functions have no this pointer. R0 is the first argument, R1 is the second argument,R2 is the third argument, and so on. The first argument is stored at the memory address pointed to by register R1 in the.bSS section. The space occupied by the static member s_value is.bss, because s_value is not initialized.

This shows that static member variables are not associated with the memory of a specific object, which is similar to global variables in C language.

Why can’t static functions of a class access non-static functions of a class and non-static member variables of a class? Since these accesses require R0 as the this pointer to access the object’s memory, static functions do not have such a pointer.

4. Programmer’s own implementation of the construction and destruction:

Write a constructor without virtual functions and inheritance:

class MyClass { public: int x; int y; int z; MyClass(int x_value, int y_value, int z_value) { x = x_value; y = y_value; z = z_value; }}; int main(int argc, char const *argv[]) { MyClass myclass(0x11, 0x22, 0x33); myclass.x = 0x111; return 0; }Copy the code

The constructor is called with four arguments: R0,R1,R2,R3, representing this, the first, second, and third arguments respectively:

You can see from this that the object’s memory is allocated before the constructor is called. The constructor, on the other hand, stores the passed arguments at the offset of the R0 pointer, and implicitly returns the R0 this pointer:

Let’s look at the destructor:

class MyClass
{
public:
    MyClass() {}
    ~MyClass() {}
};
MyClass myclass2;
int main(int argc, char const *argv[])
{

    MyClass myclass;
    return 0;
}
Copy the code

To see when the destructor is called, two objects of MyClass are created, one global and one allocated on the stack. The destructor for myClass, which is scoped in the main block, is called before the return of main. We can also conclude from disassembly:

One difference between C++ and C is that in C++ you can create global variables and call a function to initialize them, like this:

int foo(){
    return 1;
}
int bar = foo();
Copy the code

But the initializer element is not constant was compiled with GCC in C to create MyClass Myclass2:

You can see that the compiler generates a function__cxx_global_var_init, the address of this function is.init_arrayTo ensure that when the executable is loaded__cxx_global_var_initThis function is used to initialize a global variable in c++. It first calls the MyClass constructor to create MyClass:

Then call __cxA_atexit to register the address of the destructor so that it can be called when the executable exits. The __cxa_atexit function prototype is: int __cxa_atexit(void (*destructor) (void *), void *arg, void *__dso_handle); As can be seen from IDA, the first parameter is the destructor address, the second parameter is the Object address, and the third parameter is the Handle of the Dynamic Shared Object

Virtual functions, virtual tables, and synthesized constructors:

In the case of virtual functions, the memory layout is slightly different, code:

class MyClass { public: int x; int y; int z; virtual void setX(int x_value) { x = x_value; }}; int main(int argc, char const *argv[]) { MyClass myClass; myClass.x = 0x11; myClass.y = 0x22; myClass.z = 0x33; myClass.setX(0x44); MyClass *ptr_class = &myClass; ptr_class->setX(0x55); return 0; }Copy the code

MyClass has one more virtual function, look at the corresponding disassembly:

Conclusion:

When a class defines a virtual function, the compiler generates a default constructor if the programmer does not define a constructor
When a class defines a virtual function, the memory layout of an object no longer behaves like a C structure, but rather changes somewhat
A virtual function is called from an object in the same way as a non-virtual function:BL Indicates the address of the virtual function. To call a virtual function through a pointer, you need to calculate the address of the virtual function and thenBLThat’s the top oneBLX R2

Let’s look again at what the constructor generated by the compiler does:

_ZTV7MyClass address value +8 is the address of the virtual function setX, so we can get the memory layout of the object, also introduced the concept of virtual Table V-table:

Vi. Inheritance and Rewriting:

When a class with no virtual functions inherits from another class with no virtual functions, and the subclass is not overridden, the memory layout is simple, and the subclass has variables defined by the parent class, which I won’t detail here.

Take a look at the following example:

class BaseClass { public: int base_x; int base_y; int base_z; void set_base_x(int value) { base_x = value; } void set_base_y(int value) { base_y = value; }}; class MyClass : public BaseClass { public: int base_y; int base_z; int sub_a; int sub_b; void set_base_x(int value) { base_x = value + 1; }}; ! [framework. PNG] (HTTP: / / https://p1-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/4cdf91cf29a848079b99870ed32f0ce5~tplv-k3u1fbpfcp-watermark.im age) int main(int argc, char const *argv[]) { MyClass myClass; myClass.BaseClass::base_x = 0x11; myClass.BaseClass::base_y = 0x22; myClass.BaseClass::base_z = 0x33; myClass.base_y = 0x44; myClass.base_z = 0x55; myClass.sub_a = 0x66; myClass.sub_b = 0x77; myClass.set_base_x(0x88); return 0; }Copy the code

MyClass inherits base_x,base_y, and base_z from BaseClass, and overrides two base_y,base_z members, and defines sub_A and sub_B.

Compilation results:

The result shows that the memory layout of the object’s member variables is:

Virtual functions without rewriting:

Class does not override the parent class’s virtual function:

class BaseClass { public: virtual void f() { cout << "f()" << endl; } virtual void g() { cout << "g()" << endl; } virtual void h() { cout << "h()" << endl; }}; class MyClass : public BaseClass { public: virtual void f1() { cout << "f1()" << endl; } virtual void g1() { cout << "g1()" << endl; } virtual void h1() { cout << "h1()" << endl; }}; int main(int argc, char const *argv[]) { MyClass myClass; return 0; }Copy the code

From the assembly result, we can see that the compiler generates another constructor, since both subclasses and superclasses have virtual functions, so the compiler needs to synthesize a constructor. 1. Call the superclass constructor to set the virtual table 2 of the superclass. Sets the virtual table of the subclass

Let’s look at the structure of virtual table:

The above code calls the parent class’s constructor to initialize the virtual table, and then calls the subclass’s constructor to override the virtual table set in the parent class, which is exactly what you want.

The memory structure is:

Virtual functions and polymorphisms with overwriting:

The code is as follows:

class BaseClass { public: virtual void f() { cout << "f()" << endl; } virtual void g() { cout << "g()" << endl; } virtual void h() { cout << "h()" << endl; }}; class MyClass : public BaseClass { public: virtual void f() { cout << "sub f()" << endl; } virtual void g1() { cout << "g1()" << endl; } virtual void h1() { cout << "h1()" << endl; }}; int main(int argc, char const *argv[]) { BaseClass *cls = new MyClass(); cls->f(); return 0; }Copy the code

In the above code, MyClass overrides only the f() function. When the assembly code calls the virtual function, it uses the way of finding the virtual table to call, so that the polymorphism mechanism can be realized:

The memory layout of the object is:

Runtime type recognition RTTI:

RTTI is a compiler implementation detail, not a language issue, and compilers have no standard way to implement RTTI functionality. The implementation in question is GNU g++. Let’s start with some code that uses RTTI:

class BaseClass { virtual int vfunc() = 0; virtual int base_func() { return 0; }}; class MyClass : public BaseClass { int vfunc() { return 1; }}; void print_type(BaseClass *p) { cout << typeid(*p).name() << endl; } int main(int argc, char const *argv[]) { BaseClass *base = new MyClass(); print_type(base); return 0; }Copy the code

Look at the assembly code for the print_type function:

R0 is the BaseClass* parameter, and then after a series of calculations and calls the function _ZNKSt9type_info4nameEv _ZNKSt9type_info4nameEv

You can see that Pointers to RTTI related types are stored at address -4 before the virtual table pointer. Virtually every polymorphic object contains a pointer to a virtual table, and the compiler stores the class type information with the virtual table. Specifically, the compiler places a pointer before the class virtual table that points to a structure containing the information needed to determine the name of the class that has the virtual table. This structure varies from compiler to compiler and is defined as type_info in g++. Type_info must have a name() function that returns the printable form of a type name.

The relationship between type_info and class virtual tables is shown in the example above:

You can help analyze the program logic by looking for RTTI in the.data.rel. Ro section during reverse C++.

The above process familiar with the c++ object in disassembly and characteristics of the situation, help us understand the construction and logic of the source program in the reverse process.