The memory model of virtual functions and virtual inheritance is a classic problem in C++. Its implementation depends on the compiler, but the main principles are much the same. This paper introduces the memory model and principle of virtual function and virtual inheritance in g++ in detail.

1 Polymorphic type

In C++, a polymorphic type is a type that declares or inherits at least one virtual function, and vice versa.

For variables of non-polymorphic type, the type information can be determined at compile time. Such as:

struct A { void foo() {} }; . A a; std::cout << typeid(a).name(); // the type of a can be determined at compile time as a a.foo(); A::foo's address, sizeof(A), can be determined at compile time; // Although A is empty, the space of an object of type A is 1 byte because the address of A needs to be determined in memoryCopy the code

With polymorphic types, some information must be deferred until runtime, such as its actual type, the address of the virtual function being called, and so on. In the following example, type B inherits type A, which declares virtual functions, so both A and B are polymorphic.

Struct A {virtual void foo() {}; Struct B: public A {// implicitly inherit virtual function}; . B b{}; A& a_rb = b; Typeid (declType (a_rb)).name(); typeId (decltype(a_rb)).name(); typeId (decltype(a_rb)).name(); A typeID (a_rb).name(); // decltype generates A declaration type that can be determined at compile time. // Since a_rb is a polymorphic type of glValue, typeID is evaluated at run time and therefore is ba_rb.foo (); // call foo in B, whose address is sizeof(B) determined at run time; // The sizeof here is determined by the compiler, usually 8 (64 bits)Copy the code

Virtual function memory model

We can use A reference or pointer to base type A to hold an object whose actual type is derived class B. This means that at compile time we cannot determine its actual type from its declared type, and therefore cannot determine which specific virtual function should be called. Considering that each function in the program has a unique address in memory, we can store the address of the specific function as a member variable in the object, so that we can access the member variable at run time to obtain the address of the virtual function of the actual type.

2.1 Single inheritance memory model

Modern C++ compilers use a table-driven object model. Specifically, for each polymorphic type, the addresses of all its virtual functions are stored together as a table, and the offset of each function is the same in both the base and derived types, allowing the offset of the virtual function with respect to the table header address to be determined at compile time. The start address of the virtual function table is stored in each object and is called a virtual (table) pointer (VPTR) or virtual (VFPTR) pointer. This virtual pointer is always at the start address of the object. When a virtual function is called using a reference or pointer of polymorphic type, the address of the virtual function is first calculated from the virtual pointer and offset, and then called.

For example, there are types A and B as follows:

struct A { int ax; Virtual void f0() {} virtual void f1() {}}; struct B : public A { int bx; Void f0() override {}; // f0};Copy the code

Their object model and virtual table model are as follows:

Struct A object A VTable (incomplete) 0 - vptr_A -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 8 - int ax | A: : f0 () | sizeof(A): 16 align: 8 + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | A: : f1 () | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + struct object B 0 - struct A B VTable 0 - vptr_A (incomplete) ------------------------------> +--------------+ 8 - int ax | B::f0() | 12 - int bx +--------------+ sizeof(A): 16 align: 8 | A::f1() | +--------------+Copy the code

Note that since B overwrites the method f0(), its virtual table overwrites A::f0() to B::f0() in the same position. When the f0() function is called, the VTable offset offset0 is A::f0() for an object of actual type A, and B::f0() for an object of actual type B, thus achieving the correct selection of the address of the virtual function at runtime.

A a;
B b;
A &a_ra = a;
A &a_rb = b;
a_ra.f0(); // call (a_ra->vptr_A + offset0) --> A::f0()
a_rb.f0(); // call (a_rb->vptr_A + 0ffset0) --> B::f0()
Copy the code

In the above example, the virtual functions of type B have already been declared in A. If A virtual function of type B does not exist in base A, the new virtual function is appended to the end of the virtual function table, without affecting the overlap with the base class. For example, if B adds the new function f2(), the virtual function table changes as follows:

Struct object B 0 - struct B A VTable (incomplete) 0 - vptr_A -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 8 - int ax | B::f0() | 12 - int bx +--------------+ sizeof(A): 16 align: 8 | A::f1() | +--------------+ | B::f2() | +--------------+Copy the code

For polymorphic types, support for run-time Type Identification (RTTI) information is required in addition to determining the virtual function address at runtime. One obvious solution is to add the address of the type information to the virtual table. To avoid the effect of the length of the virtual table on its position, g++ places it in front of the virtual table as follows:

Struct B B VTable (incomplete) object + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 0 - struct A | RTTI for B | 0 - vptr_A -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > +--------------+ 8 - int ax | B::f0() | 12 - int bx +--------------+ sizeof(A): 16 align: 8 | A::f1() | +--------------+ | B::f2() | +--------------+Copy the code

The virtual table now contains not only function addresses but also RTTI addresses, and many new items will be added later. Each entry in a virtual table is called an entity.

The above solution can handle the case of single-chain inheritance well. In single-chain inheritance, each derived type contains the data of its base type and the virtual functions, which can be arranged in the same virtual table in inheritance order, so only a single virtual pointer is required. And because every derived class contains its immediate base class and has no second immediate base class, its data is also linearly distributed in memory, meaning that the actual type has the same starting address as all of its base types. For example, B inherits A, C inherits B, and their definitions and memory models are as follows:

struct A
{
    int ax;
    virtual void f0() {}
};

struct B : public A
{
    int bx;
    virtual void f1() {}
};

struct C : public B
{
    int cx;
    void f0() override {}
    virtual void f2() {}
};
Copy the code

The memory model is

C VTable (incomplete) struct C + -- -- -- -- -- -- -- -- -- -- -- -- + object | RTTI for C | 0 - struct B + -- -- -- -- -- -- -- > + -- -- -- -- -- -- -- -- -- -- -- -- + 0 - struct A | | C::f0() | 0 - vptr_A -------------------------+ +------------+ 8 - int ax | B::f1() | 12 - int bx +------------+ 16 - int cx | C::f2() | sizeof(C): 24 align: 8 +------------+Copy the code

Can be seen from the above, the use of A reference type A or B holds actual type for C object, it is the starting address of the still point to the starting address of the C, which means that under the condition of single inheritance, dynamic transformation conversion and up down, don’t need to make any changes to this pointer address, only need to “explain” on it.

However, not all derived classes are single-chained, and their starting address is not always the same as the starting address of their base class.

2.2 Multiple inheritance memory model

Suppose that type C inherits two independent base classes, A and B, which are defined as follows:

struct A
{
    int ax;
    virtual void f0() {}
};

struct B
{
    int bx;
    virtual void f1() {}
};

struct C : public A, public B
{
    int cx;
    void f0() override {}
    void f1() override {}
};
Copy the code

Different from single-chain inheritance, since A and B are completely independent, their virtual functions have no sequential relation, that is, f0 and F1 have the same offset to the starting position of the virtual table and cannot be arranged in sequence. And the member variables in A and B are also independent, so there is no inclusion relationship between the base classes. This makes A and B must be in two disjoint regions of C, and two virtual Pointers are required to index their virtual functions. The memory layout is as follows:

C Vtable (7 entities) +--------------------+ struct C | offset_to_top (0) | object +--------------------+ 0 - struct A (primary base) | RTTI for C | 0 - vptr_A -----------------------------> +--------------------+ 8 - int ax | C::f0() | 16  - struct B +--------------------+ 16 - vptr_B ----------------------+ | C::f1() | 24 - int bx | +--------------------+ 28 - int cx | | offset_to_top (-16)| sizeof(C): 32 align: 8 | +--------------------+ | | RTTI for C | +------> +--------------------+ | Thunk C::f1() | +--------------------+Copy the code

In the layout shown above, C uses A as its primary base class, that is, it “merges” its virtual functions into A’s virtual function table, and uses A’s virtual pointer as C’s memory starting address.

The virtual pointer vptr_B of type B cannot directly point to the fourth entity in the virtual table. This is because the vTABLE area vptr_B points to must also be a complete virtual table in format. Therefore, you need to create the corresponding virtual table for vptr_B after the part of virtual table A.

In the figure above, there are two “new” entities, offset_to_top and Thunk.

In multiple inheritance, because different base classes may start at different locations, the offset of the This pointer will be different when they need to be converted to the actual type. Since the actual type is unknown at compile time, this requires that offsets be available at run time. The entity offset_to_top represents the offset from the start address of the actual type to the start address of the current form type. Add the offset to the this pointer to get the address of the actual type when converting dynamically up to the actual type. Note that because a type can be inherited singly or multiple times, the entity offset_to_top will exist in every polymorphic type even if only singly inherited.

And what about the entity Thunk? Regardless of Thunk, the address of the function C::f1() should be stored here. However, Thunk C::f1() and C::f1() have different addresses.

To figure out what Thunk is, we first notice that if a reference to type B holds a variable of actual type C, the reference starts at C+16. When it calls the function f1() overridden by type C, calling C::f1() directly with this pointer causes an error due to an extra 16-byte offset from the address of this pointer. Therefore, before calling, the this pointer must be adjusted to the correct position. Thunk does just that: first adjust the this pointer to the correct position, reducing the 16-byte offset, and then call C::f1().

2.3 Structure and destruction process

During the construction and destruction of polymorphic types, the virtual function called is not the corresponding function of the final actual type, but the corresponding function of the currently created (or not yet destructed) type. Let’s use an example to illustrate this sentence. The two types, A and B, as shown below, call their corresponding virtual functions when constructing and destructing:

struct A { virtual void f0() { std::cout << "A\n"; } A() { this->f0(); } virtual ~A() { this->f0(); }}; struct B : public A { virtual void f0() { std::cout << "B\n"; } B() { this->f0(); } ~B() override { this->f0(); }}; int main() { B b; return 0; } // Output: ABBACopy the code

Run the program, be able to get output “ABBA”, shows that the program in turn calls A: : A (), B: : B (), B: : ~ (), A: B: ~ () A. Intuitively, when A is constructed, the data in B has not yet been created, so the virtual function overwritten by B is of course not available, so the version in A should be called. Conversely, the version in A should also be called, since the function in B is destructed first and is not available after the destructor.

During program execution, this process is achieved by dynamically modifying the virtual pointer of the object.

According to the construction order of inherited classes in C++, the base class A is constructed first. When A is constructed, the object’s own virtual pointer points to the virtual table of A. A::f0() is called because the location of f0() in A’s virtual table holds the address of A::f0(). After the construction of A finishes, the construction of B starts, at which point the virtual pointer is modified to point to the virtual table of B. The destruction process is the opposite.

Virtual inheritance memory model

In the above model, for a derived object, the offset of its base class with respect to it is always determined, so dynamic downcasting does not rely on additional runtime information.

Virtual inheritance destroys this condition. It indicates that the offset of a virtual base class relative to a derived class can vary depending on the actual type, and there is only one copy, which makes the offset of a virtual base class determinable at run time. Therefore, we need to extend the virtual table from the type inherited from the virtual base class to include information about the virtual base class offset.

3.1 Rhombus inherited memory model

A classic diamond virtual inheritance relationship is shown below. To avoid repeating the inclusion of A members, types B and C virtual inherit A, respectively. Type D inherits B and C. Depending on how it is inherited, the offsets of B and C in D can be determined at compile time, while the offsets of A can be determined at run time.

struct A
{
    int ax;
    virtual void f0() {}
    virtual void bar() {}
};

struct B : virtual public A           /****************************/
{                                     /*                          */
    int bx;                           /*             A            */
    void f0() override {}             /*           v/ \v          */
};                                    /*           /   \          */ 
                                      /*          B     C         */
struct C : virtual public A           /*           \   /          */
{                                     /*            \ /           */
    int cx;                           /*             D            */ 
    void f0() override {}             /*                          */
};                                    /****************************/

struct D : public B, public C
{
    int dx;
    void f0() override {}
};
Copy the code

First, the memory model of type A is analyzed. Since virtual inheritance affects subclasses, not superclasses, neither the memory layout of A nor the virtual table changes.

                                                   A VTable
                                                   +------------------+
                                                   | offset_to_top(0) |
struct A                                           +------------------+
 object                                            |    RTTI for A    |
     0 - vptr_A -------------------------------->  +------------------+
     8 - int ax                                    |      A::f0()     |
sizeof(A): 16    align: 8                          +------------------+
                                                   |      A::bar()    |
                                                   +------------------+
Copy the code

There is no essential difference between type B and type C, so only type B is analyzed. The following is the memory model of type B:

B VTable +---------------------+ | vbase_offset(16) | +---------------------+ | offset_to_top(0) | struct B +---------------------+ object | RTTI for B | 0 - vptr_B -------------------------> +---------------------+ 8 - int bx |  B::f0() | 16 - struct A +---------------------+ 16 - vptr_A --------------+ | vcall_offset(0) |x--------+ 24 - int ax |  +---------------------+ | | | vcall_offset(-16) |o----+ | | +---------------------+ | | | | offset_to_top(-16) | | | | +---------------------+ | | | | RTTI for B | | | +--------> +---------------------+ | | | Thunk B::f0() |o----+ | +---------------------+ | | A::bar() |x--------+ +---------------------+Copy the code

For A reference of form type B, the memory offset of its base class A cannot be determined at compile time. Therefore, you need to provide an additional entity in the virtual table that indicates the location of its base class at runtime, called vbase_offset, above offset_to_TOP.

In addition, if functions declared by A and not overridden by B are called from B, they are called using A reference because A’s offset cannot be determined at compile time, and these functions must be called after A’s offset has been determined. Therefore, when an overloaded function is called using A reference to virtual base A, each function may have different adjustments to the offset of this pointer, which are recorded in vcall_offset at the mirror position. For example, when A::bar() is called, the this pointer points to vptr_A, which is the position of the function’s class A and therefore does not need to be adjusted, i.e. vcall_offset(0); B::f0() is implemented by type B, so we need to move the this pointer forward by 16 bytes.

For type D, the virtual table is more complex, but the entities in the virtual table are familiar. The following is the memory model of D:

D VTable +---------------------+ | vbase_offset(32) | +---------------------+ struct D | offset_to_top(0) | object +---------------------+ 0 - struct B (primary base) | RTTI for D | 0 - vptr_B ----------------------> +---------------------+ 8 - int bx | D::f0() | 16 - struct C +---------------------+ 16 - vptr_C ------------------+ | vbase_offset(16) | 24 - int cx | +---------------------+ 28 - int dx | | offset_to_top(-16) | 32 - struct A (virtual base) | +---------------------+ 32 - vptr_A --------------+ | | RTTI for D | 40 - int ax | +---> +---------------------+  sizeof(D): 48 align: 8 | | D::f0() | | +---------------------+ | | vcall_offset(0) |x--------+ | +---------------------+ | | | vcall_offset(-32) |o----+ | | +---------------------+ | | | | offset_to_top(-32) | | | | +---------------------+ | | | |  RTTI for D | | | +--------> +---------------------+ | | | Thunk D::f0() |o----+ | +---------------------+ | | A::bar() |x--------+ +---------------------+Copy the code

3.2 Structure and destruction process

Similar to non-virtual inheritance, when a derived class is constructed and destructed through virtual inheritance, the virtual function that is called is only the corresponding function in the virtual table of the current phase. A problem also arises, because of the difference of the virtual base class of the derived class, the virtual base class relative to the types of the offset can be different, if the method directly using 2.3, directly with the type of their own virtual table inherited virtual base class as building virtual table used in this class, will be because of the different offsets, lead to can’t correctly to obtain the object in the virtual base class.

This description is a bit abstract and difficult to explain by using the diamond inheritance example in 3.1. The inheritance relationship of the four types A, B, C and D is as follows:

struct A
{
    int ax;
    virtual void f0() {}
    virtual void bar() {}
};

struct B : virtual public A           /****************************/
{                                     /*                          */
    int bx;                           /*             A            */
    void f0() override {}             /*           v/ \v          */
};                                    /*           /   \          */
                                      /*          B     C         */
struct C : virtual public A           /*           \   /          */
{                                     /*            \ /           */
    int cx;                           /*             D            */
    virtual void f1() {}              /*                          */
};                                    /****************************/


struct D : public B, public C
{
    int dx;
    void f0() override {}
};
Copy the code

By observing the memory layout of objects of actual type B and of actual type D, it can be found that if the actual type is B, the offset of the first address of virtual base class A to B is 16. If the actual type is D, the offset of address A to address B is 32. This obviously conflicts with B’s own virtual table. If D::B is built with its own virtual table, it will cause errors due to different offsets.

The solution to this problem is actually very crude, that is, in the object construction and destruction stage, as many virtual tables will be used, as many virtual Pointers will be generated. In construction or destruction, “distribute according to need”.

For example, here type D is A subclass of type B and C, and B and C are virtual inheritors of type A. This inheritance relationship causes the virtual tables of B(called B-in-D) and C(called C-in-D) contained in D to be different from the virtual tables of B and C. Therefore, this requires the generation of two new virtual tables, the b-in-D and C-in-D virtual tables.

B-in-d is also a layout of TYPE B. A virtual table of B has two virtual Pointers, vptr_B and vptr_A. Therefore, it also has two virtual Pointers. When D::B is constructed or destructed, the memory layout and virtual table layout of its objects are shown as follows:

B-in-D VTable +---------------------+ | vbase_offset(32) | +---------------------+ struct D (Constructing/Deconstructing  B) | offset_to_top(0) | object +---------------------+ 0 - struct B (primary base) | RTTI for B | 0 - vptr_B -----------------------> +---------------------+ 8 - int bx | B::f0() | 16 - struct C +---------------------+ 16 - vptr_C | vcall_offset(0) |x--------+ 24 - int cx +---------------------+ | 28 - int dx | vcall_offset(-32) |o----+ | 32 - struct A (virtual base) +---------------------+ | | 32 - vptr_A --------------+ | offset_to_top(-32) | | | 40 - int ax  | +---------------------+ | | sizeof(D): 48 align: 8 | | RTTI for B | | | +--------> +---------------------+ | | | Thunk B::f0() |o----+ | +---------------------+ | | A::bar() |x--------+ +---------------------+Copy the code

Similarly, there are two virtual Pointers in C-in-D, vptr_C and vptr_A. In addition, there are three virtual Pointers in the final D, totaling seven different virtual Pointers that point to seven different locations of the three virtual tables. So the compiler generates a total of three different virtual tables for type D, and seven different virtual Pointers. Combine these seven Virtual Pointers into one Table, which is the Virtual Table Table (VTT). Obviously, the compiler creates a VTT for a class only if its parent class is a type that inherits from the virtual base class.

During construction and destruction, the subclass’s constructor or destructor passes an appropriate pointer to a part of the VTT to the base class, causing the parent class’s constructor or destructor to get the correct virtual table.

4 extension

Seeing is better than hearing, and practicing is better than seeing. C++ ‘s runtime polymorphic memory model is a relatively complex problem that is difficult to understand after just one or two looks. The best way to understand this is to dump the in-memory model of the objects in memory, and the structure of the virtual table of the types.

Using the Clang++ compiler, you can export the memory model and virtual table model for the types in main.cpp by using the following command.

clang++ -cc1 -emit-llvm -fdump-record-layouts -fdump-vtable-layouts  main.cpp
Copy the code

Note that the type defines at least one variable, otherwise it will be optimized by the compiler. For example, if there is an inheritance relationship A< -b < -c, you need to define at least one object of type C.

The instructions to export the inheritance structure using g++ are as follows:

g++ -fdump-class-hierarchy -c main.cpp
Copy the code

Since the name of g++ dump is its internal representation, you also need to use c++filt to export the document with some readability.

Cat/g + + export documents | c + + filt waste - n > [has some readability of the output document]Copy the code

You can also use GDB to track changes in memory, registers, virtual functions, Thunk addressing, and this pointer changes.

For g++, it uses the itanium ABI(Application Binary Interface), which can be referenced in the itanium ABI documentation for a more in-depth look at its memory layout. Itanium C++ ABI

For vc++, the memory layout is slightly different. The offset of the virtual base class is indexed by an extra pointer alone, so for vinherited classes, in addition to VFPTR pointing to the virtual function table, it is followed by a pointer to VBPTR pointing to the virtual base class offset table. In addition, vc++ merges a virtual pointer to a null subclass, or a virtual pointer to a derived class that has the same virtual function interface as the base class, with a virtual pointer to the virtual base class. This means that sometimes the first address of an object may be stored in VBPTR instead of VFPTR.

5 concludes

  • The address of the virtual function is determined at run time by the virtual function table indexed by the virtual pointer.
  • The virtual table stores not only the address of the virtual function, but also the address of type RTTI and the offset from the actual type.
  • Calls to virtual functions may involvethisPointer changes are requiredThunkAnd so on;
  • For a derived class of a virtual base class, the offset of the virtual base class is determined by the actual type, so the address of the virtual base class can be determined at run time.
  • During the construction and destruction of polymorphic types, we can call different virtual functions at different stages by modifying the virtual pointer to point to different virtual tables.
  • In the case of virtual inheritance, the virtual table of the same type can be different in different types. Therefore, the virtual table must pass theVTTPass the correct virtual table.