By: Feng Pan, Byte Games China Client Team

The introduction

When Unity3D engine manages managed heap memory, it does not directly use the operating system’s own memory allocation functions such as MALloc and free to directly allocate and release each object, but uses a set of memory allocation GC algorithm to manage the allocation and recycling of all memory objects on managed heap. In the 1.x to 2.x versions of Unity, it is based on the open source C++BoehmGC algorithm, while in the 3.x and later versions, it starts to enable the SGEN algorithm. In the Unity project generated by il2Cpp, BoehmGC algorithm is used.

BoehmGC algorithm idea

BoehmGC is a Mark-sweep algorithm, which is similar to Java GC algorithm in general idea and mainly contains 4 stages:

  • Preparation phase: Each managed heap memory object is created with an associated marker bit that indicates whether the current object is referenced or not. The marker bit is 0 before the marker.

  • Mark phase: Starting from the root memory nodes (static variables, stacks, registers), traverses and scans the managed heap’s memory nodes, marking the referenced memory nodes as 1.

  • Sweep phase: Traverses all nodes, empties memory data of unmarked nodes, and releases data based on certain conditions.

  • Finalization phase: Callback logic that triggers registration such as GC_register_finalizer.

Characteristics of BoehmGC algorithm

BoehmGC is a conservative GC algorithm, and its counterpart is accurate GC. A conservative GC algorithm is one that can’t tell whether a memory value is a pointer or a non-pointer, so it tries to think of all memory as a pointer and therefore can’t use Copying algorithms. An exact GC is a GC that can directly tell whether it is a pointer or a non-pointer. Also, you can use the Copying algorithm to optimize memory usage.

This paper mainly introduces the implementation logic of BoehmGC algorithm on memory allocation, and the next chapter mainly introduces the implementation logic of this algorithm on garbage collection.

Memory Allocation Process

Memory type

First of all, GC_malloc is the API used in BoehmGC to replace malloc() function, and receives a parameter lb, that is, the size of memory to be allocated. GC_malloc_kind(size_t lb, int k) takes two arguments, including the size of memory to allocate, and the type.

void * GC_malloc(size_t lb)
{
    return GC_malloc_kind(lb, NORMAL);
}

void * GC_malloc_kind(size_t lb, int k)
{
    return GC_malloc_kind_global(lb, k);
}
Copy the code

The parameter k is used to distinguish between different memory allocation types. There are three types of k: PTRFREE, NORMAL, and UNCOLLECTABLE. The instructions are as follows:

  • NORMAL: Untyped memory allocation. For GC, the type metadata of the object cannot be obtained. Therefore, GC will scan the memory block according to the way of pointer alignment.

  • PTRFREE: cursorless memory allocation, which explicitly tells the GC that there is no pointer information in the object, that is, there is no need to look for references to other objects in the object during GC. Mono/IL2CPP uses this type of memory allocation for arrays of int, byte, string, etc.

  • UNCOLLECTABLE: Memory allocated by BOEHM to assist in memory management, which does not need to be marked or reclaimed.

Large memory or small memory

First, determine whether the size is larger than 2048 bytes. If it is smaller than 2048 bytes, a small memory allocation process is used to deal with it; otherwise, a large memory allocation process is used.

void * GC_malloc_kind_global(size_t lb, int k)
{
    if (SMALL_OBJ(lb)) {
        // Small memory allocation process. }else {
        // Large memory allocation process}}Copy the code

Small object memory allocation

Size alignment -GRANULE

For small memory allocations, we do not allocate the actual memory size directly, but allocate memory multiples of 16 bytes to ensure that memory is managed in a pointer aligned manner. Therefore, the first step is to figure out the actual size of bytes that need to be allocated. The idea is to come up with a concept for GRANULES, that is, one GRANULE is 16 bytes. Allocating memory according to basic units of aggregates, such as allocating 2 aggregates, allocating 32 bytes. In the allocation process, according to the original size needed, calculate and map to get the actual number of GRANULE needed to be allocated, and the code is as follows:

// Lb is the original allocation size, and LG is that size (1-128).
size_t lg = GC_size_map[lb];
Copy the code

For example, one needs 18 bytes of memory, then LG =2 that means allocating 2 GRANULE (32 bytes), and one GRANULE (16 bytes) that means allocating one GRANULE (16 bytes).

GC_size_map[] is a “size table” that maintains the mapping between LB -> LG. GC_size_map can return up to 128 GRANULE, which is the size limit of small memory (128 * 16 = 2048). The GC_size_map array itself expands lazily. The following shows the correlation between GRANULE and memory size:

Free linked list -ok_freeList

After determining the size of a GRANULE, I first check the ok_freelist linked list for any free memory blocks, and if I find any, I return them to allocate them. The algorithm maintains a data structure, obj_KIND, as follows:

struct obj_kind {
    void **ok_freelist;
    struct hblk民运分子ok_reclaim_list;. } GC_obj_kinds[3];
Copy the code

GC_obj_kinds[3] corresponds to the three memory types mentioned above: PTRFREE, NORMAL, and UNCOLLECTABLE. That is, each type has an obj_kind structure information.

Each obj_KIND structure maintains a linked list of ok_freelist two-dimensional Pointers to free memory blocks. Ok_freelist maintains 128 linked lists, i.e., lists 0 to 127. Where, the size of each memory block in list N is (N+1) * 16, which means that N is equivalent to the number of GRANULE GRANULE before. The data structure for OK_freelist is shown below:

First, according to the calculated GRANULE=N number, find the linked list ok_freelist[N] of the corresponding size. If any exists and the number of elements is greater than 0, retrieve the first memory block address ok_freelist[N][0] and remove it from the linked list. The operation process is shown as follows:

The corresponding logic is as follows:

GC_INNER void * GC_generic_malloc_inner(size_t lb, int k)
{
    struct obj_kind * kind = GC_obj_kinds + k;
    size_t lg = GC_size_map[lb];
    void** opp = & (kind -> ok_freelist[lg]); . lg = GC_size_map[lb];// View the memory block of freelistopp = & (kind -> ok_freelist[lg]); op = *opp;// No memory block was found
    if (0 == op) {
        ...
        // Find or create from the underlying memory block linked list
        op = GC_allocobj(lg, k);
        if (0== op) returnNULL; }}// Remove the found memory block from the free list
    *opp = obj_link(op);
    obj_link(op) = 0; .// Returns the memory block address
    return op;
}
Copy the code

The ok_freelist list is initially empty, that is, the chunk of free memory available for ok_freelist at the beginning. If there is no corresponding free block in ok_freelist, then GC_allocobj(lg, k) is called to look for available memory at the bottom level. The code logic is as follows:

GC_INNER ptr_t GC_allocobj(size_t gran, int kind)
{
    void** flh = & (GC_obj_kinds[kind].ok_freelist[gran]);while (*flh == 0) {
        ENTER_GC();
        GC_collect_a_little_inner(1);
        EXIT_GC();
        if (*flh == 0) {
            GC_new_hblk(gran, kind);
            if (*flh == 0) { ENTER_GC(); . GC_collect_a_little_inner(1); . EXIT_GC(); }}}return (ptr_t)(*flh);
}
Copy the code

The core logic of GC_allocobj is to call GC_new_hblk(GRan, KIND) to obtain memory from the underlying memory pool and check whether there is any free memory block allocated in the underlying memory pool. If there is no free memory block allocated to the underlying memory pool, the system functions such as malloc are used to allocate memory to the underlying memory pool. Just take a piece and return it. The code logic for GC_new_hblk is as follows:

GC_INNER void GC_new_hblk(size_t gran, int kind)
{
    struct hblk *h; /* the new heap block */
    GC_bool clear = GC_obj_kinds[kind].ok_init;

    /* Allocate a new heap block */
    h = GC_allochblk(GRANULES_TO_BYTES(gran), kind, 0);
    if (h == 0) return;

    /* Build the free list */
    GC_obj_kinds[kind].ok_freelist[gran] =
    GC_build_fl(h, GRANULES_TO_WORDS(gran), clear,(ptr_t)GC_obj_kinds[kind].ok_freelist[gran]);
}
Copy the code

It can be found that the main logic of GC_new_hblk has two steps. First, the GC_allochblk method is called to further obtain the available memory blocks in the memory pool. Second, the GC_build_fl method is called to build ok_FREEList from the memory block returned from the memory pool for upper level use.

Linked list of core memory blocks -GC_hblkfreelist

The underlying memory pool implementation logic is similar to ok_freelist, maintaining a pointer linked list of free memory blocks GC_hblkfreelist. Unlike freeList, the basic unit of memory blocks in this list is 4K (4096) bytes, i.e. the size of a page_size.

GC_hblkfreelist has a total of 60 elements, and each element is a linked list. For example, GC_hblkfreelist[N] is a linked list. The size of each memory block in the linked list is a multiple of 4K, but each memory block in the linked list[N] is not necessarily equal in size, as shown below:

Memory block -hBLk, header information -hBLKHdr

First, the basic unit of each memory block in the linked list is 4096 (4KB) memory block. A memory block of 4096 size is called HBLK, and the data definition is as follows:

struct hblk {
    char hb_body[HBLKSIZE]; //HBLKSIZE=4096
};
Copy the code

Each HBLK has a header that describes the state of the memory block. The data is defined as follows:

// Header information
struct hblkhdr {
    struct hblk * hb_next; // Point to the next HBLK
    struct hblk * hb_prev; // point to the previous HBLK
    struct hblk * hb_block; // Corresponding HBLK
    unsigned char hb_obj_kind; / / kink type
    unsigned char hb_flags; / / tag
    word hb_sz; // If used by the upper layer, the actual allocated unit, if free, the size of the memory block
    word hb_descr; 
    size_t hb_n_marks;// Mark the number of bits for GC
    word hb_marks[MARK_BITS_SZ]; // marked as, for GC
}
Copy the code

As each memory block is created, header information is generated and stored in a global queue using a hash algorithm. Header information is the key data structure that determines whether GC can run properly. The access logic for the header will be explained later.

Hb_next indicates the next memory block in hBLkFREEList, hb_block points to the corresponding memory block, hb_obj_KIND indicates the memory block type, and HB_flags is the flag bit used to record some information. Hb_sz represents the size of the memory block itself.

The memory block and header information are shown below:

HBLK memory block lookup

Now that you know the data structure, look at the code for the GC_allochblk method. The logic is as follows:

structh blk *GC_allochblk(size_t sz, int kind, unsigned flags/* IGNORE_OFF_PAGE or 0 */){...//1. Calculate the required memory block size
    blocks_needed = OBJ_SZ_TO_BLOCKS_CHECKED(sz);
    start_list = GC_hblk_fl_from_blocks(blocks_needed);

    //2. Find the exact HBLK memory block
    result = GC_allochblk_nth(sz, kind, flags, start_list, FALSE);
    if (0! = result)returnresult; may_split = TRUE; .if (start_list < UNIQUE_THRESHOLD) {
        ++start_list;
    }
    //3. Search from the linked list of larger memory blocks
    for(; start_list < = split_limit; ++start_list) { result = GC_allochblk_nth(sz, kind, flags, start_list, may_split);if (0! = result)break;
    }
    return result;
}

STATIC int GC_hblk_fl_from_blocks(word blocks_needed)
{
    if(blocks_needed < =32) return blocks_needed;
    if(blocks_needed > =256) return (256-32) /8+32;
    return (blocks_needed-32) /8+32;
}
Copy the code

The steps are as follows:

  1. For example, if 16 bytes need to be allocated according to GC_size_map, sz=16 and OBJ_SZ_TO_BLOCKS_CHECKED are calculated as follows:

    OBJ_SZ_TO_BLOCKS_CHECKED(lb) = divHBLKSZ(lb + HBLKSIZE - 1)
                             = divHBLKSZ(lb + 4096 - 1) / 4096
    Copy the code

    The result is the number of HBLK memory blocks that need to be allocated, since the GC_hblkfreelist list is 4096 in size as its base unit. For example, less than 4096 bytes, the result is 1. For small objects (less than 2048 bytes), the number of memory blocks is 1.

  2. GC_hblk_fl_from_blocks is a key method to determine which GC_hblkfreelist to look up based on the actual number of memory blocks (blocks_needed). Start_list is the list index to start the search, starting with GC_hblkfreelist[start_list]. GC_hblkfreelist[blocks] = GC_hblkfreelist[blocks] = GC_hblkfreelist

    1. If blocks_needed is smaller than 32, startList =blocks_needed, go to GC_hblkfreelist[blocks_needed].

    2. If blocks_needed ranges from 32 to 256, startList =(blocks_needed-32)/8+32. If blocks_needed increases by 8, the index of GC_hblkfreelist[index] increases by 1.

    3. If blocks_needed is greater than 256, it is all looked up from the GC_hblkfreelist[60] linked list.

    The reason for this rule is that there are a limited number of linked lists, only 60 in all. Similarly, memory blocks of the corresponding size are added to the GC_hblk_fl_from_blocks linked list of the corresponding level when they are created. The structure of the linked list is shown below:

    The number on the left is the level of the linked list. It can be seen that the size of memory blocks stored varies from GC_hblkfreelist[32]. For example, the size of memory blocks stored by GC_hblkfreelist[32] ranges from 128KB to 156KB.

  3. After deciding which list to start the search from, an exact search is performed first, and if found directly, the found memory block is returned directly.

  4. If an accurate lookup fails, the start_list is incrementally increased to look up from a larger linked list of memory blocks.

    GC_allochblk_nth method is the core memory block lookup method, the code is more complex, the main logic is as follows:

    STATIC struct hblk *GC_allochblk_nth(size_t sz, int kind, unsigned flags, int n, int may_split)
    {
        struct hblk *hbp;
        hdr * hhdr;
        struct hblk *thishbp;
        hdr * thishdr;/* Header corr. to thishbp */
        // Calculate the size of memory blocks to be allocated
        signed_word size_needed = HBLKSIZE * OBJ_SZ_TO_BLOCKS_CHECKED(sz);
    
        // Find the appropriate memory block from the linked list
        for (hbp = GC_hblkfreelist[n];; hbp = hhdr -> hb_next) {
            signed_word size_avail;
            if (NULL == hbp) return NULL;
            // Get the header information of the memory block
            GET_HDR(hbp, hhdr);
            // Memory block sizesize_avail = (signed_word)hhdr-> hb_sz;if (size_avail < size_needed) continue;
            // The available memory is larger than the required allocation
            if(size_avail ! = size_needed) {// Exit the loop and return null
                if(! may_split)continue; .if( size_avail > = size_needed ) { ...// Split memory blocks and modify linked lists
                    hbp = GC_get_first_part(hbp, hhdr, size_needed, n);
                    break; }}}if (0 == hbp) return 0; .// Modify header information
        setup_header(hhdr, hbp, sz, kind, flags)
        ...
        return hbp;
    }
    Copy the code

    The main logic is divided into the following steps:

    1. The input parameter sz is the size that the upper layer needs to allocate, for example 16 bytes, and king is the memory type. For example, NORMAL, n is the index of the linked list GC_hblkfreelist, that is, the index of the memory block calculated previously. May_split indicates whether splitting is allowed. If splitting is not allowed, it is not allowed to search for larger memory blocks.

    2. Calculate the memory to be allocated. For example, if sz is 16 bytes, size_needed=4096.

    3. Sz =16, start_list =1, size_needed=4096, sz=16, start_list =1, size_needed=4096 The linked list of GC_hblkfreelist[1] is traversed, and the size of the memory block in the linked list is 4096. If yes, stop traversal.

    4. If there is no free 4KB memory block in the linked list of GC_hblkfreelist[1], search from the larger linked list step by step. If there is, stop traversal, and split the large memory block into two parts, and take away the desired memory block in the first half. For example, if an 8KB memory block is found, split the memory block, as shown in the following figure:Find an 8KB memory block and first remove it from the 8KB linked list. Since only a 4KB memory block is needed, the 8KB memory block is divided into two parts, and the first part is used for return. By changing the flag ~FREE_BLK, the state is marked as non-idle. The last part is newly generated, and the corresponding header information will be generated and added to the linked list of 4KB memory blocks, namely GC_hblkfreelist[1]. The code logic is as follows:

      STATIC struct hblk *GC_get_first_part(struct hblk *h, hdr *hhdr, size_t bytes, int index) {
          word total_size = hhdr -> hb_sz;
          struct hblk * rest;
          hdr * rest_hdr;
          // Delete from the free list
          GC_remove_from_fl_at(hhdr, index);
          if (total_size == bytes) return h;
          // The second half
          rest = (struct hblk *)((word)h + bytes);
          // Generate header information
          rest_hdr = GC_install_header(rest);
          // Memory block size
          rest_hdr -> hb_sz = total_size - bytes;
          rest_hdr -> hb_flags = 0; .// Add the corresponding free linked list
          GC_add_to_fl(rest, rest_hdr);
      }
      Copy the code
    5. Modify the header information corresponding to the memory block. Reset the header information for subsequent judgment. For example, if the hb_sz field is changed to 16 bytes, the HB_sz field is changed to 16 bytes, the HB_block field is set to the corresponding memory block address, and the HB_marks and HB_N_marks fields are cleared for subsequent GC tags.

      static GC_bool setup_header(hdr * hhdr, struct hblk *block, size_t byte_sz,
      int kind, unsigned flags) {
          hhdr -> hb_sz = byte_sz;
          hhdr -> hb_obj_kind = (unsignedchar)kind;
          hhdr -> hb_flags = (unsignedchar)flags;
          hhdr -> hb_block = block;
          descr = GC_obj_kinds[kind].ok_descriptor;
          if (GC_obj_kinds[kind].ok_relocate_descr) descr += byte_sz;
          hhdr -> hb_descr = descr; . GC_clear_hdr_marks(hhdr); hhdr -> hb_last_reclaimed = (unsignedshort)GC_gc_no;
          return(TRUE);
      }
      Copy the code

Memory block allocation

If the GC_hblkfreelist freelist does not find a suitable block of memory, consider cutting out a new segment of memory from the system and adding it to the GC_hblkfreelist list. GC_expand_hp_inner

GC_INNER GC_bool GC_expand_hp_inner(word n)
{...// Call the system to open up memory
    space = GET_MEM(bytes);
    // Record memory address and size
    GC_add_to_our_memory((ptr_t)space, bytes); .// Add to GC_hblkfreelist
    GC_add_to_heap(space, bytes); . }Copy the code
  1. GET_MEM is a macro. The interface used to allocate memory varies depending on the operating system. This method calls the corresponding system interface, such as malloc, to allocate memory.

  2. The GC_add_to_our_memory method maintains the allocated memory in a global array through which information about the allocated memory blocks can be queried.

    # define GC_our_memory GC_arrays._our_memory
    Copy the code
  3. The GC_add_to_heap method adds the created memory block to the corresponding GC_hblkfreelist list. It also adds a global array of heap memory information.

    GC_INNER void GC_add_to_heap(struct hblk *p, size_t bytes)
    {...// Generate header information
        phdr = GC_install_header(p);
        // Heap memory information record
        GC_heap_sects[GC_n_heap_sects].hs_start = (ptr_t)p;
        GC_heap_sects[GC_n_heap_sects].hs_bytes = bytes;
        GC_n_heap_sects++;
        // Update hb_sz and HB_flags
        phdr -> hb_sz = bytes;
        phdr -> hb_flags = 0;
        // Add to the list
        GC_freehblk(p); .// Update the range of heap memory addresses for subsequent GC
        GC_collect_at_heapsize += bytes;
        GC_least_plausible_heap_addr = (void((*)ptr_t)p - sizeof(word));
        GC_greatest_plausible_heap_addr = (void *)endp;
    }
    Copy the code

    The GC_freehblk method has some special logic. If the GC_freehblk method finds whether the contiguous memory blocks exist and are free, the GC_freehblk method combines the contiguous memory blocks to generate a larger memory block.

    GC_INNER void GC_freehblk(struct hblk *hbp)
    {
        struct hblk *next, *prev;
        hdr *hhdr, *prevhdr, *nexthdr;
        word size;
    
        GET_HDR(hbp, hhdr);
        size = HBLKSIZE * OBJ_SZ_TO_BLOCKS(hhdr-> hb_sz); hhdr-> hb_sz = size; hhdr -> hb_flags |= FREE_BLK;
        next = (struct hblk *)((ptr_t)hbp + size);
        GET_HDR(next, nexthdr);
        prev = GC_free_block_ending_at(hbp);
        //mergenext
        if (0! = nexthdr & & HBLK_IS_FREE(nexthdr)) { GC_remove_from_fl(nexthdr); hhdr -> hb_sz += nexthdr -> hb_sz;
            GC_remove_header(next); } // Merge previf (0! = prev) { prevhdr = HDR(prev); GC_remove_from_fl(prevhdr); prevhdr -> hb_sz += hhdr -> hb_sz; GC_remove_header(hbp); hbp = prev; hhdr = prevhdr; }... GC_add_to_fl(HBP, HHDR); }Copy the code

    Where the GC_add_to_fl method is responsible for adding the newly allocated memory block to the corresponding free linked list GC_hblkfreelist, the code is as follows:

    STATIC void GC_add_to_fl(struct hblk *h, hdr *hhdr)
    {
        intindex = GC_hblk_fl_from_blocks(divHBLKSZ(hhdr -> hb_sz)); . GC_hblkfreelist[index] = h; . }Copy the code

    The GC_hblk_fl_from_blocks method, as described above, determines which GC_hblkfreelist list to manage the created memory block.

HBLK memory block ->ok_freeList

The appropriate block of memory is obtained and returned to the upper layer, which is used to build the upper layer free linked list OK_FREEList. Call the GC_build_fl method in GC_new_hblk to build the linked list.

/ / build ok_freelist [gran]
GC_obj_kinds[kind].ok_freelist[gran] = GC_build_fl(h, GRANULES_TO_WORDS(gran), clear,(ptr_t)GC_obj_kinds[kind].ok_freelist[gran]);

GC_INNER ptr_t GC_build_fl(struct hblk *h, size_t sz, GC_bool clear,
ptr_t list) {
    word *p, *prev;
    word *last_object;/* points to last object in new hblk*/.// Build the linked list
    p = (word *)(h -> hb_body) + sz;/* second object in *h*/
    prev = (word *)(h -> hb_body);/* One object behind p*/
    last_object = (word *)((char *)h + HBLKSIZE);
    last_object -= sz;
    while((word)p < = (word)last_object) {/* current object's link points to last object */
        obj_link(p) = (ptr_t)prev;
        prev = p;
        p += sz;
    }
    p -= sz;

    // Concatenate the previous list* (ptr_t *)h = list;
    // return the entry address
    return ((ptr_t)p);
}
Copy the code

The split process for building OK_FREEList looks like this:

For example, if a 4096-byte memory block is divided into 16-byte units, the steps are as follows:

  1. 4096 bytes are divided into 256 memory blocks based on 16 bytes. The number of memory blocks is 0 to 255. The last memory block (255) is used as the first node of the new linked list.

  2. The memory address is traversed forward, creating a linked list where the next node of 255 is 254 and the last node is 0.

  3. The next node of the tail node points to the first address of the original list.

  4. Take the address of the first node of the new list as ok_freelist[N], and N is the GRANULE mentioned above, for example, 16 bytes corresponds to 1.

Rebuild the freeList and make the first node available for upper level use.

Large object memory allocation

Instead of allocating small memory objects (2048 bytes or less), allocating large memory objects means allocating more than 2048 bytes of memory. Return to GC_generic_malloc:

GC_API GC_ATTR_MALLOC void * GC_CALLGC_generic_malloc(size_t lb, int k)
{...if (SMALL_OBJ(lb)) {
    }
    else {
        lg = ROUNDED_UP_GRANULES(lb);
        lb_rounded = GRANULES_TO_BYTES(lg);
        // Calculate the number of HBLK memory blocks
        n_blocks = OBJ_SZ_TO_BLOCKS(lb_rounded);
        result = (ptr_t)GC_alloc_large(lb_rounded, k, 0); . } retrun op; }Copy the code

OBJ_SZ_TO_BLOCKS is used to calculate the number of HBLK memory blocks required. For large memory blocks, the number of HBLK memory blocks required is greater than or equal to 1. For example, if 9000 bytes of memory need to be allocated, three HBLK memory blocks are required, and then GC_alloc_large is called to allocate memory.

The GC_alloc_large method and comments are as follows:

GC_INNER ptr_t GC_alloc_large(size_t lb, int k, unsigned flags)
{
    struct hblk * h;
    word n_blocks;
    ptr_tresult; . n_blocks =OBJ_SZ_TO_BLOCKS_CHECKED(lb); .// Allocate memory
    h = GC_allochblk(lb, k, flags); .// Failed to allocate the memory block
    while (0== h & &GC_collect_or_expand(n_blocks, flags ! =0, retry)) {
        h = GC_allochblk(lb, k, flags);
        retry = TRUE;
    }
    // Record the large memory creation size
    size_ttotal_bytes = n_blocks * HBLKSIZE; . GC_large_allocd_bytes += total_bytes; . result = h -> hb_body;// Returns the memory address
    return result;
}
Copy the code

Lb is the size of memory that needs to be allocated and then goes back to the GC_allochblk core method call.

Find free memory fast

Look again at the implementation of the GC_allochblk method.

struct hblk * GC_allochblk(size_t sz, int kind, unsigned flags)
{
    word blocks;
    int start_list;
    struct hblk *result;
    intmay_split; .// Calculate the number of HBLK memory blocks required
    blocks = OBJ_SZ_TO_BLOCKS_CHECKED(sz); .// Number of processes
    start_list = GC_hblk_fl_from_blocks(blocks);
    // Exact allocation
    result = GC_allochblk_nth(sz, kind, flags, start_list, FALSE);
    if (0! = result)returnresult; .// Search from larger memory
    for(; start_list < = split_limit; ++start_list) { result =GC_allochblk_nth(sz, kind, flags, start_list, may_split);
        if (0! = result)break;
    }
    return result;
}
Copy the code

GC_hblk_fl_from_blocks = GC_hblk_fl_from_blocks = start_list = GC_hblkfreelist[start_list] = GC_hblkfreelist Look it up in a larger linked list.

The logic of GC_allochblk_nth is the same as that of a small memory lookup, except that once a large memory block is allocated, it is not further used to build an ok_freeList linked list, but instead returns the address of the large memory block directly.

Overall process of memory allocation

In summary, the flow chart of memory allocation is as follows:

Header Information storage logic

As mentioned above, for each memory block from GC_hblkfreelist, there is a corresponding Header data that describes the information about the memory block, and this Header information is also key to the logical implementation of the GC algorithm. After allocating the memory block and the Header, the SET_HDR method stores the Header information. The corresponding GET_HDR method finds the Header information.

A two-dimensional structure

Header information is stored in a global two-level array. The data structure of the global information is as follows:

#define GC_top_index GC_arrays._top_index

#ifndef HASH_TL
    # define LOG_TOP_SZ (WORDSZ - LOG_BOTTOM_SZ - LOG_HBLKSIZE)
#else
    # define LOG_TOP_SZ 11
#endif

struct GC_arrays {. bottom_index *_top_index[TOP_SZ]; };typedef struct bi {
    hdr * index[BOTTOM_SZ];
    structbi * asc_link; /* All indices are linked in*/
    /* ascending order... * /
    structbi * desc_link;/* ... and in descending order. */
    wordkey; /* high order address bits. */
    # ifdef HASH_TL
        structbi * hash_link;/* Hash chain link. */
    # endif
} bottom_index;
Copy the code

The global GC_arrays object maintains the array _top_index, which contains a set of bottom_INDEX Pointers. Each bottom_INDEX maintains an array index containing each Header data.

Calculation rules

For a memory block address P, the hash algorithm is used to calculate the location for storing the header. The process is as follows:

  1. Given a 64-bit memory block address P, divide it into three segments, and use the information of each segment to determine its location. As shown in the figure:The memory address is 42 bits high and 22 to 63 bits high, respectively. It is used to calculate the index of the top_index array.

    After bottom_index is determined, the memory address is 12 to 22 digits, which is used to calculate the index array in Bottom_index.

    Memory addresses 0 to 12 are used to calculate the index of the memory address in the memory block.

    In addition, the algorithm set a macro HASH_TL, turn this macro on and off, the algorithm logic is slightly different, take HASH_TL macro for example, the logic is as follows:

    #ifndef HASH_TL
        # define LOG_TOP_SZ (WORDSZ - LOG_BOTTOM_SZ - LOG_HBLKSIZE)
    #else
        # define LOG_TOP_SZ 11
    #defineTOP_SZ (1 < < LOG_TOP_SZ)
    
    #defineHDR_FROM_BI(bi, p) ((bi)-> index[((word)(p) >> LOG_HBLKSIZE) & (BOTTOM_SZ - 1)])
    
    # define TL_HASH(hi) ((hi) & (TOP_SZ - 1))
    
    # define GET_BI(p, bottom_indx) \
    do{ \ REGISTER word hi = (word)(p) > > (LOG_BOTTOM_SZ + LOG_HBLKSIZE); \ REGISTER bottom_index * _bi = GC_top_index[TL_HASH(hi)]; \while(_bi -> key ! = hi & & _bi ! = GC_all_nils) \ _bi = _bi -> hash_link; \ (bottom_indx) = _bi; The \}while (0)
    
    # define GET_HDR_ADDR(p, ha) \
        do{ \ REGISTER bottom_index * bi; \ GET_BI(p, bi); \ (ha) = & HDR_FROM_BI(bi, p); The \}while (0)
    
    # define GET_HDR(p, hhdr) \
        do{ \ REGISTER hdr ** _ha; \ GET_HDR_ADDR(p, _ha); \ (hhdr) = *_ha; The \}while (0)
    
    # define SET_HDR(p, hhdr) \
        do{ \ REGISTER hdr ** _ha; \ GET_HDR_ADDR(p, _ha); \ *_ha = (hhdr); The \}while (0)
    Copy the code
    1. Where GET_BI (p, bi) is used to calculate the subscript in the topIndex array, that is, the corresponding bottom_indx object is calculated. Combining with the figure above, it can be seen that the value is shifted 22 bits to the right by (p) >> (LOG_BOTTOM_SZ + LOG_HBLKSIZE). To obtain the high 42 bits, use TL_HASH(hi) to take 11 bits of the hash value to locate GC_top_index[hash]. Since the middle 11 bits of different memory addresses may be the same, maintain a hash list and use GC_top_index[hash] as the entry point of the list. Bi ->key, which stores the 42 bits of information before TL_HASH. Once matched, locate the specific bottom_indx information.

    2. Call the HDR_FROM_BI(bi, p) method, use the memory address with 12 to 22 digits as the hash value, and locate bi->index[hash] as the address for storing header information.

    3. It also provides two methods,

      # define HBLKSIZE 4096
      # define HBLKPTR(objptr) ((struct hblk *)(((word)(objptr)) & ~(word)(HBLKSIZE-1)))
      # define HBLKDISPL(objptr) (((size_t) (objptr)) & (HBLKSIZE-1))
      Copy the code
      1. HBLKPTR is used to return the first address of the HBLK memory block corresponding to the current object memory address, because according to the above, the basic unit of memory allocation is HBLK, the size is 4096 bytes, that is, the allocated memory must be a multiple of 4096. Therefore, the bit and operation with “~(hblksize-1)” can obtain the HBLK address that the memory object belongs to. The address of object P is located in HBLK.

      2. HBLKDISPL is used to return the offset of the current object’s memory address in the HBLK memory block. Since the valid object memory address must be in an HBLK, the memory size of an HBLK is 0 to 406, 12 bits, so the bit and operation with “hBLksize-1” can get the offset.

Take 16-byte small memory objects as an example. If a 4KB HBLK is deleted, 256 such objects can be allocated. Then the memory address of object N is the first address of the HBLK +16 x N.

Auxiliary list

In addition, a one-dimensional bidirectional linked list is maintained to assist in the management of all bottom_index data. When allocating memory blocks and generating header information, a new Bottom_index node is inserted into an appropriate position in the linked list. The linked list is ordered, and the value of the bottom_index node increases according to the value of the bottom_index->key field, which stores the first 42 bits of the address. Therefore, the larger the memory address is, the further the position in the linked list is. As shown in the figure:

Insert a new Bottom_index node into the list as follows:

static GC_bool get_index(word addr) { word hi = (word)(addr) > > (LOG_BOTTOM_SZ + LOG_HBLKSIZE); bottom_index * r; bottom_index * p; bottom_index ** prev; bottom_index *pi; /* old_p */ word i; # ifdef HASH_TL i = TL_HASH(hi); pi = p = GC_top_index[i]; while(p ! = GC_all_nils) { if (p -> key == hi) return(TRUE); p = p -> hash_link; } # else if (GC_top_index[hi] ! = GC_all_nils) return TRUE; i = hi; R = (bottom_index *)GC_scratch_alloc(sizeof(bottom_index)); if (EXPECT(NULL == r, FALSE)) return FALSE; // initialize the empty data BZERO(r, sizeof(bottom_index)); r -> key = hi; # ifdef HASH_TL //r placed in the first position of GC_top_index[I] r - >; hash_link = pi; # endif prev = & GC_all_bottom_indices; pi = 0; P while ((p = *prev)! = 0 & & p -> key < hi) { pi = p; prev = & (p -> asc_link); } // insert r into the list r - >; desc_link = pi; if (0 == p) { GC_all_bottom_indices_end = r; } else { p -> desc_link = r; } r -> asc_link = p; *prev = r; GC_top_index[I] = r; GC_top_index[I] = r; return(TRUE); }Copy the code

The cache structure

A cache structure is maintained to store a memory block and its corresponding header information. The data structure is as follows:

typedef struct hce {
    word block_addr; // Memory block address
    hdr * hce_hdr; / / the header address
} hdr_cache_entry;
Copy the code

When searching for header information, you can also increase efficiency by searching the cache first. Encapsulation using the macro HC_GET_HDR.

#define HC_GET_HDR(p, hhdr, source) \
{ /* cannot use do-while(0) here */ \
    hdr_cache_entry * hce = HCE(p); \
    if (EXPECT(HCE_VALID_FOR(hce, p), TRUE)) { \ HC_HIT(); \ hhdr = hce -> hce_hdr; The \}else { \
        hhdr = HEADER_CACHE_MISS(p, hce, source); \
    if (NULL == hhdr) break; /* go to the enclosing loop end */\} \}Copy the code

If yes, return to GC_header_cache_miss. If no, use GC_header_cache_miss. The GC_header_cache_miss method looks for header information using the HDR(p) method.

conclusion

This article roughly analyzes the implementation ideas of BoehmGC algorithm in Unity on object memory allocation, and the next article will learn and analyze the implementation of GC. Many of these details need to be further studied and explored. In addition, I hope we can correct the misunderstanding in the article.