By: Feng Pan, Byte Games China Client Team

The introduction

In the previous Unity3D managed heap BoehmGC algorithm learning-Memory allocation, the relevant mechanism of GC memory allocation is discussed. This paper mainly analyzes the specific implementation mechanism of garbage collection in BoehmGC algorithm.

Description of the process

First, BoehmGC algorithm provides an entry API for garbage collection, GC_gcollect, which calls GC_try_to_collect_general. GC_try_to_collect_general then calls GC_try_to_collect_inner as follows:

void GC_CALL GC_gcollect(void){(void)GC_try_to_collect_general(0, FALSE); . } GC_boolGC_try_to_collect_general(GC_stop_func stop_func, GC_bool force_unmap GC_ATTR_UNUSED){... ENTER_GC();// Call GC_try_to_collect_inner to start GCresult = GC_try_to_collect_inner(stop_func ! =0? stop_func:GC_default_stop_func); EXIT_GC(); . }Copy the code

The GC_try_to_collect_inner method starts the actual GC operation. Here is the code and the main steps:

GC_INNER GC_bool GC_try_to_collect_inner(GC_stop_func stop_func){...// Control stop
    if (GC_dont_gc || (*stop_func)()) return FALSE;
    // Throws an event
    if(GC_on_collection_event) GC_on_collection_event(GC_EVENT_START); .// Clear the flag bitGC_clear_marks(); .// Start tracing memory nodes, marking the "reference" flag bit
    if(! GC_stopped_mark(stop_func)) { ... }// The closing flag begins to reclaim memory nodes that are not referencedGC_finish_collection(); .// Throws an event
    if (GC_on_collection_event) GC_on_collection_event(GC_EVENT_END);
}
Copy the code

First, explain the concept of an “object memory node,” the block of object memory allocated to the upper layer described earlier. For example, an object node is allocated 16 bytes, and an HBLK memory block is 4096 bytes in size, which can be equally divided into 256 object nodes for upper layer use. For small memory (less than 2048 bytes), memory nodes are fetched from ok_freelist, aligned to 16-byte memory, and there are 128 size classifications. For large memory (greater than 2048 bytes), fetch it directly from hBLkFreelist. The size of the object memory node is recorded in the HB_sZ field of the Header data structure.

GC is divided into two phases:

  • The first stage: the Mark stage, in which the Mark bits of the memory nodes are reset, and then the memory space is scanned starting from the root of the memory, marking the referenced memory nodes on the managed heap as “references”.

  • The second stage: Reclaim (Reclaim) stage, this stage traverses all memory nodes on the managed heap, and then Reclaim the memory nodes that are not marked as “references”.

In addition, the GC provides an event mechanism for notifying the outside world of the GC phase and status. The state is defined as follows:

typedef enum {
    GC_EVENT_START, //开始GC
    GC_EVENT_MARK_START, // Start the tag
    GC_EVENT_MARK_END,  // End tag
    GC_EVENT_RECLAIM_START, // Start recycling
    GC_EVENT_RECLAIM_END, // The collection is complete
    GC_EVENT_END / / end of GC. }Copy the code

The external world can set the event callback method as follows:

void GC_CALL GC_set_on_collection_event(GC_on_collection_event_proc fn){... GC_on_collection_event = fn; . }Copy the code

During the GC process, the GC_on_collection_event method is called for notifications at various stages.

Managed heap memory layout

Before we get into the garbage collection process, let’s look at the layout and structure of managed heap memory.

As mentioned above, HBLK memory block is the basic unit for Boehm algorithm to allocate memory in managed heap. All object nodes are allocated on the basis of HBLK regardless of large or small memory. When an HBLK block is allocated, the corresponding header information is created. The header information is added to the global array _top_index:

struct GC_arrays {. bottom_index *_top_index[TOP_SZ]; };typedef struct bi {
    hdr * index[BOTTOM_SZ]; // The data structure that stores Header information
    structbi * asc_link; // in ascending order
    structbi * desc_link; // in descending order
    word key; //key, the first 42 bits high address
    # ifdef HASH_TL
        structbi * hash_link;
    # endif
} bottom_index;
Copy the code

The bottomIdex structure is an element of the array, which stores specific header information. The logical steps for storing header information are as follows:

  1. Based on the address of the HBLK memory block corresponding to the header, a hash value is obtained. The hash value is the index subscript of _top_index. If the calculated hash values are the same, then the 0 to 42 bit values are further compared. The header is considered to have found the corresponding bottomIndex block, and the corresponding bottomIndex block is returned directly.

  2. If not, create a new bottomIndex block and insert it into the second-level linked list. The reference relationship of all bottomIndex blocks is shown in the following figure. Blocks with the same hash value are maintained using a hash_link linked list.

  3. At the same time, a secondary ordered linked list is built to store all the bottomIndex blocks. The linked list is one-dimensional, and the hash value is calculated according to certain rules, and the hash value is constructed in order. This list converts the two-dimensional structure of the global array _top_index into a one-dimensional ordered list for subsequent traversal. The diagram below:

  4. After determining the bottomIndex block, calculate the hash value based on the 42 to 64 bits of the HBLK address as the subscript of the index array in bottomIndex, and store the header information in the corresponding storage location.

To sum up, traversing the linked list can find the address and header information of all HBLK memory blocks, further traversing all memory nodes in HBLK memory blocks, traversing all HBLK memory blocks in managed heap logic is as follows:

void GC_apply_to_all_blocks(void (*fn)(struct hblk *h, word client_data),
              word client_data)
{
  signed_word j;
  bottom_index * index_p;
  for(index_p = GC_all_bottom_indices; index_p ! =0;
     index_p = index_p -> asc_link) {
    for (j = BOTTOM_SZ-1; j > =0;) {
      if(! IS_FORWARDING_ADDR_OR_NIL(index_p->index[j])) {
        if(! HBLK_IS_FREE(index_p->index[j])) {
          (*fn)((struct hblk *)(((index_p-> key < < LOG_BOTTOM_SZ) + (word)j)
                < < LOG_HBLKSIZE))
        }
        j--;
       } else if (index_p->index[j] == 0) {
        j--;
       } else {
        j -= (signed_word)(index_p->index[j]); }}}}Copy the code

GC_all_bottom_indices > GC_all_bottom_indices > GC_all_bottom_indices > GC_all_bottom_indices > GC_all_bottom_indices > GC_all_bottom_indices > GC_all_bottom_indices > GC_all_bottom_indices > GC_all_bottom_indices > GC_all_bottom_indices The HBLK address corresponding to the header is returned to the upper layer for further processing. Once you understand the core structure and logic, you can analyze the subsequent operations.

Mark

The purpose of the Mark phase is to Mark objects on managed heap memory as “references” or “non-references” for subsequent reclamation phases.

Flag bit reset

First, header information records whether all object memory nodes in the corresponding memory block are referenced. The structure is as follows:

// Header information
struct hblkhdr {.size_t hb_n_marks;// Mark the number of bits for GC
    word hb_marks[MARK_BITS_SZ]; // marked as, for GC
}
Copy the code

Where HB_n_marks represents the number of “referenced” objects in the HBLK memory block, the HB_marks array maintains reference marker bits for all objects, with 0 and 1 for “referenced” and “unreferenced,” and MARK_BITS_SZ is a large enough number. For example, if the 16th object node is referenced, hb_marks[15] has the value 1, and if the 17th object node is not referenced, hb_marks[16] has the value 0.

The GC_clear_marks method is first called to reset all the tag bits. GC_apply_to_all_blocks iterates over the addresses of all HBLK memory blocks in the managed heap and finds the corresponding headers as described above. The GC_clear_hdr_marks method is responsible for zeroing out the bit and number of tokens in the header for memory nodes.

GC_INNER void GC_clear_marks(void)
{
    GC_apply_to_all_blocks(clear_marks_for_block, (word)0); . }static void clear_marks_for_block(struct hblk *h, word dummy GC_ATTR_UNUSED){ hdr * hhdr = HDR(h); . GC_clear_hdr_marks(hhdr); } GC_INNERvoid GC_clear_hdr_marks(hdr *hhdr){ BZERO(hhdr -> hb_marks, sizeof(hhdr-> hb_marks)); . hhdr -> hb_n_marks =0;
}
Copy the code

Mark main process

After resetting the tag information, the process of scanning the tag is officially started. The GC_stopped_mark method implements the main logic:

  1. Get ready, pause the thread,

  2. The scan starts from the root node, including stack memory, register variables, static data area memory traversal, determine whether the memory space pointed to by the root node memory address is located in a valid HBLK memory block. Mark up these valid memory Spaces and store them in a temporary array.

  3. The memory space of all objects in the temporary array is further scanned to determine whether the memory space it points to is in a valid HBLK memory block and marked. Until the entire iteration is complete.

  4. End the marking process and resume the thread.

The main logic of the GC_stopped_mark method is as follows:

STATIC GC_bool GC_stopped_mark(GC_stop_func stop_func)
{
    // Event notification to suspend the thread
    if(GC_on_collection_event) GC_on_collection_event(GC_EVENT_POST_STOP_WORLD); .// Suspend the thread
    STOP_WORLD();
    // Event notification to suspend the thread
    if(GC_on_collection_event) GC_on_collection_event(GC_EVENT_POST_STOP_WORLD); .// Event notification, start scanning
    if(GC_on_collection_event) GC_on_collection_event(GC_EVENT_MARK_START); .// Determine the gc initialization statusGC_initiate_gc(); .// Start from the root
    if (GC_mark_some(GC_approx_sp())) break; .// Event notification, end scan
    if(GC_on_collection_event) GC_on_collection_event(GC_EVENT_MARK_END); .// event notification to start thread recovery
    if(GC_on_collection_event) GC_on_collection_event(GC_EVENT_PRE_START_WORLD); .// Restore the threadSTART_WORLD(); .// event notification to resume thread completion
    if(GC_on_collection_event) GC_on_collection_event(GC_EVENT_POST_START_WORLD); . }Copy the code

The implementation logic of the sub-phase is described in detail below:

Thread to suspend

To accurately capture and track the transient state of managed heap memory, the current thread needs to be suspended first.

GCThread

First, GC encapsulates a GC_THREAD structure for each thread, which stores thread-related information, including the current thread ID, status, stack (bottom pointer, top pointer) information, structure is as follows:

 typedef struct GC_Thread_Rep {.struct GC_Thread_Rep * next;
     pthread_t id; / / thread id
     struct thread_stop_info stop_info; // Thread pause information.ptr_tstack_end; word stack_size; . } * GC_thread;struct thread_stop_info {.ptr_tstack_ptr; . }Copy the code

The bottom of the stack pointer is retrieved when each thread is created, and the current top of the stack pointer is saved into the GC_THREAD structure when the thread processes the SUSPEND signal. Stop_info records information about the pause time of a thread. The global array GC_threads is used to maintain all thread information.

suspendAll

Thread pause macro definition is STOP_WORLD, method implementation is GC_stop_world, internal call GC_suspend_all to suspend all threads (except the current thread), the main logic is as follows

GC_INNER void GC_stop_world(void){... (void)GC_suspend_all(); . } STATIC intGC_suspend_all(void)
{
    int n_live_threads = 0;
    int i;
    pthread_t self = pthread_self();
    // Iterate over all thread information
    for (i = 0; i < THREAD_TABLE_SZ; i++) {
        for(p = GC_threads[i]; p ! =0; p = p -> next) {
            // Not the current thread
            if(! THREAD_EQUAL(p -> id, self)) {// Filter terminated and blocked threads
                 if((p -> flags & FINISHED) ! =0) continue;
                 if (p -> thread_blocked) /* Will wait */ continue; . n_live_threads++; .// Send the suspend signal to suspend the system threadresult = RAISE_SIGNAL(p, GC_sig_suspend); . }}}}Copy the code

Because the current thread is needed for subsequent logic, all threads except the current thread are suspended. It also filters terminated and blocked threads. It then initiates a thread interrupt signal.

suspend_handler

When an interrupt signal is received, the GC_suspend_handler callback is triggered, which in turn triggers the GC_suspend_handler_inner method:

STATIC void GC_suspend_handler_inner(ptr_t dummy GC_ATTR_UNUSED,
                   void * context GC_ATTR_UNUSED)
{
    pthread_t self = pthread_self(a); GC_thread me; .// Find the current thread
    me = GC_lookup_thread_async(self); .// Store the top pointer
    GC_store_stack_ptr(me); . }Copy the code

After pausing the thread, scan from the root node.

Memory scanning

Execute the GC_mark_some method to begin scanning from the root node as follows

GC_INNER GC_bool GC_mark_some(ptr_t cold_gc_frame)
{
    switch(GC_mark_state) {
    case MS_NONE:
        break;
    case MS_PUSH_UNCOLLECTABLE: {
        ...
        //
        GC_push_roots(FALSE, cold_gc_frame);
        GC_objects_are_marked = TRUE;
        if (GC_mark_state != MS_INVALID) {
            GC_mark_state = MS_ROOTS_PUSHED;
        }
    }
    case MS_ROOTS_PUSHED: {
        ...
        MARK_FROM_MARK_STACK();
        break; }...case MS_INVALID: {
        ...
    }
    break;
}
Copy the code

GC_mark_state indicates the current scan status, which is divided into four states. The algorithm processes the scan status according to different states.MS_INVALID: indicates that the memory space is unmarked. In this case, the state changes to MS_PUSH_UNCOLLECTABLE. As shown in the figure:

The green squares in the figure represent the root node memory that has not been scanned, such as the stack area memory space. It then scans from the lower address to the higher address, offset by 1 byte.

MS_PUSH_UNCOLLECTABLE: Starts traversing the stub node and marks the managed heap memory node pointed to by the root node as “reference” because it is referenced by the root node and stores the node into the array. After completing the marking, switch to MS_ROOTS_PUSHED state at this time, as shown in the picture:

Starting from the starting address of the root node memory, offset bytes gradually to determine whether the address points to a managed heap memory address. For example, the address is PTR, such as * PTR is a valid address of managed heap memory whose node is “referenced” by the root node. In the figure, green indicates the unscanned memory space, and blue indicates the scanned memory space. After all scans are complete, all referenced heap memory nodes are marked blue (ABCDE) and all unreferenced heap memory nodes are marked white (nodes to be reclaimed).

MS_ROOTS_PUSHED: iterate over the previously “referenced” managed heap memory node and determine if the address pointed to is also managed heap memory node based on offset address. For example, node A’s address is PTR and determine if * PTR is A valid managed heap memory node address. Then PTR ++, scan A and push BCDE… All scanning judgments are completed.

When the scan is complete, all “referenced” memory nodes on managed heap memory have been marked, and all unmarked nodes are eventually reclaimed by GC. When done, switch to MS_NONE.

MS_NONE: indicates that all scans are completed and marked.

The above four states are classified into two basic logics, root node scan and managed heap memory recursive scan, to explore the specific implementation logic:

Root node scanning

The root node sweeps the GC_push_roots method corresponding to MS_PUSH_UNCOLLECTABLE as follows:

GC_INNER void GC_push_roots(GC_bool all, ptr_t cold_gc_frame GC_ATTR_UNUSED){...// Scan register, thread stack address space
    GC_push_regs_and_stack(cold_gc_frame);
    // Scan other root space
    if(GC_push_other_roots ! =0) (*GC_push_other_roots)();
}

STATIC void GC_push_regs_and_stack(ptr_t cold_gc_frame)
{
    GC_with_callee_saves_pushed(GC_push_current_stack, cold_gc_frame);
}
Copy the code

The GC_push_all_stack_sections and GC_push_all_register_sections methods scan stacks and registers, respectively, and will not be expanded here. The GC_push_all_eager method is eventually called to scan.

GC_API void GC_CALL GC_push_all_eager(void *bottom, void *top){... lim = t -1

    for (p = b; (word)p <= (word)lim;
        p = (word *)(((ptr_t)p) + ALIGNMENT)) {
    REGISTER word q = *p;
    GC_PUSH_ONE_STACK(q, p);
   }
}
Copy the code

Lim indicates the maximum value of the space. For example, in ascending ALIGNMENT from the bottom of the stack to the top of the stack, p is the current address and Q is * P, that is, the address pointed to by P. Then call GC_PUSH_ONE_STACK to check.

# define GC_PUSH_ONE_STACK(p, source) \
do{\if ((word)(p) > = (word)GC_least_plausible_heap_addr \ & & (word)(p) < (word)GC_greatest_plausible_heap_addr) { \ PUSH_ONE_CHECKED_STACK(p, source); \} \}while (0)
Copy the code

GC_least_plausible_heap_addr and GC_greatest_plausible_heap_addr record the lowest and highest addresses of managed heap memory space, which will be updated when the system allocates heap memory above. This is a rough range. If it is within this range, call GC_mark_and_push_stack to determine further.

GC_INNER void GC_mark_and_push_stack(ptr_t p, ptr_t source) {
    hdr * hhdr;
    ptr_t r = p;

    PREFETCH(p);
    GET_HDR(p, hhdr);

    // Get the header information
    if (NULL == hhdr|| (r = (ptr_t)GC_base(p)) == NULL|| (hhdr = HDR(r)) == NULL)) {
        return;
    }
    // Determine the idle state
    if (HBLK_IS_FREE(hhdr) {
        return; }...// mark and add to temporary array
    PUSH_CONTENTS_HDR(r, GC_mark_stack_top, GC_mark_stack_limit,
           source, hhdr, FALSE);
}
Copy the code

Source corresponds to the previous memory address of the root node, and variable R corresponds to the address value pointed to by the root node. The GET_HDR method is used to get the Header information corresponding to R. If it can be obtained, it means that R is in an effective HBLK memory block, and further determine whether the memory block is free. Note R is allocated to the upper layer as a valid object memory node, and r is a managed heap memory node that can be “referenced” by the root node.

Judgment logic

The logic for determining whether a memory address is a valid, referenced managed heap memory node can be summarized:

  1. Whether the value ranges from GC_least_plausible_heap_addr to GC_greatest_plausible_heap_addr.

  2. Check whether the header information can be obtained because each HBLK memory block has a header information. If the header information can be obtained, the address is in the HBLK memory block.

  3. Check whether the HBLK memory block is free. If the HBLK memory block is not free, the memory block is allocated for upper-layer use.

Operation flag bit

After determining that the node is “referable,” the PUSH_CONTENTS_HDR method is called to process the node.

# define PUSH_CONTENTS_HDR(current, mark_stack_top, mark_stack_limit, source, hhdr, do_offset_check) { \
    // Calculate the offset of the HBLK memory block address where the memory node resides
    size_t displ = HBLKDISPL(current); \
    size_t gran_displ = BYTES_TO_GRANULES(displ); \...// Set the corresponding flag bit in the header to 1, indicating that the node is "referenced".
    SET_MARK_BIT_EXIT_IF_SET(hhdr, gran_displ); \...// Increase the number of references hb_n_marks
    INCR_MARKS(hhdr); \...// Store the memory node into a temporary array
    PUSH_OBJ(base, hhdr, mark_stack_top, mark_stack_limit); \... }Copy the code
  1. Compute the offset of the memory node address in the HBLK memory block because, as described above, this offset is the subscript of hb_marks[], the token bit array in the Header information, which stores the token state information for each memory node. For example, if the address of the HBLK memory block is 0x122C86900, HB_sz =16 (node size 16 bytes), and the address of memory node A is 0x122C86920, then the offset is 32 bytes (2 grans), and Header-> HB_marks [1] stores the value 0 or 1 of the “reference” status flag bit of A. Call the INCR_MARKS method hb_n_marks plus 1 to indicate the total number of “referenced” nodes in this HBLK memory block. As shown in the figure. Blue indicates the two memory nodes that are “referenced”, the corresponding HB_marks [1] and HB_marks [2] are marked as 1, and hb_n_marks=2.

  2. Call PUSH_OBJ to add the “referenced” memory node address to a temporary array as a starting point for the subsequent recursive scan. Mark_stack_top is an array that holds the obJ addresses of the managed heap memory nodes that were “referenced” just scanned.

    #define PUSH_OBJ(obj, hhdr, mark_stack_top, mark_stack_limit) \
     do{\... mark_stack_top++; \... mark_stack_top = GC_signal_mark_stack_overflow(mark_stack_top); \ mark_stack_top -> mse_start = (obj); \
         mark_stack_top -> mse_descr.w = _descr; \} \}while (0)
    Copy the code

Heap memory is scanned recursively

Next, based on traversing mark_STACK_TOP, scan the value of the address to which its memory space points. First call the GC_mark_from method with the MARK_FROM_MARK_STACK macro as follows:

GC_INNER mse * GC_mark_from(mse *mark_stack_top, mse *mark_stack, mse *mark_stack_limit) 
{
    current_p = mark_stack_top -> mse_start;
    descr = mark_stack_top ->mse_descr.w; . while ((word)current_p<= (word)limit) { current = *(word *)current_p; . if (current>= (word)least_ha && current <(word)greatest_ha) { PREFETCH((ptr_t)current); . PUSH_CONTENTS((ptr_t)current, mark_stack_top, mark_stack_limit, current_p); } current_p += ALIGNMENT; }... }Copy the code

Current_p = current; current_p = current; current_p = current; current_p = current; The judgment logic of current is:

  1. Whether the value ranges from GC_least_plausible_heap_addr to GC_greatest_plausible_heap_addr.

  2. Call the PUSH_CONTENTS method to further determine the logic as follows:

    #define PUSH_CONTENTS(current, mark_stack_top, mark_stack_limit, source) \
     do { \
         hdr * my_hhdr; \
         HC_GET_HDR(current, my_hhdr, source); /* contains "break" */ \
         PUSH_CONTENTS_HDR(current, mark_stack_top, mark_stack_limit, \
            source, my_hhdr, TRUE); The \}while (0)
    Copy the code
    • Call the HC_GET_HDR method to obtain the header information corresponding to the HBLK memory block where current resides. Unlike GET_HDR, the HC_GET_HDR method uses the cache to retrieve the header information. If it does, it returns the header directly. If it does not, it calls HEADER_CACHE_MISS to retrieve the header information and store it in the cache. Frequently obtaining header information can shorten the calculation time and improve performance. If no header information is obtained, exit the judgment.

      #define HC_GET_HDR(p, HHDR, source) \ {hdr_cache_entry * hce = hce (p); \ if (EXPECT(HCE_VALID_FOR(hce, p), TRUE)) { \ HC_HIT(); \ hhdr = hce -> hce_hdr; HHDR = HEADER_CACHE_MISS(p, hce, source); If (NULL == HHDR) break; \} \}Copy the code
    • If we get the header, we think current is also a managed heap memory node and call PUSH_CONTENTS_HDR.

    • The PUSH_CONTENTS_HDR method sets the flag bit for this object, header-> hb_marks[index], and increments hb_n_marks, as described above. Finally, it is added to the temporary array mark_STACK_TOP as an object for the next recursive scan.

After the scan is complete, the memory addresses of the nodes collected in the current mark_STACK_TOP array are scanned in the same way to determine which address space they point to.

When all scans are complete, “referenced” memory nodes on the managed heap have all been marked as 1 in HB_marks, and unreferenced nodes are set to 0 when the previous marker bit is reset. The GC then determines the garbage collection based on these markers.

The thread to restore

When the scan is complete, call START_WORLD to restore all threads, which is implemented in the GC_start_world method as follows:

GC_INNER void GC_start_world(void){... n_live_threads = GC_restart_all(); } STATIC intGC_restart_all(void){... pthread_t self = pthread_self();for (i = 0; i < THREAD_TABLE_SZ; i++) {
    for(p = GC_threads[i]; p ! = NULL; p = p -> next) {if(! THREAD_EQUAL(p -> id, self)) {if((p -> flags & FINISHED) ! =0) continue;
            if (p -> thread_blocked) continue; . result = RAISE_SIGNAL(p, GC_sig_thr_restart); . }}}Copy the code

Corresponding to the GC_suspend_all method, threads other than the current thread (as well as terminated and blocked) resume, calling RAISE_SIGNAL to send a resume signal.

Reclaim phase

After scanning and marking is complete, collection of un” referenced “managed heap memory nodes begins. The logic is fired in the GC_finish_collection method as follows:

STATIC void GC_finish_collection(void) { ... // Check memory leak logicif (GC_find_leak) {
        for (kind = 0; kind < GC_n_kinds; kind++) {
            for (size = 1; size< = MAXOBJGRANULES;size++) {
                q = (ptr_t)GC_obj_kinds[kind].ok_freelist[size];
                if (q ! = NULL)GC_set_fl_marks(q); } // Do not reclaim GC_start_reclaim(TRUE); }... // clear ok_freelist for (kind = 0; kind < GC_n_kinds; kind++) {
        for (size = 1; size< = MAXOBJGRANULES;size++) {
            q = (ptr_t)GC_obj_kinds[kind].ok_freelist[size];
            if (q ! = NULL)GC_clear_fl_marks(q); } // start reclaim GC_start_reclaim(FALSE); . }Copy the code

This method first determines whether the operation needs to detect memory leakage. If so, the detection logic of leakage will be triggered. The detected leaking memory will not actually trigger the reclamation logic, which will be analyzed later. Set the marker bit of the memory node in the OK_FREEList linked list to 0 with GC_clear_fl_marks. Then call the GC_start_reclaim method to start the reclaim.

Recovery logic

The main logic of the GC_start_reclaim method is as follows:

GC_INNER void GC_start_reclaim(GC_bool report_if_found)
{
    for (kind = 0; kind < GC_n_kinds; kind++) { struct hblk ** rlist = GC_obj_kinds[kind].ok_reclaim_list; .// Clear ok_freelist and ok_reclaim_list
        void**lim = & (GC_obj_kinds[kind].ok_freelist[MAXOBJGRANULES+1]);
        for (fop = GC_obj_kinds[kind].ok_freelist; (word)fop < (word)lim; (*(word **)&fop)++) {
            ...
            GC_clear_fl_links(fop);
        }
        ...
        BZERO(rlist, (MAXOBJGRANULES + 1) * sizeof(void*)); }...// join ok_reclaim_listGC_apply_to_all_blocks(GC_reclaim_block, (word)report_if_found); .// Reclaim memory
    GC_reclaim_all((GC_stop_func)0, FALSE); . }Copy the code
  1. Before recycling, clear memory nodes in the OK_FREEList linked list, because small memory nodes (less than 2048 bytes) that are not “referenced” will be added to the OK_FREEList linked list during recycling to meet subsequent memory allocation, so clear it first.

  2. Then clear the OK_reclaim_list list as well. Ok_reclaim_list is a Header field that is specifically set for small memory node reclamation. The reclaimed object is located in ok_reclaim_list, so you need to clear the linked list first.

  3. The GC_apply_to_all_blocks method traverses all HBLK blocks of managed heap memory. For each memory block, the GC_reclaim_block method is used to preprocess the memory block to be reclaimed. For large memory, it is directly reclaimed, and for small memory, it is temporarily added to the OK_REclaim_list.

  4. The GC_reclaim_all method actually recycles small memory objects.

2 ~ 4 are analyzed respectively in the following sections.

GC_reclaim_block

The GC_reclaim_block method looks like this:

STATIC void GC_reclaim_block(struct hblk *hbp, word report_if_found)
{
    hdr * hhdr = HDR(hbp);
    word sz = hhdr -> hb_sz; /* size of objects in current block */struct obj_kind * ok = & GC_obj_kinds[hhdr -> hb_obj_kind];if( sz > MAXOBJBYTES ) { /* 1 big object */
    if( !mark_bit_from_hdr(hhdr, 0)) {...// Release HBP directly to GC_hblkfreelist
        GC_freehblk(hbp);
    }
    else{... GC_bool empty = GC_block_empty(hhdr);if (empty) {
            GC_freehblk(hbp);
        } else if(! GC_block_nearly_full(hhdr)) { struct hblk **rlh = ok -> ok_reclaim_list;if(rlh ! = NULL) { rlh += BYTES_TO_GRANULES(sz); hhdr -> hb_next = *rlh; *rlh = hbp; }}}... }Copy the code

Large memory and small memory blocks are processed differently:

Large memory

Large memory is special because a large memory node is generally composed of 1 to N HBLK memory blocks, unlike small memory, which is divided into several small memory nodes by an HBLK. Therefore, the HB_marks array of Header information stores only one marker bit, hb_marks[0].

Mark_bit_from_hdr (header, bit_no) is used to obtain the value of header->hb_marks[bit_no]. If 0, the large memory node is not “referenced”. Call the GC_freehblk method to add it directly to the GC_hblkfreelist list, while clearing out the data in memory. For the next large memory allocation.

Small memory

For small memory nodes, because several nodes may not “reference” an HBLK memory block, a case-by-case processing strategy is adopted, which includes the following three cases:

  1. If the entire HBLK is empty, that is, header-> HB_n_marks =0, it indicates that there is no “referenced” memory node in the HBLK memory block. The entire memory block is directly released to GC_hblkfreelist to clear the data.

  2. Call the GC_block_nearly_full method to determine whether most of the nodes in the HBLK memory block are “referenced”. If not, add the HBLK memory block to the list to be reclaimed ok_REclaim_list. Ok_reclaim_list is the same as ok_freelist. 128 linked lists are provided, as shown in the figure. HBLK memory blocks are added to the corresponding OK_REclaiM_list according to the size of the internal evenly divided memory nodes. For example, if HBLK contains a 16-byte memory node, add ok_reclaiM_list [0].

  3. If GC_block_nearly_full, most memory nodes in the HBLK memory block are “referenced” and the block is not added to the OK_reclaim_list.

GC_reclaim_all

The GC_reclaim_all method processes all HBLK memory blocks that have just been added to ok_REclaim_list for small memory objects, and reclaims any memory nodes that are not “referenced” to ok_FREEList. The code logic is as follows:

GC_INNER GC_bool GC_reclaim_all(GC_stop_func stop_func, GC_bool ignore_old){...for (kind = 0; kind < GC_n_kinds; kind++) { ok = & (GC_obj_kinds[kind]); rlp = ok -> ok_reclaim_list;if (rlp == 0) continue;
    for (sz = 1; sz < = MAXOBJGRANULES; sz++) { rlh = rlp + sz;while((hbp = *rlh) ! =0) {... hhdr = HDR(hbp); *rlh = hhdr -> hb_next; GC_reclaim_small_nonempty_block(hbp, FALSE); }}}... }Copy the code

Go through all the ok_reclaim_list, retrieve the HBLK memory block, and the corresponding header information, call GC_reclaim_small_nonempty_block method to process, there are two parameters, the first parameter is passed HBLK memory block address, The second parameter report_if_found indicates whether the device is only used for discovery and not reclamation. False indicates that reclamation is required. The method is as follows:

STATIC void GC_reclaim_small_nonempty_block(struct hblk *hbp, GC_bool report_if_found)
{
  hdr *hhdr = HDR(hbp);
  word sz = hhdr -> hb_sz;
  struct obj_kind * ok =& GC_obj_kinds[hhdr -> hb_obj_kind];/ / ok_freelist linked list
  void**flh = & (ok -> ok_freelist[BYTES_TO_GRANULES(sz)]); .// It is not collected
  if (report_if_found) {
    GC_reclaim_check(hbp, hhdr, sz);
  } else {
    / / recycling
    *flh = GC_reclaim_generic(hbp, hhdr, sz, ok -> ok_init,
                 (ptr_t)(*flh), &GC_bytes_found);
  }
}
Copy the code

Enter the GC_reclaim_generic method and call the GC_reclaim_clear method to reclaim the memory. The main logic is as follows:

STATIC ptr_t GC_reclaim_clear(struct hblk *hbp, hdr *hhdr, word sz,
               ptr_t list, signed_word *count)
{
    word bit_no = 0;
    word *p, *q, *plim;
    p = (word*)(hbp-> hb_body); plim = (word *)(hbp-> hb_body + HBLKSIZE - sz);

    while ((word)p < = (word)plim) {
      if (mark_bit_from_hdr(hhdr, bit_no)) {
        p = (word *)((ptr_t)p + sz);
      } else {
          obj_link(p) = list;
          list = ((ptr_t)p);
          q = (word *)((ptr_t)p + sz);
          p++; /* Skip link field */
          while ((word)p < (word)q) {*p++ = 0; } } bit_no += MARK_BIT_OFFSET(sz); }... }Copy the code

Each memory node in HBLK memory is iterated, and if mark_bit_FROM_hDR is 1, it is marked “referenced” and does not need to be collected, otherwise it is added to the list, ok_FREEList. As shown in the figure:

This completes the collection of all memory nodes.

Memory leak detection

In addition to the reclaim logic, the algorithm can also detect memory leaks, which are managed heap memory nodes that are not “referenced.” In the GC_finish_collection method, there is also a logic to detect memory leaks. Finally, the GC_reclaim_small_nonempty_block method, report_if_found is called true, This logic is used only to mark unreferenced memory nodes, not actually reclaim them, and add them to the GC_leaked global array.

conclusion

This article roughly analyzes the implementation ideas of BoehmGC algorithm in Unity on object garbage collection. Some concepts have been described in the previous article, and many details remain to be further studied and excavated. In addition, I hope we can correct the misunderstanding in the article.