Make writing a habit together! This is the 8th day of my participation in the “Gold Digging Day New Plan · April More Text Challenge”. Click here for more details.

This paper briefly introduces a basic recycling algorithm: reference counting algorithm [Collins, 1960], English name reference counting.

The reference-counting method is very simple. The viability of objects can be determined directly by the creation and deletion of reference relationships, eliminating the need for a tracer to traverse the heap to find all live objects and then reversely identify garbage objects that are not traversed.

The reference counting algorithm is based on the idea of counting the number of pointer references to each allocated object. This is a direct approach and happens to be naturally incremental because it allocates memory management overhead throughout the program.

The algorithm relies on a very simple invariant: an object can be alive if and only if the number of references to it is greater than zero.

So how does the algorithm work?

How does the reference counting algorithm work?

Under the reference counting method, each allocated object contains a reference counting field.

The memory manager is responsible for maintaining invariants, that is, the reference count for each object is always equal to the number of direct pointer references to that object, and the reference count is increased or decreased when a reference to an object is created or deleted.

The basic version of the algorithm is given below:

  • New method: used to create an object, new() allocates a new object. For brevity, we ignore object types and assume that all objects are of the same type and size.
  • The delete method:Implement a reduction in the reference count, called when the client program no longer needs the objectdelete()
  • The update method: update()Is the only way to perform pointer assignment in the system. Implement an increment to the reference object count, which we increment before deleting, and this is handled correctlysource == targetIn the case.
def new() :
	obj = allocate_memory()
	obj.set_reference_count(1)
	return obj

def delete(obj) :
	obj.decrement_reference_count()
	if obj.get_reference_count() == 0:
		for child in children(obj):
			delete(child)
		release_memory(obj)

def update(source, target) :
	target.increment_reference_count()
	delete(source)
	source = target
Copy the code

Circular references cannot be resolved

Without a doubt, the biggest drawback of reference counting is its inability to reclaim circular storage. Under the simple reference counting method, cyclic data structures such as bidirectional linked lists or non-simple graphs cannot be effectively recycled and will leak memory. The following example shows the problem:

After Delete (A) and delete(C), we end up with an unreachable but connected component of an object subgraph that cannot be accessed from any root, but whose nodes we cannot reclaim due to non-zero references.

Fortunately, all the other garbage collection techniques (tag scanning, tag compression, replication, and so on) can handle the loop structure with ease. This is why it is not uncommon for systems that use reference counting as their primary garbage collection mechanism to utilize a trace collection algorithm after the heap is exhausted.

Advantages and disadvantages of reference counting algorithms

The memory management overhead of reference counting is spread over the course of the program and can be reclaimed as soon as an object becomes garbage.

Moreover, the algorithm operates directly on the source and target of Pointers, so it is no less local than the applications it serves, and is generally better than a trace GC that needs to track all live objects. The advantages of this algorithm are as follows:

  • Responsiveness: Memory management overhead is distributed throughout the program, which generally results in a more fluid and responsive system than a trace collector. Note that the processing overhead is related to the size of the subgraph to which the last pointer points, and may not be significant in some cases.
  • Immediate memory reuse: Unlike the trace collector, unreachable memory remains unallocated until the collector executes (usually when the heap is exhausted); The reference counting method allows immediate reuse of discarded memory. This immediate reuse leads to better time locality for caching, reducing page errors. It also simplifies resource cleaning because finalizers can be invoked immediately, freeing system resources more quickly. Immediate reuse of space also allows for optimizations, such as in-place updates of data structures.
  • Easy to implement: Reference count-based collection is the simplest garbage collection mechanism in terms of implementation details. Implementation is particularly easy if the language runtime does not allow pointer manipulation and/or if the programmer cannot determine/manipulate the object root.
  • Control vs. Correctness: Reference counting systems can give programmers complete control over object allocation and unallocation. It allows programmers to optimize reference counting overhead where they think it is safe. This does present correctness challenges and requires a higher level of coding discipline. Even without clever optimization, there is tight coupling between the interface and the reference-counting scheme of the client program. It requires the client to correctly invoke operations that increase/decrease the reference count.
  • Space overhead: The space overhead for each object that hosts the reference count field. In theory, for very small objects, this could amount to 50% overhead. This overhead needs to be balanced against the immediate reuse of memory units and the fact that reference counts do not depend on heap space during collection. Reference counting systems can reduce space overhead by using single bytes for reference counting instead of full words. Such a system increases the reference count by falling back on a trace scheme, such as a tag scan, to collect objects with maximum reference counts (and circular references).

Disadvantages are as follows:

  • Pointer update overhead: Unlike tracing schemes where pointer updates are free, reference counting can be expensive because two reference counts need to be updated for each pointer update to keep the program correct.
  • Atomized operations: In order to avoid premature object release caused by multithreaded contention, the operation of adding and subtracting reference counts, memory loading and pointer storage must be atomized, and the atomized operations need to solve many thread contention problems.
  • Loop structure: As we discussed earlier, the biggest disadvantage of reference counting is that it can’t reclaim circular storage. Under the simple reference counting method, cyclic data structures such as bidirectional linked lists or non-simple graphs cannot be effectively recycled and will leak memory.

In the worst case, the reference count of an object may be equal to the total number of objects in the heap, resulting in the reference count taking up the same space as a pointer field, which can be very expensive.

Finally, it is still possible for the reference-counting algorithm to pause. When deleting the last reference to a large structural root node, the algorithm recursively deletes every descendant node of the root node, and thread-safe reference count collection may result in a longer maximum pause time than a trace collector.

conclusion

Reference counting is the simplest garbage collection mechanism in terms of implementation details, and is therefore widely used in programming languages such as Lisp, Awk, Perl and Python, some applications such as Photoshop, the Real Network’s Rhapsody music service, Print, scan and document management system).

In addition to memory management, reference counting is widely used as a resource management mechanism in operating systems to manage system resources such as files and sockets.

There are many ways to improve the two problems of reference counting algorithm: the overhead problem of reference counting operation and the circular reference problem of circular structure. This will be covered in the next article, but thank you for reading it.

  • Reference Counting – A Quick Primer on Garbage Collection Algorithms (educative.io)
  • Handbook of Garbage Collection Algorithms: The Art of Automatic Memory Management: Chapter 5 reference counting