Author: Shi Jicheng


Lock-free data structure memory management

As we all know, lockless data structures tend to have better access efficiency and concurrency in concurrent access. The performance benefits of lock-free data structures come from the following two aspects:

  1. The locking design of data structures tends to be coarse-grained, and in many cases where concurrent access is possible, the visitor is blocked by the lock and cannot achieve concurrent access.
  2. Context switching is not required for the access to unlocked data structures. The operating system context switching is triggered when the concurrency of locked data structures is high.

But lockless data structures also introduce a new problem, namely memory management. For example, when thread A reads A block of data, thread B wants to release the block. In a locked data structure, these two operations are serialized; Due to the lack of lock protection in a lockless data structure, both operations may occur simultaneously. In order for thread A to access the data correctly, thread B must delay the release operation until thread A finishes reading the data. In order to achieve the purpose of delaying the release of memory, we generally use the following methods:

  1. GC support for languages themselves, such as languages with virtual machine Runtime, such as Java.
  2. Reference Count.
  3. Epoch-based Reclamation is referred to as EBR in this paper.

On the one hand, the GC mechanism of the language itself has the limitation of the language, on the other hand, the global GC often causes a certain performance loss, and the program execution is unstable. The performance cost of reference counting itself cannot be ignored, especially in the scenario where there are many read operations. To protect data security, the count needs to be increased each time the data is read, and then decreased after the reading. In the case of high concurrency, the efficiency is not optimistic. EBR avoids the above problems, on the one hand does not require the language level of the specification, on the other hand is relatively efficient implementation. Here is a brief introduction to EBR. For more detailed explanation, please refer to the paper Practical Lock-Freedom.

Epoch-Based Reclamation

In the concept of EBR, there is the concept of Epoch. Epoch is a number, which represents the current generation, and the number is monotonically increasing. The Global has a Global Epoch, which represents the current generation of the Global. Each access to a data structure for each thread contains an Epoch, the Local Epoch, which indicates the generation in which the current thread is. With these concepts in mind, let’s look at the following example to understand how the EBR works.

In the example shown below, threads A and B concurrently access blocks of memory in unlocked data, with time passing from top to bottom. Global Epoch is 0 before point 1.

  • Time node 1: Thread A finds that no other thread is being sent to access the data structure and increments the Global Epoch by 1 to become 1. Meanwhile, thread A Local Epoch is set to 1.
  • Time node 2: Thread B deletes data block M because B finds that only thread A is accessing the data structure, and A’s Epoch and Global Epoch are equal, both of which are 1. Thread B increments the Global Epoch to 2. B thread Local Epoch is synchronized with Global Epoch, which is also 2. Since the Epoch deletion operation was delayed and needed to be put into a collector, the data block M was put into the collector and marked as Epoch 1, meaning that the data could only be used in Epoch 1. Since Epoch 2, there is no data block M (deleted by thread B) in the data structure.
  • Time node 3: Thread B accesses data block N and finds that the Global Epoch is 2 and thread A’s Epoch is 1. Then, thread B marks its Local Epoch as 2, consistent with the Global Epoch.
  • Time nodes 4 and 5: Threads A and B indicate that they have finished accessing the data and are no longer traced by the data structure.
  • Time node 6: Thread A also starts to access data block N. The current Global Epoch is 2 and no other thread accesses the data block. Then thread A adds the Global Epoch to 3 and marks its Local Epoch as 3. At the same time, thread A found that there was A data block M with Epoch 1 in the collector, which was two generations later than the current Global Epoch. It could be deleted and the data block M could be released.
  • Time node 7: Thread A indicates that it has finished accessing the data and is no longer traced by the data structure.

Through the above example, it is not difficult to find that the accessed data can only exist in two epochs, one is the current Epoch, namely Global Epoch, and the other is the previous Epoch, namely (Global epoch-1). All data marked with an earlier Epoch can be deleted, that is, blocks of data in the collector marked as less than (Global epoch-1).

After analyzing the EBR algorithm, we can find that the root cause of its performance superiority lies in the coarse-grained management of data recovery. In the Reference Count method, the higher the concurrency, the more intensive the modification of Counter, the greater the competition, and the worse the performance. In EBR, high concurrency will cause almost all threads to be in the same Epoch, and there is no need to modify the Global Epoch, thus avoiding this aspect of competition and better performance. Of course, EBR has its own problems. When for some reason an access operation does not end, the Global Epoch can never move forward, garbage collection can never be triggered, and memory leaks are inevitable.

All in all, EBR’s excellent performance advantages make it the first choice for high-performance lock-free data structure implementations, even with some drawbacks.

The Rust language implements EBR

From the above analysis of EBR, it is not difficult to see that EBR needs to know the starting point of data access and control the iteration of Epoch with the starting point. While other languages have their own encapsulation and implementation methods, Rust’s concept of the lifecycle helps at the language level. Based on this advantage, the Rust language is a natural fit for implementing EBR and already has a full-fledged implementation version, CrossBeam Epoch. There will be no source-level analysis of the implementation, but rather an attempt to map the framework API to EBR concepts to help you understand them.

Here is the code for the simplest way to use epoch for a lockless data structure:

{
    let guard = epoch::pin();
    guard.defer(move || mem.release());
}
Copy the code

The first line indicates that the current thread is starting to access the data structure, either reading or writing. The second line indicates that a block of memory is delayed. The EBR algorithm decides when to release the block. When the entire block of code is completed, it exits the data structure access, and Guard’s DROP method unlogs the current thread from the monitored queue.

Another example is cuckoohash, a lock-free Hashmap implemented and used in Datenlord.

{
    let guard = pin();
    let value = map.get(&key, &guard);
    / / /... Use the value
}
Copy the code

The first line is similar to the previous example, and the second line has the semantics of finding the value corresponding to the key from the map and getting the reference to the value, whose lifetime does not exceed the lifetime of the Guard. Through the lifecycle approach, we limit the use of a value reference to the lifetime of guard.

conclusion

This paper briefly introduces the epoch-based Reclamation memory management method, and introduces the implementation and application of Rust from the interface level. This paper also analyzes the performance advantages of EBR and the advantages of Rust language implementation from a language. We will then bring you an in-depth analysis of the Rust EBR implementation from the implementation details of crossBeam Epoch.