1. The background

Using locality principle to cache data is a common method of performance optimization when coding.

The only thing you can do with caching is the cache component. Ccache is an excellent LRU cache component, which has many clever optimization strategies to reduce lock conflicts and achieve high performance.

The strategies to reduce lock conflicts are

An element is invoked only after it has been accessed several times (lifting refers to moving elements to the head of the LRU chain)
The weights are placed on a queue and processed by a separate thread
Do garbage collection in the same thread

So let’s see how that works.

2. lru cache

Before analyzing the source code, let’s take a quick look at what the LRU cache does.

Lru is the abbreviation for least recently used. After the LRU cache is full, new contents that have not been accessed for a long time must be eliminated before being cached.

To implement LRU policy, hashtable and list data structures are generally used. Hashtable supports quick retrieval of corresponding values through keys, while List is used to record the access time order of elements and supports elimination of the contents that have not been accessed for the longest time. The following figure

In a Hashtable, the content corresponding to a key consists of two parts. The first part is the actual content to be stored, which is defined here as value, and the second part is a pointer to the corresponding node in the list, which is defined as element.

In the list, each node also contains two parts. The first part is a pointer to the corresponding value in the HashTable, which is defined as node, and the second part is a pointer to next, which is used to string the entire list together.

If we perform get(key2), we will find value2 and element2 through key2, then node2 will be found through element2, and then node2 will be moved to the top of the list, so after we perform get(key2), the figure will be changed to

If there is a set(key5, value5) operation, and our cache can only cache 4 items at most, what will happen? Key5 and Value5 are first inserted into the HashTable, node5 is inserted at the top of the list, and the element at the bottom of the list, in this case Node4, is removed, along with the data corresponding to Node4 in the HashTable. The figure above will become

Through the above process, the LRU policy can be well implemented. However, both hashtable and list data structures are not thread-safe. If they are to be used in a multi-threaded environment, both set and GET operations need to be locked, which will greatly affect performance. Especially, the number of server CPU cores is increasing now, and locking has a very large loss on performance.

3. Optimize the ccache policy

In view of the above problems, CCache adopts the following optimization strategies, which are very clever.

3.1 Sharding a HashTable

This is a very common strategy.

A hashtable is divided into multiple hashtables by key, and each hashtable corresponds to a lock. The lock granularity is finer, and the probability of conflicts is lower.

As shown in the figure, a Hashtable is split into three hashtables by key, and the locks become three. So that when hashTable1 and HashTable2 are accessed concurrently, there is no conflict.

3.2 The rights shall be raised only after multiple visits

Add an access count to value. For each get operation, count +1. When the count reaches the threshold, it is moved to the head of the queue in the list and the count is reset to 0.

If the threshold is 3, the number of writes to the list is 3 times lower and the probability of lock conflicts is 3 times lower.

This is a lossy strategy that makes the order of the list not exactly equal to the access chronological order. However, considering the high frequency of LRU cache GET operations, this strategy should be negligible for hit ratio loss.

3.3 Open a single thread to update the list

In both get and SET operations, the list of access timings needs to be updated, but the update only needs to be done before the next set operation, not in real time. Based on this, you can start a separate update thread to update the list. When get and SET, update tasks are submitted to the queue, and the update thread keeps fetching tasks from the queue for update.

This has two advantages

List does not have multithreaded access and does not need to be locked
Return from hashtable, asynchronously update list, faster function accordingly

One problem with this is that the update thread can become a bottleneck when the CPU core is large and the QPS of GET and set are high. However, considering that the list operation is very lightweight, and that the service cannot put all resources in the read/write cache, this is also negligible.

3.4 Batch Elimination

When the cache is full, one batch of elements is eliminated at a time. Optimization when the cache is full, every new set element triggers a flush.

3.5 Overall Process

After implementing the above strategy, the overall process looks something like this

3.5.1 track of get operation

3.5.2 set operation

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Go language high-performance cache component CCache analysis

1. The background

2. lru cache