preface

Redis (a) how to implement fixed size cache?

Java from zero handwriting implementation of Redis (c) Redis expire principle

Java from zero handwriting implementation redis (three) how to restart memory data is not lost?

Java from zero handwriting implementation redis (four) add listener

Java from zero handwriting redis (five) another way to implement expiration strategy

Java from zero handwriting implementation redis (six) AOF persistence principle detailed explanation and implementation

Java from scratch handwriting Redis (seven) LRU cache elimination strategy in detail

We implemented common obsolescence strategies such as FIFO/LRU/LFU, but in the operating system, the clock page replacement algorithm is actually used.

LRU does perform well, but it is memory intensive and cumbersome to implement.

Clock page replacement algorithm is an approximate LRU algorithm, which can be regarded as an improvement of FIFO algorithm.

Clock page replacement algorithm

Why is the Clock algorithm needed?

The performance of LRU algorithm is close to OPT, but it is difficult and expensive to implement. FIFO algorithm is simple to implement, but poor performance.

So operating system designers have tried many algorithms, which are variants of the CLOCK algorithm, in an attempt to approach the performance of LRU with relatively low overhead.

Because the algorithm circularly checks the situation of each page, it is called CLOCK algorithm, also known as Not Recently Used (NRU) algorithm.

The basic idea

You need the access bit of a page entry, which is initialized to 0 when a page is loaded into memory, and then set to 1 if the page is accessed (read/write).

Organize the pages into a circular linked list (like the surface of a clock), with the pointer pointing to the oldest page (coming in first);

When a page-missing interrupt occurs, the oldest page that the pointer points to is checked, and if its access is 0, it is immediately eliminated. If access is 1, the position is 0 and the pointer is moved down one space. So on until you find the page that is eliminated, and then move the pointer to its next pane.

Personal questions

(1) What if I go around and find all the elements are 1?

If you default to the first element, you’re back to a naive FIFO.

(2) Access performance problems

The traversal here can be thought of as a circular linked list:

Content of each node:

K key;
boolean accessFlag;
Copy the code

Naive FIFO is very simple, just throw the element into the queue, and then eliminate the oldest element.

So if you really use a linked list as a data structure, then the search and update time is O(N), obviously not very good.

One possible solution is to store keys + bidirectional linked list nodes in a HashMap.

In contrast to the performance improved version of LRU, each update does not remove the node adjustment, but only updates the corresponding flag bit.

Simple CLOCK algorithm

This is done by associating each page visited with a reference bit, also known in some places as a use bit.

His main idea was to initialize use bit to 0 when a page was loaded into main memory. If the page is accessed later, the use bit is still marked as 1.

For page replacement algorithms, the set of candidate frames can be thought of as a circular buffer with a pointer associated with the buffer. When a page replacement is encountered, the pointer points to the next frame in the buffer.

If the page enters main memory and there are no free frames, i.e. all pages have use bits of 1, then loop a buffer from the pointer, clear all the previous use bits of 0, and leave them in the original position, and swap out the page corresponding to the frame.

Ps: there are no free frames, and all used bits will be cleared.

example

Take the following page replacement process as an example. The pages visited are in order :1,2,3,4,1,2,5,1,2,3,4,5.

The main store has 4 free frames, each page corresponding to the structure (page number, using bits).

At first, the page number 1 enters main memory. There are free frames in main memory, and its usage bit is marked as 1. Since there is no page 1 in main memory before, there will be a page missing interruption.

Similarly, the following pages 2, 3, and 4 enter main memory and mark their usage bit as 1, resulting in page missing interruption.

When subsequent pages 1 and 2 enter main memory, they are already in main memory.

When after page 5 into main memory, not spare frame within main memory, at this time as the pointer to move the buffer cycle, all previous page to use 0, namely at that time the use of page 1, 2, 3, 4 correspond to a full to 0, the pointer back to the original position, replace page 1, page 5 in main memory, at the same time using a marked as 1.

By analogy, it can be seen that 10 page missing interrupts occur in CLOCK.

Gclock(Generalized clock page replacement algorithm)

Algorithm thought

This algorithm is a variant of Clock.

In contrast to the binary 0 and 1 representation of Clock flag bit, the Gclock flag bit is an integer, which means it can theoretically increase to infinity.

The working principle of

(1) When the object to be cached is in the cache, the value of its marker bit is increased by 1. At the same time, the pointer points to the next object of that object.

(2) If not in the cache, check that the pointer points to the object’s marker bit. If it is 0, the object is replaced with an object to be cached; Otherwise, the value of the marker bit is reduced by 1 and the pointer points to the next object. And so on until one object is eliminated. Since the value of the marker bit is allowed to be greater than 1, the pointer may loop multiple times before weeding out an object.

Ps: This is similar to the simplified version of LFU, and the corresponding occurrence times are counted.

WSclock(Working set clock page replacement algorithm)

Algorithm thought

This algorithm is also a variant of clock and may be the most widely used algorithm in practice.

It adopts the principle of Clock and is an enhanced version of WS algorithm.

The algorithm data structure is a circular linked list. Each cache object holds the “last used time” RT and the “referenced or not” R flag bit, using a periodic timer T. Age represents the difference between the current time and RT

The working principle of

(1) When the object to be cached exists in the cache, update RT to the current time. At the same time, the pointer points to the next object of that object.

(2) If it does not exist in the cache, if the cache is not full, then rt of the position to which the update pointer points is the current time and R is 1. At the same time, the pointer points to the next object. If it is full, an object needs to be eliminated. Check the object to which the pointer points,

  • If R is 1, the object is in the working Set, then reset R to 0 and pointer to the next object.

  • R is 0. If age is greater than t, it indicates that the object is not in the Working Set. Replace the object, set R to 1, and rt to the current time. If age is not greater than t, the search continues for the obsolete object. If you return to the beginning of the pointer and no obsolete object is found, the first encountered object with R of 0 is eliminated.

Second chance method (or enhanced Clock)

Improved CLOCK algorithm

Idea: Reduce the missing page processing overhead of modifying pages

Modify the Clock algorithm to allow dirty pages to always be retained in an hourly scan, using the dirty bit and the use bit to guide displacement

Algorithm process

In the previous CLOCK algorithm, in addition to the used bit, a modified bit is added, which is also called dirty bit in some places.

Now each page has two states, one is (use bit, modify bit), which can be considered in the following four cases:

(0,0) : not recently used and not modified, the best state!

(0,1) : modified but not recently used, will be written

(1,0) : used but not modified, the next round will be used again

(1,1) : used and modified, the last choice for the next round of page replacement

example

Take the following page replacement process as an example:

Visit the page is in turn: 0,1,3,6,2,4,5,2,5,0,3,1,2,5,4,1,0, red Numbers are going to modify the page, as their modified bit will be set to 1, the page in italics in the following figure said to use and modify bits as shown in the figure below. “Fault?” Indicates the number of times to search for an idle frame when a page is missing.

Replace the order

  1. Search for pages in main memory that satisfy (use bit, modify bit) (0,0) starting from the current position of pointer;

  2. If no one is found in step 1, then look for the page in state (0,1).

  3. If it is still not found, the pointer returns to its original position and sets the usage bit of all pages in the collection to 0. Repeat step 1 and, if necessary, step 2 so that you are sure to find the page you are replacing.

Java implementation clock algorithm

instructions

This paper mainly implements a simple version of clock algorithm, and adds some performance optimization to the conventional implementation. (The whole network may be exclusive, or the first to do so)

The optimization is mainly based on performance considerations, similar to the previous performance optimization for LRU, which optimized the query operation from O(N) to O(1).

Implementation approach

We define a circular linked list that conforms to the current business scenario (this can also be isolated in the later stage, so that there is time to write a separate data structure project for reuse).

Define a node that contains accessFlag.

We use bidirectional rather than unidirectional lists for the best performance of deletion.

Use a map to store information about keys, avoiding the need to loop through the list to determine whether a key exists and trading space for time.

All right, now comes the coding phase of happiness.

Code implementation

Node definition

/** * loop linked list node *@author binbin.hou
 * @since 0.0.15
 * @param <K> key
 * @param <V> value
 */
public class CircleListNode<K.V> {

    /**
     * 键
     * @since0.0.15 * /
    private K key;

    /**
     * 值
     * @since0.0.15 * /
    private V value = null;

    /** * Has been accessed *@since0.0.15 * /
    private boolean accessFlag = false;

    /** * the last node *@since0.0.15 * /
    private CircleListNode<K, V> pre;

    /** * the last node *@since0.0.15 * /
    private CircleListNode<K, V> next;

    //getter & setter
}
Copy the code

Here are a few simple elements: key, value, accessFlag, next, pre user implementation bidirectional linked list.

Bidirectional linked list implementation

Basic attributes

In keeping with the original Lru bidirectional list, we implement the original frontal interface.

public class LruMapCircleList<K.V> implements ILruMap<K.V> {

    private static final Log log = LogFactory.getLog(LruMapCircleList.class);

    /** * header *@since0.0.15 * /
    private CircleListNode<K,V> head;

    /** * map map *@since0.0.15 * /
    private Map<K, CircleListNode<K,V>> indexMap;

    public LruMapCircleList(a) {
        // two-way circular list
        this.head = new CircleListNode<>(null);
        this.head.next(this.head);
        this.head.pre(this.head);

        indexMap = newHashMap<>(); }}Copy the code

The Head node is initialized, and the indexMap user saves the relationship between the key and the bidirectional node.

Remove elements

/** * remove element ** 1. If exists, ignore * 2. There are removed, removed from the list + map * * head = = = = = = > > 2 > 1 head * * delete after 2: * head 1 = = = = > > head *@paramThe key element *@since0.0.15 * /
@Override
public void removeKey(final K key) {
    CircleListNode<K,V> node = indexMap.get(key);
    if(ObjectUtil.isNull(node)) {
        log.warn("Delete message does not exist: {}", key);
        return;
    }
    CircleListNode<K,V> pre = node.pre();
    CircleListNode<K,V> next = node.next();
    //1-->(x2)--> 2
    pre.next(next);
    next.pre(pre);
    indexMap.remove(key);
    log.debug("Key: {} removed from the circular list", key);
}
Copy the code

Removing a node is not difficult by simply removing the node from the circular linked list, along with the information in the indexMap.

update

Here we use the same method for put/get. In fact, if we want to implement the enhanced version of clock algorithm, it is better to separate the two. However, I feel that the principle is similar, so I will not implement it here.

/** * put elements ** Similar to FIFO, put directly at the end of the queue ** head==>1==>head * add elements: ** head==>1==>2==>head ** (1) If the element does not exist, insert directly. * Default accessFlag = 0; * (2) Update accessFlag=1 if it already exists; * *@paramThe key element *@since0.0.15 * /
@Override
public void updateKey(final K key) {
    CircleListNode<K,V> node = indexMap.get(key);
    / / there
    if(ObjectUtil.isNotNull(node)) {
        node.accessFlag(true);
        log.debug("Node already exists, set node access id to true, key: {}", key);
    } else {
        // If it does not exist, insert to the end
        node = new CircleListNode<>(key);
        CircleListNode<K,V> tail = head.pre();
        tail.next(node);
        node.pre(tail);
        node.next(head);
        head.pre(node);
        // Place it in indexMap for quick location
        indexMap.put(key, node);
        log.debug("Node does not exist, add node to linked list: {}", key); }}Copy the code

The main point here is to distinguish whether the next node already exists.

(1) The node already exists, directly obtain the node, update accessFlag=true;

(2) No: Insert a new node with accessFlag = false

Eliminate data

/** * removes the oldest element ** (1) from head.next, if accessFlag= 0, remove * (2) if accessFlag=1, set it to 0, and loop through the next node. * *@returnResults *@since0.0.15 * /
@Override
public ICacheEntry<K, V> removeEldest(a) {
    //fast-fail
    if(isEmpty()) {
        log.error("Current list is empty and cannot be deleted.");
        throw new CacheRuntimeException("Do not delete headers!");
    }
    // Start with the oldest element, starting directly from head.next
    CircleListNode<K,V> node = this.head;
    while(node.next() ! =this.head) {
        // Next element
        node = node.next();
        if(! node.accessFlag()) {// If no access is provided, it will be eliminated directly
            K key = node.key();
            this.removeKey(key);
            return CacheEntry.of(key, node.value());
        } else {
            // Set the current accessFlag to 0, proceed to the next
            node.accessFlag(false); }}// If the loop does not find it, just take the first element.
    CircleListNode<K,V> firstNode = this.head.next();
    return CacheEntry.of(firstNode.key(), firstNode.value());
}
Copy the code

The node is traversed directly. If accessFlag=0, the node is eliminated.

If accessFlag=1, set its value to 0, and proceed to the next. (You can only use it once.)

If you don’t find it, you can actually call head. Next and downgrade to FIFO. Of course, since we’ve already updated accessFlag=0, we can actually continue the loop.

  • Implementation deficiencies

There is an improvement here: we don’t always cycle from the beginning. In fact, the disadvantage is obvious, resulting in the first element will be eliminated the second time, other elements that have not been accessed may always exist, you can use an element to remember this position. (Next node of the node that was eliminated last time), feeling that this is more in line with the idea of clock algorithm.

Another option is not to set the accessFlag to 0 and simply degrade to A FIFO if you can’t find any elements throughout the loop, but this will deteriorate after most elements are accessed. So it is advisable to mark the position of the last loop.

call

When the cache is full, call the current list:

import com.github.houbb.cache.api.ICache;
import com.github.houbb.cache.api.ICacheEntry;
import com.github.houbb.cache.api.ICacheEvictContext;
import com.github.houbb.cache.core.model.CacheEntry;
import com.github.houbb.cache.core.support.struct.lru.ILruMap;
import com.github.houbb.cache.core.support.struct.lru.impl.LruMapCircleList;
import com.github.houbb.log.integration.core.Log;
import com.github.houbb.log.integration.core.LogFactory;

/** * Clock algorithm **@author binbin.hou
 * @since0.0.15 * /
public class CacheEvictClock<K.V> extends AbstractCacheEvict<K.V> {

    private static final Log log = LogFactory.getLog(CacheEvictClock.class);

    /** * circular linked list *@since0.0.15 * /
    private final ILruMap<K,V> circleList;

    public CacheEvictClock(a) {
        this.circleList = new LruMapCircleList<>();
    }

    @Override
    protected ICacheEntry<K, V> doEvict(ICacheEvictContext<K, V> context) {
        ICacheEntry<K, V> result = null;
        final ICache<K,V> cache = context.cache();
        // When the limit is exceeded, remove the element at the end of the queue
        if(cache.size() >= context.size()) {
            ICacheEntry<K,V>  evictEntry = circleList.removeEldest();;
            // Perform cache removal
            final K evictKey = evictEntry.key();
            V evictValue = cache.remove(evictKey);

            log.debug(Key: {}, value: {}, evictKey, evictValue);
            result = new CacheEntry<>(evictKey, evictValue);
        }

        return result;
    }


    /** * Update information *@paramThe key element *@since0.0.15 * /
    @Override
    public void updateKey(final K key) {
        this.circleList.updateKey(key);
    }

    /** * removes the element **@paramThe key element *@since0.0.15 * /
    @Override
    public void removeKey(final K key) {
        this.circleList.removeKey(key); }}Copy the code

In fact, it is not difficult to call the place, is to call the method directly.

test

All right, so once we’ve written the code we’re going to verify it briefly.

The test code

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>clock())
        .build();
cache.put("A"."hello");
cache.put("B"."world");
cache.put("C"."FIFO");
// call A
cache.get("A");
cache.put("D"."LRU");
Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());
Copy the code

The log

[the DEBUG] [11:32:55 2020-10-07. 396] [the main] [C.G.H.C.C.S.S.L.I.L ruMapCircleList. UpdateKey] - node does not exist, the new node to the list: A [DEBUG] [11:32:55 2020-10-07. 398] [the main] [C.G.H.C.C.S.S.L.I.L ruMapCircleList. UpdateKey] - node does not exist, the new node to the list: B [DEBUG] [11:32:55 2020-10-07. 401] [the main] [C.G.H.C.C.S.S.L.I.L ruMapCircleList. UpdateKey] - node does not exist, the new node to the list: C [DEBUG] [11:32:55 2020-10-07. 403] [the main] [C.G.H.C.C.S.S.L.I.L ruMapCircleList. UpdateKey] - node has been in existence, the node access marks, is true, the key: A [DEBUG] [11:32:55 2020-10-07. 404] [the main] [C.G.H.C.C.S.S.L.I.L ruMapCircleList. RemoveKey] - the Key: Removed from the circulation list B [DEBUG] [11:32:55 2020-10-07. 406] [the main] [C.G.H.C.C.S.E.C acheEvictClock. DoEvict] - based on clock algorithm eliminated the key: B, value: the world [the DEBUG] [11:32:55 2020-10-07. 410] [the main] [C.G.H.C.C.S.L.R.C acheRemoveListener. Listen] - Remove the key: B, value: world, type: Evict [DEBUG] [11:32:55 2020-10-07. 411] [the main] [C.G.H.C.C.S.S.L.I.L ruMapCircleList. UpdateKey] - node does not exist, the new node to the list: D [D, A, C]Copy the code

In line with our expectations.

Comparison of LRU, FIFO and Clock

LRU and FIFO are both first-in, first-out in nature, but LRU is based on the most recent access time of pages to sort, so it is necessary to dynamically adjust the order between each page in each page visit (the most recent access time of each page has changed); FIFO sorts pages according to the time they enter memory. This time is fixed and constant, so the order between pages is fixed and constant.

LRU works fine if the program is local. If all pages in memory have not been accessed, FIFO is degraded (for example, the page has not been accessed after entering memory, the last access time is the same as the time of entering memory).

LRU algorithm has good performance but high system overhead. FIFO algorithm of the system is less overhead, but may occur Belady phenomenon.

Therefore, the preferred approach is the Clock algorithm, which does not have to dynamically adjust the order of the page in the list every time a page is visited. Instead, it simply marks the page and moves it to the end of the list when a page miss interrupts.

The Clock algorithm performs just as well as LRU for pages in memory that have not been accessed, and for pages that have been accessed, it cannot remember the exact order of access as LRU does.

Replacement algorithm supplement

Common substitution algorithms, we’ve basically gone through them.

However, the algorithm variation, different scenes of the algorithm is also more, here to supplement the algorithm without detailed solution, here will not do the corresponding implementation.

Objective To perfect the cognitive system of elimination algorithm.

Optimal Permutation Algorithm (OPT)

Optimal (OPT) substitution algorithm selects obsolete pages that will never be used again or will not be accessed for the longest period of time to ensure the lowest page miss rate.

However, this algorithm cannot be implemented because people cannot predict which of the thousands of pages in memory will not be accessed for the longest time in the future.

The optimal permutation algorithm can be used to evaluate other algorithms. Suppose the system allocates three physical blocks to a process, and consider the following page number reference string:

7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1
Copy the code

When the process runs, the 7, 0 and 1 pages are loaded into the memory successively.

When the process wants to visit page 2, there will be page missing interruption. According to the optimal replacement algorithm, the page 7 that needs to be called in the 18th visit will be eliminated.

Then, when page 0 is accessed, there is no need to generate a missing page interrupt because it is already in memory. When I visit page 3, I will weed out page 1 based on the best substitution algorithm… Figure 3-26 shows the process.

It can be seen from the figure that the optimal permutation algorithm is adopted.

As you can see, the number of missing page interrupts is 9 and the number of page substitutions is 6.

Of course, this is a theoretical algorithm, but it can’t be implemented in practice, because we can’t predict how the data will be used.

PBA: Page Buffering Algorithm

Although LRU and Clock replacement algorithms are better than FIFO algorithms, they both require some hardware support and are more expensive. Moreover, replacing a modified page is more expensive than replacing an unmodified page.

The page caching algorithm (PBA) can not only improve the performance of paging system, but also adopt a simpler substitution strategy.

The VAX/VMS operating system uses the page caching algorithm. It uses the aforementioned variable allocation and local replacement, replacement algorithm is FIFO.

The algorithm stipulates that a page that is eliminated is placed in one of the two linked lists, that is, if the page has not been modified, it is placed directly in the free linked list; Otherwise, it is placed in a linked list of modified pages. Note that the page does not physically move in memory, but simply moves entries from the page table to one of the two linked lists.

A list of free pages is actually a list of free physical blocks, each of which is free and, therefore, can be loaded with programs or data. When a page needs to be read in, the page can be loaded with the first physical block in the list of free physical blocks. When you have an unmodified page to swap out, you do not actually swap it out of memory, but rather hang the physical block of the unmodified page at the end of the free page list.

Similarly, when replacing a modified page, the physical block in which it resides hangs at the end of the list of modified pages. In this way, both modified and unmodified pages remain in memory. When the process later visits these pages, it costs little to return them to the process’s resident set. When the number of modified pages reaches a certain value, such as 64 pages, they are written back to disk altogether, significantly reducing the number of disk I/O operations.

A simpler page buffering algorithm has been implemented in the MACH operating system, except that it does not distinguish between modified and unmodified pages.

Comparison of permutation algorithms

algorithm annotation
The optimal algorithm Not implementable, but can serve as a benchmark
NRU (not recently used) algorithm Rough approximation of LRU
FIFO algorithm Important (frequently used) pages may be discarded
Second chance algorithm A big improvement over FIFO
The clock algorithm The reality of
LRU (least recently used) algorithm Excellent, but difficult to achieve
NFU (least frequently used) algorithm The approximation of the LRU
Aging algorithm Very similar to LRU
Working set algorithm It’s expensive to implement
Working set clock algorithm Good efficient algorithm

summary

Clock algorithm is a tradeoff. In practical application, the operating system chooses this algorithm.

From my understanding, the advantage of clock is that it is not necessary to update the location of the element every time it is accessed frequently. It only needs to update once when it is eliminated. Although we use bidirectional linked list optimization in LRU, the time complexity is O(1), but it is still quite wasteful.

Cache elimination algorithm is basically here to come to an end, thank you for your support, I hope you gain.

Open source address: github.com/houbb/cache

If you find this article helpful, please feel free to like it. Your encouragement, is my biggest motivation ~

Do not know what you have gained? Or have other more ideas, welcome to discuss with me in the comments section, looking forward to meeting your thinking.