Talk about cache elimination algorithm -LRU implementation principle

preface

We often use cache to improve the speed of data query. Due to the limited cache capacity, when the cache capacity reaches the upper limit, some data needs to be deleted to make space, so that new data can be added. Cache data cannot be deleted randomly. In general, we need to delete cache data according to some algorithm. Commonly used elimination algorithms are LRU,LFU,FIFO, this article we talk about LRU algorithm.

LRU profile

LRU is short for Least Recently Used. This algorithm considers that Recently Used data is hot data and will be Used again next time with a high probability. And data that has been used infrequently recently will most likely not be used again. When the cache capacity is full, data that has been used less recently is prioritized.

Suppose the internal data is now cached as shown below:

Here we call the first node of the list the head node and the last node the tail node.

When the cache is called to obtain the data with key=1, the LRU algorithm needs to move the node 1 to the first node, and the other nodes remain unchanged, as shown in the figure.

Then we insert a key=8 node, at which point the cache capacity reaches the upper limit, so we need to delete the data before joining. Since each query moves data to the head node, unqueried data will sink to the tail node, and the tail data can be considered the least accessed data, so the data of the tail node is deleted.

Then we add the data directly to the header.

Here are the specific steps of the LRU algorithm:

The new data is inserted directly into the head of the list
Cache data hit, move data to the head of the list
When the cache is full, remove data at the end of the list.

LRU algorithm implementation

As you can see in the example above, the LRU algorithm needs to add a head node and remove a tail node. The time complexity of adding/deleting nodes in linked lists is O(1), which is very suitable for storing cached data containers. However, you can’t use a normal one-way list. One-way lists have several disadvantages:

Each time to obtain any node data, it is necessary to traverse the beginning node, which results in the complexity of obtaining node O(N).
If we move the intermediate node to the head node, we need to know the information about the node before the intermediate node, and the one-way list has to iterate again to get the information.

To solve the above problems, other data structures can be combined to solve them.

With hash table storage nodes, the complexity of getting nodes is reduced to O(1). The node movement problem can add a precursor pointer to the node to record the information of the last node, so that the linked list is changed from one-way to bidirectional.

To sum up, the combination of bidirectional linked list and hash list is used, and the data structure is shown in the figure below:

Add two sentinel nodes to the bidirectional list without storing any data. With sentry nodes, it is possible to add/remove nodes without considering the existence of boundary nodes, simplifying programming and reducing code complexity.

LRU algorithm implementation code is as follows, in order to simplify the key, val is considered int.

public class LRUCache {

    Entry head, tail;
    int capacity;
    int size;
    Map<Integer, Entry> cache;


    public LRUCache(int capacity) {
        this.capacity = capacity;
        // Initialize the linked list
        initLinkedList();
        size = 0;
        cache = new HashMap<>(capacity + 2);
    }

    /** * if the node does not exist, return -1. If it does, move the node to the first node and return the node's data. * *@param key
     * @return* /
    public int get(int key) {
        Entry node = cache.get(key);
        if (node == null) {
            return -1;
        }
        // There is a mobile node
        moveToHead(node);
        return node.value;
    }

    /** * adds the node to the end node, and deletes the end node ** if the capacity is full@param key
     * @param value
     */
    public void put(int key, int value) {
        Entry node = cache.get(key);
        if(node ! =null) {
            node.value = value;
            moveToHead(node);
            return;
        }
        // Does not exist. Add it first, then remove the tail
        // Delete the endnode
        if (size == capacity) {
            Entry lastNode = tail.pre;
            deleteNode(lastNode);
            cache.remove(lastNode.key);
            size--;
        }
        // Add a header

        Entry newNode = new Entry();
        newNode.key = key;
        newNode.value = value;
        addNode(newNode);
        cache.put(key, newNode);
        size++;

    }

    private void moveToHead(Entry node) {
        // Delete the relationship of the original node first
        deleteNode(node);
        addNode(node);
    }

    private void addNode(Entry node) {
        head.next.pre = node;
        node.next = head.next;

        node.pre = head;
        head.next = node;
    }

    private void deleteNode(Entry node) {
        node.pre.next = node.next;
        node.next.pre = node.pre;
    }


    public static class Entry {
        public Entry pre;
        public Entry next;
        public int key;
        public int value;

        public Entry(int key, int value) {
            this.key = key;
            this.value = value;
        }

        public Entry(a) {}}private void initLinkedList(a) {
        head = new Entry();
        tail = new Entry();

        head.next = tail;
        tail.pre = head;

    }

    public static void main(String[] args) {

        LRUCache cache = new LRUCache(2);

        cache.put(1.1);
        cache.put(2.2);
        System.out.println(cache.get(1));
        cache.put(3.3);
        System.out.println(cache.get(2)); }}Copy the code

LRU algorithm analysis

The cache hit ratio is a very important indicator of the cache system. If the cache hit ratio of the cache system is too low, the query will flow back to the database and the pressure of the database will increase.

Combined with the above analysis of the advantages and disadvantages of LRU algorithm.

The advantage of LRU algorithm is that the algorithm is not difficult to implement, and the EFFICIENCY of LRU is very good for hot data.

The disadvantage of LRU algorithm is that for occasional batch operations, such as batch query of historical data, hot data in the cache may be replaced by these historical data, resulting in cache pollution, resulting in the decrease of cache hit ratio and slowing down normal data query.

LRU algorithm improvement scheme

The following scheme sources with MySQL InnoDB LRU improved algorithm

The linked list is divided into two parts, the hot data area and the cold data area, as shown in the figure.

After improvement, the algorithm flow will be as follows:

If the access data is located in the hot data area, it is moved to the head node of the hot data area as in the previous LRU algorithm.
When data is inserted, if the cache is full, the data on the tail is discarded. The data is then inserted into the header of the cold data area.
Determine whether data in the cold data area is accessed each time:
- If the data has been in the cache for more than a specified period of time, say 1 s, it moves to the head of the hot data area.
- If the data exists for a time less than the specified time, the position remains unchanged.

For occasional batch queries, the data simply falls into the cold data zone and is quickly weeded out. Data in the hot data area will not be affected, thus solving the problem of LRU algorithm cache hit ratio decline.

Other improvement methods are LRU-K, 2Q,LIRS algorithm, interested students can consult by themselves.

Welcome to pay attention to my public account: procedures to get daily dry goods push. If you are interested in my topics, you can also follow my blog: studyidea.cn

Talk about cache elimination algorithm -LRU implementation principle

preface

LRU profile

LRU algorithm implementation

LRU algorithm analysis

LRU algorithm improvement scheme

Related Posts

How do I use Pointers in Go

Cloud without borders, Ali Cloud hybrid cloud data synchronous release

Alibaba distributed transaction framework seata1.3 access manual