Redis source code analysis skiplist

Recently to change a new job, by the new and old work alternate this window to relax, so column delay more, but I do not feel guilty, after all, no one urges more. However, it is boring to watch TV series and variety shows every day. I envy those of you who work regularly and have serious work content every day. I feel that you live a full life. [head]

There are many kinds of data structures in the computer field. Data structures exist either to save time, or to save space, or both, so there are some data structures called time for space, space for time. There are also data structures that save time and space at the expense of accuracy, like the BloomFilter I talked about earlier. Skiplist is also a probabilistic data structure, organized into multiple levels of random probability data for quick lookup.

Jump table

What exactly is a skip watch? So let’s think about this scenario, if you have an ordered list, and you want to see if a particular value is in the list, do you have to go through the list once to know, order n time.

Some people might ask why we don’t just use continuous storage, we can still use binary search, and we want to keep the advantage of low change time with linked lists. So how do we optimize the speed of a single lookup? In fact, the idea is very much like binary search, but the characteristics of single linked list can not access randomly limited us, but the idea of binary gradually narrow the scope inspired us, can we think of any way to gradually narrow the scope?

Can I create a new list on top of the original list, where the new list takes one node from the original list? Suppose the original list is L0, the new list is L1, and the elements in L1 are the first, third, fifth, seventh, ninth of L0… And then create Pointers to each node in L1 and L0. In this way, L1 can reduce the range in L0 by half, and create a new linked list for L1 L2… , the linked list with a higher level is divided into a larger range, and the range is narrowed down step by step after the large range of range is determined, as shown in the figure below.

Suppose we want to find 13, we can determine the range of 2-14 in L3, 8-14 in L2, 10-14 in L1, and 13 in L0. The overall search path is shown in the following figure. Is the red path less nodes than the green path directly looking for 13 in L0?

In fact, this implementation is very similar to binary search, but the binary search in advance stored in the middle point, with the extra space in exchange for time, it is easy to think of its time complexity and binary search are O(logn).

Is this guy really cool? He invented a data structure that would reduce the search time of ordered lists from O(n) to O(logn), but I have a question,What if a node is inserted or deleted from the list?Is the entire data structure rebuilt every time the data changes?We don’t have to, we don’t have to be strict about half the relationship between two levels,You just have to have a probability of one in twoIt is easy to delete a node. Directly delete the corresponding modified node in a certain level. When inserting a node, the new node can be inserted into the upper linked list with an exponential decreasing probability. For example, L0 is 100% inserted, L1 is inserted with a probability of 1/2, and if L1 is inserted, L2 is inserted with a probability of 1/2… Pay attention to,As long as there are nodes in the high Level, there must be nodes in the low Level, but the probability of appearing in the high Level linked list will decrease with the Level indexEventually, the jumper might look something like this.

That’s how we reinvented Skiplist.

Skip tables in Redis

In order to provide sorted set related operations (such as zadd and zrange), the underlying Redis implementation is skiplist. Let’s take a look at how Redis implements Skiplist.

typedef struct zskiplist {
    struct zskiplistNode *header, *tail; // end pointer
    unsigned long length;   // Skiplist length
    int level;  // The maximum number of linked lists
} zskiplist;
Copy the code

Let’s take a look at the definition of Zskiplist in Redis. There’s nothing there, just Pointers, lengths, and series. The focus is on zskiplistNode. ZskiplistNode has forward Pointers, so Level[0] is a bidirectional list.

typedef struct zskiplistNode {
    sds ele;   // The specific value stored by the node
    double score;   // The score corresponding to the node
    struct zskiplistNode *backward; // Forward pointer
    struct zskiplistLevel {
        struct zskiplistNode *forward; // Back pointer for each layer
        unsigned long span;  // Span to the next node
    } level[];
} zskiplistNode;
Copy the code

The skiplist implementation in Redis is slightly different from what we mentioned above. It is not in the form of a simple multi-level linked list, but directly in level[] in zskiplistNode, the association of nodes of different levels is organized. The structure of Zskiplist is visualized as follows.

Operation of the hop table

With zskiplist constructed, let’s take a look at its main operations.

New jump table

/* Create a hop table */
zskiplist *zslCreate(void) {
    int j;
    zskiplist *zsl;

    zsl = zmalloc(sizeof(*zsl));
    zsl->level = 1;
    zsl->length = 0;
    zsl->header = zslCreateNode(ZSKIPLIST_MAXLEVEL,0.NULL);  // Create a head node
    for (j = 0; j < ZSKIPLIST_MAXLEVEL; j++) {
        zsl->header->level[j].forward = NULL;
        zsl->header->level[j].span = 0;
    }
    zsl->header->backward = NULL;
    zsl->tail = NULL;
    return zsl;
}
Copy the code

Creating a hop table is easier. Create an empty node as the head node.

/* Insert a new node in the hop table, */
zskiplistNode *zslInsert(zskiplist *zsl, double score, sds ele) {
    zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
    unsigned int rank[ZSKIPLIST_MAXLEVEL];
    inti, level; serverAssert(! isnan(score)); x = zsl->header;for (i = zsl->level- 1; i >= 0; i--) {
        /* store rank that is crossed to reach the insert position */
        rank[i] = i == (zsl->level- 1)?0 : rank[i+1];
        while (x->level[i].forward &&
                (x->level[i].forward->score < score ||
                    (x->level[i].forward->score == score &&
                    sdscmp(x->level[i].forward->ele,ele) < 0)))
        {
            rank[i] += x->level[i].span;
            x = x->level[i].forward;
        }
        update[i] = x;
    }
    /* There are no duplicate elements in skiplist, but we allow duplicate scores because there are no duplicate insertions of two * identical elements if zslInsert() is called, because zslInsert() already determines the presence of */ in the hash table
    level = zslRandomLevel();  // Generate a random value to determine the highest level of the list to be inserted
    if (level > zsl->level) {
        for (i = zsl->level; i < level; i++) {
            rank[i] = 0;
            update[i] = zsl->header;
            update[i]->level[i].span = zsl->length;
        }
        zsl->level = level;
    }
    x = zslCreateNode(level,score,ele);  // Create a new node for the inserted data
    for (i = 0; i < level; i++) {
        x->level[i].forward = update[i]->level[i].forward;
        update[i]->level[i].forward = x;

        /* After inserting a new node, you need to update the span value of the preceding and following nodes */
        x->level[i].span = update[i]->level[i].span - (rank[0] - rank[i]);
        update[i]->level[i].span = (rank[0] - rank[i]) + 1;
    }

    /* Increases the span value for other levels because a new node is inserted between the original two nodes */
    for (i = level; i < zsl->level; i++) {
        update[i]->level[i].span++;
    }

    x->backward = (update[0] == zsl->header) ? NULL : update[0];
    if (x->level[0].forward)
        x->level[0].forward->backward = x;
    else
        zsl->tail = x;
    zsl->length++;
    return x;
}
int zslRandomLevel(void) {
    int level = 1;
    while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF)) / / ZSKIPLIST_P = = 0.25
        level += 1;
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}
Copy the code

Data insertion is a little more complicated, requiring you to create new nodes, determine which levels you want to insert new nodes at, and update the span values for each level in the previous node. An extra note here is zslRandomLevel, which determines whether a single node is placed to the next level with a 25% probability, not 50%.

void zslDeleteNode(zskiplist *zsl, zskiplistNode *x, zskiplistNode **update) {
    int i;
    for (i = 0; i < zsl->level; i++) {
        if (update[i]->level[i].forward == x) {
            update[i]->level[i].span += x->level[i].span - 1;  // To delete a node, change the span value
            update[i]->level[i].forward = x->level[i].forward;
        } else {
            update[i]->level[i].span -= 1; }}if (x->level[0].forward) {
        x->level[0].forward->backward = x->backward;
    } else {
        zsl->tail = x->backward;
    }
    while(zsl->level > 1 && zsl->header->level[zsl->level- 1].forward == NULL)
        zsl->level--;
    zsl->length--;
}

* * If the node is null, you need to call zslFreeNode() to release the node, otherwise you just null the pointer to the SDS, so that * other nodes can continue to use the SDS */
int zslDelete(zskiplist *zsl, double score, sds ele, zskiplistNode **node) {
    zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
    int i;

    x = zsl->header;
    for (i = zsl->level- 1; i >= 0; i--) {
        while (x->level[i].forward &&
                (x->level[i].forward->score < score ||
                    (x->level[i].forward->score == score &&
                     sdscmp(x->level[i].forward->ele,ele) < 0)))
        {
            x = x->level[i].forward;
        }
        update[i] = x;
    }
    /* There may be multiple nodes with the same SOCre that must be found and removed */
    x = x->level[0].forward;
    if (x && score == x->score && sdscmp(x->ele,ele) == 0) {
        zslDeleteNode(zsl, x, update);
        if(! node) zslFreeNode(x);else
            *node = x;
        return 1;
    }
    return 0; /* not found */
}
Copy the code

Data deletion is also very simple, similar to single-linked list deletion, but at the same time need to update the data at each level.

The rest of the code is a lot more, knowing the skiplist implementation, other related operations code is easier to think of, I will not list here, if you are interested in t_zset.c

Why does Redis use Skiplist instead of a balanced tree

In Redis, skiplist is mainly used to achieve sorted set-related functions. Of course, red black trees can also achieve its functions. Why did Redis author use skiplist instead of red black trees, B trees and other balanced trees? And apparently, red-black trees save more memory than Skiplist! Redis antirez also once personally respond to the author of this problem, the original see news.ycombinator.com/item?id=117…

Let me translate roughly:

Skiplist is not particularly memory intensive and can consume less memory than b-trees by adjusting the probability of nodes reaching higher levels.
The sorted set may face a large number of Zrange and Zreverange operations, and the performance of a hoptable as a singly linked list traversal is no less than that of other balanced trees.
It’s easy to implement and debug. For example, implementing O(log(N)) time complexity with ZRANK requires a simple code change.

This article is a Redis source code analysis series of blog posts, but also with the corresponding Redis Chinese annotated version, there are students who want to further study Redis, welcome star and attention. Redis Chinese Annotated version warehouse: github.com/xindoo/Redi… IO /s/1h Redis source code analysis column: zxs. IO /s/1h If you find this article useful, welcome to three links.

Jump table

Skip tables in Redis

Operation of the hop table

New jump table

Why does Redis use Skiplist instead of a balanced tree

Related Posts

Jango says Java– enumeration classes

Notes on the Art of Concurrent Programming in Java (1) The challenges of concurrent programming

Database and table solution MyCat series route distribution