What is a jump list

Skiplist is also essentially a lookup structure. The general search problem is divided into two categories: one is based on various balance trees, one is based on hash table. Skiplist, however, is special and doesn’t fit into either of these categories.

For example, find 23

  • 23 is first compared to 7, then to 19, bigger than both of them, and then going backwards.
  • But 23 is smaller than 26 when compared to 26, so go back to the following linked list (the original linked list) and compare to 22.
  • Twenty-three is larger than twenty-two, so go back to twenty-six along the pointer below. 23 is smaller than 26, indicating that the data 23 does not exist in the original list, and its insertion position should be between 22 and 26.

It’s kind of like dichotomy actually

What about insertion

Skiplist avoids this problem by not requiring a strict relationship between the number of nodes in the next two levels of the linked list. Instead, it assigns a random level to each node. For example, if a node has a random number of layers 3, then it is linked to the list of layers 1 through 3.

This reduces the complexity of the insert operation. In fact, this is an important feature of Skiplist, which makes it significantly better at insert performance than balanced tree solutions. We’ll come back to that later.

Stochastic calculation method

  • First, each node must have a level 1 pointer (each node is in the level 1 list).
  • If a node has a pointer to level I (I >=1) (that is, the node is already in the list from level 1 to level I), then the probability that it has a pointer to level (I +1) is p.
  • The maximum number of layers of a node cannot exceed a maximum value, denoted as MaxLevel.

Skiplist and balance tree, hash table comparison

  • The elements of Skiplist and various balanced trees (AVL, red-black, etc.) are ordered, while hash tables are not. Therefore, the hash table can only do a single key lookup, not a range lookup. Range lookup refers to finding all nodes whose size is between two specified values.
  • Balanced trees are more complex than skiplist operations when it comes to range lookups. In the equilibrium tree, after we find the low value in the specified range, we need to continue to find other nodes that do not exceed the high value in a middle-order traversal order. The middle-order traversal here is not easy to achieve without some modification of the equilibrium tree. Range lookups on Skiplist are as simple as a few steps through the level 1 list after finding small values.
  • Balanced tree insertions and deletions can trigger complex logic adjustments to subtrees, whereas Skiplist insertions and deletions require simple and fast changes to adjacent nodes’ Pointers.
  • Skiplist is more flexible than balanced trees in terms of memory footprint. In general, a balanced tree contains 2 Pointers per node (one to the left and right subtrees), whereas skiplist contains an average of 1/(1-p) per node, depending on the size of the parameter p. If p=1/4, as in Redis, then on average each node contains 1.33 Pointers, which is better than a balanced tree.
  • The time complexity of finding a single key, skiplist, and balanced tree is O(log n), roughly the same; On the premise of keeping low probability of hash value conflict, the search time complexity of hash table is close to O(1), and the performance is higher. So most of the Map or Dictionary structures we use are based on hash tables.
  • Skiplist is much easier to implement than a balanced tree.

implementation

In Redis, skiplist is used to implement a data structure that is exposed externally: the sorted set. To be precise, the sorted set underlying uses not only skiplist, but also Ziplist and dict.

Each element in a sorted set presents three main attributes:

  • The data itself (we saved the name as data in the previous example).
  • Each data corresponds to a score.
  • Each data is ranked according to the score size and the dictionary of the data itself. Can be in positive or reverse order.

In fact, the sorted set implementation in Redis looks like this:

  • When the data is small, the sorted set is implemented by a Ziplist.
  • When there is a lot of data, the sorted set is implemented by a dict plus a skiplist. Simply put, dict is used to query data for scores, while Skiplist is used to query data (possibly range lookup) for scores.

In this case, we’ll discuss the sorted set in more detail in the next chapter. Now let’s focus on the sorted set relationship with Skiplist:

  • Zscore queries are not provided by Skiplist, but by Dict.
  • Skiplist has been extended in Redis to support ranking, making it quick to look up data by rank, or easy to rank data by score. Also, the time complexity is order log n according to the ranking lookup.
  • Zrevrange queries, based on ranking data, are provided by the expanded Skiplist.
  • Zrevrank looks up points from dict data, then takes them to Skiplist, and gets a ranking.

In summary, skiplist in Redis differs from the classic skiplist introduced earlier in the following ways:

  • Scores allow repetition, meaning skiplist keys allow repetition. This is not allowed in the classic Skiplist introduced at the beginning.
  • When you compare, you compare not only the scores (skiplist’s equivalent of keys), but also the data itself. In the Skiplist implementation of Redis, data is uniquely identified by its own content, not by a key. In addition, when multiple elements have the same score, you also need to sort the data according to the content of the dictionary.
  • The level 1 list is not a one-way list, but a bidirectional list. This is to facilitate retrieving a range of elements in reverse order.
  • You can easily calculate the rank of each element in Skiplist.

Please add the following text and link at the end of the article: This article is participating in the “Gold Digging Booklet free learning!” Event, click to view details of the event