Redis has five basic data structures: string, hash, set, zset, and list. But do you know what the underlying data structures that make up these five structures are? Today we’re going to spend five minutes looking at it. (Current Redis version 3.0.6)

Dynamic string SDS

SDS is short for “Simple Dynamic String”. The character strings in all scenarios in Redis are basically implemented by SDS

  • All non-numeric keys. For example,set msg "hello world"The key of MSG.
  • A value of the string data type. For example, ‘ ‘set MSG “hello world” MSG value “Hello wolrd”
  • “String value” in non-string data types. For example,RPUSH fruits "apple" "banana" "cherry"“Apple” “banana” “cherry”

SDS looks like this:

Free: how much space is left len: the length of the string buf: the array of characters to store

Space preallocation

In order to reduce the number of memory reallocation times caused by modifying the string, SDS adopts the strategy of “once tube enough” :

  • If the SDS length is less than 1MB, allocate more space with len length
  • If the length of the modified SDS is greater than or equal to 1MB, an additional 1MB space is added

Lazy space release

To avoid memory reallocation when the string is shortened, the SDS does not immediately release the space when the data is reduced.

int

It’s all the numbers in Redis, including this one, with “” in quotes

Two-way linked list

Long like this:

Divided into two parts, one part is “overall planning part” : orange, one part is “specific implementation side” : blue.

Subject “Overall planning part” :

  • headA header that points to a specific bidirectional list
  • tailPoints to the tail of a specific bidirectional list
  • lenThe length of the bidirectional list

Specific “implementation” : a two-way list structure at a glance, with pre and next

It consists of list and listNode data structures.

ziplist

Compress the list. One of the underlying implementations of Redis’ list keys and hash keys. This data structure was developed to save memory. Like arrays in various languages, it is made up of contiguous chunks of memory, which saves memory because memory is contiguous, reducing fragmentation and pointer usage.

Then the structure of the entry in the text looks like this:

Element

First find the element at the end of the list:

The ziplist element’s previous_entry_length attribute is then traversed one by one:

Chain update

Looking at the structure of the entry element again, we have a previous_entry_length field that is either 1 byte long or 5 bytes long:

  • If the length of the previous node is less than 254 bytes, thenprevious_entry_lengthThe length is 1 byte
  • If the length of the previous node is greater than 254 bytes, thenprevious_entry_lengthThe value contains 5 bytes

For example, if you have a compressed list of 250 to 253 bytes in length, and a new node is added with a length greater than or equal to 254 bytes, it will appear:

The program needs to keep reassigning space to the compressed list until the end.

In addition to adding operations, deletion operations can also lead to “cascading updates.” All entries in the Ziplist are between 250 and 253 bytes in length. Big nodes are larger than 254 bytes, and small nodes are smaller than 254 bytes.

Hash table

The hash table is a little bit complicated. There are generally two methods of making hash table, one is: open addressing method, the other is zipper method. Redis’s hash table is made using the zipper method.

The overall structure is as follows:

It is also divided into two parts: the orange part on the left and the blue part on the right, and also the relationship between “coordination” and “implementation”. The actual implementation of the hash table, it’s all in the blue. Let’s start with the blue:

This is also divided into the left and right sides of the “overall” and “implementation” two parts.

The right part is easy to understand: is the usual zip table implementation hash table style; Arrays are buckets. Usually, different keys are located in different buckets first. If the keys are repeated, a linked list is used to string the conflicting keys together.

The process of creating a key:

If repeated:

rehash

Take a look at the orange “overall” section on the left of the hash table’s overall diagram, where there are two key properties: HT and Rehashidx. Ht is an array with only two elements ht[0] and HT [1]. Ht [0] stores the hash table used in Redis, while HT [1] is related to rehashidx and rehash of the hash table.

Rehash refers to the process of recalculating the hash and index values of keys, and then rearranging the key-value pairs.

Load factor = HT [0].used/HT [0].size.

Expansion and contraction standards

Capacity:

  • Without executing the BGSAVE and BGREWRITEAOF directivesLoad factorGreater than or equal to 1.
  • Is executing the BGSAVE and BGREWRITEAOF directives in case of hash tableLoad factorGreater than or equal to 5.

Contract:

  • Load factorWhen the value is less than 0.1, the program automatically starts to shrink the hash table.

The amount of expansion and contraction

Capacity:

  • The first one is greater than or equal toht[0].used * 2the2^n(2 to the n).

Contract:

  • The first one is greater than or equal toht[0].usedthe2^n(2 to the n).

(The following section is a detailed analysis, so you can skip to the expansion steps.) I had my doubts about shrinking: When the loading factor is less than 0.1, if there are 4 elements in the hash table and the length of the hash table is greater than 40, then the hash table will be shrunk. If there is a hash table whose length is greater than 40 but there are 4 elements (HT [0].used is 4), then the shrunk value is what?

I thought about it for a moment: based on what I said above, it should be 4. However, if it is 4, the existence and the length of the contraction is equal, is it also expanded? Open the source code to see:

Shrink specific functions:

int dictResize(dict *d)     // Narrow the dictionary d
{
    int minimal;

    // If dict_can_resize is set to 0, rehash cannot be performed, or rehash is being performed, return the error flag DICT_ERR
    if(! dict_can_resize || dictIsRehashing(d))return DICT_ERR;

    minimal = d->ht[0].used;            // Get the number of existing nodes as minimal
    if (minimal < DICT_HT_INITIAL_SIZE)DICT_HT_INITIAL_SIZE (4)
        minimal = DICT_HT_INITIAL_SIZE;
    return dictExpand(d, minimal);      // Adjust the size of dictionary D with minimal
} 
Copy the code
int dictExpand(dict *d, unsigned long size)     // Adjust or create a hash table for dictionary d based on size
{
    dictht n; 
    unsigned long realsize = _dictNextPower(size);  // Get a realsize closest to 2^n

    if (dictIsRehashing(d) || d->ht[0].used > size) // Rehash or size is not large enough to return an error flag
        return DICT_ERR;

    if (realsize == d->ht[0].size) return DICT_ERR; // Return an error flag if the new realsize is the same as the original size
    /* Allocate the new hash table and initialize all pointers to NULL */
    // Initialize the members of the new hash table
    n.size = realsize;
    n.sizemask = realsize- 1;
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    n.used = 0;

    /* Is this the first initialization? If so it's not really a rehashing * we just set the first hash table so that it can accept keys. */
    if (d->ht[0].table == NULL) {   // Set new hash table n to ht[0] if ht[0] hash table is empty
        d->ht[0] = n;
        return DICT_OK;
    }

    d->ht[1] = n;           // If HT [0] is not empty, rehash is required
    d->rehashidx = 0;       // Set the rehash bit to 0 and start incremental rehashing.
    return DICT_OK;
} 
Copy the code
static unsigned long _dictNextPower(unsigned long size)
{
    unsigned long i = DICT_HT_INITIAL_SIZE; / / DICT_HT_INITIAL_SIZE of 4

    if (size >= LONG_MAX) return LONG_MAX + 1LU;
    while(1) {
        if (i >= size)
            return i;
        i *= 2; }}Copy the code

As you can see from the code, if you shrink to a length of 4, not only will you not shrink, but you will even report an error. (😝)

Let’s go back and look at the setting again: Can the question be true? Hash table expansion are 2 double long, minimum of 4, 4, 8 = = = = = = “= = = = = = 16, 32 = = = = = = = = = = 64. 128.

That is to say, there is no longer than 40 cases, can only be 64. But if it’s 64, 64 X 0.1 (shrink limit) = 6.4, which means that when it shrinks to 6, the table shrinks. How much does it shrink to? Is eight. And then, if you go down to 4, you’re not going to contract any more. Therefore, there is no hash table of length greater than 40 that has elements 4.

Expansion step

Contraction steps

Incremental refresh

In the two GIFs of “expansion step” and “shrinkage step”, the fourth step of each graph “recalculate the data in HT [0] using the hash function and rehash to HT [1]” is not completed in one step, but is divided into N steps and completed step by step. It is possible to store tens of millions or even hundreds of millions of keys in a hash. After all, Redis can store 2^32 -1 key-value pairs per hash (more than 4 billion). Rehashing these keys at once may cause the server to stop serving for a period of time. After all, the hash function takes a while to compute ((#^.^#)).

The refresh of a hash table is repeated and progressive.

Progressive refresh is closely related to rehashidx in the orange “overall” section on the left of the image below:

  • The value of rehashidx is the element position of the current rehash
  • When rehashidx is equal to negative 1, it means that you’re not doing refresh

Even during the process, in addition to the normal execution of each operation of adding, deleting, modifying and checking the hash table, the key value of the hash table ht[0] will be rehashed to HT [1].

The following uses the expansion procedure as an example:

intset

The set of integers is one of the underlying implementations of the set key.

Jump table

This data structure looks like this:

Redis abstracts the jump table as follows:

Look at this diagram, the left is “coordinated”, the right is implemented. The overall planning part has the following instructions:

  • Header: indicates the header of the hop table
  • Tail: indicates the tail of the hop table
  • Level: indicates the number of layers of the node with the largest number of layers
  • Length: indicates the length of the hop meter

The implementation part has the following notes:

  • Header: a sentinel node of a linked list that does not record principal data.
  • It’s a two-way list
  • There is an order of points
  • O1, O2, and O3 are members of a node. They are Pointers to an SDS value.
  • The highest level is 32. Every time a new node is created, the program randomly generates a value between 1 and 32 as the size of the level array, which is called “height.”

Implementation of five data structures in Redis

Redis object

Redis does not directly use the above mentioned data structures to implement the key value database, but is based on an object, and the underlying object indirectly refers to the specific data structure mentioned above.

The structure is as follows:

string

Where: embstr and RAW are composed of SDS dynamic strings. The only difference is that when the raw memory is allocated, the redisobject and SDS are allocated, while the embSTR is the redisobject and the SDS are allocated.

The list of

hash

set

zset

For more exciting content, please follow my wechat official accountInternet Technology NestOr add wechat to discuss and communicate:

reference

  • Throwsnew.com/2017/09/12/…
  • Redis Design and Implementation