preface

This article will introduce one of the important data structures of Redis database, the dictionary. What is a dictionary? How does Redis implement dictionaries? What are the basic operations and applications of dictionaries? The following around these three questions to explain step by step.

What is a dictionary

A dictionary, also known as a Symbol table, associative array, or map, is an abstract data structure used to hold key-value pairs.

In dictionaries, a key can be associated (or mapped to) with a value. These associated keys or values are called key-value pairs.

Each key in the dictionary is unique, and programs can look up values associated with it by key, update values by key, delete entire key-value pairs by key, and so on.

Dictionaries are often built into many high-level programming languages as a data structure, but Redis’s C language does not have such a data structure built in, so Redis built its own dictionary implementation.

Dictionaries are widely used in Redis. For example, the database of Redis uses dictionaries as the bottom layer, and the operation of adding, deleting, modifying and looking up databases is also built on the operation of dictionaries.

Here’s an example: When we execute a command:

127.0.0.1:6379 >set msg "hello world"
OK
Copy the code

When you create a key-value pair in the database with the key “MSG” and the value “Hello world”, the key-value pair is stored in the dictionary representing the database.

In addition to being used to represent databases, dictionaries are also one of the underlying implementations of hash keys. Redis uses dictionaries as the underlying implementation of hash keys when a hash key contains multiple keys and values, or when the elements in a key-value pair are long strings.

For example, website is a hash key containing 10086 key-value pairs. The hash key is the name of some database, and the value of the key is the homepage url of the database:

127.0.0.1:6379 > hlen website,integer) 10086 127.0.1:6379 > hgetall website 1)"Redis"
2) "Redis.io"
3) "MariaDB"
4) "MariaDB.org"
5) "MongoDB"
6) "MongoDB.org"
#...
Copy the code

The underlying implementation of the website key is a dictionary containing 10086 key-value pairs, such as:

  • The key value pair is “Redis” and the value is “Redis. IO”.
  • The key of the key-value pair is “MariaDB” and the value is “MariaDB.org”
  • Key-value pair with key ‘MongoDB’ and value ‘MongoDB.org’

How to design a dictionary

As mentioned above, C does not have dictionary data structure, so Redis has to design its own dictionary data structure. The entire Redis database is stored with dictionaries. According to the characteristics of Redis database, dictionaries have the following characteristics:

  1. Massive data can be stored. Key-value pairs are mapping relationships, and associated values can be extracted or inserted according to the time complexity of O(1).
  2. The key type in a key-value pair can be string, integer, floating point, and so on, and the key is unique. For example, run the set test “hello world” command. In this case, the key test type is a string. If test exists in the database, the operation is performed.
  3. The median value of a key-value pair can be String, Hash, List, Set, or SortedSet.

Next, we will explain how to design the data structure of dictionary step by step based on the above three characteristics.

Feature 1: Massive data storage, O(1) complexity

Since we are going to implement mass data storage, the first field in our dictionary data structure should be data, which points to the memory address of the data store.

“The time complexity of O(1) can be devaluated in massive data”. The data structure that can realize this feature is the C array, which can store massive data and value according to the time complexity of the subscript O(1). At this point we tentatively assume that the first field, data, should be an array. Next, let’s take a look at how C arrays work and see why they are so efficient.

Principle of C array

A collection of finite objects of the same type is called an array, and the objects that make up the array are called the elements of the array. The numeric numbers used to distinguish the elements of the array are called subscripts, and the types of elements can be array, character, pointer, structure, etc.

So how do you read data in order 1 in C array?

Let’s start by declaring an array variable:

int a[10];
Copy the code

In other words, array A is a collection of 10 objects of the same type stored in a contiguous chunk of memory. By default, A points to the first address.

When you need to operate on elements in array A, C language needs to find the corresponding memory address by subscript, and then it can operate on the corresponding memory. For example, when we want to read the value of a[9], C is actually converted to the form *(a+9), a[9] and *(a+9) are equivalent, we take the address of both sides of the equation, we can get &[a]==a+9, that is, to obtain the address of a[9], We can offset the first address of array A by 9 elements, and we can also see that arrays are evaluated by subscripts using a header pointer and an offset. As shown below:

When there is a large amount of data in a data, the head pointer + offset can be used to locate the memory address of the data with O(1) time complexity, and then perform corresponding operations. This feature of C arrays is an obvious choice for solving massive data storage and making it fast to read.

According to the introduction of array, C array can quickly locate elements by subscript, and as long as the memory is sufficient, it can also store massive data, basically meeting the first feature.

Therefore, the schematic diagram of dictionary data structure satisfying feature 1 can be designed as follows:

Feature 2: The key is unique and can be a string, integer, floating point, etc

From the previous array introduction, we can see that “subscript” refers to the number of elements in the array, which can only be an integer. In feature 2, the key types in key-value pairs can be strings, integers, floating point, etc., which cannot be directly used as subscripts. In this case, special processing is required for keys, which is called Hash.

The Hash function

The Hash function converts an input of any length into a Hash value of fixed type and length using a Hash algorithm. In other words, the Hash function can convert different keys into unique integer data. Hash functions generally have the following characteristics:

  1. The same input is hashed to produce the same output.
  2. Generally, different input values are computed with the Hash. However, the same output value may also be generated.

Therefore, a good Hash algorithm has strong random distribution of output values after Hash calculation. For example, the “Times 33” Hash function published by Daniel J.Bernstein on Comp.lang. c uses the following core algorithm: Hash (I)=hash(i-1)*33+ STR [I], which is one of the best known hash functions for strings because it is fast to evaluate and the output values are well distributed.

In application, ready-made open source Hash algorithms are usually used. For example, the Redis client uses the “Times 33” Hash function to calculate the Hash value of strings, while the Hash function of Redis server uses the Siphash algorithm. Its main function is similar to that of the client-side Hash function, with the advantage that the Hash values calculated for regular keys have strong random distribution, but the algorithm is more responsible. Therefore, we use the client-side Hash function as an example to understand the Hash algorithm.

    static unsigned int dictGenHashFunction (const unsigned char *buf, int len) {
        unsigned int hash = 5381;
        while (len--)
            hash = ((hash << 5) + hash) + (*buff++) /* hash * 33 + c */
        return hash;
    }
Copy the code

Hash << 5 is hash * 32, (hash << 5) + hash) is hash * 33.

DictGenHashFunction Main function is: The input parameter is a string of arbitrary length, and returns unsigned integer data after Hash calculation. Therefore, we can use the Hash function to convert any input key into integer data that can be used as an array subscript.

The second feature of the dictionary is “keys can be strings, integers, floating-point, etc.” The Hash function only converts strings to integers. What happens when a key is of a non-string type?

The answer is simple: the key type is perceived by the client, and the Redis server receives that the key sent by the client is actually a string.

Although the Hash function can convert any output key into integer data output, it introduces a new problem. The Hash value of the key is very large, and it is obviously not appropriate to directly use it as an array subscript. If the subscript of the data is too large, the array will occupy too much memory. We need to add two fields to the dictionary data structure: (1) the total size of the dictionary data structure: (2) the total size of the dictionary data structure: (3) the total size of the dictionary data structure; ② The used field of the stored data amount. After these two fields are added, the dictionary data structure looks like this:

How does a large Hash value relate to a small array index? The simplest way is to get a value that is always smaller than the size of the array. In this case, the value can be used as an array subscript. We call the value after the Hash value as the index value of the key in the dictionary, that is, “index value == array subscript value”. The problem with this method, however, is that it tends to cause Hash conflicts.

Hash conflict

According to the previous introduction to Hash, the Hash values of different keys have strong random distribution, but there is a small probability that the same value will be entered. In this case, the calculated index value of the keys will be the same. In other words, two different keys will be associated with the subscript of the same array, which is called a conflict between these keys.

To resolve Hash collisions, the elements in the array should store the “value” of the key-value pairs, the “key” information, which can string the conflicting key-value pairs into a single linked list, and a next pointer, which can determine whether the “key” information is the current key to look up. At this point, the fields of the array elements are also defined, and the dictionary data structure is shown as follows:

When looking for values according to keys, there are the following steps:

  1. The key obtains the index value through Hash and mod operations, and finds the corresponding element based on the index value.
  2. If the value is equal, the value in the element is returned; otherwise, if the next pointer has a value, the next pointer is read to the element; if there is a value, the next pointer is read to the element; if there is no value, the key does not exist in the dictionary, and NULL is returned.

The dictionary data structure is designed so that the first half of the second feature is implemented, as well as the “key is unique” feature, so the above lookup is performed every time the key-value pair is inserted into the dictionary. If the key already exists, modify the value in the element, and otherwise perform the insert.

Feature 3: The median value of a key-value pair can be String, Hash, List, or Set

How can we ensure that the median value of a key-value pair can be of any type? In fact, it is very simple to set the val field in the array element as a pointer to any memory in which the value is located.

At this point, we have completed the design of the dictionary data structure, presumably we have already known the cause and effect of the dictionary, next let’s take a look at how the dictionary is implemented in Redis.

Redis dictionary implementation

The Redis dictionary structure uses Hash tables as the underlying implementation. A Hash table can have multiple Hash table nodes, and each Hash table node holds a key/value pair in the dictionary. Next, we will introduce Hash tables, Hash table nodes, and dictionary implementations respectively, referring to the source codes dict.c and dict.h

The Hash table

typedef struct dictht {
    dictEntry **table;      /* Pointer array for storing key-value pairs */
    unsigned long size;     /* Table array size */
    unsigned long sizemask; /* 掩码 = size - 1 */
    unsigned long used;     /* The number of existing elements in the table array, including the next single list */
} dictht;
Copy the code

The above structures are Hash table structures, and each dictionary has two such structures for dictionary rehash. We’ll see that later.

The Hash table takes 32 bytes (8 + 8 + 8 + 8). Table, size and Used fields have been clearly defined.

  • Table: array that stores key-value pairs. The elements in the array refer to the structure of dictEntry. Each dictEntry contains key-value pairs.
  • Size: total size of the table array;
  • Used: indicates the number of stored key-value pairs in the table array.

Let’s look at the Size ask field. The sizemask field is used to calculate the key index value. The sizemask value is equal to size-1. This field was introduced to speed up the conversion of hash values into array indexes.

How Redis uses the sizemask field to speed up the conversion of hash values into array indexes is as follows:

  1. The initial Hash table array capacity is set to 4. As the storage capacity of key-value pairs increases, the Hash table needs to be expanded. The new expanded Hash table capacity is set to double the current capacity. . Sizemask masks can only be 3,7,15,31… , the binary number is 11, 1111, 1111, 11111…… . So the binary of the mask must be 1 for each bit.
  2. Index value = Hash value & mask value. Redis source code:idx = hash & d->ht[table].sizemask, the result is the same as the Hash value and the size of the Hash table mod, and the computer’s bit operation is much faster than mod.

The empty Hash table is shown as follows:

Hash table node

Elements in the Hash table are encapsulated by the dictEntry structure, which is mainly used to store key-value pairs. The structure is as follows:

typedef struct dictEntry {
    void *key;              /* Store key */
    union {                 /* Only one selected member can be stored */ 
        void *val;          /* val */ in db.dict
        uint64_t u64;       
        int64_t s64;        /* db.expires Stores an expiration time */
        double d;
    } v;                   /* value, which is a union */
    struct dictEntry *next;  /* When a Hash conflict occurs, refer to the conflicting elements to form a singly linked list */
} dictEntry;
Copy the code

The dictEntry member has the following properties:

  • Key is a pointer to a key in a key-value pair, and is of type void*, which means it can point to any type. Dict allows you to implement pseudo-generics of keys.
  • V is union, representing the value in a key-value pair. Val, U64, S64, and D are all 8 bytes in size. And since val is void*, which means it can point to any type, pseudo-generics of values can be implemented. There’s only one v in each dictEntry.
  • Next is a pointer to the next node, used to resolve Hash collisions (chaining addresses).

Key points:

  1. The dictEntry element structure in the Hash table takes 24 bytes (8 + 8 + 8, union size is the size of the longest byte of the class type).

  2. Union V uses different fields in different scenarios, for example:

    • The *val field is used to store all key/value pairs in the Redis database, which can point to different types of values.
    • The dictionary is used to record the expiration time of the key, using s64 field storage;
  3. When a Hash conflict occurs, the next field is used to point to the conflicting element, using header interpolation to form a singly linked list.

When there are three key-value pairs, add K2 =>v2, k1=>v1 and k3=>v3 successively, assuming that K1 and K2 Hash conflict, then these three key-value pairs will be stored in the dictionary, as shown in the diagram below:

Note: dictEntry* is an array of Pointers.

The dictionary

In addition to the two structure Hash tables and Hash table nodes described above, the Redis dictionary implementation also encapsulates a data structure called the dictionary in the outermost layer, which is mainly used to encapsulate the Hash table. When the dictionary needs to perform some special operations, it needs to use the auxiliary fields in the dictionary. The specific structure is as follows:

typedef struct dict {
    dictType *type; /* The dictionary corresponds to the specific operation function */
    void *privdata; /* The data that the dictionary depends on */
    dictht ht[2]; /* Hash table where key-value pairs are stored. There are two, used for rehash with */
    long rehashidx; /* Rehash the identifier. The default value is -1, indicating that the rehash operation is not performed. If the value is not -1, it indicates that the rehash operation is being performed. The stored value indicates the index */ to which the rehash operation on Hash table HT [0] is performed
    unsigned long iterators; /* The number of iterators currently running */
} dict;
Copy the code

The dictionary structure takes up 96 bytes (8 + 8 + 32*2 + 8 + 8). The type field points to the dictType structure, which contains the function pointer to the dictionary operation, as follows:

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key); /* The dictionary corresponding hash function */
    void *(*keyDup)(void *privdata, const void *key); The copy function corresponding to the /* key */
    void *(*valDup)(void *privdata, const void *obj); /* corresponds to the copy function */
    int (*keyCompare)(void *privdata, const void *key1, const void *key2); /* The key comparison function */
    void (*keyDestructor)(void *privdata, void *key); /* Key destruction function */
    void (*valDestructor)(void *privdata, void *obj); /* the destruction function of the value */
} dictType;
Copy the code

The Redis dictionary is a data structure that is used in many other places besides the k-V data store of the main database. For example, the Sentinel mode of Redis uses dictionaries to store and manage all the Master and Slave nodes. For example, when the value of a key-value pair in a database is of the Hash type, the dictionary used to store the Hash type value is also used. In different applications, key-value pairs in dictionaries may be different, and dictType structure is a set of operation functions abstracted to realize dictionaries of various forms.

The meanings of other fields are as follows:

  • Privdata: Private data used in conjunction with the function pointed to by the type field;

  • Ht field: it is an array of size 2. The element type stored in this array is Dictht. Although there are two elements, ht[0] is generally used only.

  • The rehashidx field is used to indicate whether the dictionary is rehashing. If it is not rehashing, the value is -1; otherwise, it is used to indicate which element ht[0] rehashed to, and to record the array subscript value of that element.

  • Iterators field: Records the number of safe iterators currently running. Rehash operations are suspended when a safe iterator is bound to the dictionary. Iterators are used in many situations in Redis. For example, the keys command creates a safe iterator, in which iterators increment by 1, and the command decrement by 1. The sort command creates a normal iterator, which does not change.

Thus, a complete Redis dictionary data structure looks like this:

Basic operation

Before we explained the concept of dictionary, how to design a dictionary and the implementation of Redis dictionary, next we will execute commands and read the source code, to see how the Redis dictionary initialization and add, modify, find, delete elements.

Dictionary initialization

In the startup of Redis-server, the entire database will first initialize an empty dictionary to store the key value pairs of the entire database, initialize an empty dictionary, and call the dictCreate method.

/* dict-benchmark [count] */
int main(int argc, char **argv) {
    long j;
    long long start, elapsed;
    dict *dict = dictCreate(&BenchmarkDictType,NULL); .Copy the code

DictCreate dictionary method the source code is shown below. This method first requests a block of memory for the dictionary data structure, and then calls the _dictInit method to initialize the value for the dictionary structure.

/* Create a new Hash table */
dict *dictCreate(dictType *type,
        void *privDataPtr)
{
    dict *d = zmalloc(sizeof(*d)); // Allocate 96 bytes of memory for the dictionary

    _dictInit(d,type,privDataPtr); // Struct initialization value
    return d;
}
Copy the code

The _dictInit method initializes the value of the struct. This method assigns the type and privDate values of the struct, and sets the rehashidx field to -1, indicating that no rehash is performed, and sets the iterators field to 0. The _dictReset method is called to initialize the values for the two Hash tables in the dictionary.

/* Initialize the Hash table */
int _dictInit(dict *d, dictType *type,
        void *privDataPtr)
{
    _dictReset(&d->ht[0]);
    _dictReset(&d->ht[1]);
    d->type = type;
    d->privdata = privDataPtr;
    d->rehashidx = - 1;
    d->iterators = 0;
    return DICT_OK;
}
Copy the code

The _dictReset method for initializing values of the Hash table is shown below. This method sets the individual members of the Hash table structure to default values.

static void _dictReset(dict *ht) {
    ht->table = NULL;
    ht->size = 0;
    ht->sizemask = 0;
    ht->used = 0;
}
Copy the code

The dictCreate function initializes an empty dictionary by applying for space, calling the _dictInit function and assigning initial values to each field of the dictionary. After initialization, the memory usage of a dictionary is as follows:

Add elements

In Bash, we add a key-value pair as follows:

127.0.0.1:6379 >set k1 v1
OK
Copy the code

Void setKey(redisDb *db, robj *key, robj *val) is executed by the Server after receiving the command to add elements. As we mentioned earlier, one of the features of dictionaries is that each key must be unique, so element addition requires the following steps: first check to see if the key exists, modify if it does, and add key-value pairs otherwise. DictFind is called to query whether the key exists. If it is, DBOverwrite is called to modify the key value pair; otherwise, dbAdd is called to add elements.

void setKey(redisDb *db, robj *key, robj *val) {
    if (lookupKeyWrite(db,key) == NULL) { /* Find if the key exists */
        dbAdd(db,key,val); /* If not, add key */
    } else {
        dbOverwrite(db,key,val); /* overwrites */ if it exists
    }
    incrRefCount(val); /* Reference count of added value */
    removeExpire(db,key); /* Expiration time of recharge key in database */
    signalModifiedKey(db,key); /* Send notification of key change */
}
Copy the code

Let’s look at the dbAdd method.

void dbAdd(redisDb *db, robj *key, robj *val) {
    sds copy = sdsdup(key->ptr); /* Copy key name */
    int retval = dictAdd(db->dict, copy, val); /* Try to add key-value pairs */

    serverAssertWithInfo(NULL,key,retval == DICT_OK); /* If the key already exists, stop */
    if (val->type == OBJ_LIST ||
        val->type == OBJ_ZSET)
        signalKeyAsReady(db, key); /* Add the key to the server.ready_keys list */
    if (server.cluster_enabled) slotToKeyAdd(key); /* If cluster mode is enabled, save the key to the slot */
}
Copy the code

We can see that dbAdd actually calls dictAdd method to insert the key value pair. Therefore, we will focus on this method.

DictAdd method source code is as follows:

/* Add an element to the target Hash table */
int dictAdd(dict *d, void *key, void *val)
{
    dictEntry *entry = dictAddRaw(d,key,NULL); /* Add a key, return NULL if the key already exists, otherwise add a key to a new node, return a new node */

    if(! entry)return DICT_ERR; Return error */ if the key exists
    dictSetVal(d, entry, val); /* Sets the value */
    return DICT_OK;
}
Copy the code

The logic of the dictAdd function method is also simple, that is:

  1. calldictAddRowFunction to add keys, return NULL if keys already exist in the dictionary, otherwise add keys to the Hash table, and return the newly added Hash node;
  2. Set the value to the returned new node, that is, update its val field.

Let’s look at the dictAddRow function how to add or find keys, the code is as follows:

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing) /* Enter the dictionary, key, Hash table node address */
{
    long index;
    dictEntry *entry;
    dictht *ht;

    if (dictIsRehashing(d)) _dictRehashStep(d); /* Whether the dictionary is rehash, if yes, rehash */

    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == - 1) /* Find the key, return -1, save the old node to the existing field, otherwise return the index of the new node. If the Hash table capacity is insufficient, expand the Hash table */
        return NULL;

    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0]; /* Insert into hash HT [1] if yes, insert into hash HT [0] */ if no
    entry = zmalloc(sizeof(*entry)); /* Apply for a new node memory */
    entry->next = ht->table[index]; /* Next pointer to ht->table[index] */
    ht->table[index] = entry; /* Set ht->table[index] to this node */
    ht->used++;

    dictSetKey(d, entry, key); /* Store key information to the new node */
    return entry;
}
Copy the code

DictAddRaw function main function is to add or find keys:

  • Returns NULL on success and stores the old node into the Existing field.

  • Otherwise, add the node and return as follows:

    • Get index position index;
    • Insert hash ht[0] into hash ht[1]; insert hash HT [0] into hash ht[1]
    • Allocate memory for a new node.
    • Update next pointer to ht->table[index];
    • Update ht->table[index] pointer to new node
    • Store key information to the new node;
    • Returns the node.

Dictkeyindex function dictkeyindex function dictkeyindex

static int _dictKeyIndex(dict *ht, const void *key) {
    unsigned int h;
    dictEntry *he;

    if (_dictExpandIfNeeded(ht) == DICT_ERR) /* Extend the Hash table */ if necessary
        return - 1;

    h = dictHashKey(ht, key) & ht->sizemask; /* Computes the Hash value of the key */
    
    /* Searches to see if the location already contains the key */
    he = ht->table[h];
    while(he) {
        if (dictCompareHashKeys(ht, key, he->key))
            return - 1;
        he = he->next;
    }
    return h;
}
Copy the code

This method is used to obtain the index value of the key, the index value is similar to the previous introduction, there are mainly two steps:

  1. dictHashKey(ht, key): Call the Hash function of the dictionary to get the Hash value of the key;
  2. dictHashKey(ht, key) & ht->sizemask: Mod the Hash value of the key to the dictionary mask to obtain the index value.

DictAddRaw function can directly locate the location of “key value pair” to be stored after getting the key index value. A new node can be created to store it. After the key-value pair is added, the memory usage of the dictionary is shown as follows:

Note: Key and V.Val in dictEntry are actually Pointers, so the value is directly written into dictEntry for the convenience of display.

The dictionary capacity

As the Redis database is gradually added, the capacity of the dictionary storing key-value pairs reaches the upper limit. In this case, the Hash table of the dictionary needs to be expanded. The code for expansion is as follows:

int dictExpand(dict *d, unsigned long size)
{
    if (dictIsRehashing(d) || d->ht[0].used > size) /* If the table size is smaller than ht[0], an error */ will be thrown
        return DICT_ERR;

    dictht n; /* New hash table */
    unsigned long realsize = _dictNextPower(size); /* Recalculate the value after expansion. The value must be 2 to the power of N */

    /* Rehashing to the same table size is not useful. */
    if (realsize == d->ht[0].size) return DICT_ERR; /* The recalculated value is invalid if it is the same as the original size */

    /* Allocates a new Hash table and initializes all Pointers to NULL */
    n.size = realsize;
    n.sizemask = realsize- 1;
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    n.used = 0;

    /* Instead of rehash, ht[0] receives the value */
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }

    /* Prepare a second Hash table to perform progressive hashing */
    d->ht[1] = n; /* The new memory after expansion is placed in HT [1] */
    d->rehashidx = 0; /* Non-default -1, which means rehash */ is required
    return DICT_OK;
}
Copy the code

Capacity expansion process is as follows:

  1. When applying for a new memory, the default capacity is 4 dictentries. For non-initial Hash application, the required memory must be twice the capacity of the current Hash table.
  2. The newly obtained memory address is assigned to HT [1] and the dictionary’s rehashidx identifier is changed from -1 to 0, indicating that the rehash operation is required later.

At this point, the schematic diagram of the dictionary’s memory structure is shown below:

Note: After the expansion, the dictionary capacity and mask value will change, and the index value obtained after bitwise operation of the same key and mask will change, resulting in the failure to find the value according to the key. The solution to this problem is to put the newly expanded memory into a new Hash table (HT [1]) and label the dictionary with the identifier of the rehash operation (rehashidx! = 1). After that, new key-value pairs are stored in the new Hash table. The modification, deletion, and search operations need to be checked in HT [0] and HT [1] before deciding which Hash table to operate on.

When we added the element above, we also talked about the _dictExpandIfNeeded function, so let’s look at what this function does.

static int _dictExpandIfNeeded(dict *d)
{
    if (dictIsRehashing(d)) return DICT_OK; /* Rehash is being performed, returns */

    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE); /* If the table capacity is 0, set the table capacity to the initial value 4 */

    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio)) /* Determine whether expansion is required */
    {
        return dictExpand(d, d->ht[0].used*2);
    }
    return DICT_OK;
}
Copy the code

As you can see, what this function is really doing is deciding if it needs to be expanded, and when?

  • Rehash does not perform capacity expansion.
  • If the Hash table size is 0, expand it to the initial value 4.
  • Normally, icT_CAN_resize is 1. If the number of key-value pairs in the hash table is greater than or equal to the size of the hash table, the number of key-value pairs will be doubled.
  • In some cases, if dict_CAN_resize is 0, Redis will avoid expansion, but if the hash table is already full (load factor greater than 5), it will force expansion.

In addition, all the data in the old Hash table (HT [0]) needs to be migrated and inserted into the new Hash table (HT [1]) after recalculating the index value. This migration process is called rehash. Let’s look at the implementation of rehash.

Progressive rehash

The rehash operation is triggered not only for capacity expansion but also for capacity reduction. The rehash code looks like this:

int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Maximum number of empty buckets accessed, n*10 */
    if(! dictIsRehashing(d))return 0; /* Dict returns */ when it is not rehashing

    while(n-- && d->ht[0].used ! =0) { /* n indicates the maximum number of migrated elements */
        dictEntry *de, *nextde;

        assert(d->ht[0].size > (unsigned long)d->rehashidx); /* To prevent rehashidx from crossing bounds, */ is not continued when rehashidx is larger than ht[0]
        while(d->ht[0].table[d->rehashidx] == NULL) { /* When the bucket at the rehashidx position is empty, continue traversing until the bucket is not empty or the maximum number of empty buckets accessed */
            d->rehashidx++;
            if (--empty_visits == 0) return 1; /* Maximum number of empty buckets --, if the value is 0, exit */
        }
        de = d->ht[0].table[d->rehashidx]; /* The element in the current bucket, the dictEntry pointer */
        
        while(de) { /* Move the elements from the old table to the new table */ through the bucket
            uint64_t h;

            nextde = de->next;
            h = dictHashKey(d, de->key) & d->ht[1].sizemask; /* Get key index in HT [1] */
            de->next = d->ht[1].table[h]; /* insert */ into the header
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL; /* HT [0] Is set to empty */
        d->rehashidx++;
    }

    if (d->ht[0].used == 0) { /* Check if rehash is complete */
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = - 1;
        return 0;
    }

    return 1;
}
Copy the code

The principle of rehash is as simple as putting the reconstructed index of the table in HT [0] into HT [1]. So why do we have progressive hashing?

As we know, Redis can provide high performance online services in single-process mode. When the number of key/value pairs in the database reaches millions, tens or billions, the whole rehash process will be very slow. If the rehash process is not optimized, it may cause serious service unavailability. Redis’s idea of optimization is clever, using the idea of divide-and-conquer to rehash operations. The general steps are as follows:

  • Before performing insert, delete, search, and modify operations, check whether the current dictionary rehash operation is in progressdictRehashStepFunction to rehash (rehash only one node at a time).
  • In addition to these operations, is called when the service is idle and the current dictionary also needs a rehash operationincrementallyRehashThe rehash function performs batch rehash operations on 100 nodes at a time for 1 millisecond.

After N rehash operations, the entire HT [0] data is migrated to HT [1]. The advantage of this is that the time that should have been centralized is spread over millions, millions, billions of operations, so the time is negligible.

How does a single rehash work incrementally?

  • withnTo store the maximum number of migrations.n--;
  • Or maintain oneempty_visitsField, which indicates the maximum number of empty buckets to be accessed. The value isn*10Executes whenever an empty bucket is located--empty_visits.

When either of the n or empty_visits fields above is 0, the loop is broken.

Of course, there are several points to note in progressive hashing:

  1. When the key is moved from HT [0] to HT [1], the hash value needs to be recalculated.
  2. Insert HT [1] uses the head node insert;
  3. rehashidx++;
  4. whenht[0].used == 0, the value of the rehashidx field needs to be -1.

We have covered all this above, but we do not want to go into details.

Look for the element

dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    uint64_t h, idx, table;

    if (d->ht[0].used + d->ht[1].used == 0) return NULL; /* Returns */ if dict is empty
    if (dictIsRehashing(d)) _dictRehashStep(d); /* If rehash is being performed, call the _dictRehashStep method to advance the rehash process */
    h = dictHashKey(d, key); /* Computes the hash value */
    for (table = 0; table <= 1; table++) { /* Find the first hash table ht[0] first, and then ht[1] */ if rehash is being performed
        idx = h & d->ht[table].sizemask; /* Calculate index */
        he = d->ht[table].table[idx];
        while(he) { /* iterate over the key */
            if (key==he->key || dictCompareKeys(d, key, he->key))
                return he;
            he = he->next;
        }
        if(! dictIsRehashing(d))return NULL; /* If there is no rehash, return directly, otherwise continue to look for */ in HT [1]
    }
    return NULL;
}
Copy the code
static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}
Copy the code

The process of finding the key is also very simple, mainly divided into the following steps:

  1. Call the Hash function based on the key to get its Hash value;
  2. Get the index value based on the Hash value;
  3. Iterating through the two Hash tables of the dictionary, reading the corresponding elements of the index;
  4. Iterate over the single linked list of elements. If a key matching its own key is found, the element is returned;
  5. Returns NULL if it cannot be found;

Note that:

  • If you’re doing a rehash, take a step forward, as we mentioned above.
  • If rehashidx! = -1, then it means that there is data in both tables, and HT [0] cannot be found, so we need to check ht[1].

Modify the element

void dbOverwrite(redisDb *db, robj *key, robj *val) {
    dictEntry *de = dictFind(db->dict,key->ptr); /* If the key exists, return the node */ that exists

    serverAssertWithInfo(NULL,key,de ! =NULL); /* If it does not exist, the execution of */ is interrupted
    dictEntry auxentry = *de; 
    robj *old = dictGetVal(de); /* Get the value of the old node val */
    if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
        val->lru = old->lru;
    }
    dictSetVal(db->dict, de, val); /* Sets the node to a new value */

    if (server.lazyfree_lazy_server_del) {
        freeObjAsync(old);
        dictSetVal(db->dict, &auxentry, NULL);
    }

    dictFreeVal(db->dict, &auxentry); /* Free old val memory */
}
Copy the code

Although the method in dict.h is not called to modify the elements in the dictionary, the modification process is basically similar, Redis modifs the key value pairs, the whole process is mainly divided into the following steps:

  1. calldictFindFind if the key exists;
  2. If no, the execution is interrupted.
  3. Change the value in the node key-value pair to the new value.
  4. Free old value memory.

Remove elements

static int dictDelete(dict *ht, const void *key) {
    unsigned int h;
    dictEntry *de, *prevde;

    if (ht->size == 0) /* If the hash table capacity is 0, an error */ is returned
        return DICT_ERR;
    h = dictHashKey(ht, key) & ht->sizemask; /* Get the hash value */
    de = ht->table[h];

    prevde = NULL;
    while(de) {
        if (dictCompareHashKeys(ht,key,de->key)) { /* Compares the element key */
            if (prevde)
                prevde->next = de->next;
            else
                ht->table[h] = de->next;

            dictFreeEntryKey(ht,de); /* Release key */
            dictFreeEntryVal(ht,de); /* Release val */
            free(de);
            ht->used--;
            return DICT_OK;
        }
        prevde = de;
        de = de->next;
    }
    return DICT_ERR;
}
Copy the code

The main execution procedure for deleting an element is:

  1. Find if the key exists in the dictionary;
  2. If it exists, the node is removed from the single linked list.
  3. Release the memory occupied by keys, values, and itself of the node.
  4. Subtract 1 from the used dictionary of the Hash table.

When the usage of data in the dictionary is less than 10% of the total space after a series of operations, the capacity reduction operation will be carried out to keep the memory occupied by Redis database within a reasonable range and not waste memory.

The reduction function is as follows:

#define HASHTABLE_MIN_FILL        10      /* Fill the hash table at least 10% */

void tryResizeHashTables(int dbid) {
    if (htNeedsResize(server.db[dbid].dict)) /* Check whether the used/size is less than 10% */
        dictResize(server.db[dbid].dict); /* Perform capacity reduction */
    if (htNeedsResize(server.db[dbid].expires))
        dictResize(server.db[dbid].expires);
}

int dictResize(dict *d) /* Scaling function */
{
    int minimal;

    if(! dict_can_resize || dictIsRehashing(d))return DICT_ERR;
    minimal = d->ht[0].used;
    if (minimal < DICT_HT_INITIAL_SIZE) /* The minimum capacity is 4 */
        minimal = DICT_HT_INITIAL_SIZE;
    return dictExpand(d, minimal); /* Call the capacity expansion function, the actual capacity reduction */
}
Copy the code

The whole process of capacity reduction is roughly as follows: Judge whether the current capacity reaches the minimum threshold, that is, used/size<10%. If the threshold is reached, call the dictResize function for capacity reduction. The capacity of the function after capacity reduction is essentially the minimum 2^N integer of used. The capacity reduction operation is similar to the expansion operation in essence, and the dictExpand function is finally called. The subsequent operation is consistent with the expansion, which will not be described here.

Dictionary traversal

The basic concepts and operations of dictionaries have been explained previously. Next, we will talk about dictionary traversal operations. The principles of traversal of databases are as follows:

  1. Do not repeat the data;
  2. Do not omit any data;

We should know that there are two main ways to traverse the entire Redis database: full traversal (e.g. keys command), intermittent traversal (hsCAN command) :

  • Full traversal: traverses the entire database in one command execution;
  • Intermittent traversal: Only part of the data is obtained during each command execution.

Let’s learn about these two traversal methods.

Iterator traversal

Iterators — An interface that can be accessed on a container (a dictionary, a linked list, etc.). The designer does not need to care about the contents of the container. The interface that calls the iterator is fixed to iterate through the data.

Dictionary iterators are mainly used to iterate over the data in the dictionary data structure. Since it is iterating over the data in the dictionary, a problem will inevitably occur. During the iteration, if data is added or deleted, the dictionary may trigger the rehash operation, or the dictionary is rehash operation when the iterator starts. As a result, one piece of data may be traversed multiple times. So how does Redis solve this problem? With that in mind, let’s take a look at the implementation of the iterator.

The basic data structure implemented by the iterator in Redis source code is as follows:

typedef struct dictIterator {
    dict *d; /* Iterated dictionary */
    long index; /* Which index value in the Hash table is currently iterated to */
    int table, safe; /* table is used to indicate the current iterating Hash table, ht[0] and HT [1], and safe is used to indicate whether the currently created Hash table is a safe iterator */
    dictEntry *entry, *nextEntry; /* Current node, next node */
    long long fingerprint; /* The fingerprint of the dictionary, when the dictionary is not changed, the value is unchanged, when the change is also changed */
} dictIterator;
Copy the code

The entire data structure occupies 48 bytes (8 + 8 + 4 + 4 + 8 + 8 + 8 + 8), where:

  • The d field points to the dictionary to be iterated over;
  • The index field indicates the current index value in the Hash table.
  • The table field indicates the current iterating Hash table (0 and 1 in HT [0] and HT [1]).
  • The safe field indicates whether the iterator currently being created is in safe mode;
  • The entry field indicates the node data being read.
  • The nextEntry field represents the data pointed to by the Next field in the Entry node.
  • The Fingerprint field is a 64-bit integer that represents the state of the dictionary at a given time. It is called the fingerprint of the dictionary here, because the value of the dictionary is the Hash value generated by combining all the field values in the dictionary (dict structure). Therefore, when the data in the dictionary changes, its value will be different, and the generated algorithm isdict.cIn the filedictFingerprintFunction.

To make the iterative process easier, Redis also provides iteration-related API functions, mainly as follows:

dictIterator *dictGetIterator(dict *d); /* Initializes the iterator */
dictIterator *dictGetSafeIterator(dict *d); /* Initializes the safe iterator */
dictEntry *dictNext(dictIterator *iter); /* Get the next node */ through the iterator
void dictReleaseIterator(dictIterator *iter); /* Release the iterator */
Copy the code

After a brief introduction to the iterator structure, field meanings, and API, let’s take a look at how Redis solves the problem of adding and deleting data without having to read and delete data repeatedly. Redis is a single-process single-thread mode, and there is no case that two commands can be executed at the same time. Therefore, the preceding problem will be triggered only when the command executed deletes data during the traversal. We classify iterator traversal data into two categories:

  1. Plain iterators: only iterate over data;
  2. Safe iterator: Delete data while traversing.

1. Plain iterators

When a common iterator iterates data in the dictionary, it checks the fingerprint field value of the iterator strictly to ensure that the dictionary structure does not change during iteration and that the retrieved data does not duplicate.

When Redis executes partial commands, it iterates over dictionary data using ordinary iterators, such as the sort command. The sort command is used to sort the elements of a given list, set, or ordered set. If a given ordered set uses a dictionary for its member names and a skip list for its points, then the iterator is used to traverse the entire dictionary.

The process of iterating data with ordinary iterators is relatively simple, which is mainly divided into the following steps:

  1. Call dictGetIterator to initialize a common iterator. At this time, iter->safe will be set to 0, indicating that the initialized iterator is a common iterator.

  2. The dictNext function is cyclically called to traverse nodes of the Hash table in the dictionary in turn. The first traverse will obtain the fingerprint value of the current dictionary through the dictFingerprint function.

    1. Note: The two Pointers to Entry and nextEntry respectively point to the two parent nodes after the Hash conflict. If the entry node is deleted in safe mode, the nextEntry field can ensure that the data in subsequent iterations is not lost.
  3. When calling dictNext function to iterate over node data in the dictionary Hash table, the release iterator will continue to call dictFingerprint function to calculate the fingerprint value of the dictionary and compare it with the fingerprint value obtained for the first time. If no, the ASSERTION FAILED === ASSERTION is displayed, and the program exits.

The common iterator limits only iterative operations in the whole iteration process by comparing the fingerprint values of step 1 and Step 3. That is, operations such as the modification, addition, deletion and search of dictionary data in the iteration process cannot be carried out. Only the dictNext function can be called to iterate over the whole dictionary, or an exception will be reported. Thus to ensure the accuracy of the data taken out by the iterator.

Note: The dictRehashStep function is called for incremental rehash operations to modify, add, delete, and find the dictionary, resulting in a change in fingerprint value.

2. Security iterators

The safe iterator iterates the data in a similar way to the common iterator, and iterates the nodes of the Hash table in the dictionary by calling the dictNext function in a loop. The safe iterator ensures the accuracy of the data read, not by limiting some operations of the dictionary, but by limiting the rehash, so that the dictionary can be added, deleted, changed, and looked up during iteration.

We know that adding, deleting, changing and searching a dictionary calls dictRehashStep for incremental rehash operations, so how do we limit rehash operations? DictRehashStep (); dictRehashStep ();

static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}
Copy the code

The principle is simple: if the current dictionary has a safe iterator running, no progressive rehash is performed, the rehash operation is paused, and the dictionary is not iterated repeatedly, thus ensuring that the data is read accurately.

When Redis executes some commands, it will use secure magnetic bars to iterate dictionary data, such as keys command. Keys command mainly returns the list of all keys of a given mode through pattern matching, and deletes expired keys. Redis data key-value pairs are stored in the dictionary, so the keys command iterates through the dictionary with a security iterator. The whole iteration process of security iterator is relatively simple, which is mainly divided into the following steps:

  1. calldictGetSafeIteratorThe iteration->safe () function initializes a safe iterator, setting iteration->safe to 1 to indicate that the initialized iterator is safe.
  2. Cycle calldictNextThe iterators function iterates each node in the Hash table. The first iteration increments the iterators field to ensure that the progressive rehash operation is interrupted during iteration.
  3. When callingdictNextWhen the iterator is released, the iterators field in the dictionary is subtracted by 1 to ensure that the progressive rehash operation can work properly. The iteratorss bullet of the dictionary is modified in step 1 and Step 3 to interrupt the progressive rehash operation during iteration, thus ensuring the accuracy of data read by the iterator.

Intermittent traverse

We explained the implementation of “full traversal” dictionary above, but there is a problem highlighted, when there is a large amount of data in the database, execute the keys command to perform a full traversal of the database, which must take a long time, resulting in a short Redis unavailable, so the Scan operation was added after Redis 2.8.0. This is called “intermittent traversal”. DictScan is an implementation of “discontinuous traversal”, which is mainly used when iterating data in the dictionary. For example, hsCAN command iterates key in the entire database, and ZSCAN command iterates all members and values of the ordered set. Both dictionary traversal are realized by dictScan function. DictScan can perform rehash operation during dictionary traversal, which ensures that all data can be traversed through the algorithm.

Let’s look at the dictScan function introduction:

unsigned long dictScan(dict *d,
                       unsigned long v,
                       dictScanFunction *fn,
                       dictScanBucketFunction* bucketfn,
                       void *privdata)
Copy the code
  • The variable d is the dictionary of the current iteration;
  • The variable v identifies the cursor (that is, the array index in the Hash table) at the beginning of the iteration. Each iteration will return a new cursor value. The entire traversal process is carried out around the change of this cursor value to ensure that all data can be traversed.
  • Fn is a function pointer that is called each time a node is traversed;
  • The bucketfn function is called when defragging;
  • Privdata is the parameter required by the callback function fn.

Example of outer calling dictScan function when executing hsCAN command:

    long maxiterations = count * 10; /* count is the count value passed in by the hscan command, which represents the number of data to obtain. When the Hash table is sick (for example, when most nodes are empty), the maximum number of iterations is 10*n */
    do {
        cursor = dictScan(ht, cursor, scanCallback, NULL, privdata); /* Call dictScan to iterate over the dictionary data. The initial value of the cursor field is the value passed in by hsCAN, which represents the initial cursor value of the iterated Hash array */
    } while (cursor && maxiterations-- && listLength(keys) < (unsigned long)count);
Copy the code

In the process of dictScan function traversing the dictionary intermittently, the following three situations will be encountered:

  1. From the beginning to the end of the iteration, the hash table does not rehash.
  2. From the beginning to the end of the iteration, the hash is expanded or shrunk, and the rehash is done exactly between iterations.
  3. From the beginning to the end of an iteration, the hash table is being rehashed during one or more iterations.

1. No Rehash operation is encountered during traversal

No rehash is encountered in each iteration, meaning that the dictionary is traversed only in case 1 or 2. In fact, in the first case, the nodes in Hash table HT [0] can be traversed sequentially. In the second case, the dictionary may be expanded or shrunk during the whole traversal process. If the traversal is performed sequentially, data may be read repeatedly. It is assumed that the key-value pair with subscript 0 May be distributed in nodes with subscript 0 or 4 after capacity expansion. If the data of node with subscript 0 is traversed for the first time and the dictionary has already performed capacity expansion for the second time, the data of node with subscript 0 May be repeated for subsequent traversal. In order to avoid data leakage and data repetition, Redis uniformly adopts a method called Reverse binary Iteration to carry out discontinuous data iteration. Next, look at its main source code implementation, the iterative code is as follows:

        t0 = &(d->ht[0]);
        m0 = t0->sizemask;
        de = t0->table[v & m0]; /* Avoid cursors exceeding the maximum Hash table size */
        while (de) { /* Loops through the single linked list of the current node */
            next = de->next;
            fn(privdata, de); /* Store the key-value pairs of each node in a single linked list */ in the privData field
            de = next;
        }
Copy the code

The entire iteration process relies heavily on the variable v of the vernal value. It finds the Hash table element that needs to be read according to V, and then iterates through all key-value pairs in the single linked list of the element. Functions performed by the FN function pointer are successively executed to read key-value pairs.

In order to be compatible with capacity reduction and expansion operations that may occur during the iteration interval, v variable (vernal value) will be modified in each iteration to ensure that no data is missing in the iteration. The specific cursor change algorithm is as follows:

v |= ~m0;
v = rev(v); // Binary reversal
v++;
v = rev(v); // Binary reversal
Copy the code

The specific algorithm is introduced here, but you can learn by yourself if you are interested.

2. A Rehash operation occurs during traversal

From the beginning to the end of an iteration, the Hash table is being rehashed during one or more iterations, and two Hash tables will coexist in the rehash operation: One table HT [1] is expanded or scaled down, and the other table HT [0] is old table HT [0]. The data of HT [0] is gradually migrated to HT [1] through progressive Rehash to complete the migration process.

Since the size of both tables exists, data needs to be retrieved from both HT [0] and HT [1]. The entire traversal process is as follows: First find the smaller table in the two Hash tables, iterate over the smaller Hash table, and then iterate over the larger Hash table. The iteration code is as follows:

        t0 = &d->ht[0];
        t1 = &d->ht[1];
        if (t0->size > t1->size) {
            t0 = &d->ht[1];
            t1 = &d->ht[0];
        }
        m0 = t0->sizemask;
        m1 = t1->sizemask;
        de = t0->table[v & m0];
        while (de) { /* Iterate over the first small Hash table */
            next = de->next;
            fn(privdata, de);
            de = next;
        } 
        do { /* Iterate over the second large Hash table */
            de = t1->table[v & m1];
            while (de) {
                next = de->next;
                fn(privdata, de);
                de = next;
            }
            v |= ~m1;
            v = rev(v);
            v++;
            v = rev(v);
        } while (v & (m0 ^ m1));        
Copy the code

In addition to the iteration of this case, there are two other cases:

  1. Rehash operation of dictionary expansion was encountered several times during iteration.
  2. Several times during the iteration, we encountered rehash operations for dictionary shrinkage.

This part is not introduced in detail here, if interested, you can combine the source code to learn.

All in all, this algorithm can satisfy the omission or repeated iteration data, mainly is the clever use of expansion and shrinkage capacity for the principle of integer times increase or decrease, according to this characteristic, it’s easy to deduce the data of the same node expansion/contraction after the new Hash table the distribution location, so as to avoid the repeated traversal or leakage traversal.

conclusion

This article mainly introduces Redis dictionary data structure.

First, we talked about what a dictionary is, why Redis needs to design a dictionary, and then we talked about how to design a dictionary, including arrays that store data, hash functions that support various forms of keys, and linked list structures after hash collisions.

Then we talked about the dictionary structure in Redis, including three parts: dict, dictht, dictEntry hash table node. We need to understand the individual structure fields.

We went over the basic operations of dictionaries, such as adding and deleting elements. One of the more important methods is progressive Rehash, which we need to master:

  • What does the rehash execution flow look like?
    • Iterate over ht[0], calculate the hash value, add ht[1], update the used and Rehashidx fields, and perform a certain number of stop Rehash operations.
  • What is progressive Rehash?
    • Allocate the entire Rehash operation over each add, delete, change, and query operation.
    • During the rehash operation, the new dictionary data is added to HT [1]. Ht [0] does not add any data.
    • In addition to add, delete, alter, and check operations, Redis also performs rehash operations when idle.
  • When does the hash table expand?
    • Rehash does not perform capacity expansion
    • The hash table size is 0, expand to the initial size DICT_HT_INITIAL_SIZE
    • Normally, dict_can_resize is 1. If the number of key-value pairs in the hash table is greater than or equal to the size of the hash table, the number of key-value pairs will be increased to twice
    • In some cases, if dict_CAN_resize is 0, Redis will avoid expansion, but if the hash table is already full (load factor greater than 5), it will force expansion
  • Will Redis actively rehash?
    • Can do!

I hope you can have a harvest!

Reference documentation

  • Redis Design and Implementation by Huang Jianhong
  • Redis5 design and source code Analysis by Chen Lei