preface

Redis is a key-value pair database whose keys are stored by hashing. The entire Redis can be thought of as an outer hash, which is called an outer hash because Redis also provides a hash type inside, which can be called an inner hash. When we use hash objects for data storage, there are two levels of hash storage for Redis as a whole.

The hash object

The hash object itself is also a key-value storage structure, and the underlying storage structure can be divided into two types: ziplist (compressed list) and hashtable (hashtable). The two storage structures are also distinguished by encoding:

Encoding attribute describe Object encoding Returned value of the command
OBJ_ENCODING_ZIPLIST Implement hash objects using compressed lists ziplist
OBJ_ENCODING_HT Implement hash objects using dictionaries hashtable

hashtable

The key-value in Redis is wrapped by the dictEntry object, and the hash table is wrapped by the dictEntry object again. This is the hash table object Dictht:

typedef struct dictht {
    dictEntry **table;// Hash table array
    unsigned long size;// Hash table size
    unsigned long sizemask;// Mask size, used to calculate the index value, always equal to size-1
    unsigned long used;// Hash table has nodes
} dictht;
Copy the code

Note: The table in the structure definition above is an array, each element of which is a dictEntry object.

The dictionary

Dictionaries, also known as symbol table, associative array or map, are nested with hash table DICtht objects. The following is a definition of dictionary HT:

typedef struct dict {
    dictType *type;// Some specific functions of dictionary type
    void *privdata;// Private data, which may be needed by specific functions in type
    dictht ht[2];// Hash table (note that there are two hash tables)
    long rehashidx; // Rehash index. When not rehash, the value is -1
    unsigned long iterators; // The number of iterators in use
} dict;
Copy the code

DictType defines some common functions internally, and its data structure is defined as follows:

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);// Compute the hash function
    void *(*keyDup)(void *privdata, const void *key);// Copy the key function
    void *(*valDup)(void *privdata, const void *obj);// Copy the value function
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);// Compare key functions
    void (*keyDestructor)(void *privdata, void *key);// Destroy the key function
    void (*valDestructor)(void *privdata, void *obj);// Destroy the value function
} dictType;
Copy the code

When we create a hash object, we get the following schematic (some properties are omitted) :

Rehash operations

Dict defines an array HT [2], and HT [2] defines two hash tables: HT [0] and HT [1]. By default, Redis only uses HT [0], not HT [1], and does not allocate space for HT [1] initialization.

When setting a hash object, which subscript will fall in the hash array (dictEntry[3] in the figure above) is determined by calculating the hash value. If a hash collision occurs, then there will be multiple dictentries of the same index, forming a linked list (the right-hand side of the figure points to NULL), but it is important to note that the last element inserted is always at the top of the list (in case of hash collisions, the last element inserted is always at the top of the list). Always put nodes to the head of the list).

When reading data, a node with multiple elements needs to traverse the linked list, so the longer the linked list, the worse the performance. To ensure the performance of the hash table, you need to rehash the hash table when either of the following conditions is met:

  • The load factor is greater than or equal to1dict_can_resize1At the right time.
  • The load factor is greater than or equal to the security threshold (dict_force_resize_ratio=5).

PS: Load factor = Number of nodes in the hash table/size of the hash table (h[0]. Used /h[0].size).

Rehash steps

Both extended and contracted hashing are done by performing rehash, which involves allocating and freeing space through the following five steps:

  1. Allocate space for dict DICT HT [1] hash tables, depending on the number of nodes currently stored in the table (i.e. Ht [0]. Used) :

    • If the operation is extendedht[1]The size of2nThe first one is greater than or equal toht[0].used * 2The value of the property (e.gused=3At this time,ht[0].used * 2=6, therefore,2the3The power to8So the first one is greater thanused * 2(2 to the 2nd power < 6 and 2 to the 3rd power > 6).
    • If it is a shrink operationht[1]The first one is greater than or equal to magnitude 2 to the nht[0].usedThe value of the.
  2. Setting the value of the attribute rehashix in the dictionary to 0 indicates that a rehash operation is being performed.

  3. All key/value pairs in HT [0] are rehashed in sequence and placed in the corresponding position of ht[1] array. The value of Rehashix increases by 1 after each rehash of key/value pairs is completed.

  4. After all key-value pairs in HT [0] have been migrated to HT [1], release HT [0] and change HT [1] to HT [0], then create a new HT [1] array in preparation for the next rehash.

  5. If rehashix in the dictionary is set to -1, the rehash operation is complete and the next rehash operation is waiting.

Progressive rehash

This rehash operation in Redis is also called progressive rehash because it does not rehash all of ht[0] at once, but rehashes key/value pairs from HT [0] into HT [1] over several times. Progressive Rehash is a divide-and-conquer idea that avoids the massive amount of computation that centralized rehash entails.

In a progressive rehash process, since new key-value pairs may also be stored, ** Redis puts the newly added key-value pairs into HT [1], ensuring that the number of HT [0] key-value pairs is only reduced **.

When the rehash operation is being performed, if the server receives a command request from the client, it queries HT [0] first and then ht[1].

ziplist

Some of ziplist’s features have been analyzed separately in previous articles. For more details, click here. But the ziplist in a hash object differs from the Ziplist in a list object in that the hash object is a key-value, so the ziplist is also a key-value. Key and value are next to each other:

Ziplist and Hashtable encoding conversion

When a hash object can satisfy either of the following two conditions, the hash object will choose to use ziplist encoding for storage:

  • The total length of all key-value pairs (both key and value) in a hash object is less than or equal to64Byte (This threshold can be passed by a parameterhash-max-ziplist-valueTo control).
  • The number of key-value pairs in a hash object is less than or equal to512This threshold can be specified by the parameterhash-max-ziplist-entriesTo control).

If either of these conditions is not met, the hash object chooses to be stored using hashTable encoding.

Common command used to hash objects

  • Hset Key field value: Sets a single key fieldfieldHash objectkeyValue).
  • Hmset Key field1 value1 field2 value2: Set multiple fieldsfieldHash objectkeyValue).
  • Hsetnx Key field value: Hash tablekeyIn the domainfieldIs set tovalueIf thefieldIf yes, no operation is performed.
  • Hget Key field: Gets the hash tablekeyIn the domainfieldThe correspondingvalue.
  • Hmget key field1 field2: Get the hash tablekeyMultiple fields infieldThe correspondingvalue.
  • Hdel key field1 field2: Delete the hash tablekeyOne or more of themfield.
  • Hlen key: Returns the number of fields in the hash table key.
  • Hincrby Key Field Increment: Indicates a hash tablekeyIn the domainfieldDelta plus deltaincrementincrementIt could be negative iffieldAn error will be reported if it is not a number.
  • Hincrbyfloat Key field Increment: indicates a hash tablekeyIn the domainfieldDelta plus deltaincrement.incrementIt could be negative iffieldnotfloatType will report an error.
  • Hkeys key: Obtains the hash tablekeyAll fields in.
  • Hvals key: Retrieves the values of all fields in the hash table.

To flushes the Redis database, run the flushall command to prevent interference from other key values.

Then run the following commands in sequence:

hset address country china
type address
object encoding address
Copy the code

The following results are obtained:

You can see that when we have only one key-value pair in our hash object, the underlying encoding is ziplist.

Now let’s change the hash-max-ziplist-entries parameter to 2, restart Redis and type the following command to test:

hmset key field1 value1 field2 value2 field3 value3
object encoding key
Copy the code

The output results in the following:

As you can see, the encoding has changed to hashTable.

conclusion

This paper mainly introduces the use of hashtable, the storage structure underlying the hash type among the five common data types in Redis, and how Redis re-hashes when the hash distribution is not uniform. Finally, some common commands for hash objects are introduced. Some examples are given to verify the conclusion of this paper.