Chapter 4: Dictionary

A dictionary, also known as a symbol table, associative array, or map, is an abstract data structure for holding key-value pairs. In a dictionary, a key is associated with a value, and the associated keys and values are called key-value pairs.

Each key in the dictionary is unique, and a program can look up a value associated with it by key, update a value by key, delete a key-value pair by key, and so on.

Dictionaries are often built into many high-level programming languages as data structures, but Redis uses C without such data structures, so Redis builds its own dictionary implementation.

4.1 Implementation of dictionary

Redis dictionary uses hash table as the underlying implementation, a hash table can have multiple hash table nodes, and each hash table node stores a key value pair in the dictionary.

4.4.1 hash table

The hash table used by the Redis dictionary is defined by the dictht structure:

typedef struct dictht{
	// Hash table array
	dictEntry **table;
	// Hash table size
	unsigned long size;
	// Hash table size mask, used to calculate index value, always equal to size-1
	unsigned long sizemask;
	// The number of existing nodes in the hash table
	unsigned long used;
} dictht;
Copy the code

The table property is an array, and each element in the array is a pointer to a dictEntry structure, each of which holds a key-value pair.

The size property records the size of the hash table, that is, the size of the table, while the Used property records the number of nodes currently used by the hash table.

The sizemask attribute always equals size-1. This attribute, along with the hash value, determines which index of the table array a key should be placed on.

Figure 4-1 shows the basic structure.

4.1.2 Hash table nodes

Hash table nodes are represented by dictEntry structures, each of which holds a key-value pair:

typedef struct dictEntry{
	/ / key
	void *key;
	/ / value
	union{
		void *val;
		uint64_tu64;
		int64_ts64;
	}v;
	// Point to the next hash table node to form a linked list
	struct dictEntry *next;
}dictEntry;
Copy the code

The key property holds the keys in the key-value pair, while the V property holds the values in the key-value pair, which can be a pointer, a Uint64_t integer, or an Int64_t integer.

The next attribute is a value pointing to another hash table node that can resolve key collisions by joining together multiple key-value pairs with the same hash value.

Figure 4-2 shows the solution.

4.1.3 dictionary

Dictionaries in Redis are represented by dict constructs:

Typedef struct dict {// Type specific function dictType *type; // Private data void * privData; // Dictht ht[2]; // Rehash index // If rehash is not in progress, the value is -1 int rehashidx; /* rehashing not in progress if rehashidx == -1 */ }dict;Copy the code

The type and privData attributes are set for creating polymorphic dictionaries for different types of key-value pairs:

type: is a pointer to dictType structure. Each dictType structure preserves a cluster of functions for manipulating key value pairs of a specific type. Redis will set different type-specific functions for dictionaries with different uses.
privdata: saves the optional parameters that need to be passed to those specific types.

Typedef struct dictTtpe {// calculate the hashFunction unsigned int (*hashFunction)(const void *key); Void *(*keyDup)(void * privData, const void *key); Void *(*valDup)(void * privData, const void *obj); Int (*keyCompare)(void * privData, const void *key1, const void *key2); Void (*keyDestructor)(void * privData, void *key); Void (*valDestructor)(void * privData, void *obj); }dictType;Copy the code

The HT attribute is an array of two entries, each of which is a Dictht hash table. In general, dictionaries can only use ht[0] hash tables, and HT [1] hash tables are only used for rehashing HT [0] hash tables.

In addition to HT [1], another rehash related is rehashidx, which records the current progress of rehash. A value of -1 indicates that no rehash is currently in progress, and its value is -1.

Figure 4-3 shows a dictionary in the common state.

4.2 Hash algorithm

To add a new key-value pair to a dictionary, the program calculates the hash and index values based on the keys of the key-value pair. Then, based on the index values, the program places the hash table node containing the new hash table on the specified index of the hash table array.

Redis calculates hash and index values as follows:

Hash = dict -> type -> hashFunction(key); Ht [x] can be HT [0] or HT [1] index = hash & dict -> HT [x].sizemask;Copy the code

For example, for the dictionary shown in Figure 4-4, if we were to add a key-value pair K0 and V0 to the dictionary, the program would use statements in preference:

hash = dict -> type -> hashFunction(K0);
Copy the code

Compute the hash value for the key K0.

Assuming the calculated hash value is 8, the program continues with the statement:

index = hash&dict -> ht[0].sizemask = 8 & 3 = 0;
Copy the code

For sizemask, it has to be 2^n – 1, because 2^n – 1 each binary is 1, so our hash & sizemask will fall between 0 and sizemask, so it doesn’t go out of bounds.

Of course, MurmurHash2 is used in our Redis.

4.3 Resolving key Conflicts

When two or more keys are assigned to the same index of the hash table array, we say the keys are in conflict.

Redis hash table uses chained address method to solve key conflicts. Each hash table node has a next pointer, and multiple hash table nodes can use the next pointer to form a one-way linked list.

Note: Here, the head insertion method is used for insertion, and the time complexity is O(1). If the tail insertion method is used, the linked list will be traversed, resulting in a waste of performance.

4.4 rehash

With the continuous execution of operations, the number of keys stored in the hash table will gradually increase or decrease. In order to maintain the load factor of the hash table in a reasonable range, when the number of keys stored in the hash table is too much or too little, it is necessary to expand or shrink the hash table accordingly.

Load factor = ht[0]. Used/ht[0].size();Copy the code

Both expansion and contraction can be done by rehash operations. Redis rehashes the dictionary’s hash table as follows:

Allocate space for the dictionary ht[1] hash table, which depends on the operation to be performed and the number of key-value pairs ht[0] currently contains (that is, the value of ht[0]. Used) :
- If an extension operation is performed, the size of ht[1] is the first one equal to 2^n of ht[0]. Used *2;
- If the shrink operation is performed, then ht[1] is the first 2*n greater than or equal to ht[0]. Used;
Rehash all key-value pairs saved in HT [0] onto HT [1]. Rehash means recalculating the hash and index values of the keys and then placing the key-value pairs in the specified position of the HT [1] hash table.
When all key-value pairs contained in HT [0] are migrated to HT [1] (HT [0] becomes empty), release HT [0], set HT [1] to HT [0], and create a new blank hash table in HT [1] for the next rehash.

4.5 Progressive Rehash

We know that we rehash as we expand and shrink, but this rehash is not done all at once, it is done many times, incrementally.

The reason for this is that if HT [0] only holds 4 key-value pairs, then one-time rehashing is fine, but if our key-value pairs reach hundreds of millions, one-time rehashing is too heavy and may cause our server to go out of service.

Here are the detailed steps for rehash:

Allocate space for HT [1] so that the dictionary holds both HT [0] and HT [1].
There is one in our dictionaryrehashidx, set its value to 0 to indicate that rehash has started;
During rehash, every time a dictionary is added, deleted, found, or updated, the program rehashes to HT [1] all key-value pairs on the REhashidx index of the HT [0] hash table, in addition to the specified operation.
As the dictionary operation continues, eventually at some point in time all key-value pairs of HT [0] will be rehashed to HT [1], at which point the program willrehashidxProperty -1 indicates that the rehash operation has completed.

Advantages: A dial-and-conquer approach spreads the computation required for rehash key and value pairs over the add, delete, and find operations to the dictionary, avoiding the heavy computation that comes with centralized Rehash.

During progressive rehash, the dictionary uses both HT [0] and HT [1] hash tables, so during progressive rehash, dictionary deletion, lookup, update, and other operations are performed on both hash tables.

All operations are added to HT [1], and ht[0] does not perform any operations. This ensures that the number of key-value pairs contained in HT [0] will only grow and become empty as the rehash operation is executed.

4.6 the dictionary API

4.7 Key Reviews

Dictionaries are widely used to implement various features of Redis, including databases and hash keys
Dictionaries in Redis use hash tables as the underlying implementation, with each dictionary having two hash tables, one for normal use and one for rehash.
When dictionaries are used as the underlying implementation of databases, Redis uses the Murmurhash2 algorithm to compute the hash value of keys.
Hash tables use the chained address method to resolve key collisions.
When you expand or shrink a hash table, you rehash all the key-value pairs into the new hash table. This rehash process is not done once, but incrementally.