One, foreword

With the continuous execution of operations, the hash table to save the key/value pair will gradually increase or decrease, in order to make a hash table load factor (the load factor) to maintain in a reasonable scope, when the number of key/value pair hash table to save too much or too little, the application needs to be on the size of the hash table corresponding expansion or contraction.

The original resolution

Second, implementation analysis

1. Rehash process analysis

The work of expanding and contracting hash tables can be done by performing rehash operations.

Redis performs the rehash steps on the dictionary hash table:

1. Allocate space for the dictionary HT [1] hash table, the size of which depends on the operation to be performed and the number of key-value pairs ht[0] currently contains (i.e. Ht [0]. Used) :

If an extension operation is performed, then the size of ht[1] is the first one greater than or equal to ht[0]. Used * 2 of 2^n (2 to the NTH power); If the shrink operation is performed, then ht[1] has the size of the first 2^n greater than or equal to ht[0]. Used.Copy the code

2. Rehash all key/value pairs saved in HT [0] onto HT [1] : Rehash means to recalculate the key hash and index value, and then place the key/value pairs into the specified position in the HT [1] hash table.

3. When all key pairs contained in HT [0] are migrated to HT [1] (HT [0] becomes empty), release HT [0], set HT [1] to HT [0], and create a blank hash table in HT [1] to prepare for the next rehash.

The program expands the dictionary HT [0] as follows:

1. Ht [0]. Used is currently 4, 4 * 2 = 8, and 8 (2^3) is the first hash table greater than or equal to 4 to the 2 n power, so the program sets ht[1] to 8. Figure 4-9 shows what the dictionary looks like after ht[1] allocates space.Copy the code

2. Rehash the four key/value pairs contained in HT [0] to HT [1], as shown in Figure 4-10.Copy the code

3. Release HT [0], set HT [1] to HT [0], and assign a blank hash table to HT [1], as shown in Figure 4-11. At this point, the expansion of the hash table is complete, and the program has successfully changed the size of the hash table from 4 to 8Copy the code

2. Expansion and contraction of hash tables

When any of the following conditions are met, the program automatically begins to perform an extension operation on the hash table:

The server is not running BGSAVE or BGREWRITEAOF, and the hash table load factor is greater than or equal to 1. The server is running the BGSAVE or BGREWRITEAOF command, and the hash table load factor is greater than or equal to 5.

The load factor of the hash table can be calculated by the formula:

# load factor = number of nodes saved in hash table/size of hash table load_factor = ht[0]. Used/HT [0].sizeCopy the code

For example, for a hash table of size 4 with four key-value pairs, the load factor for the hash table is:

load_factor = 4 / 4 = 1
Copy the code

For example, for a hash table of size 512 with 256 key-value pairs, the load factor for the hash table is:

load_factor = 256 / 512 = 0.5
Copy the code

Depending on whether the BGSAVE command or BGREWRITEAOF command is being executed, the load factor required for the server to perform the extension operation is different because Redis needs to create a child of the current server process during the BGSAVE or BGREWRITEAOF command execution, Most operating systems use copy-on-write technology to optimize the use of subprocesses, so the server increases the load factor required to perform the extension operation during the existence of the subprocess, thus avoiding the hash table extension operation during the existence of the subprocess as much as possible. This avoids unnecessary memory writes and maximizes memory savings.

On the other hand, when the load factor of the hash table is less than 0.1, the program automatically starts to shrink the hash table.

** Copy-on-write is a technique that can delay or even avoid copying data. Instead of making a copy of the entire process space, the kernel makes the parent and child share the same copy. Data is copied only when it needs to be written, so that the parent and child processes have their own copies. That is, resources are copied only when they need to be written, and shared read-only until then.

3. Progressive Rehash

Expanding or contracting the hash table requires rehashing all key/value pairs in HT [0] into HT [1]. However, this rehash action is not done in a single, centralized way, but in multiple, incremental ways.

The reason for this is that if only four key-value pairs are stored in HT [0], the server can rehash all of these key-value pairs into HT [1] in an instant; However, if the number of key-value pairs stored in a hash table is not four, but four, or forty, or even four million, then rehashing all of these key-value pairs to HT [1] at once can cause the server to go out of service for a period of time.

Therefore, to avoid the impact of rehash on server performance, the server does not rehash all the key pairs in HT [0] to HT [1] at one time. Instead, the server slowly rehashes the key pairs in HT [0] to HT [1] several times.

Here are the detailed steps for progressive rehash of a hash table:

1. Allocate space for HT [1] so that the dictionary holds both HT [0] and HT [1]. 2. Maintain an index counter variable rehashidx in the dictionary and set its value to 0 to indicate that the rehash work has officially begun. 3. Every time a dictionary is added, deleted, found, or updated during rehash, the program rehashes to HT [1], in addition to the specified operation, all key/value pairs of the HT [0] hash table on the rehashidx index. When the rehash is complete, the program increments the value of the rehashidx property by one. 4. At some point in time, all key-value pairs of HT [0] will be rehashed to HT [1] as the dictionary operation continues, and the program sets the value of the rehashidx property to -1 to indicate that the rehash operation is complete.Copy the code

The benefit of progressive Rehash is that it takes a dial-and-conquer approach, spreading the computation required for rehash key and value pairs over every add, delete, find, and update operation to the dictionary, thereby avoiding the massive computation that comes with centralized Rehash.

Figures 4-12 through 4-17 show a complete progressive rehash process, noting how the dictionary’s Rehashidx property changes throughout the rehash process.

Description:

Because the dictionary uses both the HT [0] and ht1 hash tables during progressive rehash, delete, find, update, and other dictionary operations are performed on both hash tables during progressive rehash: For example, to look for a key in a dictionary, the program looks in HT [0] first, and if it doesn’t find one, goes to HT1, and so on.

In addition, during progressive rehash, new key-value pairs added to the dictionary are stored in HT1, and ht[0] does not add any more: This ensures that the number of key-value pairs contained in HT [0] will only decrease, and eventually become empty as the rehash operation executes.

Summary of key points

1. Dictionaries use hash tables as the underlying implementation. Each dictionary has two hash tables, one for normal use and one for rehash only

2. When the hash table holds too many or too few key-value pairs, the program needs to expand or rehash the size of the hash table.

3. Rehash actions are not done in a single, centralized way, but in multiple, incremental ways

4. In progressive rehash, the dictionary uses both HT [0] and HT [1] hash tables