Redis basic data structure and its implementation

String

The String type is the most commonly used type in Redis. The internal implementation is stored by SDS (Simple Dynamic String), which is a Simple Dynamic String. Similar to ArrayList in Java, SDS reduces frequent memory allocation by pre-allocating redundant space. The SDS structure is shown, following the last character of ‘\0’

O(1) gets the length of the string

Prevent buffer overflow

SDS automatically checks and expands the string space

Reduce the number of memory reallocations caused by string changes

Traditional C language

Operations that grow strings, such as concatenation (append), expand the size of the underlying array by reallocating memory
Operations that shorten a string, such as truncation (trim), reallocate memory to free up space that the string is no longer using

SDS realizes two optimization strategies: space pre-allocation and inert space release

Space preallocation

Space preallocation is conducive to SDS string growth operation. During expansion operation, it will judge whether the unused space size of SDS is enough. If not, it will allocate additional unused space to SDS in addition to the required space

When SDS length is less than 1MB, the free size is the same as len size after modification
If the SDS length is greater than 1MB, 1MB of unused space will be allocated after modification
Inert space release

Lazy space free The user optimizes the string shortening operation of SDS. When it is necessary to shorten the string saved by SDS, the program does not immediately use memory reallocation to reclaim the shortened extra bytes, but instead preserves the number of these bytes with the free attribute

Binary security

C limits strings to contain no empty characters except at the end of the string. Otherwise, the empty string read by the program will be mistaken for the end of the string, as shown in the following figure.

SDS reads the len attribute instead of the empty string to determine whether the string is over

List

By the linked list or zipList (zipList) in addition to the linked list, publish and subscribe, slow query, monitor and other functions are also used in the linked list, Redis server itself also uses the linked list to save multiple client state information

Implementation of linked lists and linked list nodes

Each linked listNode is represented using an adlist.h/listNode structure:

Using adlist.h/list to hold the linked list is much easier

Read the elements of a closed interval using the lrange command
As message queue

Dictionary (hash)

Dictionaries use hash tables or compressed lists as the underlying implementation hash tables are represented by dict.h/dict structures

Generally, dictionaries only use the HT [0] hash table. Ht [1] only uses TreHashidx to record the current progress of the REhash when rehashing the HT [0] hash table. If no rehash operation is performed, the value is -1

rehash

Allocates space for the dictionary’s HT [1] hash table, the size of which depends on the operation to be performed and the number of key-value pairs ht[0] currently contains
- If an extension operation is performed, the size of HT [1] is the first 2n2^n2n greater than or equal to ht[0]. Used *2
- If the shrink operation is performed then ht[1] is the first size greater than or equal to ht[0]. Used 2n2^n2n
All key-value pairs stored in HT [0] are recalculated for key hashes and index values, and then placed in the ht[1] hash table at the specified location
When all key pairs of HT [0] are migrated to HT [1], release HT [0] and set HT [1] to HT [0]. Ht [1] creates a blank hash table for the next rehash

Expansion and contraction of hash tables

When the server does not run the BGSAVE or BGREWRITEAOF command and the hash table load factor is greater than or equal to 1
When the server runs the BGSAVE or BGREWRITEAOF command and the hash table load factor is greater than or equal to 5
When the load factor of the hash table is less than 0.1, the program automatically starts to shrink the hash table

Progressive rehash

Ht [1] rehashes all key/value pairs in HT [0] at once. This amount of computation may cause the server to go out of service for a period of time

Allocate space for HT [1] so that the dictionary holds both HT [0] and HT [1] hash tables
Maintain an index counter variable, rehashidx, in the dictionary and set its value to 0 to indicate that rehash work has officially begun
During rehash, all key-value pairs of the HT [0] hash table on the Rehashidx index are rehashed to HT [1] each time the dictionary is operated on. When the rehash is complete, the program incretures the value of the Rehashidx attribute by one
As dictionary operations continue, eventually at some point in time all key-value pairs of HT [0] rehash to HT [1], at which point the value of the rehashidx property is set to -1, indicating that the rehash operation is complete.
During progressive Rehash, all new key-value pairs added to the dictionary are stored in HT [1], and HT [0] does not do any more additions. This ensures that the number of key-value pairs contained in HT [0] only grows.

set

A set is an unordered set, automatically unduplicated

An ordered set

Ordered sets are implemented by skip lists

The example above is a jump representation.

Header: Points to the table header node of the skip table
Tail: Refers to the end-of-table node of a skip table
Level: Records the level of the node with the largest level in the current skip table
Length: Records the number of nodes that the skip table currently contains
Layer: the words L1,L2,L3 and so on in the node mark each layer of the node. L1 represents the first layer. Each layer has two attributes: forward pointer and span
BW: Back pointer, used in the program from the end of the table to the top of the table
Points: 1.0,2.0 and 3.0 in each node are the points saved in the node, and the nodes are sorted according to the points saved in the node from the smallest to the largest
Member objects: O1, O2, and O3 are the member objects stored in each node.

Skip list node

layer

Programs can use these layers to speed up access to other nodes, and in general, the more layers there are, the faster access to other nodes will be
Each time a new skip list node is created, the program randomly generates a value [1,32] between balances 1 and 32 as the size of the Level array according to the power law

span

Through the span, the span of all layers along the way is accumulated, and the result is the rank of the target node in the skip list

role

Deduplicated and can be sorted

List of compression

Compressed lists are one of the underlying implementations of list keys and hash keys. When a list key contains only a small number of list items, and each item is either a small integer value or a short string length, Redis uses compressed lists as the underlying implementation of list keys. A compressed list can contain multiple nodes. Each node can hold either a byte array or an integer value, and the components of the compressed list are shown below

Compressed list node

Previous_entry_length: Records the length of the previous node in the compressed list. The size can be 1 byte or 5 bytes. (If the previous node is less than 254 bytes, this field only takes 1 byte, and if it is greater than or equal to 254 bytes, this field takes 5 bytes.)

So we can subtract the value of the previous_entry_length property of the current node from the pointer C to the start address of the previous node to get a pointer P to the start address of the previous node.

Encoding: Records the type and length of data stored in the node’s Content property
Content: The value of a node can be a byte array or an integer

Chain update

When the number of bytes occupied by multiple consecutive nodes is between 250-253, and the number of bytes occupied by the first node is greater than 254 bytes, the previous_entry_length field that originally occupied only one byte needs to be upgraded to occupy 5 bytes, then the continuous multiple nodes need to be extended. The program needs to continuously perform space reallocation operations on the compressed list. Chained updates have a worst-case complexity of O(N^2^), but the probability of causing performance problems is very low.

Cascading updates can be triggered only if there are exactly multiple consecutive nodes between 250 and 253 bytes in the compressed list, which is rare in practice
Even if there are cascading updates, as long as the number of nodes being updated is small, there is no impact on performance
[1] [Redis Design and Implementation 2nd Edition]