Dynamic string SDS

SDS stands for “Simple Dynamic String”. All the strings in redis are basically realized by SDS

  • All non-numeric keys. For example setmsg key MSG in “Hello World”.
  • The value of the string data type. For example, ‘ ‘set MSG “hello world” = MSG “hello wolrd”
  • String value in a non-string data type. For example, RPUSH fruits”apple””banana””cherry” in “apple”” cherry”

SDS looks like this:

Len: string length buf: string array saved

Space preallocation

In order to reduce the number of memory reallocation times caused by modifying strings, SDS adopts the “once enough” strategy:

  • If the modified SDS length is less than 1MB, more space of existing LEN length will be allocated
  • If the SDS length after modification is greater than or equal to 1MB, it will be expanded to meet the modified length and provide 1MB more space

Inert space release

To avoid memory reallocation when shortening strings, SDS does not immediately free up space when data is reduced.

int

Redis contains all sorts of numbers including the following, deliberately quoted “”

Two-way linked list

Long like this:

It is divided into two parts, one is the “overall part” : orange, and the other is the “specific implementation side” : blue.

Subject “overall part” :

  • Head points to the head of a concrete bidirectional list
  • Tail refers to the end of a specific bidirectional list
  • Len The length of a bidirectional list

Specific “implementers” : bidirectional linked list structure at a glance, with pre as the precursor and next as the successor

It consists of two data structures, list and listNode.

Xiaobian here also organized a Redis core learning notes, pay attention to the public number: Kylin change bug access.

ziplist

Compress the list. One of the underlying implementations of redis’s list key and hash key. This data structure was developed to save memory. Like arrays in various languages, it is made up of contiguous chunks of memory, which saves memory by reducing fragmentation and pointer footprint because memory is contiguous.

Then the structure of the entry is as follows:

Traversal of elements

Find the element at the end of the list first:

The ziplist node element previous_entry_length attributes are used to iterate through the ziplist node element one by one:

Chain update

Looking at the structure of the entry element again, there is a previous_entry_length field that is either 1 or 5 bytes long:

  • If the length of the previous object is less than 254 bytes, the length of previous_entry_length is 1 byte
  • If the length of the previous object is greater than 254 bytes, the length of previous_entry_length is 5 bytes

Suppose there is a set of compressed lists, all between 250 and 253 bytes in length, suddenly add a new node new, 254 bytes or longer, it will appear:

The program needs to continuously redistribute the compressed list until it is finished.

In addition to adding operations, deleting operations can also lead to “cascading updates.” As shown in the figure below, the length of all entry nodes in Ziplist is between 250 and 253 bytes, the length of big node is larger than 254 bytes, and that of small node is smaller than 254 bytes.

Hash table

Hash tables are a little bit more complicated. Hash table production methods generally have two kinds, one is: open addressing method, a zipper method. Redis hash table is made using the zipper method.

The overall structure is shown as follows:

It is also divided into two parts: the orange part on the left and the blue part on the right, which is also the relationship between “coordination” and “implementation”. The actual implementation of the hash table is all done in blue. Let’s look at the blue part first:

This is also divided into two parts of “overall planning” and “implementation” on the left and right sides.

The right part is easy to understand: it is the usual hash table implementation of zipper tables; An array is a bucket. Typically, different keys are first located to different buckets. If the keys are duplicate, the conflicting keys are linked together in a linked list.

The process of creating a key:

If repeated:

rehash

Looking again at the left orange “pooling” section of the hash table overall diagram, there are two key attributes: HT and RehashidX. Ht is an array with only two elements ht[0] and HT [1]; Where ht[0] stores the hash table used in Redis, ht[1] is related to rehashidx and the rehash of hash table.

Rehash refers to the process of recalculating the hash and index values of a key and then rearranging the key-value pairs.

Load factor = HT [0]. Used/HT [0].size

Expansion and contraction criteria

Capacity:

  • When BGSAVE and BGREWRITEAOF are not executed, the hash table load factor is greater than or equal to 1.
  • When BGSAVE and BGREWRITEAOF instructions are being executed, the hash table load factor is greater than or equal to 5.

Contract:

  • When the load factor is less than 0.1, the program automatically starts shrinking the hash table.

Number of expansions and contractions

Capacity:

  • The first is greater than or equal to ht[0]. Used *2 to the n(2 to the NTH power).

Contract:

  • The first is greater than or equal to ht[0]. Used of 2^n(2 to the NTH power).

** AS for the contraction, I was in doubt: If the length of a hash table is greater than 40 and the element 4 exists (ht[0]. Used is 4), what is the value of the shrunk hash table?

I thought about it: according to the previous content, it should be 4. However, if it is 4 and the length is equal to that after contraction, should it be expanded? Open the source code to see:

Specific shrinkage function:

As we can see from the code, if the length is 4, not only will it not shrink, but it will even report an error. (a)

Let’s go back and look at the setup: Is it possible? Hash table expansion are 2 double long, minimum of 4, 4, 8 = = = = = = “= = = = = = 16, 32 = = = = = = = = = = 64. 128.

In other words: there is no longer a length of more than 40, only 64. But if it’s 64, 64 X 0.1 = 6.4, which means that at 6, the hash table shrinks by how much? Is eight. And if you go down to 4, you don’t shrink anymore. So, there is no such thing as a hash table of length greater than 40 that has four elements.

Expansion step

Contraction steps

Incremental refresh

In each of the two giFs “Expansion Step” and “contraction Step”, the fourth step, “recalculate the data in HT [0] with hash function and rehash to HT [1]”, was not completed in one step, but was divided into N steps and completed step by step. Since there are tens or even hundreds of millions of keys in a hash, Redis can store 2^32-1 key-value pairs (more than 4 billion) in each hash. Rehashing these keys at once may cause the server to go out of service for a while. After all, the hash function takes a while to compute ((#^.^#)).

The refresh of the hash table is done in stages and incrementally.

Progressive Refresh is closely related to RehashidX in the orange “pooling” section on the left in the figure below:

  • The value of rehashidx is the current element position of the Rehash
  • Rehashidx = -1 indicates that refresh is not being performed

Even during the process, every operation of adding, deleting, changing, and checking the HASH table of HT [0] will rehash the key value pairs of the HASH table to HT [1] in addition to the normal execution.

Take the capacity expansion procedure as an example:

intset

The set of integers is one of the underlying implementations of the set key.

Jump table

This data structure looks like this:

In Redis, the hop table is abstracted as follows:

Look at this diagram, “co-ordinate” on the left, implement on the right. The overall planning part has the following points:

  • Header: indicates the header of the hop table
  • Tail: indicates the tail of a hop table
  • Level: indicates the number of layers of the node with the largest number of layers
  • Length: indicates the length of the hop table

The implementation section has the following points:

  • Header: is the sentinel node of the linked list and does not record the body data.
  • It’s a bidirectional linked list
  • There is a sequence of points
  • O1, O2, o3 are the members stored by the node and are Pointers to an SDS value.
  • The maximum level height is 32. Each time a new node is created, the program randomly generates a value between 1 and 32 as the size of the level array. This size is called “height”.

Xiaobian here also organized a Redis core learning notes, pay attention to the public number: Kylin change bug access.

Redis five data structure implementation

Redis object

Redis does not directly use the above mentioned data structure to implement the key value database, but based on an object, object underlying indirect reference to the above mentioned specific data structure.

The structure is as follows:

string

Where: EMbSTR and RAW are made up of SDS dynamic strings. The only difference is that raw allocates memory, while REdisObject and SDS allocate memory separately, embSTR redisobject and RAW are in the same memory.

The list of

hash

set

zset

Space is limited, other content will not be shown here one by one, welcome everyone to exchange, like the article remember to pay attention to me like yo, thank you for your support!