On the first day of 2021, a friend of mine wrote me a message and shared with me a redis question he had met with Ali (this guy definitely won the year-end bonus). After reading it, I found it quite interesting. The question is very simple, which is a typical question that is often ignored by everyone. Here sorted out to share, by the way to consolidate their foundation, I hope to interview and want to interview brother a little help.

So they’re going to look something like this

Interviewer: Do you know the underlying implementation of redis String data structure?

Tetsuko: Of course, it is based on SDS

Interviewer: Redis is developed in C, so why not just use C strings and design structures like SDS separately?

Iron:…


In fact, it can be seen that the interviewer wants to see if Tiezi only stays at the use level of Redis, or has a deeper study of the underlying data structure.

We know that Redis is written in C, but instead of using C strings directly, it reconstructs itself an abstract type called SIMPLE Dynamic String (SDS).

Redis also supports traditional strings in C, but only in places where strings do not need to be modified, such as static character output.

However, when we use Redis in development, we often modify the value of the string frequently, so SDS will be used to represent the value of the string. It is worth noting that in redis database, key-value pairs containing string values are implemented by SDS.

For example, if you run the simplest set command in Redis, redis creates a new key-value pair.

127.0.0.1:6379 >set xiaofu "Programmer internal matters."

Copy the code

At this time, the key and value of the key-value pair are both a string object, and the underlying implementation of the object is two SDS structures saving the string Xiaofu and the internal point of the programmer respectively.

Another example: IF I push data into a list, Redis creates a new key-value pair.

127.0.0.1:6379 > lpush xiaofu"Programmer internal matters." "Rich Programmer"

Copy the code

The key of the key-value pair is a string object implemented by SDS, and the value of the key-value pair is a list object containing two string objects, which are also implemented by SDS.

SDS structure

The data structure of an SDS value is mainly composed of len, free and BUF [].

struct sdshdr{



  int free// buf[] The number of unused bytes in the array



  int len; // buf[] The length of the string held in the array



  char buf[]; // Hold an array of strings

}

Copy the code

Where buf[] is the char array that actually holds the string. Free represents the number of unused bytes in the buf[] array; Len represents the length of the string held in the buf[] array.


For example, buf[] is a string of 6 bytes, free is 0, but eagle-eyed students will notice that it is 7 characters, and there is a “\0”.

As mentioned above, SDS does not use C strings directly, but still uses some C features, such as following the rule that C strings end with a space character, so that part of C string functions can be used. For SDS, a byte taken up by the empty string is not counted in the len attribute and will be allocated extra space.

With a brief understanding of the structure of SDS, let’s take a look at the advantages of SDS over C strings.

High efficiency

For example, when we use redis in work, we often get the length of a string by STRLEN command. Len attribute records the length of a string in SDS structure, so we get the length of a string and take len directly, the complexity is O(1).


In the case of C string, when obtaining the length of a string, the whole string must be traversed until the end of the string is traversed (a space in C represents a complete string), and the complexity is O(N).

In high concurrency scenarios, frequently traversing a string and getting the length of the string is likely to be a performance bottleneck for Redis, so SDS performs better.

Data overflow

As mentioned above, C strings do not record their length. Two adjacent strings may be stored in the following way, allocating appropriate memory space for the string.


If I want to change “programmer’s internal matter” to “programmer’s internal matter 123”, but the allocated memory is only 6 bytes, the modified string needs 9 bytes to put down.


There is no way but to occupy the space of adjacent strings, and the data overflow causes the contents of other strings to be modified.

When we need to modify data, we will first check whether len of the current SDS space meets the requirement. If not, the space will be automatically expanded to the size required for modification, and then the modification will be performed, as shown in the following figure.


However, there is a special place, after expanding the 6 bytes of “programmer’s internal point” to 9 bytes of “programmer’s internal point 123”, it is found that the value of the free attribute becomes the total length of the string after expansion, which involves the memory redistribution strategy mentioned below.

Memory reallocation policy

C String length is fixed, so every time a string is grown or shortened, memory allocation is required. The memory allocation algorithm is usually a time-consuming operation, and it is acceptable if the program does not modify the string frequently.

Unfortunately, as redis is a database, data is bound to be changed frequently, and if a memory reallocation is performed for each change, performance can be severely affected.

SDS can solve the problem of memory allocation when the string is growing or shortening by using two memory reallocation strategies.

1. Pre-allocate space

Space pre-allocated strategy to optimize SDS strings growth, when modify the string and extend to the space of SDS, not only can modify the necessary space for SDS distribution, also for free distribution of SDS extra unused space, the next time modified to check whether the unused space free meet, meet, don’t have to expand in space.

By using the space preallocation strategy, Redis can effectively reduce the number of memory reallocation caused by continuous string growth operations.


Rules for allocating extra unused space free:

  • If the SDS string is modified,lenValue is less than1M, then additional unused space is allocated at this timefreeWith the size of thelenThe same.
  • If the SDS string is modified,lenValue greater than or equal to1M, then additional unused space is allocated at this timefreeThe size of1M.

2. Lazy space release

The lazy space free strategy is used to optimize the SDS string shortening operation. When the SDS string is shortened, the memory is not immediately reallocated to reclaim the excess space. Instead, the free attribute is used to record the space and can be used directly if there is a subsequent growth operation.


Data format diversity

The characters in a C string must conform to certain encoding formats, and as mentioned above, a C string must end with a \0 null character to indicate the end of a string. Therefore, the string cannot contain \0, otherwise it will be mistaken for multiple characters.

Due to this limitation, C strings can only store text data, but binary data such as audio, video and pictures cannot be stored.

Redis operates on the Buf array as a binary, so any data stored in it is restricted and filtered, as long as it is stored in and taken out.

conclusion

The above is just a little basic knowledge of redis data structure, it is not difficult, but according to my interview experience, if asked this kind of question, do not just vaguely say that the underlying is SDS, and explain why it is implemented in this way.

On the one hand, it can show a solid basic skills, if the expression is clear, it is a very good plus; In an initiative to dispel the idea of the interviewer to ask, of course, afraid of not according to the routine card!


Sorted out hundreds of various kinds of technical e-books, there is a need for the students public number [programmer point matter] within the reply [666] self. The technology group is almost full, want to enter the students can add my friend, and the big guys blow technology together.