1. What is SDS?

Redis’ underlying implementation of String data structure is based on SDS.

However, Redis is developed in C language. The bottom layer of Redis does not use the traditional String representation of C language, namely the character array ending with null characters, but uses the Simple Dynamic String specially designed for it as its default String representation. Its English full name is Simple Dynamic String, referred to as SDS.

2. SDS definition

struct sdshdr {
      // The number of bytes used in the record buff array is equal to the length of the string saved by SDS
      int len;
      // Records the number of unused bytes in the buff array
      int free;
      // Array of bytes to hold strings
      char buf[];
}
Copy the code

Similar to the Java String String, String stores characters through char[] (JDK 1.8).

SDS, like the C string, ends with a ‘\0’ notation, which does not count toward the used length. The advantage of this is that you can reuse some of the functions in the C string library.

Such as:

127.0. 01.:6379> set name "HelloCoder"
Copy the code

At the bottom of Redis is stored like this:

The key and value of the key-value pair are both a string object, and the underlying implementation of the object is two SDS structures that hold the strings name and HelloCoder respectively.

3, SDS and C string difference

This paper mainly describes the advantages of SDS over C string, and also the reason why SDS is more suitable for Redis than C language string.

3.1. Get the time complexity of string length

C strings need to be traversed, and the time complexity is O(n).

SDS is obtained directly, the time complexity is O(1), because there is len function to record the length of the string.

3.2. Prevent buffer overflow

C language does not record its own length, string in the execution of the concatenation string, if the length is not enough will cause a buffer overflow problem.

The /strcat function concatenates the contents of the SRC string to the end of the dest string:

char *strcat(char *dest, const char *src);
Copy the code

The SDS space allocation strategy completely eliminates this possibility. When THE API needs to modify the SDS, the API will first check whether the SDS space meets the requirements. If not, the API will automatically dynamically expand its space.

3.3. Reduce the number of memory reallocations caused by string changes

There is a correlation between the length of the C string and the length of the underlying array. Every time a C string is increased or decreased, the program must reallocate the data of the C string, otherwise there will be a memory leak.

Because Redis frequently manipulates data, memory allocation and free time may affect performance. SSD avoids this defect and implements two optimization strategies of space pre-allocation and lazy space free, instead of reallocating memory every time.

1. Pre-allocate space

The following reference is from Redis Design and Implementation

  • Len is less than 1 M

If len is less than 1 M, then the size allocated to Free is the same as len. For example, if len is changed to 10 bytes, then free is also 10 bytes (unused space), and the actual length of BUF becomes 10 bytes + 10 bytes + 1byte

1byte is the ending ‘\0’

  • Len is going to be greater than or equal to 1 M

If len is greater than or equal to 1M, then 1M is allocated to free. For example, if len is changed to 10M, 1M is allocated to free. The actual buf length is 10M + 1M + 1 byte.

When modifying, first check whether the space is enough, if so, directly use, otherwise perform memory reallocation.

2, inert space release

When SDS length is shortened, Redis does not free memory, but records to the free field and waits for next use. At the same time, the corresponding API is provided to manually free the memory.

Like before:

127.0. 01.:6379> set name "HelloCoder"
Copy the code

Len of name is equal to 10, now set to

127.0. 01.:6379> set name "HaC"
Copy the code

Len is now 3 and free is now 17 (10 + 10-3 = 17).

The first 10 is the original free size

3.4. Binary security

C strings can only hold Spaces at the end of the string. If there are Spaces in the middle (blank characters will be mistaken for the end of the string), they will be intercepted as the end of the string. In this way, binary data such as pictures, audio and video cannot be saved.

For example, save a special string and convert it to binary bytes: if a ‘\0’ appears in the middle of the preserved data, the C string function will only recognize the “Redis” in it and ignore the “Cluster” after it.

In SDS, however, there is no problem because it uses len instead of a null character to determine the end.

All SDS apis process data in SDS BUF arrays in binary form without limiting, filtering, or modifying the data.

Data is read as it is written.

3.5. Compatible with some C string functions

Redis retains the \0 ending, although it allocates an extra byte of space, in order to reuse some of the functions defined by the C

library.

For example, you can reuse C’s StrcasecMP comparison function to avoid unnecessary code duplication.

4, summarize

The simple reason why Redis uses SDS is to modify the native implementation of C to make it more flexible and efficient.

A graph summarizes the benefits of SDS: