reference

Redis data structure: SDS dynamic string

Redis source code interpretation (a): basic data structure SDS

1. Replace C’s default char* type with Simple Dynamic String

Redis does not directly use the string of C language, but defines a string data structure by itself. SDS is the default string, and all the key values we set are basically SDS

C language string features:

  • Evaluates the length of the string each timestrlen(s)The time complexity of isO(n)
  • use\ 0Terminators determine the end of a string, a rule that makes C strings binary unsafe
  • To append a string N times, you must reallocate the string N times

In terms of performance, the above three problems can be solved as follows:

  • How to implementO(1)Time complexity length query:
    • Make the string data structure contain its own length attribute => custom string data structure
  • How to implement binary security:
    • Because it’s implementedlengthProperties no longer need to be in a special format (\ 0) parses the data, so binary is safe
  • How to reduce the number of memory reallocations:
    • According to a certain mechanism to determine the size of the expanded memory, and then perform the append operation, after the expansion of the excess space is not released, convenient next time to append the array, the price is a waste of a little memory, but to achieve the effect of space for time

2. Data structure of Redis SDS

Github source code SRC/SDs.h, structure declaration code is as follows:

struct __attribute__(((packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__(((packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__(((packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__(((packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__(((packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

#define SDS_TYPE_5  0
#define SDS_TYPE_8  1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
#define SDS_TYPE_MASK 7
#define SDS_TYPE_BITS 3

// Get the header pointer from buf
#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));
Copy the code

Although sdSHDR5, SDSHDR8, sdSHDR16, sdSHDR32, and sdSHDR64 are declared above, they can be summarized as follows:

  • lenThe length of the current byte array is not included\ 0
  • allocRecords the total memory allocation of the current byte array, excluding\ 0
  • flagsThat records the current byte arraySDS_TYPE
  • bufHolds the real value of the string as well as the one at the end\ 0

Look at an example of SDSHdr8, the entire SDS memory is continuous, unified open, through this way can be addressed by the buF header pointer, get the pointer to the whole struct

2.1 attribute ((packed) Role: To save money within the province

Compiler optimizations for memory alignment: Structs allocate memory that is an integer multiple of the largest internal element

__attribute__ ((__packed__)) tells the compiler not to optimize the alignment of the structure so that the fields within the structure are next to each other

printf("%ld\n".sizeof(struct sdshdr8));  / / 3
printf("%ld\n".sizeof(struct sdshdr16)); / / 5
printf("%ld\n".sizeof(struct sdshdr32)); / / 9
printf("%ld\n".sizeof(struct sdshdr64)); / / 17
Copy the code

Take SDSHdr32 as an example, its internal maximum element is 4(Uint32_t takes up 4 bytes), without memory alignment, saving 4* 3-9 = 3 bytes; similarly, sdSHdr64 saves 8* 3-17 = 7 bytes.

2.2 Why is there a distinction between SDSHDR5 and SDSHDR8

In most scenarios, no developer will give keys a particularly long name. To make keys into SDS strings, store the length of the string in sdshDr. len. How to select the type of len?

  • uint8: There must be strings longer than 2^ 8-1
  • uint16: There must be strings longer than 2^ 16-1
  • uint32: There must be strings longer than 2^ 32-1
  • uint64: 99% of the time string lengths are very short, 8 bytes is extremely wasteful

We can divide SDSHDR into several types based on the length of the string. We can see another small detail: /* 3 LSB of type, 5 unused bits */; /* 3 LSB of type, 5 unused bits */

3. Create a SDS

Call:

mysds = sdsnewlan("abc", 3);
Copy the code

See notes for analysis

sds sdsnewlen(const void *init, size_t initlen) {
    void *sh;
    sds s;
    // Determine 'SDS_TYPE' according to the content length 'initlen'
    char type = sdsReqType(initlen);
    // Empty strings use type SDS_TYPE_8, because empty strings are usually used for append operations, SDS_TYPE_5 is not suitable
    if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8;
    // Get the structure size
    int hdrlen = sdsHdrSize(type);
    // flags pointer
    unsigned char *fp;
    // Allocate memory: structure size + string size +1 (' \0 ')
    sh = s_malloc(hdrlen+initlen+1);
    if (sh == NULL) return NULL;
    if (init==SDS_NOINIT)
        init = NULL;
    else if(! init)// Empty string initializes memory to 0
        memset(sh, 0, hdrlen+initlen+1);
    s = (char*)sh+hdrlen;
    // Get the flags pointer
    fp = ((unsigned char*)s)- 1;
    switch(type) {
        case SDS_TYPE_5: {
            // The first 5 bits of the flags field of SDS_TYPE_5 save the length and the last 3 bits save the type
            *fp = type | (initlen << SDS_TYPE_BITS);
            break;
        }
        case SDS_TYPE_8: {
            SDS_HDR_VAR(8,s);       // Get a pointer to SDSHDR
            sh->len = initlen;      / / set the len
            sh->alloc = initlen;    / / set the alloc
            *fp = type;             / / set type
            break;
        }
        case SDS_TYPE_16: {
            SDS_HDR_VAR(16,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_32: {
            SDS_HDR_VAR(32,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_64: {
            SDS_HDR_VAR(64,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break; }}if (initlen && init)
        memcpy(s, init, initlen);   // Memory copy
    s[initlen] = '\ 0';              // Set the last bit of the character array to \0
    return s;
}
Copy the code

4. The splicing SDS

sds sdscatlen(sds s, const void *t, size_t len) {
    size_t curlen = sdslen(s);      // Gets the current string length

    s = sdsMakeRoomFor(s,len);      / / capacity
    if (s == NULL) return NULL;
    memcpy(s+curlen, t, len);       // Memory copy
    sdssetlen(s, curlen+len);       // Update the len attribute
    s[curlen+len] = '\ 0';           // Append a \0 to the end
    return s;
}
Copy the code

The focus is on sdsMakeRoomFor, which reduces the number of memory allocations for splicing operations through policies

sds sdsMakeRoomFor(sds s, size_t addlen) {
    void *sh, *newsh;
    Alloc - sh->len;
    size_t avail = sdsavail(s);
    size_t len, newlen;
    char type, oldtype = s[- 1] & SDS_TYPE_MASK;
    int hdrlen;

    // If the remaining space is sufficient, no expansion is required
    if (avail >= addlen) return s;
    len = sdslen(s);
    sh = (char*)s-sdsHdrSize(oldtype);
    newlen = (len+addlen);
    If the memory size is smaller than 1MB, double the memory size. If the memory size is smaller than 1MB, allocate 1MB more
    if (newlen < SDS_MAX_PREALLOC)
        newlen *= 2;
    else
        newlen += SDS_MAX_PREALLOC;

    // There is a comment for SDS_TYPE_5: sdshdr5 is never used
    type = sdsReqType(newlen);
    if (type == SDS_TYPE_5) type = SDS_TYPE_8;
    // Check whether the type changes before and after capacity expansion
    hdrlen = sdsHdrSize(type);
    if (oldtype==type) {
        newsh = s_realloc(sh, hdrlen+newlen+1);
        if (newsh == NULL) return NULL;
        s = (char*)newsh+hdrlen;
    } else {
        newsh = s_malloc(hdrlen+newlen+1);
        if (newsh == NULL) return NULL;
        memcpy((char*)newsh+hdrlen, s, len+1);
        s_free(sh);
        s = (char*)newsh+hdrlen;
        s[- 1] = type;
        sdssetlen(s, len);
    }
    sdssetalloc(s, newlen);
    return s;
}
Copy the code

SDS summary:

  • throughsds->lenReduced the time complexity of retrieving string length toO(1), so that strings are not limited to C strings\ 0Terminator to achieve binary security
  • Reduce the number of memory reallocations for splicing operations by using the memory preallocation policy (double the number of memory reallocations for splicing operations by less than 1mb; otherwise, increase the number by 1mb) : space swap time
  • There will always be insds->bufAppend one to the end of\ 0In some scenarios, it behaves the same as a C string
  • Internally: the memory of the entire SDS is continuous and can be addressed to any value