Redis is a key-value database. Here is a simple Redis command:

> SET msg "hello wolrd"
Copy the code

This command saves the key “MSG” and the value “Hello Wolrd” to the Redis database. This chapter examines how Redis stores these strings in memory.

redisObject

RedisObject server. H /redisObject is an abstract type defined by Redis for data stored internally. Before analyzing Redis data types in depth, we first understand redisObject.

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS;
    int refcount;
    void *ptr;
} robj;
Copy the code
  • Type: indicates the data type.
  • Encoding: Encoding format, that is, data structure used to store data. For the same type of data, Redis will use different encoding according to the amount of data, memory usage, etc., to maximize memory saving.
  • Refcount, reference count. To save memory, Redis will refer to the same redisObject in multiple places.
  • PTR: Points to the actual data structure, such as SDS, where the real data is stored.
  • Lru: 24 bits, LRU timestamp or LFU count.

RedisObject is responsible for loading all keys and values in Redis. Redisobject. PTR points to the data structure where the data is actually stored, and properties such as redisObject. refcount and redisObject.lru are used to manage data (data sharing, data expiration, etc.).

Note: Type, encoding, and lru use C bit definitions. These three attributes use different bits of the same unsigned int. This maximizes memory savings.

Redis defines the following data types and encodings, as shown in Table 1-1.The first five data types in Table 1-1 will be analyzed in Part 1 of this book, and the last two data types will be analyzed in Part 5. If the reader is currently confused by table 1-1, proceed with the book in doubt.

sds

As we know, C uses a null-terminated character array as a String, but Redis extends this by defining the String type SDS (Simple Dynamic String). Redis keys are all strings, and the simplest Redis value type is also string. String Redis values can be used in many scenarios, such as caching HTML fragments and logging user login information.

define

Note: Codes in this section are in SDS.h/SDS.c unless otherwise specified.

Redis defines different SDS structures for strings of different lengths:

typedef char *sds; struct __attribute__ ((__packed__)) sdshdr5 { unsigned char flags; char buf[]; }; struct __attribute__ ((__packed__)) sdshdr8 { uint8_t len; uint8_t alloc; unsigned char flags; char buf[]; }; .Copy the code

Redis also defines sdSHDR16, sdSHDR32, and sdSHDR64 constructs. The code of sdSHdr16, sdSHdr32 and sdSHdr64 are basically the same as sdSHdr8 except that len and alloc attributes use uint16_t, uint32 and UINT64_t. Redis defines different SDSHDR structures to maximize memory savings by using appropriate len and alloc property types for strings of different lengths.

  • Len: Byte length is used, that is, string length. Sdshdr5 can store less than 32 (25) strings, sdshdr8 can store less than 256 (28) strings, and so on. Because this property records the string length, SDS can get the string length in constant time. Redis limits the maximum length of a string to 512MB.
  • Alloc: Indicates the length of requested bytes, that is, the total length of SDS. Alloc -len is the available (free) space in SDS.
  • Flag: The lower 3 bits represent the type of SDSHDR, and the higher 5 bits are only used in sdshdr5 to represent the length of the string. Therefore, sdSHdr5 does not have len. In addition, since Redis defines sdSHdr5 as a constant string and does not support expansion, there is no alloc attribute.
  • Buf: String content, SDS follows the C language string specification, saves a null character as the end of buF, and does not count len, alloc attributes. In this way, SDS can be operated directly by STRCMP, strCPy and other functions of C language.

Tip: The BUF array in the SDSHDR structure does not specify an array length. It is a flexible array defined by the C99 specification – the last property in the structure can be defined as a variable-sized array (it must be preceded by other properties). The sizeof function is used to calculate the sizeof the structure containing the flexible array. The return result does not include the memory occupied by the flexible array. In addition, the attribute((Packed)) keyword unaligns bytes inside the structure to save memory.

Operation analysis

Let’s look at the SDS builder:

sds sdsnewlen(const void *init, size_t initlen) { void *sh; sds s; // [1] char type = sdsReqType(initlen); // [2] if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8; // [3] int hdrlen = sdsHdrSize(type); unsigned char *fp; /* flags pointer. */ sh = s_malloc(hdrlen+initlen+1); . // [4] s = (char*)sh+hdrlen; fp = ((unsigned char*)s)-1; switch(type) { case SDS_TYPE_5: { *fp = type | (initlen << SDS_TYPE_BITS); break; } case SDS_TYPE_8: { SDS_HDR_VAR(8,s); sh->len = initlen; sh->alloc = initlen; *fp = type; break; }... } if (initlen && init) memcpy(s, init, initlen); s[initlen] = '\0'; // [5] return s; }Copy the code

Parameter Description:

  • Init, initlen: string content and length.

[1] Determine the corresponding SDSHDR type according to the length of the string. [2] Strings of length 0 usually need to be expanded later, and sdshdr5 should not be used, so this is converted to sdshdr8. [3] the sdsHdrSize function is responsible for querying the length of the SDSHDR structure, and the s_malloc function is responsible for applying the memory space. The length of the memory space is hdrlen+initlen+1, where hdrlen is the length of the SDSHDR structure (excluding the BUF attribute). Initlen is the length of the string content, and the last byte is used to store the null character “\0”. S_malloc allocates a specified amount of memory, the same as the C malloc function. [4] Assign the SDSHDR attribute. SDS_HDR_VAR is a macro that converts the SH pointer to the corresponding SDSHDR structure pointer. [5] Note that SDS is actually an alias for char*, where the returned S pointer points to the sdSHdr. buf property, the contents of the string. Redis uses this pointer to read/write string data directly.

Create an SDS with the content of “Hello Wolrd”, as shown in Figure 1-1.

The expansion mechanism of SDS is a very important function.

sds sdsMakeRoomFor(sds s, size_t addlen) {
    void *sh, *newsh;
    // [1]
    size_t avail = sdsavail(s);
    size_t len, newlen;
    char type, oldtype = s[-1] & SDS_TYPE_MASK;
    int hdrlen;

    if (avail >= addlen) return s;
    // [2]
    len = sdslen(s);
    
    sh = (char*)s-sdsHdrSize(oldtype);
    newlen = (len+addlen);
    // [3]
    if (newlen < SDS_MAX_PREALLOC)
        newlen *= 2;
    else
        newlen += SDS_MAX_PREALLOC;

    // [4]    
    type = sdsReqType(newlen);
    if (type == SDS_TYPE_5) type = SDS_TYPE_8;
    // [5]
    hdrlen = sdsHdrSize(type);
    if (oldtype==type) {
        newsh = s_realloc(sh, hdrlen+newlen+1);
        if (newsh == NULL) return NULL;
        s = (char*)newsh+hdrlen;
    } else {
        newsh = s_malloc(hdrlen+newlen+1);
        if (newsh == NULL) return NULL;
        memcpy((char*)newsh+hdrlen, s, len+1);
        s_free(sh);
        s = (char*)newsh+hdrlen;
        s[-1] = type;
        sdssetlen(s, len);
    }
    // [6]
    sdssetalloc(s, newlen);
    return s;
}
Copy the code

Parameter description: addlen: The available length (alloc-len) must be greater than this parameter. [1] Get the current available space length. If the current available space length meets the requirement, it is returned directly. [2] Sdslen is responsible for obtaining the string length. As the string length is recorded in SDS.len, the complexity of this operation is O(1). Here the len variable is the length of the original SDS string and the newlen variable is the length of the new SDS. Sh refers to the SDSHDR structure of the original SDS. [3] Pre-allocation ratio parameter requires more memory space, avoiding memory copy operation every time expansion. If the length of the new SDS is less than SDS_MAX_PREALLOC (default: 1024 x 1024, in bytes), the length of the new SDS is automatically doubled. Otherwise, the new SDS length automatically increases SDS_MAX_PREALLOC. [4] sdsReqType(newlen) is responsible for calculating the new SDSHDR type. Note The expanded type does not use sdSHdr5 and does not support capacity expansion. [5] If SDS is of the same type after expansion, the s_realloc function is used to apply for memory. Otherwise, because the SDS structure has changed, the entire SDS must be moved, the new memory space allocated directly, and the original string contents copied into the new memory space. The s_realloc function is the same as the C realloc function. It is responsible for reallocating the memory of a given pointer to a given size. It will try to reallocate the pointer on the original address space, if the original address space cannot meet the requirements, allocate new memory space and copy the content. [6] Update the sdshdr.alloc attribute.

Call sdsMakeRoomFor(SDS,64) to the SDS of “Hello Wolrd” above, and the generated SDS is shown in Figure 1-2.

As you can see in Figure 1-2, after len is used to record the length of the string, null characters can be stored in the string. Redis strings support binary security and can store user input as raw data streams without any particular format meaning, so Redis strings can store any data, such as image data streams or serialized objects. C strings treat null characters as specific marker characters at the end of strings, which are not binary safe. Table 1-2 lists the common functions of SDS.

function role
Sdsnew, sdsempty Create the SDS
Sdsfree, SDSClear, sdsRemoveFreeSpace Release the SDS, clear the string content in the SDS, and remove the remaining space of the SDS
sdslen Gets the SDS string length
sdsdup Copies the given string into SDS, overwriting the original string
sdscat Concatenate the given string to the SDS string content
sdscmp Compares two SDS strings to see if they are the same
sdsrange Gets substrings. Strings that are not in the specified range are cleared

coding

There are three encodings for string types:

  • OBJ_ENCODING_EMBSTR: a string whose length is less than or equal to OBJ_ENCODING_EMBSTR_SIZE_LIMIT (44 bytes).

In this code, the redisObject and SDS structures are stored in a continuous memory block, as shown in Figure 1-3.

The OBJ_ENCODING_EMBSTR encoding is an optimization of Redis for short strings. It has the following advantages: (1) Memory requester and free only need to call the memory operation function once. (2)redisObject and SDSHDR structures are stored in a continuous piece of memory to reduce memory fragmentation.

  • OBJ_ENCODING_RAW: a string larger than OBJ_ENCODING_EMBSTR_SIZE_LIMIT, in which redisObject and SDS structures are stored in two discontiguous memory blocks.
  • OBJ_ENCODING_INT: Converts numeric string to integer to greatly reduce the memory space occupied by data. For example, the string “123456789012” takes 12 bytes. In Redis, it is converted to long long, which takes only 8 bytes.

When we send a request to Redis, Redis parses the request and converts the command and parameters into redisObjec. Object. C/createStringObject function responsible for completing this operation:

robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}
Copy the code

As you can see, encoding is converted to OBJ_ENCODING_RAW or OBJ_ENCODING_EMBSTR’s redisObject, depending on the string length.

After converting the parameter to a redisObject, Redis stores the redisObject in the database, for example:

> SET Introduction "Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. "
Copy the code

Redis will set the key “Introduction” and the value “Redis…” Convert two RedisObjects into redisObjects and store the RedisObjects in the database. Figure 1-4 shows the result.

The keys in Redis are all strings and are encoded using OBJ_ENCODING_RAW, OBJ_ENCODING_ EMBSTR, and Redis tries to convert string values to OBJ_ENCODING_INT. Object. C/tryObjectEncoding function do this:

robj *tryObjectEncoding(robj *o) { long value; sds s = o->ptr; size_t len; . // [1] if (o->refcount > 1) return o; len = sdslen(s); // [2] if (len <= 20 && string2l(s,len,&value)) { // [3] if ((server.maxmemory == 0 || ! (server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) && value >= 0 && value < OBJ_SHARED_INTEGERS) { decrRefCount(o); incrRefCount(shared.integers[value]); return shared.integers[value]; } else { // [4] if (o->encoding == OBJ_ENCODING_RAW) { sdsfree(o->ptr); o->encoding = OBJ_ENCODING_INT; o->ptr = (void*) value; return o; } else if (o->encoding == OBJ_ENCODING_EMBSTR) { // [5] decrRefCount(o); return createStringObjectFromLongLongForValue(value); } } } // [6] if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) { robj *emb; if (o->encoding == OBJ_ENCODING_EMBSTR) return o; emb = createEmbeddedStringObject(s,sdslen(s)); decrRefCount(o); return emb; } // [7] trimStringObjectIfNeeded(o); return o; }Copy the code

[1] This data object is referenced in many places, so encoding operation cannot be carried out any more, otherwise the normal operation of other places will be affected. [2] If the string length is less than or equal to 20, the string2l function is called to try to convert it to long long, returning 1 on success. In THE C language, long long occupies 8 bytes and ranges from -9223372036854775808 to 9223372036854775807. Therefore, it can save the converted value of the string with a maximum length of 19, plus the sign bit of the negative number, a total of 20 digits. The following are the processing steps in which strings can be converted to OBJ_ENCODING_INT encoding. [3] First try using shared data in shared.integers to avoid wasting memory by repeatedly creating the same data objects. Shared is a shared data set created when Redis is started, storing common shared data in Redis. Shared. integers is an array of integers that holds small numbers from 0 to 9999, shared in various usage scenarios. Note: If server.maxMemory is configured and a deprecated algorithm (LRU, LFU) that does not support shared data is used, then shared data cannot be used because each data must have a redisobjec.lru attribute for these algorithms to work properly. [4] If shared data cannot be used and the original encoding format is OBJ_ENCODING_RAW, replace the original SDS type of redisObject. PTR with the converted value of the string. [5] If shared data cannot be used and the original encoding format is OBJ_ENCODING_EMBSTR, redisObject and SDS are stored in the same memory block, redisObject.ptr cannot be directly replaced. So call createString – ObjectFromLongLongForValue function to create a new redisObject, coding for OBJ_ENCODING_INT, Redisobject. PTR points to long long or long. [6] The string cannot be converted to OBJ_ENCODING_INT encoding. Try converting it to OBJ_ENCODING_EMBSTR encoding. [7] At this point, the string can only be encoded using OBJ_ENCODING_RAW in an attempt to free up the remaining free space in SDS. String type implementation code in T_string. c, the reader can view the source code for more implementation details.

Tip: server. C/redisCommandTable defines each Redis command and corresponding processing function, readers can find interested from this command handler.

struct redisCommand redisCommandTable[] = { ... {"get",getCommand,2, "read-only fast @string", 0,NULL,1,1,1,0,0,0}, {"set",setCommand,-3, "write use-memory @string", 0, NULL, 1,1,1,0,0,0},... }Copy the code

GET commands are processed by getCommand, SET commands are processed by setCommand, and so on.

Alternatively, we can view the TYPE of the data OBJECT using the TYPE command and the ENCODING using the OBJECT ENCODING command:

> SET msg "hello world"
OK
> TYPE msg
string
> OBJECT ENCODING  msg
"embstr"
> SET Introduction "Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. "
OK
> TYPE Introduction
string
> OBJECT ENCODING  info
"raw"
> SET page 1
OK
> TYPE page
string
> OBJECT ENCODING  page
"int"
Copy the code

Conclusion:

  • All keys and values in Redis are redisObject variables.
  • SDS is a string type defined by Redis, supporting binary security and expansion.
  • SDS can obtain string length in constant time and reduce memory copy times using a pre-allocated memory mechanism.
  • Redis’s main goal for encoding data is to maximize memory savings. The string type can be OBJ_ENCODING_ RAW, OBJ_ENCODING_EMBSTR, or OBJ_ENCODING_INT.

This article is excerpted from the author’s new book ** “Redis Core Principle and practice” **, this book in-depth analysis of Redis common features of the internal mechanism and implementation, most of the content from the Redis source code analysis, and summarizes the design ideas, implementation principle. The inner workings of Redis can be quickly and easily understood by reading this book.

With the consent of the editor of the book, I will continue to publish some chapters in the book on binecy, as a preview of the book, you are welcome to check it out, thank you.

Jingdong link douban link