Implementation of memory encoded data structures

This part mainly introduces the Redis special memory encoded data structure, and it is recommended to understand with images.

The integer set

Intset.h and intset.c

The set of integers (intset) is one of the underlying implementations of the set key: Redis uses the set of integers as the underlying implementation of the set key when a set contains only integer numeric elements and the set has a small number of elements (see below).

Structure definition

typedef struct intset {

    // Encoding mode
    uint32_t encoding;

    // The number of elements the collection contains
    uint32_t length;

    // Hold an array of elements
    int8_t contents[];

} intset;
Copy the code

Where content is the underlying implementation of the set of integers, and all elements are elements of the contents array. It is worth noting that although the array is declared int8_T, the data type stored in the array is determined by encoding:

Encoding has three options: intset_enc_int16/32/64.

#define INTSET_ENC_INT16 (sizeof(int16_t)) #define INTSET_ENC_INT32 (sizeof(int32_t)) #define INTSET_ENC_INT64 (sizeof(int64_t))Copy the code

Upgrade operations

When inserting a data larger than the current encodeing type into the current set of integers, an irreversible upgrade must be performed: all element encodings must be upgraded to an encoding large enough to accommodate the new element.

Specifically divided into three steps:

1. Expand the size of the underlying array of the integer collection based on the type of the new element.

2. Convert all existing elements of the underlying array to the same type as the new element, and place the converted elements in the correct bit. In the process of placing elements, the ordered nature of the underlying array should be maintained unchanged.

3. Add the new element to the underlying array.

Here’s an example:

To start, the contents data type of the integer collection is INT16, which stores the following elements:

Now you need to insert a number: 65535, which is out of the INT16 data range, so you need to upgrade. * 64*4=256 bits; now there are only 48 bits;

Space allocated, now we need to transfer the original element (because the original number is stored in INT16-bit format), to ensure the order of the elements, we need to transfer the last digit 3:

After transferring the old character, insert the new element 65535:

The time complexity of an upgrade is O (N). It is also worth noting that the element causing the upgrade operation is either at index 0 (too negative) or at Length-1 (too large).

The reason why Redis’s integer array is upgraded in this way has two advantages:

The first is to save memory, if you want to hold INT64 numbers, then the traditional way is to declare the INT64 array directly, but sometimes does not need to INT64 format storage, memory waste; INT64 is allocated only when it is really needed.

Second, it is flexible. The set of integers can be automatically upgraded to accommodate new elements by the underlying array, so we can arbitrarily add integers of type INT16_t, INT32_t, or INT64_t to the set without worrying about typing errors. This is very flexible.

PS: Upgrade cannot be downgraded.

Interface API

The overview

function role Time complexity
intsetNew Creates a new collection of integers. O(1)
intsetAdd Adds the given element to the set of integers. O(N)
intsetRemove Removes the given element from the set of integers. O(N)
intsetFind Checks whether the given value exists in the collection. Because the underlying array is ordered, the search can be done by binary search, so the complexity is O(logN).
intsetRandom Returns a random element from the collection of integers. O(1)
intsetGet Retrieves the element of the underlying array at the given index. O(1)
intsetLen Returns the number of elements contained in the collection of integers. O(1)
intsetBlobLen Returns the number of bytes of memory occupied by the collection of integers. O(1)

What’s curious is how to return the value I want from the underlying array:

/* * Return the element of the underlying array of the collection on the POS index based on the given enC encoding. * T = O(1) */ static int64_t _intsetGetEncoded(intset *is, int pos, uint8_t enc) { int64_t v64; int32_t v32; int16_t v16; // (ENCODING*)is->contents) convert the array back to the encoded type // Then (ENCODING*)is->contents)+pos calculates the correct position in the array // memcpy(&venc,... , sizeof(vEnc)) and then copy the correct number of bytes from the array // If necessary, If (enc == INTSET_ENC_INT64) {if (enc == INTSET_ENC_INT64) { memcpy(&v64,((int64_t*)is->contents)+pos,sizeof(v64)); memrev64ifbe(&v64); return v64; } else if (enc == INTSET_ENC_INT32) { memcpy(&v32,((int32_t*)is->contents)+pos,sizeof(v32)); memrev32ifbe(&v32); return v32; } else { memcpy(&v16,((int16_t*)is->contents)+pos,sizeof(v16)); memrev16ifbe(&v16); return v16; }} /* * Depending on how the collection is encoded, Static int64_t _intsetGet(intset *is, int pos) { return _intsetGetEncoded(is,pos,intrev32ifbe(is->encoding)); }Copy the code

No, the above operation is only inserted at the specified POS, so how to determine the POS? In fact, the integers in the integer set are all ordered, arranged from smallest to largest, so when inserting or searching for a value, binary search is used first. The following is a function to insert a value into the integer set:

intset *intsetAdd(intset *is, int64_t value, Uint8_t valenc = _intsetValueEncoding(value); uint32_t pos; If (success) *success = 1; // If the encoding of value is larger than the encoding of the integer set // then value must be added to the integer set // and the integer set needs to upgrade itself, If (valenc > intrev32ifbe(is->encoding)) {return intsetUpgradeAndAdd(is,value); } else {// The existing encoding of integer sets applies to value; // Look for value in integer sets to see if it exists: // set *success to 0 and return an unmodified set of integers. If (intsetSearch(is,value,&pos)) {if (success) *success = 0; return is; } // Run here, Is = intsetResize(is,intrev32ifbe(is->length)+1); is = intrev32ifBe (is->length)+1); / / if the new element is not be added to the underlying array / / at the end of the need to move data from existing elements, take the location on the pos, is used to set the new value / / / / if the array is example: / / | | y | z | x? | / / | < -- - > | / / n of the pos and new elements to 1, then the array will move y, and z two elements / / | x | | | | z/y/y | < -- - > | / / so you can set the new element to the pos:  // | x | n | y | z | // T = O(N) if (pos < intrev32ifbe(is->length)) intsetMoveTail(is,pos,pos+1); } // Set the new value to the specified position in the underlying array _intsetSet(is,pos,value); Is ->length = intrev32ifbe(is->length)+1); // add a new element to the set of integers. }Copy the code

So how do you insert data?

/* * Set the value of the underlying array at the pos position to value depending on how the collection is encoded. */ static void _intsetSet(intset *is, int pos, Uint32_t encoding = intrev32ifbe(is->encoding); (Enc_t*)is->contents)[pos]; (Enc_t*)is->contents)[pos]; // Finally, ((Enc_t*)is->contents)+pos locates on the new value just set // If necessary, If (encoding == INTSET_ENC_INT64) {((int64_t*)is->contents)[pos] = value; memrev64ifbe(((int64_t*)is->contents)+pos); } else if (encoding == INTSET_ENC_INT32) { ((int32_t*)is->contents)[pos] = value; memrev32ifbe(((int32_t*)is->contents)+pos); } else { ((int16_t*)is->contents)[pos] = value; memrev16ifbe(((int16_t*)is->contents)+pos); }}Copy the code

See one of the memrev16/32/64IFbe functions, some curious, then F12, find a:

Void memrev16(void *p) {unsigned char *x = p, t; t = x[0]; x[0] = x[1]; x[1] = t; Void memrev32(void *p) {unsigned char *x = p, t; t = x[0]; x[0] = x[3]; x[3] = t; t = x[1]; x[1] = x[2]; x[2] = t; }Copy the code

Add about the big-endian and little-endian conversion, it is worth noting that the big-endian and little-endian storage order in memory is byte, not bit!

For example, in the small end: 00000000 10000000,

In the big end storage is not: 00000001 00000000 (i.e., bitwise reverse)

Instead, the reverse is done by byte: 10000000 00000000.

Schematic diagram:

Conversion mode:

So now memrev16/32 is clear: char* is used to fetch a single byte of data, and then the two bits are swapped, that is, big end to small end. Why not write Memrev64 without looking at the source code?

So how do collections get deleted? Blind guess is a copy of memory to move, look at it and sure enough:

static void intsetMoveTail(intset *is, uint32_t from, uint32_t to) { void *src, *dst; Uint32_t bytes = intrev32ifbe(is->length)-from; Uint32_t encoding = intrev32ifbe(is->encoding); DST = (Enc_t*)is_.contents+to record the position of the end of the move // bytes *= Sizeof (Enc_t) calculates how many bytes to move if (encoding == INTSET_ENC_INT64) {SRC = (int64_t*)is->contents+from; dst = (int64_t*)is->contents+to; bytes *= sizeof(int64_t); } else if (encoding == INTSET_ENC_INT32) { src = (int32_t*)is->contents+from; dst = (int32_t*)is->contents+to; bytes *= sizeof(int32_t); } else { src = (int16_t*)is->contents+from; dst = (int16_t*)is->contents+to; bytes *= sizeof(int16_t); } // make a move // T = O(N) memmove(DST, SRC,bytes); }Copy the code

List of compression

Source code see: ziplist.h and ziplist.c.

Compressed lists are sequential data structures composed of a series of specially encoded sequential memory blocks developed by Redis to save memory.

Redis’ ordered collections, hashes, and lists all use compressed lists directly or indirectly. Redis uses compressed lists as its underlying data store when ordered collections or hashes have a small number of elements and the elements are short strings or integers. Lists are stored using a QuickList data structure, which is a combination of a bidirectional list and a compressed list (see below).

A compressed list can contain any number of entries, each of which can hold either a byte array or an integer value. Here is the structure of a compressed list:

1. Zlbytes: the length of the compressed list is 4 bytes, so the longest compressed list is **(2^32)-1 bytes **.

Zltail: the offset of the last element of the compressed list relative to the start address of the compressed list, which is 4 bytes;

Zllen: The number of elements in the compressed list, in two bytes; So what happens when the number of elements in the compressed list exceeds (2^16)-2? In this case, the number of elements in the compressed list cannot be obtained through the zllen field. The number of elements must be obtained through the entire compressed list.

4. Entry: Compressed list stores elements, which can be byte arrays or integers.

Zlend: The end of the compressed list (not the last element), in one byte, always 0xFF.

In Redis, compressed lists are defined as follows:

/* * Create and return a new ziplist */ unsigned char *ziplistNew(void) {// ZIPLIST_HEADER_SIZE is the size of the ziplist header // +1 byte is the end of the table ZIP_END Unsigned int bytes = ZIPLIST_HEADER_SIZE+1; Unsigned char *zl = zmalloc(bytes); // Allocate space for the header and end of the table. ZIPLIST_BYTES(zl) = intrev32ifbe(bytes); ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE); ZIPLIST_LENGTH(zl) = 0; Zl [bytes-1] = ZIP_END; return zl; }Copy the code

Redis uses the following macro definition (char * zl points to the start of the compressed list) to quickly manipulate the compressed list to retrieve data for each field:

#define ZIPLIST_BYTES(zl) (*((uint32_t*)(zl)))) #define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t)))) #define ZIPLIST_LENGTH(zl) (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))) // returns the sizeof the ziplist header #define #define ZIPLIST_ENTRY_HEAD(zl) #define ZIPLIST_ENTRY_HEAD(zl) ((zl)+ZIPLIST_HEADER_SIZE) #define ZIPLIST_ENTRY_TAIL(zl) ((zl)+ intrev32IFbe (ZIPLIST_TAIL_OFFSET(zl))) ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1)Copy the code

Structure definition

The node information of the compressed list is defined as follows:

Typedef struct zlentry {// prevrawlen: prevrawlenSize: Prevrawlen unsigned int prevrawlenSize, prevrawlen; Unsigned int lensize, len; Prevrawlensize + lenSize unsigned int headerSize; // Unsigned char encoding; // A pointer to the current node unsigned char *p; } zlentry;Copy the code

The node details

According to the node definition code above, the compressed node information can be divided into three parts: previous_entry_length, Encoding and content, as shown below:

Now look at these three sections in detail.

1. Previous_entry_lentry: Record the length of the previous node, 1 or 5 bytes – if the previous node is less than 254 bytes, use 1 byte to store the previous node information otherwise use 5 bytes (the first byte is set to 0xFE (254) and the last four bytes are length).

The current node pointer and previous field provide quick access to the previous node, enabling list node backtracking.

2. Encoding field: Indicates the data type and length stored in the content. The value can be 1/2/5 bytes, indicating byte array or integer respectively. The detailed coding can be seen in the following table:

The byte array is encoded as follows:

coding The length of the code content Property to hold
00bbbbbb 1 byte An array of bytes less than or equal to 63 bytes.
01bbbbbb xxxxxxxx 2 – A byte array with a length of 16383 bytes or less.
10______ aaaaaaaa bbbbbbbb cccccccc dddddddd 5 bytes A byte array whose length is less than or equal to 4294967295.

The integer encoding is as follows:

coding The length of the code content Property to hold
11000000 1 byte An integer of type INT16_t.
11010000 1 byte The value is an integer of type int32_t.
11100000 1 byte An integer of type INT64_t.
11110000 1 byte 24 – bit signed integer.
11111110 1 byte An 8-bit signed integer.
1111xxxx 1 byte The node that uses this encoding does not have the corresponding content attribute because the encoding itself contains a value between 0 and 12, so it does not need the content attribute.

3. Content: Holds the value of the node, which can be a byte array or an integer. The following two examples represent an 11-byte byte array and an integer value that holds int16. Can you guess which is the 11-byte byte array?

An example of the address to which the compressed list pointer points is as follows:

API interface

The overview

function role Algorithm complexity
ziplistNew Create a new compressed list. O(1)
ziplistPush Create a new node with the given value and add the new node to the top or bottom of the compressed list. The average O (N ^ 2).
ziplistInsert Inserts a new node with the given value after the given node. The average O (N ^ 2).
ziplistIndex Returns the node on the given index of the compressed list. O(N)
ziplistFind Finds and returns the node containing the given value in the compressed list. Because the value of a node can be a byte array, the complexity of checking whether the node value is the same as the given value is O(N^2).
ziplistNext Returns the next node of the given node. O(1)
ziplistPrev Returns the previous node of a given node. O(1)
ziplistGet Gets the value held by the given node. O(1)
ziplistDelete Removes the given node from the compressed list. The average O (N ^ 2).
ziplistDeleteRange Deletes consecutive nodes of a compressed list on a given index. The average O (N ^ 2).
ziplistBlobLen Returns the number of bytes of memory currently occupied by the compressed list. O(1)
ziplistLen Returns the number of nodes that the compressed list currently contains. O(N) when the number of nodes is smaller than 65535.

Creating a compressed list

unsigned char *ziplistNew(void); /* To create an empty compressed list, just allocate initial storage (11=4+4+2+1 bytes) and initialize the ZLbytes, zltail, zllen, and zlend fields. */ unsigned char *ziplistNew(void) { //ZIPLIST_HEADER_SIZE = zlbytes + zltail + zllen; Unsigned int bytes = ZIPLIST_HEADER_SIZE+1; // Unsigned int bytes = ZIPLIST_HEADER_SIZE+1; unsigned char *zl = zmalloc(bytes); ZIPLIST_BYTES(zl) = intrev32ifbe(bytes); ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE); ZIPLIST_LENGTH(zl) = 0; 0XFF zl[bytes-1] = ZIP_END; return zl; }Copy the code

Insert elements

Function input parameter zl represents the first address of the compressed list, p points to the insertion position of the new element, S represents the data content, slen represents the data length, and the return parameter is the first address of the compressed list.

unsigned char *ziplistInsert(unsigned char *zl, unsigned char *p,  unsigned char *s, unsigned int slen);
Copy the code

When inserting an element, there are three simple steps: first, you need to encode the element content as an element in the compressed list, second, you need to reallocate space, and third, you need to copy the data. The implementation logic for each step is discussed separately below.

1) coding

Encoding computes the contents of the previous_entry_LENGTH, Encoding, and Content fields. How do I get the length of the previous element? At this point, it is necessary to discuss according to the position of the inserted elements, as shown in the figure:

When the compression list is empty and the insertion position is P0, the previous element does not exist, that is, the length of the previous element is 0.

When the insertion position is P1, the length of entryX element needs to be obtained. The previous_entry_length field of entryX+1 stores the length of entryX element, which is easy to obtain.

When the insertion position is P2, we need to obtain the length of entryN, which is the last element of the compressed list. To calculate the length of entryN, we need to add the length of its three fields. The function is implemented as follows:

unsigned int zipRawEntryLength(unsigned char *p) {
    unsigned int prevlensize, encoding, lensize, len;
    ZIP_DECODE_PREVLENSIZE(p, prevlensize);
    ZIP_DECODE_LENGTH(p + prevlensize, encoding, lensize, len);
    return prevlensize + lensize + len;
}
Copy the code

The encoding field identifies the data type and length of the current element. The encoding first attempts to parse the data as an integer. If the encoding succeeds, the data is encoded as a compressed list integer; if the encoding fails, the data is encoded as a compressed list byte array.

if (zipTryEncoding(s,slen,&value,&encoding)) {
    reqlen = zipIntSize(encoding);
} else {
    reqlen = slen;
}

reqlen += zipStorePrevEntryLength(NULL,prevlen);
reqlen += zipStoreEntryEncoding(NULL,encoding,slen);
Copy the code

2) Redistribute space

Is the space size the sum of the length of the compressed list before adding the element plus the length of the newly added element? Not exactly, as you can see here.

Before the entryX element is inserted, the length of the entryX element is 128 bytes, and the previous_entry_length field of the entryX+1 element is 1 byte.

Add entryNEW. The length of entryNEW is 1024 bytes. In this case, the previous_entry_length field of entryX+1 must occupy 5 bytes.

The length of the compressed list is not only increased by 1024 bytes, but also expanded by 4 bytes for the entryX+1 element.

It is easy to know that the entryX+1 element length may increase by 4 bytes, decrease by 4 bytes, or remain the same. Due to space redistribution, the position pointer P inserted by the new element will become invalid. Therefore, the offset of pointer P relative to the first address of the compressed list should be calculated in advance and then offset after space allocation.

size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)); int forcelarge = 0; nextdiff = (p[0] ! = ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0; if (nextdiff == -4 && reqlen < 4) { nextdiff = 0; forcelarge = 1; } // Store offset = p-zl; Zl = ziplistResize(zl,curlen+reqlen+nextdiff); P P = zl+offset;Copy the code

So what does nextDiff and Forcelarge do here?

Curlen represents the length of the compressed list before inserting elements, reqlen represents the length of the inserted elements, and Nextdiff represents the change of the length of entryX+1 elements. The values can be 0 (the length remains the same), 4 (the length increases by 4), and -4 (the length decreases by 4).

So let’s think about what happens when nextDiff is minus 4, and Reqlen is less than 4? Yes, insert element causes compression reduces the list of the required space, namely the function calls realloc ziplistResize bottom less than Pointers zl space redistribution. This can be problematic. We all know that when Realloc realallocates space, the address returned may not change. When realloC realloC realloC realloC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC realLOC It is therefore necessary to avoid the situation of reassigning NextDIff to 0 with the forcelarge marker.

And if you think about it a little bit more, if nextDiff is minus 4, is reqlen less than 4? The answer is yes, cascading updates can cause this to happen. Chained updates will be introduced later.

3) Data copy

After the space is reallocated, you need to move the element after position P to the specified position and insert the new element into position P. We assume that the length of entryX+1 element increases by 4 (i.e. Nextdiff equals 4), and the data copy schematic is shown in the figure below:

As can be seen from the figure, all elements after position P need to be moved, the offset of which is the length of entryNew inserted element, and the length of data block moved is the sum of the length of all elements after position P plus the value of NextDiff. The previous_ENTRY_LENGTH field of the entryX+1 element also needs to be updated after the data is moved.

memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff); If (forcelarge) //entryX+1 previous_entry_length is still 5 bytes; / / but entryNEW element length less than 4 bytes zipStorePrevEntryLengthLarge (p + reqlen reqlen); else zipStorePrevEntryLength(p+reqlen,reqlen); ZIPLIST_TAIL_OFFSET(zL) = intrev32IFbe (intrev32IFbe (ZIPLIST_TAIL_OFFSET(zL))+reqlen); zipEntry(p+reqlen, &tail); if (p[reqlen+tail.headersize+tail.len] ! = ZIP_END) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff); } // Update zllen ZIPLIST_INCR_LENGTH(zl,1);Copy the code

Think about why, after the first update of the trailing element offset, the element referred to might not be the trailing element. Obviously, when the entryX+1 element is a trailing element, the trailing element’s offset needs to be updated only once; However, when entryX+1 element does not know the tail element, and the length of entryX+1 element changes, the offset of the tail element also needs to add the value of Nextdiff.

Above references :(highly recommended)

Segmentfault.com/a/119000001…

Chain update

When a node is inserted or deleted into a Ziplist, the size of the previous node may be 1 or 5 bytes, and the size of the previous node is inconsistent, which may cause the sequential influence of subsequent nodes, resulting in multiple space reallocations, known as chain updates.

For example, the new inserted is exactly larger than 254 bytes, and the original entry is between 250 and 253 bytes:

So how does insertion come about? Want to see again:

If e1-EN is 250-253 bytes and big is larger than 254 bytes and small is smaller than 254 bytes, deleting small will cause cascading updates of nodes after E1.

In the worst case, N reallocations are required, so the worst complexity of each reallocation is O(N ^2). Although there is such a serious performance loss, the probability of occurrence in actual scenarios is very low, so the average complexity of ziplistPush and other commands is O(n).

The article links

(1) Skiplist, SDS, dictionary, Hyperloglog, # nuggets # juejin.cn/post/694944…

Redis source code, interview guide (3) object system, reference counting, ordered collection # nuggets article # juejin.cn/post/694981…