Redis series (3) a compressed list of underlying data structures

preface
define
- Definition of a compressed list
- Compressed list node definition
The new node
Problem: Cascading updates
conclusion
Refer to the article
To contact me

preface

Redis is already familiar to everyone. It is also used in daily work and frequently involved in interviews. So how deep do we know about it?

I read several books related to Redis and tried to understand its concrete implementation, and recorded some underlying data structures and implementation principles.

This article will introduce the implementation method of ziplist(compressed list) in Redis. It is one of the underlying implementations of the list and hash keys in Redis. It is used by list keys and hash keys when certain conditions (covered in a future article) are met.

As you can see, ziplist is the encoding method used inside the zset whose key is zsetkey.

define

List data structures, we already have linked lists, why do we need to make a compressed list again? To save memory.

The front and back Pointers to the linked list are very memory intensive structures, so this part of the space is especially wasteful when the data volume is small.

A compressed list is a sequential data structure consisting of a series of specially encoded contiguous memory blocks.

The idea is to simulate the structure of a list in a contiguous chunk of memory.

Definition of a compressed list

A compressed list is defined as:

struct ziplist<T>{
    // The number of bytes for the entire compressed list
    int32 zlbytes;
    // The offset from the last node to the start of the compressed list can be used to quickly locate the last element in the compressed list
    int32 zltail_offset;
    // Compress the number of elements in the list
    int16 zllength;
    // List of element contents, stored in arrays, next to each other in memory
    T[] entries;
    // The end flag bit of the compressed list, always 0xFF.
    int8 zlend;
}
Copy the code

The meaning of each field is commented out in the code. The zltail_offset attribute is needed to explain why the compressed list can only be traversed sequentially, so to improve efficiency, we need to be able to traverse both ends of the compressed list. With this attribute, we can quickly find the tail of the compressed list. For how to reverse traverse, read on.

Compressed list node definition

Each node in the compressed list is defined as:

struct entry{
    // The length of the previous entry
    int<var> prevlous_entry_length;
    // Encoding mode
    int<vat> encoding;
    / / content
    optional bute[] content;
}
Copy the code

prevlous_entry_length

The prevlous_entry_length property, by definition, is logged for reverse traversal. Think about it: first get the offset of the tail node, find the most tail node, then call the prevlous_entry_length property to get the previous node, and keep going forward.

Note here that the length of this field is not fixed; it can be 1 byte or 5 bytes.

This property is recorded in one byte if the current entry length is less than 254 bytes. Otherwise it will be recorded in 5 bytes.

This leads to a problem, see article below.

encoding

This property records the type and length of data held by the node’s Content property.

content

This property is used to actually hold the value of the node, which can be a byte array or an integer. Its type and length are determined by Encoding.

The new node

As mentioned in the introduction, in some cases the following table keys will use a compressed list, that is, if the contents of the list key are small, why can’t the compressed list be used for the large list key?

Ziplist is continuous storage data structure, there is no redundant memory (in previous articles about SDS have redundant space), that is to say, every time a new node, require memory application, then if the current memory block enough in a row, then add a new node, if application is another piece of memory space in a row, Then you need to copy everything to the new address.

That is, each time a new node is added, memory is allocated and possibly copied. When too many values are stored in ziplist, memory copying can be a big drain.

For this reason, Redis only uses Ziplist in small data scenarios.

Problem: Cascading updates

When talking about prevlous_entry_length, we mentioned that its length changes can cause a problem with cascading updates.

This property is recorded in one byte if the current entry length is less than 254 bytes. Otherwise it will be recorded in 5 bytes.

So let’s imagine an extreme scenario where, inside the Ziplist, all the nodes are 253 bytes long, which means that the prevlous_entry_length attribute of all the nodes is one byte.

The prevlous_entry_length attribute of the first node changes from 1 to 5 bytes. The total length of the node increases to 257 bytes, which is greater than 254 bytes. The prevlous_entry_LENGTH property of the next node (the original second node) also becomes 5 bytes, which in turn causes the next node to change. . The prevlous_entrY_LENGTH value on all nodes needs to be updated.

The time complexity of cascading updates is poor. Space reallocation needs to be performed at most N times, and each space reallocation requires O(N) at least. Therefore, the time complexity of cascading updates is O(N2) at worst.

Similar to adding nodes, deleting nodes may cause cascading updates.

But don’t be afraid, because the probability of cascading updates causing Redis performance stress is extremely low.

First, cascading updates require continuous node sizes of 250-253, which is rare, and large-scale sequential ones are even rarer. If you’re unlucky enough to have a cascade of three or five updates, it’s not going to weigh on Redis performance.

conclusion

Ziplist is a data structure developed by Redis to store lists in contiguity of memory space. It has the advantage of not having the memory footprint of the front and back Pointers of the linked list, but performance is under pressure when data volumes are large. Therefore, it applies only to scenarios with small data volumes.

Ziplist is one of the underlying implementation data structures for the List and Hash keys.

One problem with Ziplist is that adding or removing nodes has a very small probability of triggering cascading updates and causing performance differences. But it’s extremely unlikely. Don’t worry about it.

Refer to the article

Design and Implementation of Redis (2nd Edition)

Redis Deep Adventures: Core Principles and Practical Applications

To the end.

To contact me

Finally, welcome to follow my personal public account [Huyanshi], I will update many backend engineers’ study notes from time to time. Also welcome direct public number private letter or mailbox to contact me, must know all without reserve, endless.

All the above are personal thoughts, if there is any mistake welcome to comment.

Welcome to reprint, please sign and keep the original link.

Contact email: [email protected]

For more study notes, see personal blog or follow wechat official account < huyan10 >——>HuYan ten