Redis data structure

SDS

Instead of using C language strings to represent the String, Redis builds its own structure named Simple Dynamic String (SDS).

Definition of SDS

typedef struct sdshdr {
    // The length of bytes used in the buf array
    int len;
    
    // No byte length is used in the buf array
    int free;
    
    // Holds a byte array of strings
    char buf[];
} sdshdr;
Copy the code

Benefits of using SDS over C strings

A C string is actually an array of characters of length N +1, where n is the length of a character and the last digit is always a null character ‘\0’, used to mark the end of a string.

  1. Gets the string length at constant complexity

Because the C string is an array, obtaining the length of the string requires a complete traversal of O(N) time.

SDS maintains len attribute, which can directly obtain the length and time complexity O(1). This ensures that getting string length does not become a performance bottleneck for Redis.

  1. Prevent buffer overflow

For C strings, the API is not secure and may cause buffer overflows. For example, the STRING concatenation function strcat in C assumes that the user has already allocated enough memory to execute the function.

Assume that s1 and S2 are adjacent to each other in memory, and the word is concatenated after S1. If s1 is not allocated for memory in advance, its concatenated data will overflow into the space where S2 resides.

SDS maintains the length and automatically expands when the length is insufficient

  1. Reduces memory allocation overhead when modifying string lengths

C If the string length changes, the memory must be re-applied for or released.

SDS maintains the free attribute to avoid frequent memory requests.

Capacity expansion mechanism:

  • If len is less than 1MB and the length of free is the same as the length of len, then the buF array of SDS is len+free+1B(‘\0’) bytes. Note that the BUF array is also null-terminated to take advantage of C’s built-in string functions.
  • If len is greater than 1MB, the length of free is 1MB, and the buF array length is len+1MB+1B

Lazy space free: If the string length shrinks, the free property increases and len is known to decrease. Memory is not immediately freed for future string growth. SDS apis can be used to free this memory when needed.

  1. Binary security

File data cannot be saved because the C string ends with ‘\0’.

SDS uses len to determine the end of the string.

Two-way linked list

One of the underlying implementations of List in Redis is a bidirectional linked List.

Bidirectional linked list definition

typedef struct listNode {
    // The front node
    struct listNode *prev;
    
    // back node
    struct listNode *next;
    
    // Node value
    void *value;
} listNode;

typedef struct list {
    // Table header node
    listNode *head;
    
    // Table end node
    listNode *tail;
    
    // The number of linked list nodes
    unsigned long len;
    
    // There are also node copy, release, comparison functions, omitted.
}
Copy the code

Bidirectional lists are relatively simple and will not be described here.

Hash table

One of the underlying implementations of Hash and Set is a Hash table.

Hash table definition

typedef struct dictht {
    // Hash table array
    dictEntry **table;
    
    // Hash table size
    unsigned long size;
    
    // Hash table size mask, used to calculate index values
    // always equal to size-1
    unsigned long used;
} dictht;

typedef struct dict {
    // Type specific functions
    // Holds a set of functions for manipulating specific types of key-value pairs
    dictType *type;
    
    // Optional arguments to be saved to specific functions of the type attribute
    void *privdata;
    
    / / a hash table
    dictht ht[2];
    
    // Rehash index. When rehash is not in progress, the value is -1
    int trehashidx;
} dict;
Copy the code

rehash

Dict stores hash tables in ht, a hash table array of length 2. Ht [0] is used to save data, and HT [1] is used to expand replication during rehash. The rehash mechanism is as follows:

  1. Capacity expansion: Allocate space to the hash table HT [1] as the first 2^n greater than or equal to h[0]. Used (in nodes)*2. Reduction: h[1] size is the first 2^n that is greater than or equal to h[0]. Used
  2. Migrating key rehash mappings from HT [0] to HT [1]
  3. After the migration, release HT [0], set H [1] to H [0], and recreate an empty hash table in HT [1] to prepare for the next rehash.

The timing of the rehash

Load factor = to save the number of nodes/hash table size

load_factor = ht[0].used / ht[0].size

  1. The server does not run the BGSAVE or BGREWRITEAOF command, and the load factor is greater than or equal to 1.
  2. The server is running the BGSAVE or BGREWRITEAOF command and the load factor is greater than or equal to 5.

BGSAVE: the Redis BGSAVE command is used to asynchronously save data of the current database to disks in the background. Immediately after the BGSAVE command is executed, OK is returned, and a new child process is forked out. The original Redis process (parent process) continues to process client requests, while the child process saves data to disk and exits.

BGREWRITEAOF: the Redis BGREWRITEAOF command is used to asynchronously perform an AppendOnly File (AOF) File rewrite operation. Overrides create a volume-optimized version of the current AOF file. AOF overrides are triggered by Redis itself and BGREWRITEAOF is only used to trigger overrides manually.

Most operating systems use the copy-on-write technology to optimize the utilization efficiency of subprocesses. Therefore, increase the load factor required to trigger capacity expansion during the existence of subprocesses to avoid capacity expansion during the existence of subprocesses.

In addition, shrink the hash table automatically starts when the load factor of the hash table is less than 0.1.

Progressive rehash

If the hash table has a lot of data, migrating all key-value pairs to HT [1] at once is bound to affect performance. So during the rehash process, each CRUD operation of the hash table completes part of the migration, so that the rehash migration work is evenly spread over each CRUD operation. During rehash, if no command is executed, parts of the rehash task will be executed in scheduled mode. Each execution will not exceed 1ms to avoid performance impact.

skipList

The structure of the body

Redis uses skip tables in only two places, one for SortedSet, an ordered collection, and the other for internal data structures in cluster nodes.

Average time complexity O(logN), worst complexity O(N)

The skip table is mainly composed of zskiplist and zskiplistNode

typedef struct zskiplist{
    // Table header and end node
    struct zskiplistNode *head, *tail;
    
    // The number of nodes in the table
    unsigned long length;
    
    // The maximum number of layers in a node in the table
    int level;
} zskiplist;


typedef struct zskiplistNode {
    // Back up the pointer
    struct zskiplistNode *backword;
    
    / / score
    double score;
    
    // Member object of type redisObject
    robj *obj;
    
    / / layer
    struct zskiplistLevel {
        // Forward pointer
        struct zskiplistNode *forward;
        
        / / span
        unsigned int span;
    } level[];
} zskiplistNode;
Copy the code

The following figure shows the implementation of a hop table, describing zSkiplist, zskiplistNode, and the relationships between layersThe above has a general understanding of the structure, and the following is a detailed explanation of the core concepts in the structure

layer

In the above structure, level[] is the entry point of jumping between nodes in the hop table. Each node can have multiple layers to point to other nodes.

Each time a hoptable node is created, the program randomly generates a value between 1 and 32 as the size of the Level array, according to the power law (larger numbers are less likely to occur).

The first node does not store element values and is an empty node with a layer size of 32.

Each layer points to the same layer as the next. For example, L1 points to the next node with L1, and L2 points to the next node with L2. Because layers are randomly generated based on the power law, that is, the higher the level, the more difficult it is to generate, thus forming a structure that is more continuous at the lower level and jumps over at the higher level.

Level [I].forword and span level[I].span. A forward pointer is a pointer to the next node. The lower span is detailed below

span

The span of a layer (level[I].span) is the distance between two nodes.

The sum of the spans to a node is the ranking. At the same time because of the span of storage, according to the rank query can be the opportunity to jump to the current layer and span jump query without traversing. It can be said that based on the mechanism of forward node and span, the hop table realizes the fast query of rank.

Back Pointers, fractions, and members

  1. A backward pointer property that points to a previous node for the current node to look forward
  2. Score attribute, is a floating point number of type double, skip table node order according to the score value from the smallest to the largest
  3. The member obj property, member, points to a string object whose content is an SDS value. The obj structure type is redisObject, and the redis structure is used to store objects.

ziplist

Compressed lists are one of the underlying implementations of List and Hash. When a list has only a small amount of data and each element is a small integer value or a short string, Redis uses compressed lists as the underlying implementation of a list or hash.

Compressed list is a sequential data structure developed to save memory and efficiency. Because the storage content is contiguous, data is faster to load from memory to cache.

To sum up some other things…

Cache avalanche, breakdown, penetration

Cache avalanche

A large number of cache data expires at the same time or the cache instance is faulty. As a result, a large number of requests cannot be processed. As a result, a large number of requests are directly sent to the database, resulting in a surge in database pressure.

Solutions:

  1. Prevention in advance. When designing a system, consider the cache avalanche you may encounter to avoid a large number of caches expiring at the same time. Ensure that redis instances are highly available.
  2. Service monitoring and degradation. When a service or database CPU alarm is generated, use traffic limiting components (such as sentinal) to limit traffic.

Cache breakdown

Similar to a cache avalanche, the expiration of a hotspot data that is accessed very frequently causes all requests to access that hotspot data to be routed to the database.

Solution: Also prevent in advance at design time. This kind of data root business scenario, it is best not to add the expiration time.

The cache to penetrate

The data doesn’t exist in the cache or in the database, and requests are still sent to the database. Like malicious attacks.

Solution: Use a Bloom filter. For example, Redis layer has Redisson, Java layer has Guava, Hutool and other Bloom filter implementation.

Updated…