preface

  • Why does Redis perform so well? How does it differ from other caching middleware?
  • What data structures do redis use to support its high performance?
  • Why are internal rich data types implemented using at least two data structures underneath? What are they?
  • How can Redis be used to its greatest advantage?

All of these questions can be answered by studying the previous sections on data structures and objects in Redis Design and Implementation. You can also see how the Redis authors have painstakingly designed so many rich data structures in order to optimize memory. After learning these contents, in the process of using Redis, I will use it reasonably to adapt to its internal characteristics. Of course, the new version of Redis supports more and richer features, but the book is based on RedIS 3 and doesn’t cover those yet.

“Redis Design and Implementation” this book is very easy to understand, written by Huang Jianhong, a teacher, after 90. He is also the translator of Redis In Action

See Redis Design and Implementation 2- Database Implementation for another article

An overview of the

The characteristics of

  1. C language development, excellent performance, pure memory operation, can process more than 10W read/write (QPS) per second
  2. Multiple data structures, up to 1GB each (memcached only supports strings, up to 1M)
  3. Limited by the physical memory, it cannot read and write massive data. It is suitable for high performance operations and operations with small data volume
  4. Support transaction, persistence
  5. Single-threaded model (memcached is multi-threaded)

Supported data types

  1. Sring
  2. List
  3. Set
  4. SortedSet
  5. hash
  6. Bitmap
  7. Hyperloglogs
  8. Geo
  9. pub/sub

Why is Redis so fast

  1. Pure memory operation, no disk IO
  2. Single thread processing requests, no thread switching overhead, no race conditions, and no locking issues
  3. Multiplex model Epoll, non-blocking IO (multiplex: multiple network connections; Multiplexing allows a single thread to efficiently process multiple connection requests
  4. The data structure is simple and the manipulation of the data is simple. We also did our own data structure optimization

Why is Redis single threaded

  1. Single threading is already fast, reducing the network overhead of multi-threading, locking operations
  2. Multithreading is considered in subsequent 4.0 releases
  3. Single threading means that only one thread is used to process network requests. It is not redis-server that only one thread is working. When persistence is performed, it is forked by a child thread.
  4. Cons: Time-consuming commands can lead to a decrease in concurrency, such as keys *

Redis’ recycling strategy

  1. Volatile – lRU: Selects the least recently used data from the expired dataset server.db[I].expires
  2. Volatile – TTL: Selects stale data from the expired dataset server.db[I].expires to be discarded
  3. Volatile -random: server.db[I]. Expires selects any data to be discarded
  4. Allkeys-lru: Culls the least recently used data from the dataset (server.db[I].dict)
  5. Allkeys-random: Random selection of data from a dataset (server.db[I].dict)
  6. No-enviction: Data expulsion is prohibited

Use attention

  1. A single redis thread cannot give full play to the performance of multi-core CPU. It can be improved by opening multiple Redis instances on a single redis thread
  2. Redis implements distributed locks: use setnx (if none exists) to fight for locks, and expire sets the expiration time to prevent forgotten releases.
  3. Redis implements one-to-many message subscription: sub/pub data structures
  4. Redis implements delayed message queue: Zadd timestamp is used as score consumption to perform query operations according to the timestamp + delay time.

Major version introduction

New features in Redis5:

  • Zpopmax zpopmin and blocking variants: Returns the maximum and minimum amount of data in the set for a given score

New features in reidS4:

  • Module function, provide similar to the way of plug-in, their own development of a. So module, and install the author himself provided a neural network module. Seeing more module functionality on Redis-modules-Hub allows users to use Redis as an infrastructure and build more functionality on top of it, opening up countless new possibilities for Redis.
  • PSYNC: Addresses some of the less optimized aspects of copying in older versions of Redis
  • Optimization of the cache clearing policy Added Last Frequently Used Optimized the existing policy
  • FLUSHDB async: FLUSHDB async: unlink FLUSHDB async: unlink
  • Added SWAPDB: switch database
  • Hybrid RDB-AOF persistence format
  • Add MEMORY usage command: MEMORY

The data structure

  • Every key-value pair in Redis is made up of objects
  • The key is always a string object,
  • The value can be one of the following:
    • String object
    • A list of objects
    • The hash object
    • A collection of objects
    • Ordered binding object

Simple dynamic string SDS

The data structure

struct sdshdr {
    uint8_t len; /* used, number of bytes used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[]; /* C-style characters, including the terminator \0*/
};
Copy the code
  • Located in sdS.h file
  • SDS follows the convention of C strings ending in \0 and is stored in BUF (unlike the underlying implementation of Nginx, which does not save the last \0)
  • But len does not count the length of the last character
  • The advantage of preserving c-style BUF is that you can reuse some of the functions of the C function library

Allocate and release policies

Space preallocation

  • Optimize SDS string growth operations to reduce the number of memory reallocations required to perform continuous growth operations
  • When expanding SDS space, check whether the unused space is enough. If it is enough for direct use, if not, not only allocate enough space, but also pre-allocate some space
  • Preallocation strategy:
    • The modified SDS length (len value) is less than 1MB, and the space of the same len size is pre-allocated
    • After modification, SDS length (len value) >= 1MB, preallocate 1MB space

Inert space release

  • Optimize SDS character shortening operations
  • When SDS space is shortened, memory is not immediately reallocated to free space, but the number of bytes free is recorded
  • SDS provides apis that truly free up space when needed

Advantages over C strings

  • Time complexity of getting string length reduced from O(N) to O(1)
  • Avoid buffer overflows
  • Reduce the number of memory reallocations when modifying strings. Memory allocation involves complex algorithms and may require system calls, which can be time consuming.
  • Binary security: c’s terminator limits it to only storing text data, not binary data such as images and audio

The list

The data structure

Located in the adlist.h file

typedef struct listNode {
    struct listNode *prev; // The front node
    struct listNode *next; // back node
    void *value;/ / the node values
} listNode;

typedef struct list {
    listNode *head; // Table header node
    listNode *tail; // Table end node
    void *(*dup)(void *ptr); // Node value copy function
    void (*free) (void *ptr); // Node value release function
    int (*match)(void *ptr, void *key); // Node value comparison function
    unsigned long len; // Number of nodes
} list;
Copy the code

The characteristics of

  • Double-ended queue, which can obtain the front and rear nodes of a node, complexity O(1)
  • acyclic
  • Obtain table header and table tail complexity O(1)
  • With length, obtain the length complexity of the linked list is O(1)
  • Polymorphic: Use void* to save node values. Different types of values can be saved

The dictionary

The data structure

Located in dict.h file

Hash table

// typedef struct dictht {dictEntry **table; // An array in which each element is a pointer to dictEntry unsigned long size; // Table array size unsigned long sizemask; Size-1 unsigned long used; // the number of nodes (key-value pairs) in the hash table} dictht;Copy the code

Hash node

Typedef struct dictEntry {void *key; typedef struct dictEntry {void *key; // The key of the key-value pair // the value of the key-value pair, can be a pointer, integer, floating point union {void *val; uint64_t u64; int64_t s64; double d; } v; struct dictEntry *next; // hash table node pointer, used to resolve key conflicts} dictEntry;Copy the code

A dictionary type

Each dictionary type holds a set of functions that operate on a particular type of key-value pair

// Uint64_t (*hashFunction)(const void *key); Void *(*keyDup)(void * privData, const void *key); Void *(*valDup)(void * privData, const void *obj); Int (*keyCompare)(void * privData, const void *key1, const void *key2); Void (*keyDestructor)(void * privData, void *key); Void (*valDestructor)(void * privData, void *obj); } dictType;Copy the code

The dictionary

// typedef struct dict {dictType *type; Void * privData; // Dictht ht[2]; // ht[0] is used to store data. Ht [1] is used to store datarehashLong rehashidx; /* rehashing notin progress ifRehashidx == -1, for nowrehash*/ unsigned long iterators; /* number of iterators currently running */ } dict;Copy the code

The hash algorithm

  • Redis uses the MurmurHash2 algorithm to compute the hash value of the key
  • Hash value and sizemask or, get hash index
  • Hash conflicts (where two or more number keys are assigned to the same index in the hash table array) : chained addresses resolve conflicts

rehash

  • Expand or shrink the hash table to keep the load factor of the hash table within a reasonable range
  • Load factor = number of nodes saved (used)/hash table size (size)

The rehash steps include

  • Allocates space for the DICTIONARY’s HT [1] hash table, depending on the operation to be performed and the number of key-value pairs ht[0] currently contains
    • Extended operation: ht[1] size is the first one greater than or equal to HT [0]. Used times 2 to the n of 2
    • Shrink operation: ht[1] size is the NTH power of the first 2 greater than or equal to HT [0]. Used
  • Rehash all key-value pairs saved in HT [0] onto HT [1] : recalculate the key hash and index values
  • When all key pairs of HT [0] are migrated to HT [1], release HT [0], set HT [1] to HT [0], and create a new terror hash for HT [1].

Conditions for automatic extension

  • The server did not run the BGSave command or GBRewriteAOF command, and the hash table load factor was >= 1
  • The server is running the BGSave or GBRewriteAOF command and the hash table load factor is >= 5
  • When running the BGSave or GBRewriteAOF command, the server needs to create a child process of the current server process, which consumes memory, improves the load factor, avoids writing data, and saves memory

Conditions for automatic contraction

  • When the load factor of the hash table is less than 0.1, it shrinks automatically

Progressive rehash

  • Ht [0] data reindexing to HT [1] is not done centrally at once, but increments many times (to avoid performance problems if the hash table is too large)

Step-by-step Rehash steps

  • Allocate space for HT [1] to automatically hold two hash tables simultaneously
  • Set rehashidx to 0 in the dictionary to start rehash (default -1)
  • During rehash, all key/value pairs of the HT [0] hash table on the rehashidx index are rehash to HT [1] every time an operation is performed on the dictionary.
  • When all rehash is complete, rehashidx is set to -1

Pay attention to the point

  • All rehash operations are performed in two hash tables
  • All the newly added values are put into HT [1] to ensure that the data will only decrease but not increase

Jump table

  • A skip list is an ordered data structure that maintains multiple Pointers to other nodes at each node for fast access to nodes
  • Time complexity: worst O(N), average O(logN)
  • For the most part, the efficiency is comparable to that of the balanced tree, but simpler to implement
  • One of the underlying implementations of ordered collections

The data structure

Located in the server.h file

// Skip list node
typedef struct zskiplistNode {
    sds ele; // Member objects
    double score; // The score is sorted from smallest to largest
    struct zskiplistNode *backward; // use the back pointer from the end of the table to the beginning of the table
    struct zskiplistLevel {
        struct zskiplistNode *forward; // Forward pointer
        unsigned long span; // span, record the distance between two nodes
    } level[]; // layer, is an array
} zskiplistNode;

// Skip list information
typedef struct zskiplist {
    struct zskiplistNode *header, *tail; // Table header and table tail
    unsigned long length; // Skip list length (including the number of nodes)
    int level; // The maximum number of layers in the skip table (excluding the number of header nodes)
} zskiplist;
Copy the code

  • The size of the level array is randomly generated each time a new skip list is created. The size ranges from 1 to 32
  • The traversal operation uses only the forward pointer, the span is used to calculate the rank, and the total span of all layers accessed along the way is the rank of the node
  • Multiple nodes can contain the same branch, but each node member object is unique

The integer set

  • Intset is one of the underlying implementations of the collection key
  • A collection of integers is used as the underlying implementation when a collection contains only integer elements and there are few of them

The data structure

Located in intset.h file

typedef struct intset {
    uint32_t encoding; // Encoding mode
    uint32_t length; / / the length
    int8_t contents[]; The array content type depends on the Encoding property, not int8_t. Sorted by size, no duplication
} intset;
Copy the code

upgrade

  • When we add a new element to the set of integers, and the new element is of a longer type than all the existing elements in the set of integers, the set is upgraded before we can add new data
  • The upgrade consists of three steps:
    • Expand size and allocate space based on type
    • Convert all the underlying array data to the new type and place it correctly
    • New elements are added to the underlying array
  • Adding elements can lead to upgrades, so adding new elements to the world is O(N)
  • Downgrade is not supported, and the new data type is kept after the upgrade

Benefits of upgrading

  • Improve flexibility
  • To save memory

List of compression

  • Ziplist is one of the underlying implementations of list and hash keys
  • Redis is a sequential data structure developed to save memory
  • When a list key contains only a small number of list items, and each list item is either a small integer or a short string, ziplist is used as the underlying implementation of the list key
  • Compressed list traversal, traversing backwards from table bits to table heads
  • Ziplist has no special struct to represent it

Compressing list composition

attribute type The length of the use
zlbytes uint32_t 4 bytes The number of bytes of memory occupied by the entire compressed list
zltail uint32_t 4 bytes The number of bytes from the start of the compressed list to get the end of the table without traversing
zllen uint16_t 2 – If the number of nodes is smaller than 65535, the actual value is displayed. If the number of nodes exceeds 65535, the value can be calculated by traversal
entryN List node indefinite Contains each node
zlend uint8_t 1 byte Special value 0xFF, terminal flag

Compressing list node composition

  • Previos_entry_length: The length of the previous node, used for backtracking from the end of the table to the end of the table
    • If the preceding node length is less than 254 bytes, preivos_entry_length is 1 byte
    • If the front node length is less than 254 bytes, preivos_entry_length is represented by 5 bytes, the first byte is 0xFE (254), and the last four bytes are the actual length
  • Encoding: Records the type and length of the content. The encoding is divided into two parts: the high digit and the remaining digit. The highest digit can be:
    Highest two digit value Represents a data type Encoding bytes The remaining number of bits Maximum range
    00 A character array A byte 6bit A 63 – bit
    01 A character array Two bytes 14bit 2 ^ 14-1
    10 A character array Five bytes 4*8, leaving the remaining 6 bits of the first byte empty 2 ^ 32-1 a
    11 The integer 1 byte 000000 Int16_t Integer
    11 The integer 1 byte 010000 Int32_t An integer
    11 The integer 1 byte 100000 Int64_t Integer
    11 The integer 1 byte 110000 24 – bit signed integer
    11 The integer 1 byte 111110 An 8-bit signed integer
    11 The integer 1 byte xxxxxx Without content, XXXX itself represents an integer from 0 to 12
  • Content: Saves the value of the node

Chain update

  • A condition in which contiguous memory allocation is caused by expansion of multiple nodes of approximately 254 in size. But in the case of time, that’s less the case.

object

An overview of the

  • Rather than implement a database of key-value pairs using the previous data structures directly, Redis creates a system of objects based on the data structures, each of which uses at least one of the previous data structures
  • Each object is represented by a redisObject structure
//server.h
typedef struct redisObject {
   unsigned type: 4; // Type unsigned encoding:4; Unsigned lRU :LRU_BITS; unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or * LFU data (least significant 8 bits frequency * and most significant 16 bits access time). */ int refcount; // Reference count void * PTR; // pointer to the underlying data structure} robj;Copy the code

The benefits of using objects

  • Before executing a command, determine whether an object can execute a given command based on its object type
  • Wie objects are implemented with different data structures for different manufacturers to optimize efficiency
  • Realized based on reference counting memory reclamation mechanism, no longer used objects, memory will be automatically released
  • Reference counting implements object sharing, where multiple databases share the same object to save memory
  • Object with time accumulation information for calculating idling time

Objects in Redis

  • String object
  • A list of objects
  • The hash object
  • A collection of objects
  • Ordered binding object

Object type and encoding

Type of object

object Object Type attribute Type Output of the type command
String object REDIS_STRING string
A list of objects REDIS_LIST list
The hash object REDIS_HASH hash
A collection of objects REDIS_SET set
Ordered set object REDIS_ZSET zset

Object encoding

  • The encoding determines the data type to which the PTR points, indicating what data type to use as the underlying implementation
  • At least two different encodings are used for each type of object
  • Through coding, Redis can set different codes according to different scenarios, greatly improving flexibility and efficiency
Code constants Corresponding data structure OBJECT ENCODING Command output
REDIS_ENCODING_INT An integer of type long “Int”
REDIS_ENCODING_EMBSTR A simple dynamic string encoded by embstr “Embstr”
REDIS_ENCODING_RAW Simple dynamic string “Raw”
REDIS_ENCODING_HT The dictionary “Hashtable”
REDIS_ENCODING_LINKEDLIST Double side chain table “Linkedlist”
REDIS_ENCODING_ZIPLIST List of compression “Ziplist”
REDIS_ENCODING_INTSET The integer set “Intset”
REDIS_ENCODING_SKIPLIST Skip lists and dictionaries “Skiplist”

String object

  • The encoding of a string object can be
    • int
    • raw
    • embstr
  • Floating-point numbers are also stored as string objects in Redis, and when it comes to computation, float numbers are returned first.
String object contents The length of the The encoding type
An integer value int
A string value Less than 32 bytes embstr
A string value Greater than 32 bytes raw

Embstr encoding is an optimized encoding for short strings. This encoding, like raw encoding, uses the redisObject and SDSHDR structures to represent objects. The difference is:

  • The RAW encoding calls the memory allocation function twice to create the redisObject and SDRHDR structures, respectively
  • Embstr calls the memory allocation function once to create a contiguous space containing redisObject and SDRHDR

Code conversion

Int – and embSTR – encoded objects are automatically converted to raw – encoded string objects

  • Int encodes an object that is converted to a RAW object when the command is executed and the object is no longer an integer
  • Embstr encoding has no corresponding execution function and is read-only encoding. When modification is involved, the raw object is converted

String command

All keys in Redis are string objects, so all commands for keys are built for string keys

  • set
  • get
  • append
  • incrbyfloat
  • incrby
  • decrby
  • strlen
  • strrange
  • getrange

A list of objects

  • The encoding of a list object can be
    • ziplist
    • linkedlist

Code conversion

The two criteria for ziplist encoding are as follows, and any that do not meet are linkedList encoding (these two criteria can be modified in the configuration file) :

  • All saved string elements are less than 64 bytes long
  • The number of elements is less than 512

A list of commands

  • lpush
  • rpush
  • lpop
  • rpop
  • lindex
  • llen
  • linsert
  • lrem
  • ltrim
  • lset

The hash object

The encoding of a hash object can be

  • ziplist
  • hashtable

Code conversion

  • To use Ziplist, you need to meet two criteria, or use HashTable if you don’t (both criteria can be modified in the configuration file)
    • All key-value pairs have key and value strings of less than 64 bytes
    • The number of key-value pairs is less than 512

Hash commands

  • hset
  • hget
  • hexists
  • hdel
  • hlen
  • hgetall

A collection of objects

The encoding of a collection object can be:

  • Intset: All elements are stored in a set of integers
  • Hashtale: The dictionary value is null

Code conversion

Two conditions need to be met for collection to use intSet. If not, use hashTable (parameters can be modified through configuration file)

  • All saved elements are integer values
  • The number of elements does not exceed 512

The set command

  • sadd
  • scard
  • sismember
  • smembers
  • srandmember
  • spop
  • srem

Ordered binding object

The code for ordered sets could be

  • Ziplist: Each element is represented by two adjacent nodes, the first representing the member and the second representing the score. Small points near the head of the table, large points near the end of the table
  • Skiplist: Using zset as the underlying implementation, the Zset structure contains both a dictionary and a skip table for finding scores and scores by key, or for range queries, respectively
Typedef struct zset {zskplist * ZSL; Zrank, zrange dict *dict; Zscore}zset;Copy the code

Code conversion

Use ziplist encoding when the following two conditions are met, otherwise use Skiplist (which can be modified through the configuration file)

  • Less than 128 elements are saved
  • The member length is less than 64 bytes

Ordered set command

  • zadd
  • zcard
  • zcount
  • zrange
  • zrevrange
  • zrem
  • zscore

Type checking and command polymorphism

Redis commands fall into two broad categories:

  • Can be executed on any type of key, such as
    • del
    • expire
    • rename
    • type
    • object
  • Only specific types of keys can be executed, such as the previous commands for various objects. Is implemented through the Type property of redisObject

Memory recovery

Redis records the object reference count information through the refCount property of the object and automatically frees the object for memory reclamation when appropriate

Objects share

  • Objects that contain the same value, and the value of the key points to the same object to save memory.
  • When redis initializes, it creates 10,000 string objects, containing all integer values from 0 to 9999. When these values are needed, the server shares them instead of creating new objects
  • The quantity can be changed through the configuration file
  • Object sharing does not currently contain strings, because comparing strings to be identical can itself cause performance problems

Idle duration of the object

  • Idle duration = present time -redisObject.lru, lru records the time when an object was last accessed
  • When Redis is configured with maxMemory, the reclamation algorithm determines that when the memory exceeds this value, the longer memory will be released first to reclaim the memory

The reference command

# set string
set msg "hello world"
rpush numbers 1 2 3 4 5
llen numbers
lrange numbers 0 5
Get the underlying data structure used by the key values
object encoding numbers
Check the reference count value of the object
object refcount numbers
Lru: value=now-object.lru
object idletime numbers
Copy the code

reference

  • Redis Design and Implementation