The underlying implementation of Redis data structures

Redis in each common data structure corresponding to the underlying implementation

A concrete analysis

RedisDB structure

There is the concept of a “database” in Redis, which is defined by redisDb in redis.h.

When the Redis server is initialized, 16 databases are pre-allocated

All databases are saved in the redisServer.db array, a member of the structure redisServer

There is a pointer in redisClient called DB that points to the database currently in use

The structure of source code

typedef struct redisDb { 
  int id; // Redis has 16 databases by default.
  long avg_ttl; // Average TTL (time to live) of stored database objects, used for statistics
  dict *dict; // Store all key-values of the database
  dict *expires; // Store the expiration time of the key
  dict *blocking_keys;// BLPOP stores blocking keys and client objects
  dict *ready_keys;// Blocking push response Blocking client stores the key and client object for push after blocking
  dict *watched_keys;// Store the key and client objects monitored by watch
  } redisDb;
Copy the code

RedisObject structure

Redis object header, all Redis objects have the following structure

The structure of source code

typedef struct redisObject { 
   unsigned type:4;// Type Object type
   unsigned encoding:4;/ / code
   void *ptr;// A pointer to the underlying implementation data structure
   / /...
   int refcount;// Reference count
   / /...
   unsigned lru:LRU_BITS; //LRU_BITS is a 24bit record of the last access time by a command program
   / /...
   }robj;
Copy the code

Attribute description in redisObject

Four type

The type field indicates the type of the object.

REDIS_STRING, REDIS_LIST, REDIS_HASH, REDIS_SET, REDIS_ZSET, ordered set.

When we execute the type command, we get the type of the object by reading the type field of the RedisObject

127.0. 01.:6379> set name:1 22 
OK 
127.0. 01.:6379> type name:1 
string 
127.0. 01.:6379> lpush list:1 1 2 3 4 5 3 
(integer) 6 
127.0. 01.:6379> type list:1 
list
Copy the code

Four encoding

Encoding Specifies the 4-bit internal encoding of the object

Each object has a different implementation encoding

Redis can set different encoding for objects according to different usage scenarios, greatly improving Redis flexibility and efficiency. Using the object encoding command, you can view the object encoding mode

127.0. 01.:6379> set name:2 9999999999999999999999999999999999999999999999999999999999999999999999999999999999 
OK 
127.0. 01.:6379> object encoding name:2 
"raw" 
127.0. 01.:6379> object encoding name:1 
"int"
Copy the code

24位LRU

Lru records the last time an object was accessed by a command program (version 4.0, 24 bits; version 2.6, 22 bits).

High 16 bits store a minute-level timestamp, low 8 bits store access count (LFU: number of recent accesses)

refcount

Refcount counts the number of times the object is referenced. The type is an integer.

Refcount is used for reference counting and memory reclamation of objects.

An object is called a shared object when its refcount is greater than 1

Redis to save memory, new programs do not create new objects when some objects appear repeatedly, but still use the old objects.

ptr

A PTR pointer points to specific data, such as set Hello World, and PTR points to an SDS containing the string world.

string

A string in Redis is a modifiable string that exists in memory as a byte array.

The Redis String is called “SDS”, which stands for Simple Dynamic String. Its structure is a byte array with length information.

Source code in the STRUCTURE of SDS

struct sdshdr{ 
// Records the number of bytes used in the BUF array
int len;
// Records the number of unused bytes in the BUF array
int free; 
// An array of characters to hold strings
char buf[];
}
Copy the code

What are the benefits of using this SDS?

1. It’s faster to change

The length of a string array in C is immutable. If the string is used in the Redis C string, rather than the SDS, when we change a key of the string, if the new string length is less than or equal to the length of the string, the original so we just replace the content on the character array, and then advance the representative at the end of the string (if the old and the new string length is equal, The empty string remains where it was. But if the length of the new string is larger than the length of the old string, then unfortunately we have to apply a new array that can hold the new string, which is not good for Redis.

SDS is faster because it applies to arrays that are longer than strings.

2. Getting the string length is faster

In C language, it is necessary to call strlen(byte) to get the string length, traversing one by one until “\0” is encountered, and the string is considered to be over, so the time complexity is O(n), while SDS stores the string length, so the time complexity is O(1).

C cannot store binary data in strings, whereas SDS can

SDS expansion policy before the length of the character string is less than 1MB, the expansion space is doubled, that is, 100% redundant space is reserved. When the string length exceeds 1MB, only 1MB more redundant space is allocated for each expansion.

String objects are stored in three encoding formats

The values are int, EMstr, and RAW

127.0. 01.:6379> set name:1 1
OK
127.0. 01.:6379> type name:1
string
127.0. 01.:6379> object encoding name:1
"int"
127.0. 01.:6379> set name:2 11111111111111111111111111111111111
OK
127.0. 01.:6379> type name:2
string
127.0. 01.:6379> object encoding name:2
"embstr"
127.0. 01.:6379> set name:3 asdasdasdasdasdasdsadasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdas
OK
127.0. 01.:6379> type name:3
string
127.0. 01.:6379> object encoding name:3
"raw"
Copy the code

Int Indicates an integer of the int type

Simple dynamic string encoded by embstr Small string length less than 44 bytes

Raw Simple dynamic string The character string contains more than 44 bytes

Dict (dictionary)

Dictionary is the most frequent compound data structure in Redis server. In addition to dict for hash structure data, all keys and values in the Redis database also constitute a global dictionary, and key sets with expiration time are also a dictionary. The mapping between value and score values stored in zset is also realized by dictionary structure.

The internal structure of a dictionary

typedef struct dict { 
  dictType *type; // The dictionary corresponds to the specific operation function
  void *privdata; // Optional arguments corresponding to the above type function
  dictht ht[2]; /* Two hash tables, store key-value pair data, ht[0] is the native HT [1] is the rehash table */ 
  long rehashidx; /*rehash indicates that when the value is -1, no rehash is being performed; otherwise, rehash is being performed. The stored value indicates the indexed hash table (array subscript)*/ to which ht[0] rehash is performed 
  int iterators; // The number of iterators currently running
  } dict;
Copy the code

Taking a look at the structure of hashTable, the structure of HashTable is almost identical to the pre-JDK1.8 HashMap. It’s an array plus a linked list.

struct dictEntry { 
  void* key ; 
  void* val ; 
  dictEntry* next;// Link to the next entry
  } 
  
struct dictht { 
  dictEntry** table; // Hash table array
  long size ; // Hash table array size
  long used ; // The number of existing nodes in the hash table
  // ... 
  }
Copy the code

A stored procedure diagram for a dictionary

There are two hashtables inside the dictionary. Why two?

Normally, only one Hashtable has a value, but in dictionary expansion, new Hashtable needs to be allocated, and then the gradual relocation. The two hashtables store the old hashtable and the new hashtable. After the relocation is complete, the old hashtable is deleted and the new hashtable is replaced.

Dictionary expansion

Assume that the two Hashtables are H [0] and H [1] respectively.

Apply for new memory and assign the address of the new memory to H [1]
Setting rehashidx to 0 indicates a rehash operation
The newly added data is stored in hash[1]
Search for elements in HT [0] and then search for elements in HT [1]
Rehash all the data from the old hash table H [0] into the new hash table H [1] after recalculating the index values

Ziplist (Compressed list)

In order to save memory space, Redis uses ziplist to store zset and Hash objects when the number of elements is small. A compressed list is a contiguous memory space where elements are stored next to each other without any redundancy.

struct ziplist<T> { 
  int32 zlbytes ; // The number of bytes for the entire compressed list
  int32 zltail_offset; // The offset of the last element straight from the start of the compressed list, used to quickly locate the last node
  int16 zllength ; // Number of elements
  T[] entries ; // A list of element contents, stored in a compact sequence
  int8 zlend; // Marks the end of the compression list, constant OxFF
  }
  
  struct entry{ 
    int<var> prevlen; // Length of the previous entry in bytes
    int<var> encoding; // Element type encoding
    int<var> len; // Length of the current entry
    optional byte[] content; // Element content
    / /...
    }
Copy the code

The ztail_offset field is used to quickly locate the compressed list in order to support two-way traversal

Go to the last element and walk backwards. Look up the next one is by length.

Because ziplist structures are next to each other, there is no redundant space (contrast SDS). So you may need to reallocate memory when adding elements.

If ziplist takes up too much memory, it can be expensive to reallocate and copy memory, so ziplist should not store large strings, nor should it store too many elements.

Hash is stored using ziplist when the number of elements is small and the elements are small integers or short strings

127.0. 01.:6379> hmset user:001  username zhangfei password 111 age 23 sex M
OK
127.0. 01.:6379> object encoding user:001
"ziplist"
127.0. 01.:6379> hmset user:003 username111111111111111111111111111111111111111111111111111111111111111111111111 zhangfei password 111 num 330000000000000000000000000000000000000
OK
127.0. 01.:6379> object encoding user:003
"hashtable"
Copy the code

Zset Is stored in Ziplist when the number of elements is small and the elements are small integers or short strings

127.0. 01.:6379> zadd hit:1 100 item1 20 item2 45 item3
(integer) 3
127.0. 01.:6379> object encoding hit:1
"ziplist"
127.0. 01.:6379> zadd hit:2 100 item99999999999999999999999999999000000000000000000003333333333333333333 20 item2 45 item3
(integer) 1
127.0. 01.:6379> object encoding hit:2
"skiplist"
Copy the code

Intset Set of small integers

Redis uses intSet to store set elements when all of the set elements are integers and the number of elements is small. Intset is a compact array structure that supports 16-bit, 32-bit, and 64-bit integers at the same time.

If the above conditions are not met, the underlying implementation of the set structure is a dictionary, except that all values are null, and other features are the same as dictionaries.

127.0. 01.:6379> sadd set:001 1 3 5 6 2 
(integer) 5 
127.0. 01.:6379> object encoding set:001 
"intset" 
127.0. 01.:6379> sadd set:004 1 100000000000000000000000000 9999999999 
(integer) 3 
127.0. 01.:6379> object encoding set:004 
"hashtable"
Copy the code

quickList

Quicklist is a bidirectional linked list, and each node in the list is a Ziplist structure. Each ziplist node in QuickList can store multiple data elements.

typedef struct quicklist { 
  quicklistNode *head; // Point to the head of quickList
  quicklistNode *tail; // Point to the end of the QuickList
  unsigned long count; // The total number of all items in the list
  unsigned int len; // The number of QuickList nodes, i.e. the number of Ziplist nodes
  int fill : 16; // The ziplist size is specified by list-max-ziplist-size (Redis setting).
  unsigned int compress : 16; // Set the node compression depth given by list-compressed-depth (set by Redis).
  } quicklist;
  
  typedef struct quicklistNode {
    struct quicklistNode *prev; // points to the previous ziplist node
    struct quicklistNode *next; // point to the next ziplist node
    unsigned char *zl; // The data pointer, if not compressed, points to the Ziplist structure, whereas it points to the quicklistLZF structure
    unsigned int sz; // Indicates the total length of the ziplist structure.
    unsigned int count : 16; // Indicates the number of data items in ziplist
    unsigned int encoding : 2; // Encoding mode, 1--ziplist, 2--quicklistLZF
    unsigned int container : 2; 1--NONE, 2--ziplist
    unsigned int recompress : 1; // Unzip the flag
    // ...
}quicklistNode;    
Copy the code

Graph based on the code above Ziplist length

By default, a single Ziplist in QuickList is 8KB in length. If you exceed this number, a new ziplist will be created. The ziplist length is determined by the list-max-ziplist-size configuration parameter.

In order to further save space, ziplist is compressed and stored. LZF algorithm is used for compression, and the compression depth can be selected.

The depth of the compression

Quicklist’s default compression depth is 0, meaning no compression. The actual compression depth is determined by the list-compressed-depth configuration parameter. To support fast push/pop operations, the first and last ziplists of The QuickList are uncompressed, with a compression depth of 1. If the compression depth is 2, it means that the first ziplist and the first ziplist and the second ziplist in quickList are not compressed. As shown below, the compression depth is 1.

The low-level implementation of the application List in Redis

Skiplist (Jump list)

Zset structure is complex. On the one hand, it needs a hash structure to store the corresponding relationship between value and score; on the other hand, it needs to provide the function of sorting by score and obtaining the value list by specifying the range of score, which requires another structure “skip list”.

For example:

General list

So if we look for element 9, we’re going to have to go through the beginning node, and we’re going to have to go through 8 nodes to find element 9.

Skip lists layer linked lists

Find element 9(the red line is the search path)

The second layer traverses 4 times to find element 9

The third layer is layered four times to find element 9

It’s a little bit like binary search, it’s order lg(n).

Redis skip list implementation

// Skip list node
typedef struct zskiplistNode {
   sds ele; /* Store string data using robj in version 3.0, but SDS directly in version 4.0.1 */
   double score;// Store the sort score
   struct zskiplistNode *backward;// Back pointer to the previous node at the bottom of the current node
   struct zskiplistLevel {
       struct zskiplistNode *forward; // points to the next node in the layer
       unsigned int span;// The number of elements from the next node to this node
       } level[];
  } zskiplistNode;
  
/ / list
typedef struct zskiplist{
   // Table header and table tail nodes
   structz skiplistNode *header, *tail; 
   // The number of nodes in the table
   unsigned long length; 
   // The number of layers of the node with the largest middle level in the table
   int level;
}zskiplist;

// zset dictionary + skip list
typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;
Copy the code

Features: bidirectional linked list, ordered arrangement

The search process is to locate the last element smaller than “me”, and then go down one level from this node to locate the last element smaller than me.

Insert is also according to the search process to locate the position to insert.

So how many layers are we inserting?

int zslRandomLevel(void) {       
    int level = 1;       
    while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
          level += 1;       
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;   
   }
Copy the code

0xFFFF is converted to 111111111111111111 in binary to 65535 in decimal. Random () generates random numbers ranging from 0 to 65535

ZSKIPLIST_P = 0.25. So the probability of level += 1 execution is 0.25.

Therefore, the probability that the final return level is 1 is 1-0.25=0.75, that the final return level is 2 is 0.250.75, and that the final return level is 3 is 0.250.25*0.75……