• preface
  • define
  • String object
    • int
    • raw
    • embstr
    • How are floating point numbers saved?
    • Code switching condition
    • conclusion
  • A list of objects
    • conclusion
  • A collection of objects
    • intset
    • hashtable
    • conclusion
  • Ordered set object
    • Ziplist coding
    • Skiplist coding
    • conclusion
  • The hash object
    • Ziplist coding
    • Hashtable coding
    • conclusion
  • The full text summary
  • Refer to the article
  • To contact me

preface

Redis is already familiar to everyone. It is also used in daily work and frequently involved in interviews. So how deep do we know about it?

I read several books related to Redis and tried to understand its concrete implementation, and recorded some underlying data structures and implementation principles.

This article will introduce the implementation of five basic data types in Redis. These five basic types cover almost 80% of the scenarios we use in our business, and at least 90% of interviews.

define

In the previous eight articles, we covered the eight basic data structures in Redis in detail, but it is well known that there are five data types commonly used in Redis. Strings, lists, sets, ordered sets, hashes.

And these five data types, the bottom layer is to use the data structure introduced before, of course, is not directly one-to-one binding relationship, but the use of a subtle design, to build an object system.

For those familiar with OOP programming, it’s probably easy to see why. The benefits of object systems are numerous, but not covered in this article. The object system is just mentioned here to give you an idea of how the five data types can be implemented in some fancy way.

Next, the underlying implementation data structure of the five data types will be analyzed one by one, as well as the switching conditions between implementation methods (encoding).

Note: The following five data types are referred to as XX objects. Like string objects, list objects. The underlying data structures mentioned are given their full name.

String object

Reference to data structures, SDS, is highly recommended for the first article in this series.

There are three possible low-level implementations of string objects: int, RAW, and embstr.

int

If a string object holds an integer value in the range of long, Redis holds the information as an integer value and sets the string encoding to int.

Such as:

raw

If the string object holds a string that is longer than 32 bytes, it uses the SDS (Simple Dynamic String) data structure described earlier to hold the string value and sets the encoding of the string object to RAW.

embstr

If the string object holds a string but is less than 32 bytes long, it is stored using embstr. Embstr encoding is not a data structure, but a small optimization for SDS. Embstr needs only one memory allocation, because it needs very little space, so it uses continuous space preservation, that is, the value of SDS and the value of the string object in a contiguous memory space. This is a little bit more efficient with short strings.

Such as:

How are floating point numbers saved?

The string data type of Redis supports saving floating point numbers and adding and subtracting floating point numbers, but at the bottom redis converts floating point numbers to string values and then follows the above three encoding rules. Floating-point numbers are also converted from a string to a floating-point number for calculation, and then converted to a string for saving.

Code switching condition

For example, when we change the value of an int encoded string object, it will naturally use raw encoding.

One feature, however, is that Redis does not provide any modification to the EMbSTR encoding, which is why it is just an encoding and not a data structure.

So in Redis, the embstr encoded value is actually read only and will be converted to RAW as soon as it is changed.

conclusion

The underlying data structures or encodings of string objects are int, raw, and embstr. The conditions of use between them are as follows:

coding Conditions of use
int Integers that can be stored with long
embstr The string length is less than 32 bytes (or satisfied after floating point conversion)
raw The value is a string of more than 32 characters

A list of objects

I strongly recommend reading the second, third, and fourth articles in this series on data structures, compressed lists, bidirectional linked lists, and quick lists.

Before Redis 3.2, the list object base is realized by the combination of compressed list and bidirectional list. When the number of elements is small, the compressed list is used, and when the number of elements is increased, the ordinary bidirectional list is used to store data.

But this implementation is not good enough, each node in the bidirectional linked list, need to save before and after Pointers, this memory usage for Redis memory database is extremely unfriendly.

Therefore, after 3.2, the authors implemented a new data structure called QuickList. The underlying implementation of all lists is this data structure. Its underlying implementation is basically a combination of bidirectional linked list and compressed list, and a bidirectional pointer is used to connect the compressed list, which not only avoids the performance pressure of the compressed list storing a large number of elements, but also avoids the problem of the bidirectional linked list connecting pointer taking up too much space.

Please read the corresponding article for detailed explanation of the principle, which is not repeated here.

conclusion

coding Conditions of use
quicklist All the data

A collection of objects

Data structures covered: Intset, dict(Hashtable), articles 5 and 6 of this series are highly recommended.

The encoding of a collection object can be intSet or HashTable.

intset

Intset encoding is used when all elements in the collection are integers and the number of elements is not greater than 512.

Intset encoding, the underlying intSET data structure.

hashtable

When elements do not meet all integer values and the number of elements is less than 512, the collection object is encoded as ** hashtable**.

Each key of the dictionary is a string object that holds an element of the collection, and the dictionary values are all set to NULL.

conclusion

coding Conditions of use
intset All elements are integers with less than 512 elements
hashtable Other data

Ordered set object

Articles 3, 6, and 7 of this series are highly recommended for data structures, compressed lists, skip lists, and dictionaries.

The encoding of ordered collection objects can be Ziplist and Skiplist.

Ziplist coding

When ziplist is used, the implementation data structure of ordered collection objects is Ziplist (sounds like crap), and the key-values of each collection are represented by two nodes of a compressed list next to each other. The first node holds the members of the set element (member), and the second node holds the branches of the set element (score).

Inside the compressed list, the collection elements are sorted from smallest to largest.

Skiplist coding

When skiplist is used, zset is used internally to store data. Zset is defined as follows:

typedef struct zset{
  zskiplist *zsl;
  dict *dict;
}zset;
Copy the code

Why use skip lists and dictionaries at the same time?

In fact, if we think about it, we could do all the functions of ordered collections using dictionaries or skip lists alone, but the performance is too poor.

  • When we only use the dictionary, we can get the score of the members in order of O(1) time, but since the dictionary is unordered, when we need to perform scoping operations, we need to sort all the elements in the dictionary, which is at least O(nlogn) time.
  • When we only use skip lists, we can do range sorting in order (logn) time, but also order (logn) time to get the score of an element.

Therefore, the dictionary and skip list can be used together to complete the query score operation in O(1) time complexity, while for some range operations, the skip list can reach O(logn) complexity is lingering.

As you can see, after I added a very long key in the previous example, the encoding of ordered collections became skiplist.

conclusion

coding Conditions of use
ziplist The number of elements is less than 128 and the length of all element members is less than 64 bytes
skiplist Other circumstances where the above conditions are not met

The hash object

References to data structures, compressed lists, and dictionaries are highly recommended for the third and sixth articles in this series.

Hash objects can be encoded ziplist or Hashtable.

Ziplist coding

Ziplist hash object, using compressed list as the underlying implementation data structure, with two consecutive compressed list nodes to represent a key value pair in the hash object. The implementation is similar to the ordered collection scenario above.

As shown in the figure, when I put in two simple key-value pairs, the hash object is encoded ziplist.

Hashtable coding

This is the most intuitive use of HashTable

The hash structure itself is structurally similar to a hashtable, so each key-value pair in the hash object is a key-value pair in the dictionary.

  • Each key in a dictionary is a string object that holds the key of a key-value pair.
  • Each value of a dictionary is a string object that holds the value of a key-value pair.

As you can see, when I added an extra long value to the previous example, the encoding method came to hashTable.

conclusion

coding Conditions of use
ziplist The length of both key and value of a key-value pair is less than 64 bytes, and the number of key-value pairs is less than 512.
hastable Other conditions that do not meet the above conditions

The full text summary

In fact, after the first few articles are written, that is, after all the underlying data structure is introduced, the so-called Redis five basic data types of the underlying implementation principle is no longer difficult.

All the underlying data structures used are known, and all that remains is a permutation and combination problem and the switching conditions between various implementations, and then this condition is only knowledge, and there is no need to forcibly remember.

The possible encodings of the five basic data types are listed here for easy comprehension and memorization.

Underlying data types Possible encoding methods
string int, raw, embstr
The list of It used to be Ziplist and LinkedList, now it’s all QuickList.
A collection of Intset or hashtable
An ordered set Ziplist orskiplistSkiplist encoding uses skip lists + dictionaries
hash Ziplist or hashtable

As for their conversion conditions, since I can’t draw multidimensional tables with Markdown, but I don’t want to write HTML, I won’t make a summary. Readers who need to click on the table of contents can jump to the summary of each summary ~.

Refer to the article

Design and Implementation of Redis (2nd Edition)

Redis Deep Adventures: Core Principles and Practical Applications


To the end.

To contact me

Finally, welcome to follow my personal public account [Huyanshi], I will update many backend engineers’ study notes from time to time. Also welcome direct public number private letter or mailbox to contact me, must know all without reserve, endless.


All the above are personal thoughts, if there is any mistake welcome to comment.

Welcome to reprint, please sign and keep the original link.

Contact email: [email protected]

For more study notes, see personal blog or follow wechat official account < huyan10 >——>HuYan ten