Redis (1) : Redis memory model

Redis is currently one of the most popular in-memory database, through reading and writing data in memory, greatly improving the speed of reading and writing, it can be said that Redis is an indispensable part of the realization of high concurrency website.

When we use Redis, we encounter the five object types of Redis: string, hash, list, collection, and ordered collection. The richness of types is a big advantage of Redis over Memcached and the like. On the basis of understanding the usage and characteristics of Redis 5 object types, a further understanding of Redis memory model will be of great help to the use of Redis, for example:

Estimate Redis memory usage. So far, the cost of memory is still relatively high. Reasonable evaluation of Redis memory usage and selection of appropriate machine configuration can save costs while meeting requirements.

Optimize memory footprint. Understanding the Redis memory model allows you to choose more appropriate data types and encodings to make better use of Redis memory.

Analyze and solve problems. When problems such as Redis blocking or memory occupation occur, you can find the cause of the problem as soon as possible to analyze and solve the problem.

This paper mainly introduces the memory model of Redis taking 3.0 as an example, including: the situation of Redis occupying memory and how to query, the encoding way of different object types in memory, memory allocator (Jemalloc), simple dynamic string (SDS), RedisObject and so on. Then several applications of Redis memory model are introduced.

First, Redis memory statistics

If you want to do a good job, you must do a good job. Before explaining Redis memory, first explain how to statistics Redis memory usage situation is necessary.

After the client connects to the server through redis-CLI (the client uses redis-CLI only unless otherwise specified), you can run the info command to check the memory usage

info memory

The info command can display a lot of information about the Redis server, including basic server information, CPU, memory, persistence, client connection information and so on. Memory is a parameter that displays only memory-related information.

Some important notes in the returned results are as follows:

(1) USED_memory: the total amount of memory allocated by the Redis allocator (in bytes), including the virtual memory used (swap); The Redis allocator is described later. Used_memory_human is just more user-friendly.

(2) used_memory_rss: the amount of memory (in bytes) occupied by the Redis process, which is consistent with the value seen by the top and ps commands; In addition to the memory allocated by the allocator, usED_memory_RSS includes memory needed by the process itself to run, memory fragments, and so on, but not virtual memory.

Thus, usED_memory and USED_memory_rss, the former from the Redis perspective, and the latter from the operating system perspective. The difference between the two is that on the one hand, memory fragmentation and Redis process running require memory, making the former may be smaller than the latter; on the other hand, the existence of virtual memory makes the former may be larger than the latter.

Due to the large amount of Redis data in practical applications, the memory occupied by the process running at this time is much smaller than the amount of Redis data and memory fragments. Therefore, the ratio of USED_memory_RSS to USED_memory is used to measure the memory fragmentation rate of Redis. This parameter is mem_fragmentation_ratio.

(3) mem_fragmentation_ratio: the ratio of USED_memory_rss to USED_memory.

Mem_fragmentation_ratio is generally greater than 1, and a larger value indicates a larger proportion of memory fragments. If mem_fragmentation_ratio is less than 1, it indicates that Redis uses virtual memory. Virtual memory is used as a disk and its speed is much slower than that of memory. If the memory is insufficient, check it in time. Such as adding Redis nodes, increasing the memory of Redis servers, optimizing applications, etc.

In general, mem_fragmentation_ratio around 1.03 is a healthy state (for Jemalloc); The mem_fragmentation_ratio value in the screenshot above is large because no data has been stored to Redis, and the Redis process itself runs in memory that makes USED_memory_rss much larger than USED_memory.

(4) mem_allocator: memory allocator used by Redis. It can be liBC, jemalloc, or TCMALloc at compile time. The default is jemalloc. The default Jemalloc is used in the screenshot.

Two, Redis memory partition

As an in-memory database, Redis stores mainly data (key-value pairs) in memory. As you can see from the previous statement, in addition to data, other parts of Redis also occupy memory.

The memory footprint of Redis can be divided into the following sections

1, the data

As a database, data is the most important part, and the memory occupied by this part is counted in USED_memory.

Redis uses key-value pairs to store data, where values (objects) are of five types: strings, hashes, lists, collections, and ordered collections.

These 5 types are provided by Redis. In fact, there may be two or more internal encoding implementations for each type within Redis. In addition, when Redis stores objects, instead of directly throwing data into memory, it wraps objects in various ways, such as RedisObject and SDS. The details of data storage in Redis will be highlighted later in this article.

2. Memory required by the process itself to run

The main Redis process itself must take up memory, such as code, constant pool, etc. This amount of memory is in the order of a few megabytes, which in most production environments is negligible compared to the amount of memory consumed by Redis data. This portion of memory is not allocated by Jemalloc, so it is not counted in USED_memory

Note: In addition to the main process, the child process created by Redis can also consume memory, such as the child process created when Redis performs AOF, RDB rewrite. Of course, this memory does not belong to the Redis process and is not counted in usED_memory and USED_memory_RSS.

3. Buffer memory

Buffer memory includes:

  • Client buffer: store the input/output buffer for client connections;
  • Replication backlog buffer: used for partial replication function;
  • AOF buffer: Used to save the most recent write command while doing AOF rewrite.

You don’t need to know the details of these buffering until you understand the functionality. This portion of memory is allocated by Jemalloc and therefore counted in USED_memory

4. Memory fragmentation

Memory fragmentation is generated when Redis allocates and reclaims physical memory. For example, if changes are made to data frequently and the size of the data varies greatly, the space freed by Redis may not be freed in physical memory, but Redis cannot use it effectively, resulting in memory fragmentation. Memory fragmentation is not counted in USED_memory.

The generation of memory fragmentation is related to the operation on the data, the characteristics of the data, etc. In addition, it is also related to the memory allocator used — if the allocator is designed properly, it can minimize memory fragmentation. Jemalloc, which we’ll talk about later, does a good job of controlling memory fragmentation.

If the memory fragmentation in the Redis server is already large, you can reduce the memory fragmentation by safely restarting the server. After the restart, Redis reads data from the backup file again and rearranges the data in memory to select appropriate memory units for each data and reduce memory fragmentation.

3. Details of Redis data storage

1, an overview of the

Details about Redis data storage include memory allocator (such as Jemalloc), simple dynamic String (SDS), 5 object types and internal encoding, RedisObject. Before going into the details, let me explain the relationship between these concepts.

Here is the data model involved when executing Set Hello World.

(1) dictEntry: Redis is a key-value database, so there will be a dictEntry for each key-value pair, which stores Pointers to Key and Value; Next points to the next dictEntry, independent of this key-value.

(2) Key: As can be seen in the upper right corner of the figure, the Key (” hello “) is not stored directly as a string, but in an SDS structure.

(3) redisObject: Value(” world “) is stored not directly as a string, nor directly in SDS like keys, but in redisObject. In fact, no matter which of the five types a Value is, it is stored through a RedisObject; The Type field in RedisObject specifies the type of the Value object, and the PTR field points to the address of the object. However, you can see that string objects, although RedisObject wrapped, still need to be stored by SDS.

In fact, in addition to the Type and PTR fields, a RedisObject has other fields that are not shown in the diagram, such as the field used to specify the internal encoding of the object. More on that later.

(4) JEMalloc: No matter it is DictEntry object, RedisObject or SDS object, memory allocator (such as Jemalloc) is required to allocate memory for storage. The DictEntry object, for example, consists of three Pointers that make up 24 bytes on a 64-bit machine, and Jemalloc allocates 32 bytes of memory for it.

Here we introduce Jemalloc, RedisObject, SDS, object types and internal encoding respectively.

2, jemalloc

Redis specifies the memory allocator at compile time; The memory allocator can be LIBC, JEMalloc, or TCMALloc, and the default is jemalloc.

Jemalloc, the default memory allocator for Redis, does a relatively good job of reducing memory fragmentation. In the 64-bit system, jemalloc divides the memory space into three ranges: small, large and huge. Each range is divided into many small memory block units; When Redis stores data, it selects the most appropriate memory block for storage.

The memory units divided by Jemalloc are as follows:

For example, if you need to store a 130-byte object, Jemalloc puts it into a 160-byte memory cell.

3, RedisObject

As mentioned earlier, there are five types of Redis objects; Regardless of type, Redis is not stored directly, but rather through RedisObject objects.

RedisObject is very important, Redis object type, internal encoding, memory reclamation, shared object and other functions, all need RedisObject support, the following will be through the structure of RedisObject to explain how it works.

A RedisObject is defined as follows (different versions of Redis may vary slightly) :

typedef struct redisObject {
    unsigned type: 4; unsigned encoding:4; unsigned lru:REDIS_LRU_BITS; /* lru time (relative to server.lruclock) */ int refcount; void *ptr; } robj;Copy the code

Each field of a RedisObject has the following meanings and functions:

(1) type

The type field indicates the type of the object, which is four bits long. Currently, the value can be REDIS_STRING, REDIS_LIST, REDIS_HASH, REDIS_SET, or REDIS_ZSET.

When we execute the type command, we get the type of the object by reading the type field of the RedisObject. As shown below:

(2) encoding

Encoding Indicates the internal encoding of the object, which is four bits.

For each type supported by Redis, there are at least two internal encodings, such as int, EMbstr, and RAW for strings. With the Encoding attribute, Redis can set different encoders for different usage scenarios, making Redis much more flexible and efficient.

Taking the list object as an example, there are two encoding methods: compressed list and double-endian linked list. If there are fewer elements in a list, Redis tends to use compressed lists for storage, because compressed lists take up less memory and can load faster than double-ended lists. When the list object has a large number of elements, the compressed list is transformed into a double-ended list that is better suited for storing a large number of elements.

Using the object encoding command, you can view the encoding mode of the object, as shown in the following figure:

The encoding methods and application conditions corresponding to the five object types will be introduced later.

(3) lru

Lru records the last time an object was accessed by a command program, and the number of bits it occupies varies according to versions (for example, 24 bits in version 4.0, 22 bits in version 2.6).

The idling time of an object can be calculated by comparing the LRU time with the current time. The object idletime command displays the idling time (in seconds). The object idletime command is special in that it does not change the LRU value of the object.

(4) refcount

Refcount and shared objects

Refcount counts the number of times the object is referenced. The type is an integer. Refcount is used for reference counting and memory reclamation of objects:

When a new object is created, refCount is initialized to 1;

Refcount increases by 1 when a new program uses the object;

Refcount is reduced by 1 when the object is no longer in use by a new program;

When refCount goes to 0, the memory occupied by the object is freed.

Objects that are used multiple times in Redis (refCount >1) are called shared objects. Redis to save memory, new programs do not create new objects when some objects appear repeatedly, but still use the old objects. The object that is reused is a shared object. Currently, shared objects only support string objects with integer values.

Concrete implementation of shared objects

Redis’s shared objects currently only support string objects with integer values. This is actually a balance between memory and CPU (time) : sharing objects reduces memory consumption, but it takes extra time to determine whether two objects are equal.

  • For integer values, the operation complexity is O(1).
  • For ordinary strings, the judgment complexity is O(n)
  • For hashes, lists, sets, and ordered sets, the judgment is O(n^2).

Although shared objects can only be string objects with integer values, they can be used by all five types (such as elements of hashes, lists, and so on).

For the current implementation, Redis server initialization, will create 10000 string objects, values are 0 to 9999 integer values; These shared objects can be used directly when Redis needs to use string objects with values from 0 to 9999. The number 10000 can be changed by adjusting the value of the parameter REDIS_SHARED_INTEGERS (in 4.0, OBJ_SHARED_INTEGERS).

You can run the object refcount command to view the number of references to the shared object, as shown in the following figure. The result page confirms that only integers between 0 and 9999 will be used as shared objects.

(5) the PTR

The PTR pointer points to concrete data, such as in the previous example, set Hello World, where the PTR points to SDS containing the string world.

(6) Summary

To sum up, the structure of a redisObject is related to object type, encoding, memory reclamation, and shared objects. A redisObject is 16 bytes in size:

24 bit bit 4 + 4 bit + + 4 byte + 8 byte = 16 byte.

4, SDS,

Instead of using a C string directly (that is, an array of characters ending with a null character ‘\0’) as the default string representation, Redis uses SDS. SDS stands for Simple Dynamic String.

(1) SDS structure

SDS has the following structure:

struct sdshdr {

    int len;

    int free;

    char buf[];

};
Copy the code

Where buf stands for byte array, which is used to store strings. Len indicates the used length of BUF, and free indicates the unused length of BUF.

Here are two examples:

As can be seen from the structure of SDS, the length of buF array =free+len+1 (where 1 represents the null character at the end of the string); So, the space occupied by an SDS structure is: the length of free+ the length of len+ the length of buF array =4+4+ Free +len+1=free+len+9.

(2) Comparison between SDS and C string

SDS adds free and len fields to the C string, bringing many benefits:

  • Get the string length: SDS is O(1), C string is O(n).
  • Buffer overflow: When using the C string API, it is easy to overflow the buffer if the string length increases (such as strcat operations) and you forget to reallocate memory. SDS records the length, and the corresponding API automatically reallocates memory when it may cause buffer overflows, eliminating buffer overflows.
  • Reallocation of memory when changing strings: For C strings, if you want to change the string, you must reallocate memory (free and then requisition), because without reallocation, increasing the string length will cause memory buffer overflow and decreasing the string length will cause memory leak. As for SDS, len and free can be recorded, so the association between string length and spatial array length is removed, and optimization can be made on this basis. The space preallocation strategy (that is, allocating more memory than actually needed) greatly reduces the probability of allocating memory when the string length increases. The lazy space free strategy greatly reduces the probability of reallocating memory when the string length decreases.
  • Accessing binary data: SDS does, C strings do not. Because the C string is marked by a null character as the end of the string, and for some binary files (such as pictures, etc.), the content may contain an empty string, so the C string cannot be accessed correctly. SDS uses len as the end of the string, so there is no such problem.

In addition, because BUF in SDS still uses C strings (that is, ending with ‘\0’), SDS can use some of the functions in the C string library. Note, however, that this can only be used when SDS is used to store textual data, not binary data (‘ \0 ‘is not necessarily the ending).

(3) SDS and C string application

Redis uniformly uses SDS instead of C strings when storing objects. The set Hello world command, for example, stores hello and world as SDS. The sadd myset member1 member2 member3 command, both the key “myset” and the elements of the set member1, member2 and member3, are stored in the form of SDS. In addition to storing objects, SDS is also used to store various buffers.

The C string is used only when the string does not change, such as when printing logs.

4. Object type and internal coding of Redis

As mentioned earlier, Redis supports five object types, and each structure has at least two encodings. The advantages of this approach are: on the one hand, the interface is separated from the implementation, so that when the internal code needs to be added or changed, the use of users is not affected; on the other hand, the internal code can be switched according to different application scenarios, improving efficiency.

The internal encodings supported by various Redis object types are shown in the figure below (the version in the figure is Redis3.0, but the internal encodings are added in later versions of Redis. The internal encodings described in this chapter are 3.0 based) :

As for the conversion of Redis internal code, it conforms to the following rules: the conversion is completed when Redis writes data, and the conversion process is irreversible. Only small memory encoding can be converted to large memory encoding.

1. String

(1) Overview

Strings are the most basic type, because all keys are strings, and elements of several other complex types are strings. The value cannot exceed 512MB.

(2) Internal coding

There are three internal encodings of string types, which can be used in the following scenarios:

Int: a long integer of 8 bytes. When the string value is an integer, the value is represented by the long integer.

Embstr: a string of <=39 bytes. Both EMbSTR and RAW use RedisObject and SDS to store data. The difference is that the use of EMbSTR allocates memory space only once (thus RedisObject and SDS are contiguous), whereas RAW allocates memory space twice (RedisObject and SDS, respectively). Therefore, compared to raw, embSTR has the advantage of allocating less space on creation, freeing less space on deletion, and connecting all data of objects together. The disadvantage of embSTR is also obvious: if the length of the string increases and memory needs to be reallocated, the entire RedisObject and SDS need to be reallocated, so embSTR in Redis is implemented as read-only.

Raw: The string contains more than 39 bytes

An example is shown below:

The length of embstr is 39 because the length of RedisObject is 16 bytes and the length of SDS is 9+ string length. So when the string length is 39, embstr is exactly 16+9+39=64, and Jemalloc can allocate exactly 64 bytes of memory.

An example is shown below:

The length of embstr is 39 because the length of RedisObject is 16 bytes and the length of SDS is 9+ string length. So when the string length is 39, embstr is exactly 16+9+39=64, and Jemalloc can allocate exactly 64 bytes of memory.

(3) Coding conversion

When int data is no longer an integer, or the size exceeds long’s range, it is automatically converted to RAW.

In the case of embstr, since its implementation is read-only, any modification to an EMbstr object is converted to RAW before it is modified. Therefore, whenever an EMbstr object is modified, the modified object must be raw, regardless of whether it reaches 39 bytes. An example is shown below:

2, list

(1) Overview

A list is used to store multiple ordered strings, each string called an element. A list can store 2^32-1 elements. Lists in Redis support both insertion and pop-up, and can obtain elements in a specified position (or range), which can act as arrays, queues, stacks, and so on.

(2) Internal coding

The internal encoding of a list can be a ziplist or a linkedList.

Double-endian linked list: consists of a list structure and multiple listNode structures, as shown in the figure below:

As can be seen from the figure, the double-ended linked list stores both the header pointer and the tail pointer, and each node has a pointer pointing forward and backward. Linked lists hold the length of the list, and dUP, free, and match set type-specific functions for node values, so linked lists can be used to hold a variety of different types of values. Each node in the list points to a RedisObject of type string.

Compressed list: Compressed list is developed by Redis to save memory. It is a sequential data structure composed of a series of consecutively encoded memory blocks (instead of each node being a pointer like a double-endian linked list). The specific structure is relatively complex, omitted. Compared with a double-ended linked list, a compressed list can save memory space. However, it is more complex to modify or add or delete a list. Therefore, a compressed list can be used when the number of nodes is small. However, with a large number of nodes, it is still cost-effective to use a double-entailed list.

Compressed lists are used not only to implement lists, but also to implement hashes and ordered lists.

(3) Coding conversion

Compressed lists are used only if both of the following conditions are met:

  • The number of elements in the list is less than 512.
  • All string objects in the list are less than 64 bytes.

If one condition is not met, a double-endian list is used, and the encoding can only be converted from a compressed list to a double-endian list, not the other way around.

The following figure shows the characteristics of list encoding transformations:

A single string cannot exceed 64 bytes to allocate the length of each node uniformly. The 64 bytes here refer to the length of the string, not including the SDS structure, because compressed lists use contiguous, fixed-length memory blocks to store strings and do not require the SDS structure to specify the length. When I talk about the compressed list, I also emphasize that the length should not exceed 64 bytes, and the principle is similar here.

3, the hash

(1) Overview

As a data structure, hash is not only listed with string, list, set and ordered combination, but also one of the five object types provided by Redis. It is also the data structure used by Redis as a key-value database. For the sake of illustration, when “inner hash” is used later in this article, it represents one of the five object types provided externally by Redis; The use of “outer hash” refers to the data structure Redis uses as a key-value database.

(2) Internal coding

The internal encoding used for inner hashing can be ziplist and Hashtable. The outer hashes of Redis only use HashTable.

The compressed list was introduced earlier. Compared with hash table, compressed list is used in the scene with few elements and small length. Its advantage lies in centralized storage and space saving. Meanwhile, although the operation complexity of elements has changed from O(n) to O(1), the operation time is not significantly inferior due to the small number of elements in the hash.

Hashtable: A Hashtable consists of one dict structure, two Dictht structures, one dictEntry pointer array (called bucket), and multiple dictEntry structures.

Under normal circumstances, that is, when the Hashtable is not rehash, the relationship between the parts is as follows:

The following sections are introduced from the bottom up:

dictEntry

The dictEntry structure is used to hold key-value pairs. The structure is defined as follows:

typedef struct dictEntry{

    void *key;

    union{

        void *val;

        uint64_tu64;

        int64_ts64;

    }v;

    struct dictEntry *next;

}dictEntry;
Copy the code

The functions of each attribute are as follows:

  • Key: a key in a key-value pair;
  • Val: a value in a key-value pair, implemented using a union(common body) that stores either a pointer to a value, a 64-bit integer, or an unsigned 64-bit integer;
  • Next: points to the next dictEntry, which is used to resolve hash conflicts
  • On 64-bit systems, a dictEntry object accounts for 24 bytes (key/val/next each accounts for 8 bytes).

bucket

A bucket is an array, and each element of the array is a pointer to the dictEntry structure. The size of the bucket array in Redis is calculated as follows: the smallest 2^n larger than dictEntry. For example, if there are 1000 dictentries, the bucket size is 1024; If there are 1500 dictentries, the bucket size is 2048.

dictht

Dictht structure is as follows:

typedef struct dictht{

    dictEntry **table;

    unsigned long size;

    unsigned long sizemask;

    unsigned long used;

}dictht;
Copy the code

The function description of each attribute is as follows:

  • The table property is a pointer to a bucket;
  • The size attribute records the size of the hash table, that is, the size of the bucket;
  • Used records the number of dictentries used;
  • The sizemask attribute is always size-1. This attribute, along with the hash value, determines where a key is stored in the table.

dict

In general, the function of ordinary hash table can be realized by using dictht and dictEntry structures. But the Redis implementation also has a dict structure on top of the Dictht structure. The following explains the definition and role of the dict structure.

Dict has the following structure:

typedef struct dict{

    dictType *type;

    void *privdata;

    dictht ht[2];

    int trehashidx;

} dict;
Copy the code

The Type and privData attributes are designed to accommodate different types of key-value pairs and are used to create polymorphic dictionaries.

The HT and treHashidx attributes are used for rehash, that is, when the hash table needs to be expanded or shrunk.

Ht is an array of two items, each pointing to a dictht structure, which is why Redis hash has 1 dict and 2 Dictht structures. Normally, all data is stored in dict ht[0], and HT [1] is only used for rehash. When the dict rehash is performed, all the data in HT [0] is rehash to HT [1]. Then assign HT [1] to HT [0] and empty HT [1].

Therefore, hashing in Redis has a dict structure in addition to dictht and dictEntry, partly to accommodate different types of key-value pairs, and partly for rehash.

(3) Coding conversion

As mentioned earlier, the inner hash in Redis may use either a hash table or a compressed list.

Compressed lists are used only if both of the following conditions are met:

The number of elements in the hash is less than 512.

All key-value pairs in the hash have key and value strings of less than 64 bytes.

If one condition is not met, hash tables are used; And encoding can only be converted from a compressed list to a hash table, not the other way around.

The following figure shows the characteristics of the Redis inner hash encoding transformation:

4, collections,

(1) Overview

A set is similar to a list in that it is used to hold multiple strings, but sets differ from lists in two ways: the elements in a set are unordered, so they cannot be manipulated by indexes; There can be no repetition of elements in a collection.

A set can store a maximum of 2^32-1 elements. In addition to supporting the regular add, delete, change and search, Redis also supports the intersection, union, difference set of multiple sets.

(2) Internal coding

The internal encoding of a collection can be a collection of integers (intset) or a hashtable (hashtable).

I’ve already talked about hash tables, so I’m not going to do it here. Note that the values of the collection are all set to NULL when using the hash table.

The structure of the set of integers is defined as follows:

typedef struct intset{

    uint32_t encoding;

    uint32_t length;

    int8_t contents[];

} intset;
Copy the code

Where encoding stands for the type of contents stored, although contents (storing elements in the collection) is of type INT8_T, it actually stores the value int16_t, INT32_t, or int64_t, depending on encoding. Length indicates the number of elements.

The integer set is suitable for all the elements of the set are integers and the number of elements in the set is small. Compared with the hash table, the integer set has the advantage of centralized storage and space saving. Meanwhile, although the operation complexity of elements has changed from O(n) to O(1), the operation time is not significantly inferior due to the small number of sets.

Only when the following two conditions are met at the same time, the set will use the integer set:

The number of elements in the set is less than 512.

All elements of the set are integer values.

If one condition is not met, hash tables are used; And the encoding can only be converted from a set of integers to a hash table, not the other way around.

The following figure shows the characteristics of the collection encoding transformation:

5. Ordered sets

(1) Overview

Ordered sets are like sets in that no element can be repeated. But unlike a set, the elements in an ordered set are ordered. Unlike lists, which use index subscripts for sorting, ordered collections assign a score to each element for sorting.

(2) Internal coding

The internal encoding of ordered collections can be ziplist or skiplist. Ziplist is used in both lists and hashes, but we’ve already covered it, so we’re not going to mention it here.

A skip list is an ordered data structure that allows fast access to nodes by maintaining multiple Pointers to other nodes in each node.

In addition to skip lists, another typical implementation of ordered data structures is balanced trees; In most cases, the efficiency of skip tables is comparable to that of balance trees, and the implementation of skip tables is much simpler than that of balance trees. Therefore, skip tables are used in Redis instead of balance trees.

Skip lists support node search at complex points of average O(logN) and worst O(N), and support sequential operations. Redis skiplist implementation consists of zskiplist and zskiplistNode: the former is used to store skiplist information (such as head node, tail node, length, etc.), and the latter is used to represent skiplist nodes. The specific structure is relatively complex, omitted.

(3) Coding conversion

Compressed lists are used only if both of the following conditions are met:

The number of elements in the ordered set is less than 128;

All members of an ordered set are less than 64 bytes long.

If one condition is not met, skip lists are used; And the encoding can only be transformed from a compressed list to a skip list, the reverse direction is not possible.

The following figure shows the characteristics of ordered set encoding transformations:

5. Application examples

Now that you know about Redis’s memory model, here are a few examples to illustrate its application.

1. Estimate Redis memory usage

To estimate the size of memory occupied by data in Redis, it is necessary to have a comprehensive understanding of the Redis memory model, including hashtable, SDS, RedisObject and the encoding methods of various object types introduced above.

Here’s the simplest string type:

Suppose there are 90,000 key-value pairs, each key is 7 bytes long, and each value is 7 bytes long (and neither key nor value is an integer).

Let’s estimate the space taken up by those 90,000 key/value pairs.

Before estimating the space occupied, you can first determine the encoding used for the string type: embstr. The memory space occupied by 90,000 key-value pairs can be divided into two parts: one is the space occupied by 90,000 dictentries; One part is the bucket space required by the key-value pair.

The space occupied by each dictEntry includes:

A dictEntry, 24 bytes, and Jemalloc will allocate 32 bytes of memory.

A key, 7 bytes, so SDS(key) requires 7+9=16 bytes, jemalloc will allocate 16 bytes of memory blocks.

A RedisObject, 16 bytes, and Jemalloc will allocate 16 bytes of memory.

A value, 7 bytes, so SDS(value) requires 7+9=16 bytes, jemalloc will allocate 16 bytes of memory blocks.

To sum up, a dictEntry requires 32+16+16+16=80 bytes.

Bucket space: The bucket array size is the smallest 2^n greater than 90,000, which is 131072, and each bucket element is 8 bytes (because Pointers are 8 bytes on 64-bit systems).

Therefore, it can be estimated that the memory size occupied by the 90,000 key-value pairs is: 9000080 + 1310728 = 8248576.

Check in Redis with a program:

public class RedisTest {
    public static Jedis jedis = new Jedis("localhost", 6379);
    public static void main(String[] args) throws Exception{
        Long m1 = Long.valueOf(getMemory());
        insertData();

        Long m2 = Long.valueOf(getMemory());
        System.out.println(m2 - m1);
    }

    public static void insertData() {for(int i = 10000; i < 100000; i++){
            jedis.set("aa" + i, "aa"+ i); // Key and value are 7 bytes long and not integers}} public static StringgetMemory(){
        String memoryAllLine = jedis.info("memory");
        String usedMemoryLine = memoryAllLine.split("\r\n") [1]; String memory = usedMemoryLine.substring(usedMemoryLine.indexOf(':') + 1);
        returnmemory; }}Copy the code

Running result: 8247552

The error between the theoretical value and the resulting value is 1.2 parts per million, which is good enough for calculating how much memory is needed. The error is due to the fact that Redis allocated some bucket space before we inserted 90,000 pieces of data, and that bucket space was not yet used.

As a comparison, if the length of key and value is increased from 7 bytes to 8 bytes, the corresponding SDS becomes 17 bytes and Jemalloc will allocate 32 bytes, so the number of bytes occupied by each dictEntry also changes from 80 bytes to 112 bytes. In this case, the memory occupied by the 90,000 key/value pairs is estimated to be 90000112 + 1310728 = 11128576.

Verify the code in Redis as follows (only modify the code that inserts data) :

public static void insertData() {for(int i = 10000; i < 100000; i++){
        jedis.set("aaa" + i, "aaa"+ i); // Key and value are 8 bytes long and not integers}}Copy the code

Running results: 11128576; Accurate estimation.

For other types other than string types, the estimation of memory usage is similar and needs to be combined with the specific encoding method.

2. Optimize memory usage

Understanding the memory model of Redis is helpful in optimizing Redis memory footprint. Here are several optimization scenarios:

(1) Optimization using jEMalloc characteristics

The 90,000 key values described in the previous section are an example. Since jemalloc allocates memory in discrete values, a one-byte change in the key/value string can cause a large change in memory footprint, which can be taken advantage of at design time.

For example, if the key is 8 bytes long, SDS allocates 17 bytes and Jemalloc allocates 32 bytes. At this point, if the key length is reduced to 7 bytes, SDS is 16 bytes and JEMalloc allocates 16 bytes. The space occupied by each key can be reduced by half.

(2) Use integer/long integer

If it is an integer/long integer, Redis saves more space by using int (8 bytes) instead of string. Therefore, use long integers in scenarios where long integers can be used instead of strings.

(3) Shared objects

With shared objects, you can reduce object creation (and RedisObject creation), saving memory space.

Currently, shared objects in Redis consist of only 10,000 integers (0-9999), and you can increase the number of shared objects by adjusting the REDIS_SHARED_INTEGERS parameter. For example, if REDIS_SHARED_INTEGERS is set to 20000, objects between 0 and 19999 can be shared.

Consider a scenario where a forum site stores the number of views per post in Redis, most of which are in the range of 0 to 20,000, and you can use shared objects to save memory by appropriately increasing the REDIS_SHARED_INTEGERS parameter.

(4) Avoid over-design

However, it is important to note that there is a tradeoff between memory space and design complexity in any optimization scenario; Design complexity affects code complexity and maintainability.

If the amount of data is small, it is not cost-effective to make code development and maintenance more difficult to save memory. Again, using the 90,000 key-value pairs mentioned earlier, the actual memory savings are only a few MEgabytes. However, if the data volume is tens of millions or even hundreds of millions, it is necessary to consider memory optimization.

3. Pay attention to memory fragmentation rate

Memory fragmentation rate is an important parameter, which is of great significance to Redis memory optimization.

If the memory fragmentation rate is too high (jemalloc is normal around 1.03), it indicates that there are too many memory fragments and serious memory waste. In this case, you can consider restarting the Redis service to rearrange data in memory to reduce memory fragmentation.

If the memory fragmentation rate is less than 1, Redis has insufficient memory and some data uses virtual memory (swap). Since virtual memory is much slower to access than physical memory (2-3 orders of magnitude), Redis can be slow to access. Therefore, you must try to increase physical memory (by increasing the number of server nodes, or by increasing the memory on a single machine), or reduce the data in Redis.

In order to reduce the data in Redis, in addition to selecting appropriate data types and using shared objects, a reasonable maxmemory-policy should be set. When the memory reaches a certain amount, the memory will be reclaimed according to different priorities.

Vi. References

  • Redis Development and Operation
  • Redis Design and Implementation
  • Redis. IO/documentati…
  • Redisdoc.com/server/info…
  • www.cnblogs.com/lhcpig/p/47…
  • searchdatabase.techtarget.com.cn/7-20218/
  • www.cnblogs.com/mushroom/p/…
  • www.imooc.com/article/364…
  • Blog.csdn.net/zhengpeitao…