In this article, we summarize the best practices in Redis at two levels: the business level and the operations level.

Since I have written a lot of UGC back-end services before and used Redis in a lot of scenarios, I also stepped on a lot of holes in the process, so I have summarized a set of reasonable use methods in the process.

Later, I worked on infrastructure and developed middleware related to Codis and Redis. At this stage, I focused on the development, operation and maintenance of Redis from the application level, and more on the internal implementation of Redis and various problems arising in the operation and maintenance process. I also accumulated some experience in this area.

In view of these two, I would like to share some reasonable Redis usage, operation and maintenance methods, which may not be the most comprehensive and may be different from the way you use Redis. However, the following methods are my practical experience summarized after stepping on pits for your reference.

At the business level, developers are primarily concerned about how to use Redis properly when writing business code. Developers need a basic understanding of Redis to use it in the right business scenarios to avoid delays at the business level.

During the development process, business-level optimization suggestions are as follows:

  • The length of a key should be as short as possible. If a large amount of data is generated, an excessively long key occupies more memory
  • Do not store large data values. Large data takes time to allocate and release memory, which may block the main thread
  • Above Redis 4.0, it is recommended to enable lazy-free to release large values asynchronously without blocking the main thread
  • It is recommended to set the expiration time. Use Redis as a cache, especially if the number of Redis is large. Not setting the expiration time will result in unlimited memory growth
  • Do not use complex commands, such as SORT, SINTER, SINTERSTORE, ZUNIONSTORE, and ZINTERSTORE. These commands take a long time and block the main thread
  • When querying data, obtain as little data as possible at a time. When the number of container elements is uncertain, do not use operations such as LRANGE key 0-1 and ZRANGE key 0-1. You should set the number of elements to be queried
  • Write as little data as possible at a time, for example, HSET Key Value1 Value2 value3… , – Controls the number of elements written at a time. It is recommended that the number be less than 100. Large data is written in multiple batches
  • In batch data operation, MGET/MSET should be used to replace GET/SET, and HMGET/MHSET should be used to replace HGET/HSET to reduce the network I/O times of returning requests and delay. Pipeline is recommended to send multiple commands to the server at one time for commands without batch operation
  • Do not run the KEYS command. When you need to SCAN instances, you are advised to use SCAN. Ensure that the SCAN frequency is controlled during online operations to avoid performance jitter on Redis
  • To avoid a large number of keys expiring at a certain point in time, it is recommended to add a random time to disperse the expiration time, reduce the pressure of Redis when keys are expiring at a certain point in time, and avoid blocking the main thread
  • Depending on the business scenario, choose the appropriate elimination strategy, and random expiration is usually faster than LRU expiration to eliminate data
  • The connection pool is used to access Redis, and proper connection pool parameters are configured to avoid short connections. TCP three-way handshake and four-way wave are also time-consuming
  • Use only DB0, not multiple dB, because multiple DB will increase the burden of Redis. Each time you access a different DB, you need to run the SELECT command. If the service line is different, you are advised to split multiple instances, which can improve the performance of a single instance
  • When the number of read requests is large, read/write separation is recommended on the premise that the slave data cannot be updated in a timely manner
  • If the number of write requests is large, you are advised to use a cluster and deploy multiple instances to share the write load

The operational level

The purpose is to reasonably plan the deployment of Redis and ensure the stable operation of Redis. The main optimizations are as follows:

  • You are advised to deploy different instances for different service lines independently. You are advised to use different machines for different service lines and divide them into different groups based on service importance to prevent problems on one service line from affecting other service lines
  • Ensure that the machine has sufficient CPU, memory, bandwidth, and disk resources to prevent excessive load from affecting Redis performance
  • The instance is deployed in master-slave cluster mode and distributed on different machines to avoid single points. The slave must be set to ReadOnly
  • The master and slave nodes are independent of each other. Do not deploy multiple instances. Backup is usually performed on the slave node
  • You are advised to deploy at least three sentinel nodes on different machines to implement automatic failover
  • Plan the capacity in advance. It is recommended that the memory upper limit of one deployment instance be half of the memory of the machine. During the master/slave full synchronization, the memory space can be doubled at most to prevent the memory of the machine from being exhausted due to the full synchronization of all master-slaves caused by a large network fault
  • Monitor the CPU, memory, bandwidth, and disk of the machine, and timely handle the alarm when resources are insufficient. After the use of Swap, the performance of Redis deteriorates sharply, the network bandwidth load is too high, and the access delay increases significantly. When disk I/O is too high, the AOF function will slow down the performance of Redis
  • Set the upper limit of the maximum number of connections to prevent excessive service load caused by too many client connections
  • You are advised to limit the memory usage of a single instance to less than 20 GB. A large instance takes a long time to back up data, consumes more resources, and blocks data during primary/secondary full synchronization
  • Set a reasonable slowlog threshold (10 ms is recommended) and monitor it. Excessive slow logs need to be reported in time. Set a reasonable size of the repl- Backlog replication buffer
  • The client-output-buffer-limit size of the slave node is set appropriately. For instances with a large number of writes, a larger size can avoid the interruption of the master/slave replication
  • Backup is recommended on the slave node, which does not affect master performance
  • Disable AOF or enable AOF to flush disks every second to avoid disk I/O consumption and reduce Redis performance
  • If an instance has a memory upper limit and needs to increase the memory upper limit, adjust the slave node first and then the master node. Otherwise, data on the master node is inconsistent with that on the master node
  • Add monitoring to Redis. When monitoring and collecting info information, you need to use long connections. Frequent short connections may also affect Redis performance
  • When scanning the entire number of instances online, remember to set the sleep time to avoid performance jitter caused by the sudden increase of QPS during scanning
  • Monitor the runtime of Redis, especially the expired_keys, EVicTED_keys, and latest_fork_USec indicators. Sudden increases in these indicators for a short period of time can block the entire instance and cause performance problems

The above is the practice method recommended by Redis summarized when I use Redis and develop Redis related middleware. These aspects proposed above are more or less encountered in actual use.

It can be seen that in order to play the high performance of Redis stably, we need to do a good job in all aspects. Whenever there is a problem in one aspect, it will inevitably affect the performance of Redis, which puts forward higher requirements for our use and operation and maintenance. If you have more problems or better experience in using Redis, please leave a comment!

Original text: kaito-kidd.com/2020/07/04/redis-best-practices/