Cache database is becoming more and more standard configuration in modern system architecture, especially with the popularity of micro-service architecture, stateless transformation of micro-service requires external state, and external state needs to be stored in external cache service. Redis is the current mainstream cache database implementation. This paper introduces the basic concepts and best practices of Redis.

Structure and Concept

Redis is an open source, network-enabled, memory-based, optionally persistent key-value pair storage database written in ANSI C. Development of Redis was sponsored by Redis Labs beginning in June 2015, and from May 2013 to June 2015, its development was sponsored by Pivotal. Prior to May 2013, its development was sponsored by VMware. Redis is the most popular key-value pair storage database, according to DB-Engines.com, a monthly ranking website.

Single-node/active/standby/Cluster deployment mode

Redis is a single-thread mode, because the design concept of Redis is not CPU consumption, and the combination of single-thread asynchronous IO processing efficiency is also very high, the current Redis single instance can reach 100,000 QPS. In common application scenarios, single-node deployment or active/standby deployment (high availability) can meet the requirements.

However, applications are becoming more and more dependent on Redis, and have higher requirements on Redis: low access latency (<5ms), high QPS(millions of QPS), and high throughput (hundreds of MB/s). As a result, in many scenarios, a single CPU cannot meet the requirements. Therefore, a Redis cluster consisting of multiple Redis processes is a solution for high-performance caching services. In cluster mode, hot keys exist due to application characteristics. As a result, Redis in the cluster are used unevenly. The instances hit by hot keys are busy, while other instances are idle. There are two ways to solve hot keys: One is to provide read/write separation from the perspective of Redis cluster, and share the load through multiple Redis instances. Of course, read/write separation is a replication cluster, and how to reduce the data replication delay between instances and the consumption of the master instance during replication is the Key to the design of read/write separation mode. Another approach is to use memory for level 1 caching within the application and Redis for level 2 caching.

Codis cluster

Redis official version 3.0 only supports the cluster mode, before this, there are many Redis cluster solutions, the main idea is to add a Proxy on the Redis instance, the Proxy is responsible for the partition forwarding, and the status of the Redis instance is monitored by the sentry. The sentry writes the status to the distributed configuration center (ZK/ETCD), and the Proxy refreshes Redis instance routing information through the configuration center. The Proxy cluster implementation that is highly recognized in the open source field is Codis. The following figure shows the architecture of Codis.

Redis native cluster

Redis3.0 supports the cluster mode, which is different from the above cluster mode with Proxy. The cluster implementation provided by Redis officially does not have Proxy on the Server side. The function of Proxy routing is implemented by the client SDK. To distinguish it from Proxy clusters, the official Redis clusters are called native clusters.

The communication mechanism between Redis Cluster nodes is Redis Cluster Bus, which is implemented based on the Gossip protocol.

The Redis client obtains CLUSTER configuration information by running commands related to CLUSTER. The client and nodes use MOVED/ASK to coordinate the change of the Key slot.

Compared with Proxy cluster, native cluster has better horizontal expansion capability without Proxy layer, and 1000 nodes are officially supported. Of course, without the Proxy layer, traffic and route control will be more troublesome.

The total Slot space of a native cluster is 16,383. Therefore, the number of nodes in a cluster cannot exceed 16,383 in theory.

Redis specifications evaluate elements

When selecting Redis specifications, you need to evaluate the business model to avoid a mismatch between the selected specifications and the actual business model.

Memory capacity

According to the number/frequency of Key writes, TTL is constant, whether to display the delete function determines the capacity growth to avoid capacity exhaustion. When Redis memory is full, a second write triggers the elimination of Key. At the same time, the system resources may be insufficient because the memory is full. Therefore, the Key elimination operation takes a long time, resulting in write timeout.

Whether trading

Appendfsync =everysec if enabled, make sure that the disk underneath is SSD. Otherwise, in high QPS write scenarios, if the disk is not an SSD, the latency of application access to Redis may increase, and in extreme cases, the access may timeout.

Whether data can be regenerated into

If data can be regenerated, there is no need to migrate data.

If the data cannot be regenerated, it means that the data needs to be migrated. Currently, there is no Redis online migration tool or service (DRS service is not perfect to support Redis), so business code is needed to complete the migration, and the migration scheme is discussed according to the business situation. Typical methods are:

  • Double-write business code
  • If the duplicate Key value is overridden, a tool can be written to read from the source library, write to the destination library, and then, at some point, temporarily stop the service switch library
  • Simply stop service migration

QPS

QPS is one of the main criteria for selecting Redis specifications. In some scenarios, the data volume is very small and the QPS is very high. Because the maximum QPS of the primary and secondary versions is limited, if the REQUIRED QPS exceeds the maximum QPS of the primary and secondary versions, the cluster version is also required. A scenario with very small memory and high QPS is also one of the main scenarios for a small cluster.

Ratio of read and write QPS

If the QPS index is very high, pay attention to AOF REWRITE. Writing in AOF REWRITE will result in high latency and, in extreme cases, will result in access timeout. Refer to the connection

Number of concurrent connections

Select specifications based on the required number of concurrent connections. Pay special attention to short link access.

Indicates the CPU consumption type

In some scenarios, such as MSET and MGET commands consume a lot of CPU. Therefore, you must consider whether the CPU power is sufficient. Sometimes, the MEMORY is sufficient but the CPU is insufficient, causing the Redis CPU to be busy. This scenario is typical for clusters with small memory sizes.

Check whether the memory will be full if the TTL is too long

For some keys, the TTL is set to be very long (e.g., one month) and there is no active deletion mechanism. This may cause memory to be full and trigger the Key elimination policy. In this case, re-writes may time out.

Whether to use Pipeline

In high QPS scenarios, efficiency and performance can be greatly improved using Pipeline compared to a single Key operation. However, you need to limit the number of commands in the Pipeline, the current Codis Proxy default session_MAX_pipeline =10000, it is recommended not to exceed this value. It is also necessary to evaluate the amount of data returned by the Pipeline once.

Whether to use multiple DB

Some cloud vendors (such as Ali Cloud) support Redis cluster with multiple DB features, and Key values in different DB can be the same. Codis cluster, Redis native cluster does not support multiple DB.

Long connection or short connection

Short connections require special attention to the connection count metric. If the link is short, check whether the memory parameters local port and maximum number of handles are tuned.

Comparison of cache services provided by mainstream cloud vendors

Redis as the mainstream cache service, each cloud manufacturer provides hosted Redis cache service, but the implementation of each manufacturer is not completely consistent, the main implementation principle of each manufacturer is listed here for selection reference.

AWS

AWS provides Redis cluster hosting services. Users specify flavor machines (computing, storage, networking), and AWS helps customers deploy Redis clusters to servers. When creating instances, users can specify the number of nodes, number of replicas, slot number, and node allocation mode.

  • Computing, storage, and Network: You can specify flavor.
  • LB: No.
  • Proxy: No.
  • Multiple DB: not supported.
  • Number of copies: Number of copies can be specified.
  • Read/write separation: not supported.
  • Expansion capacity: Online expansion capacity.
  • Cross-cluster replication: not supported.
  • Performance specifications:
  • Usage restriction: Usage restriction
  • Redis version compatibility: optional, range: 3.2.4, 3.2.6, 3.2.10, 4.0.10, 5.0.0, 5.0.3, 5.0.4

Ali cloud

Ali Cloud provides Proxy mode cluster, Proxy research.

  • Computing, Storage, and Network: Bound to the Redis specifications, flavor cannot be specified.
  • LB: Using SLB, QPS peak is 2 million.
  • Proxy: The number of proxies has a certain ratio relationship with cluster specifications. You can customize the number of proxies to meet CPU consumption scenarios.
  • Multiple DB: The cluster supports multiple DB.
  • Number of copies: single copy, double copy
  • Read/write separation: Supported. Slave Data synchronization is delayed.
  • Expansion capacity: Online expansion capacity.
  • Cross-cluster replication: Supported. Provides global live features.
  • Performance specifications: Performance specifications
  • Usage restriction: Usage restriction
  • Redis version compatible with: 2.8, 4.0

Tencent cloud

Tencent Liyun provides Proxy mode cluster, Proxy research. At the same time, Tencent cloud provides two kinds of Redis engine: open source Redis, developed CKV.

  • Computing, Storage, and Network: Bound to the Redis specifications, flavor cannot be specified.
  • LB: 100,000 QPS for a single node. The upper limit of QPS is unknown.
  • Proxy: The number cannot be specified.
  • Multiple DB: The cluster does not support multiple DB.
  • Number of copies: optional: 1,2,3,4,5
  • Read/write separation: not supported.
  • Expansion capacity: Online expansion capacity.
  • Cross-cluster replication: not supported.
  • Performance Specifications: Performance specifications)
  • Restrictions on use: Restrictions on use)
  • Redis version compatible: Single-machine/master/slave 2.8, Cluster 4.0

Huawei cloud

Huawei cloud provides two types of Proxy cluster: Codis and Redis native cluster. The native cluster does not have LB and Proxy.

  • Computing, Storage, and Network: Bound to the Redis specifications, flavor cannot be specified.
  • LB: 1 million QPS.
  • Proxy: The number cannot be specified.
  • Multiple DB: The cluster does not support multiple DB.
  • Number of copies: 2
  • Read/write separation: not supported.
  • Capacity expansion: Online capacity expansion.
  • Cross-cluster replication: not supported.
  • Performance Specifications: Performance specifications)
  • Restrictions on use: Restrictions on use)
  • Redis version compatible with: 2.8, 3.x, 4.0, 5.0

More cloud best practices best practices