This paper is a summary of Redis cluster learning practice (based on Redis 6.0+), detailed introduction to the process of building a Redis cluster environment step by step, and complete the practice of cluster scaling.

Redis Cluster Redis Cluster is a distributed database solution provided by Redis. It uses sharding to share data and provides replication and failover functions. Compared with master-slave replication and sentry mode, Redis cluster realizes a more perfect high availability scheme, and solves the problems that storage capacity is limited by single machine and write operations cannot be load balanced.

This paper is a summary of Redis cluster learning practice, detailed introduction to the process of building a Redis cluster environment step by step, and complete the practice of cluster scaling.

Set up Redis cluster environment

For convenience, all nodes in the cluster environment are on the same server. There are six nodes in total, separated by port numbers. There are three primary nodes and three secondary nodes. The simple architecture of a cluster is shown below:

This article is based on the latest Redis 6.0+, directly from Github download the latest source code compiled to obtain common tools Redis -server, Redis – CLI. It is worth noting that from Redis 5.0 and later, the cluster management software Redis-trib.rb is integrated into the Redis-CLI client tool.

This section introduces how to set up a cluster environment by following standard steps step by step, rather than using redis-Trib.rb for quick management. This is also to familiarize yourself with the basic steps of cluster management. Redis-trib.rb is used to do cluster re-sharding in the cluster Scaling practices section.

Cluster construction can be divided into four steps:

Start a node: Start the node as a cluster. Node handshake: Connects independent nodes to a network. Slot assignment: 16,384 slots are allocated to the master node to achieve the effect of storing database key-value pairs in fragments. Primary/secondary replication: Specifies the primary node for the secondary node.Copy the code

Start node

Each node still starts as a Master server, except that it starts in Cluster mode. You need to modify the configuration file of the node whose port number is 6379 as an example. Modify the configuration file as follows:

# redis_6379_cluster.conf
port 6379
cluster-enabled yes
cluster-config-file "node-6379.conf"
logfile "redis-server-6379.log"
dbfilename "dump-6379.rdb"
daemonize yes
Copy the code

The cluster-config-file parameter specifies the location of the cluster configuration file. Each node maintains a cluster configuration file during running. When the cluster information changes (for example, adding or deleting nodes), all nodes in the cluster update the latest information to the configuration file. After the node is restarted, it reads the configuration file again to obtain cluster information and add the node to the cluster again. That is, when a Redis node starts in cluster mode, it first looks for a cluster configuration file, if it does, it starts with the configuration in the file, and if it does not, it initializes the configuration and saves it to the file. Cluster configuration files are maintained by Redis nodes and do not need to be manually modified.

After modifying configuration files for the six nodes, you can use the redis_xxxx_cluster.conf tool to start the six servers (XXXX indicates the port number, corresponding to the configuration file). Use the ps command to view the process:

$ ps -aux | grep redis ... 800 0.1 0.0 49584 2444? Ssl 20:42 0:00 redis-server 127.0.0.1:6379 [cluster]... 805 0.1 0.0 49584 2440? Ssl 20:42 0:00 redis-server 127.0.0.1:6380 [cluster]... 812 0.3 0.0 49584 2436? Ssl 20:42 0:00 redis-server 127.0.0.1:6381 [cluster]... 817 0.1 0.0 49584 2432? Ssl 20:43 0:00 redis-server 127.0.0.1:6479 [cluster]... 822 0.0 0.0 49584 2380? Ssl 20:43 0:00 redis-server 127.0.0.1:6480 [cluster]... 827 0.5 0.0 49584 2380? Ssl 20:43 0:00 redis-server 127.0.0.1:6481 [cluster]Copy the code

Node to shake hands

After each node above is started, the NODES are independent of each other, and they are in a CLUSTER containing only themselves. Take the server port number 6379 as an example, use CLUSTER NODES to check the NODES contained in the current CLUSTER.

37784 b3605ad216fa93e976979c43def42bf763d 127.0.0.1:6379 > CLUSTER NODES: 6379 @ 16379 myself, master 449-0 0 0 connected 4576 5798 7568 8455 12706Copy the code

We need to connect the independent nodes to form a CLUSTER containing multiple nodes using the CLUSTER MEET command.

127.0.0.1:6379> Cluster MEET 127.0.0.1 6380 OK 127.0.0.1:6379> $redis-cli -p 6379 -c # -c CLUSTER MEET 127.0.0.1 6381 OK 127.0.0.1:6379> CLUSTER MEET 127.0.0.1 6480 OK 127.0.0.1:6379> CLUSTER MEET 127.0.0.1 6381 OK 127.0.0.1:6379> CLUSTER MEET 127.0.0.1 6382 OKCopy the code

Look again at the nodes contained in the cluster:

127.0.0.1:6379 > CLUSTER NODES c47598b25205cc88abe2e5094d5bfd9ea202335f 127.0.0.1:6380 @ 16380 master - 0, 1603632309283 Connected 87 b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52 127.0.0.1:6379 @ 16379 myself, master 1603632308000-0 1 connected 51081 a64ddb3ccf5432c435a8cf20d45ab795dd8 127.0.0.1:6381 @ 16381 master - 0 1603632310292 2 connected 9 d587b75bdaed26ca582036ed706df8b2282b0aa 127.0.0.1:6481 @ 16481 master - 0 1603632309000 5 connected 4 c23b25bd4bcef7f4b77d8287e330ae72e738883 127.0.0.1:6479 @ 16479 master - 0 1603632308000 3 connected 32 ed645a9c9d13ca68dba5a147937fb1d05922ee 127.0.0.1:6480 @ 16480 master - 0, 1603632311302 connectedCopy the code

Six NODES are added to the CLUSTER as active NODES. The meanings of the result returned by CLUSTER NODES are as follows:

<id> <ip:port@cport> <flags> <master> <ping-sent> <pong-recv> <config-epoch> <link-state> <slot> <slot> ... <slot>
Copy the code

Node ID: consists of 40 hexadecimal strings. The node ID is created only once during cluster initialization and saved to the cluster configuration file (cluster-config-file). It can be read directly from the cluster configuration file when the node is restarted. port@cport: the former is a common port used to provide services for clients. The latter is a cluster port and is allocated as a common port +10000. It is only used for communication between nodes. For details on other items, refer to the official documentation Cluster Nodes.

Slots assigned

The Redis cluster stores the key-value pairs of the database by sharding. The entire database is divided into 16384 slots, and each key in the database belongs to one of the 16384 slots. Each node in the cluster can handle 0 or up to 16384 slots.

Slots are the basic unit of data management and migration. When all 16384 slots in the database are allocated nodes, the cluster is online (OK). If no node is allocated to any slot, the cluster is in the offline state (FAIL).

Note that only the master node has the capacity to process slots, and if you put the slot assignment step after the master/slave replication and assign slots to slave nodes, the cluster will not function properly (go offline).

Using CLUSTER ADDSLOTS

redis-cli -p 6379 cluster addslots {0.. 5000} redis-cli -p 6380 cluster addslots {5001.. 10000} redis-cli -p 6381 cluster addslots {10001.. 16383}Copy the code

Nodes in the cluster after slots are assigned are as follows:

127.0.0.1:6379 > CLUSTER NODES c47598b25205cc88abe2e5094d5bfd9ea202335f 127.0.0.1:6380 @ 16380 master - 0, 1603632880310 Connected, 5001-10000, 87 b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52 127.0.0.1:6379 @ 16379 myself, master 1603632879000 1-0 Connected a scale of 0-5000 to 51081 a64ddb3ccf5432c435a8cf20d45ab795dd8 127.0.0.1:6381 @ 16381 master - 0 1603632879000 2 connected 10001-16383-9 d587b75bdaed26ca582036ed706df8b2282b0aa 127.0.0.1:6481 @ 16481 master - 0 1603632878000 5 connected 4 c23b25bd4bcef7f4b77d8287e330ae72e738883 127.0.0.1:6479 @ 16479 master - 0 1603632880000 3 connected 32 ed645a9c9d13ca68dba5a147937fb1d05922ee 127.0.0.1:6480 @ 16480 master - 0, 1603632881317 connected 127.0.0.1:6379 > CLUSTER INFO Cluster_state: OK # CLUSTER is online cluster_SLOts_assigned :16384 Cluster_SLOts_OK :16384 Cluster_SLOts_pFAIL :0 cluster_slots_fail:0 cluster_known_nodes:6 cluster_size:3 cluster_current_epoch:5 cluster_my_epoch:1 cluster_stats_messages_ping_sent:4763 cluster_stats_messages_pong_sent:4939 cluster_stats_messages_meet_sent:5 cluster_stats_messages_sent:9707 cluster_stats_messages_ping_received:4939 cluster_stats_messages_pong_received:4768 cluster_stats_messages_received:9707Copy the code

Primary/secondary replication After the preceding steps, all cluster nodes exist as primary nodes, but the high availability of Redis still cannot be realized. Only after the primary/secondary replication is configured, the high availability of the cluster is truly realized.

CLUSTER REPLICATE <node_id> uses CLUSTER REPLICATE <node_id> to make the node that receives the command in the CLUSTER become the secondary node of the node specified by node_id and start replication on the primary node.

redis-cli -p 6479 cluster replicate 87b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52 redis-cli -p 6480 cluster replicate c47598b25205cc88abe2e5094d5bfd9ea202335f redis-cli -p 6481 cluster replicate 51081a64ddb3ccf5432c435a8cf20d45ab795dd8 127.0.0.1:6379 > CLUSTER NODES c47598b25205cc88abe2e5094d5bfd9ea202335f 127.0.0.1:6380 @ 16380 master - 0, 1603633105211 Connected, 5001-10000, 87 b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52 127.0.0.1:6379 @ 16379 myself, master 1603633105000 1-0 Connected a scale of 0-5000 to 51081 a64ddb3ccf5432c435a8cf20d45ab795dd8 127.0.0.1:6381 @ 16381 master - 0 1603633105000 2 connected 10001-16383-9 d587b75bdaed26ca582036ed706df8b2282b0aa 127.0.0.1:6481 @ 16481 51081 a64ddb3ccf5432c435a8cf20d45ab795dd8 slave 0 1603633107229 five connected four c23b25bd4bcef7f4b77d8287e330ae72e738883 127.0.0.1:6479 @ 16479 slave 87b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52 0 1603633106221 3 connected 32ed645a9c9d13ca68dba5a147937fb1d05922ee 127.0.0.1:6480 @ 16480 slave c47598b25205cc88abe2e5094d5bfd9ea202335f 0 1603633104000 4 connectedCopy the code

By the way, the above steps 1.2, 1.3, 1.4 can be implemented with the redis-trib.rb tool as a whole. After Redis 5.0, the redis-cli tool can be implemented directly.

Redis -cli --cluster create 127.0.0.1:6379 127.0.0.1:6479 127.0.0.1:6380 127.0.0.1:6480 127.0.0.1:6381 127.0.0.1:6481 --cluster-replicas 1Copy the code

–cluster-replicas 1 indicates that a given list of created nodes consists of a master + slave pair.

Running commands in the cluster The cluster is online. You can send commands to nodes in the cluster through the client. The node receiving the command calculates which slot the key to process belongs to and checks whether it has been assigned to it.

This command is executed if the slot in which the key resides happens to be assigned to the current node. Otherwise, the node returns a MOVED error to the client, directing the client to redirect to the correct node and sending the previous command again.

Here, we use the CLUSTER KEYSLOT to check that the key name is in slot 5798 (assigned to node 6380). When operating on this key, it will be redirected to the corresponding node. The key fruits work similarly.

127.0.0.1:6379> CLUSTER KEYSLOT name (INTEGER) 5798 127.0.0.1:6379> Set name Huey -> Redirected to slot [5798] located At 127.0.0.1:6380 OK 127.0.0.1:6380> 127.0.0.1:6379> Get fruits -> Redirected to slot [14943] located at 127.0.0.1:6381 "Apple" 127.0.0.1:6381 >Copy the code

It is worth noting that when we send a command to a slave node through the client, the command is redirected to the corresponding master node.

127.0.0.1:6480> KEYS *
1) "name"
127.0.0.1:6480> get name -> Redirected to slot [5798] located at 127.0.0.1:6380
"huey"
Copy the code

Cluster failover When a primary node in a cluster goes offline, all secondary nodes that replicate the primary node select a new primary node to complete failover. Similar to the master-slave replication configuration, when the original slave node comes online again, it exists in the cluster as the slave node of the new master node.

Following the simulation of node 6379 down (shut it down), you can observe that its slave node 6479 will continue to work as the new primary node.

462: S Oct 26 14:08:12. 750 * FAIL message received from c47598b25205cc88abe2e5094d5bfd9ea202335f about 87 b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52:462 S Oct 26 14:08:12. 751 # Cluster state changed: Fail 462:S 26 Oct 14:08:12.829 # Start of election delayed for 595 milliseconds (rank #0, 462:S 26 Oct 14:08:13.434 # Starting a failover election for epoch 6.462 :S 26 Oct 14:08:13.446 # failover  election won: I'm the new master. 462:S 26 Oct 14:08:13.447 # configEpoch set to 6 after a successful failover 462:M 26 Oct 14:08:13.447  # Setting secondary replication ID to d357886e00341b57bf17e46b6d9f8cf53b7fad21, valid up to offset: 9161. A New replication ID is adbf41b16075ea22b17f145186c53c4499864d5b 462: M 26 Oct 14:08:13. 447 * Discarding previously Cached Master state.462 :M 26 Oct 14:08:13.448 # Cluster state changed: OKCopy the code

After node 6379 recovers from downtime, it will serve as the secondary node of node 6380.

127.0.0.1:6379 > CLUSTER NODES 51081 a64ddb3ccf5432c435a8cf20d45ab795dd8 127.0.0.1:6381 @ 16381 master - 0 1603692968000 2 Connected to 10001-16383 c47598b25205cc88abe2e5094d5bfd9ea202335f 127.0.0.1:6380 @ 16380 master - 0, 1603692968504 connected 5001-10000-4 c23b25bd4bcef7f4b77d8287e330ae72e738883 127.0.0.1:6479 @ 16479 master - 0 6 connected a scale of 0-5000 to 1603692967495 87 b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52 127.0.0.1:6379 @ 16379 myself, slave 4 c23b25bd4bcef7f4b77d8287e330ae72e738883 0 1603692964000 1 connected 9 d587b75bdaed26ca582036ed706df8b2282b0aa 127.0.0.1:6481 @ 16481 slave 51081a64ddb3ccf5432c435a8cf20d45ab795dd8 0 1603692967000 4 connected 32ed645a9c9d13ca68dba5a147937fb1d05922ee 127.0.0.1:6480 @ 16480 slave c47598b25205cc88abe2e5094d5bfd9ea202335f 0 1603692967000 5 connectedCopy the code

Cluster-config-file records the status of cluster nodes. Open the nodes-6379.conf configuration file of node 6379, and you can see that the information of cluster Nodes is saved in the configuration file:

51081 a64ddb3ccf5432c435a8cf20d45ab795dd8 127.0.0.1:6381 @ 16381 master - 0 1603694920206 2 connected. 10001-16383 C47598b25205cc88abe2e5094d5bfd9ea202335f 127.0.0.1:6380 @ 16380 master - 0, 1603694916000 connected. 5001-10000 4 c23b25bd4bcef7f4b77d8287e330ae72e738883 127.0.0.1:6479 @ 16479 master - 0 6 connected a scale of 0-5000 to 1603694920000 87 b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52 127.0.0.1:6379 @ 16379 myself, slave 4 c23b25bd4bcef7f4b77d8287e330ae72e738883 0 1603694918000 1 connected 9 d587b75bdaed26ca582036ed706df8b2282b0aa 127.0.0.1:6481 @ 16481 slave 51081a64ddb3ccf5432c435a8cf20d45ab795dd8 0 1603694919000 4 connected 32ed645a9c9d13ca68dba5a147937fb1d05922ee 127.0.0.1:6480 @ 16480 slave c47598b25205cc88abe2e5094d5bfd9ea202335f 0 1603694919200 5 connected vars currentEpoch 6 lastVoteEpoch 0Copy the code

Cluster scaling The key to scaling a cluster is to re-fragment the cluster and migrate slots between nodes. This section uses adding or deleting a node from a cluster as an example to describe slot migration.

Use the redis-trib.rb tool integrated in redis-CLI to manage slots. The help menu of the tool is as follows:

$ redis-cli --cluster help Cluster Manager Commands: create host1:port1 ... hostN:portN --cluster-replicas <arg> check host:port --cluster-search-multiple-owners info host:port fix host:port --cluster-search-multiple-owners --cluster-fix-with-unreachable-masters reshard host:port --cluster-from <arg> --cluster-to <arg> --cluster-slots <arg> --cluster-yes --cluster-timeout <arg> --cluster-pipeline <arg> --cluster-replace rebalance host:port --cluster-weight <node1=w1... nodeN=wN> --cluster-use-empty-masters --cluster-timeout <arg> --cluster-simulate --cluster-pipeline <arg> --cluster-threshold <arg> --cluster-replace add-node new_host:new_port existing_host:existing_port --cluster-slave --cluster-master-id <arg> del-node host:port node_id call host:port command arg arg .. arg set-timeout host:port milliseconds import host:port --cluster-from <arg> --cluster-copy --cluster-replace backup host:port backup_directory help For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.Copy the code

Cluster scaling – Adding a node Add two nodes (ports 6382 and 6482) to the cluster. Node 6482 replicates node 6382.

(1) Start nodes: Repeat 1.1 to start nodes 6382 and 6482. (2) Node handshake: Run the redis-cli –cluster add-node command to add nodes 6382 and 6482 respectively.

Redis -cli --cluster add-node 127.0.0.1:6382 127.0.0.1:6379 redis-cli --cluster add-node 127.0.0.1:6482 127.0.0.1:6379 $ Redis-cli --cluster add-node 127.0.0.1:6382 127.0.0.1:6379 >>> Adding node 127.0.0.1:6382 to cluster 127.0.0.1:6379 >>> Performing Cluster Check (using the node 127.0.0.1:6379) S: 87 b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52 127.0.0.1:6379 slots: (0 slots) slave replicates 4c23b25bd4bcef7f4b77d8287e330ae72e738883 M: 51081 a64ddb3ccf5432c435a8cf20d45ab795dd8 127.0.0.1:6381 slots: [10001-16383] (6383 slots) master 1 additional up (s) M: C47598b25205cc88abe2e5094d5bfd9ea202335f 127.0.0.1:6380 slots: [5001-10000] (5000 slots) master 1 additional up (s) M: 4 c23b25bd4bcef7f4b77d8287e330ae72e738883 127.0.0.1:6479 slots: [0-5000] (5001 slots) master 1 additional up (s) s: 9 d587b75bdaed26ca582036ed706df8b2282b0aa 127.0.0.1:6481 slots: (0 slots) slave replicates 51081a64ddb3ccf5432c435a8cf20d45ab795dd8 S: 32 ed645a9c9d13ca68dba5a147937fb1d05922ee 127.0.0.1:6480 slots: (0 slots) slave replicates c47598b25205cc88abe2e5094d5bfd9ea202335f [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... >>> Send CLUSTER MEET to node 127.0.0.1:6382 to make it join the CLUSTER. [OK] New node added correctly.Copy the code

Number of slots moved: Each primary node has an average of 4096 slots. Therefore, a total of 4096 slots are moved. Target node ID of the receiving slot: ID of node 6382; source node ID of the removed slot: ID of node 6379/6380/6381. Redis-cli –cluster reshard command is used to refragment the cluster to balance node slots (migrate some slots from node 6379/6380/6381 to node 6382). Need to specify:

$redis-cli --cluster reshard 127.0.0.1 6479 >>> Performing cluster Check (using node 127.0.0.1:6479) M: 4 c23b25bd4bcef7f4b77d8287e330ae72e738883 127.0.0.1:6479 slots: [0-5000] (5001 slots) master 1 additional up (s) s: 32 ed645a9c9d13ca68dba5a147937fb1d05922ee 127.0.0.1:6480 slots: (0 slots) slave replicates c47598b25205cc88abe2e5094d5bfd9ea202335f M: 706 f399b248ed3a080cf1d4e43047a79331b714f 127.0.0.1:6482 slots: (0 slots) master M: Af81109fc29f69f9184ce9512c46df476fe693a3 127.0.0.1:6382 slots: (0 slots) master M: 51081 a64ddb3ccf5432c435a8cf20d45ab795dd8 127.0.0.1:6381 slots: [10001-16383] (6383 slots) master 1 additional up (s) S: 9 d587b75bdaed26ca582036ed706df8b2282b0aa 127.0.0.1:6481 slots: (0 slots) slave replicates 51081a64ddb3ccf5432c435a8cf20d45ab795dd8 S: 87 b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52 127.0.0.1:6379 slots: (0 slots) slave replicates 4c23b25bd4bcef7f4b77d8287e330ae72e738883 M: C47598b25205cc88abe2e5094d5bfd9ea202335f 127.0.0.1:6380 slots: [5001-10000] (5000 slots) master 1 additional up (s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. How many slots do you want to move (from 1 to 16384)? 4096 What is the receiving node ID?Copy the code

(4) Set the master/slave relationship:

Redis - cli - p 6482 cluster replicate af81109fc29f69f9184ce9512c46df476fe693a3 127.0.0.1:6482 > cluster NODES 32 ed645a9c9d13ca68dba5a147937fb1d05922ee 127.0.0.1:6480 @ 16480 slave c47598b25205cc88abe2e5094d5bfd9ea202335f 0 1603694930000 0 connected 51081 a64ddb3ccf5432c435a8cf20d45ab795dd8 127.0.0.1:6381 @ 16381 master - 0 1603694931000 2 Connected to 11597-16383-9 d587b75bdaed26ca582036ed706df8b2282b0aa 127.0.0.1:6481 @ 16481 slave 51081a64ddb3ccf5432c435a8cf20d45ab795dd8 0 1603694932000 2 connected 706f399b248ed3a080cf1d4e43047a79331b714f 127.0.0.1:6482 @ 16482 myself, slave af81109fc29f69f9184ce9512c46df476fe693a3 0 1603694932000 8 connected 87 b7dfacde34b3cf57d5f46ab44fd6fffb2e4f52 127.0.0.1:6379 @ 16379 slave 4 c23b25bd4bcef7f4b77d8287e330ae72e738883 0 1603694932000 6 connected c47598b25205cc88abe2e5094d5bfd9ea202335f 127.0.0.1:6380 @ 16380 master - 0, 1603694933678 Connected to 6251-10000-4 c23b25bd4bcef7f4b77d8287e330ae72e738883 127.0.0.1:6479 @ 16479 master - 0 1603694932669 6 connected 1250-5000 af81109fc29f69f9184ce9512c46df476fe693a3 127.0.0.1:6382 @ 16382 master - 0 9 connected a scale of 0-1249 to 1603694933000 5001-6250, 10001-11596Copy the code