preface

This paper is a study note on the core technology and actual combat of Geek time Redis

Redis is a high-performance key-value database, which plays a very important role online. Although there are RDB and AOF mechanisms to ensure that Redis loses data as little as possible, Redis single point of failure still cannot be solved. Therefore, it is necessary to find other solutions to solve this problem. Current solutions include master-slave replication and Redis clustering

A master-slave replication

Since a node has a single point of failure, the most common approach is to deploy more nodes and save more copies of data, which will inevitably lead to a problem: data synchronization

Redis provides a master-slave library mode to ensure the consistency of copies, with read/write separation between the master and slave libraries

Read operation: master library, slave library can receive

Write operations: Only the master library can receive, and then the master library synchronizes the write operations to the slave library

How to synchronize

  1. Establish connection and negotiate synchronization between master and slave libraries
  2. The master library executes the bgsave command to generate an RDB file and send it to the slave library. The slave library clears the current database and loads the RDB file.
  3. Send the new write command received in the second procedure to the slave library again (the new data is recorded in the Replication Buffer during the master/slave full synchronization)
validation

Set up a local environment to verify the master-slave replication mode, mainly to verify four questions

  1. Whether the primary and secondary replication will synchronize full data
  2. Whether the relationship between the master and slave nodes is maste-slave
  3. Whether data can be written to the secondary node
  4. If the master node breaks down, does the slave node become the master node

As shown above

  1. I wanted to simulate a master-slave replication of three nodes and copied two Redis configuration files
  2. Modify the port, RDB persistent file name, and pidfile configuration of the two newly copied configuration files
  3. Start three Redis services

If 6379 serves as the master node, view the data on the master node

  1. After replicaof is executed, it is clear that the data in the 6380 node is synchronized
  2. After running info replication, you can see that 6380 is the slave of 6379

6381 did the same thing

Description Failed to write data to secondary node 6380

Description Data was successfully written to primary node 6379

The master node is down for a short time. After the master node recovers, it is still the master node

Master – slave – slave

As you can see from the previous section, in a full copy, there are two time-consuming operations for the master library: generating and transferring RDB files

If there are a large number of slave libraries, the master library will be busy forking RDB files. This will block the main process from processing normal requests, which will slow the master library’s response to application requests. In addition, transferring RDB files will consume the network bandwidth of the master library. It also puts pressure on the resource usage of the main library

Based on the above reasons, the master-slave mode is derived, where the slave library can interact with the slave library to reduce the pressure on the master library

The guard mode

In master-slave mode, when the master node goes down, the slave node does not switch to the master node, which causes a problem: data cannot be written, which is obviously unacceptable on the production line. Of course, we could execute Slaveof no one on the slave node to make the slave node become the master node, but manual switching would be less efficient in a production line where every minute counts, hence the sentinel mode

The basic flow
  1. Monitoring: sentinel process at run time, periodically send ping command to all the master-slave library, detect whether or not they are still running online, if there is no response within the time required from library sentinel ping command, the guard will mark it as offline, if there is no response within the allotted time main library sentinel ping command, the guard will determine the main library
  2. Master selection: When the master library dies, the sentry must select a slave instance from several slave libraries according to certain rules as the new master library
  3. Notification: The sentinel sends the connection information of the new master database to the other slave databases and asks them to execute replicaof to establish a connection with the new master database and replicate the data. At the same time, sentry notifies the clients of the connection information of the new master library and allows them to send the requested actions to the new master library
How to determine the main library offline

To explain a noun, subjective offline: The sentinel process uses the ping command to check its network connection to the master and slave libraries to determine the status of the instance. If the master and slave libraries do not respond within the specified time, the Sentinel will mark it as subjective offline first

In the case of high cluster network pressure, network congestion, and high pressure of the master database, a single sentinel may misjudge the master database. The sentinel mistakenly thinks it is offline because the master database is not actually offline

In order to reduce the misjudgment problem, the deployment mode consisting of multiple instances (sentry cluster) is usually adopted. By introducing multiple sentry instances to judge together, a single sentry can avoid the misjudgment that the master library is offline due to its own poor network condition

As shown in the figure below

How do I select a new master library

Follow the following priorities

  1. Filter out nodes with poor network connections
  2. Highest priority slave library: Use the slave-priority configuration item to set different priorities for different slave libraries
  3. The slave library that is most closely synchronized with the old master
  4. Slave library with small ID
validation

Set up the local environment, the main verification of two problems

  1. The master library hangs, whether to select a new master from the slave library
  2. After the new master library is selected, what state does the old master library return to

Configure the sentinel.conf configuration file

// Zmaster indicates the name of the server to be monitored. // 1 indicates how many sentinels must agree to migrate the primary node sentinel Monitor zMaster 127.0.0.1 6379 1Copy the code

  1. redis-sentinel /usr/local/etc/sentinel.confStart Sentinel mode
  2. It is also clear from the logs that the monitored object is node 6379 and the slave is node 6380 and 6381

Manually shutdwon master node 6379

  1. You can clearly see that the master node is switched to 6381
  2. After you run info replication on node 6381, the node becomes the master node

6379 The old master node recovered and became a slave node

The problem

What is split brain problem?

When a network problem occurs, a node cannot be connected to other nodes, and the following scenario will appear. At this time, the sentry mistakenly takes the master library offline and elects the master library again, which will lead to the situation of double master libraries. Different clients can write data to different master libraries

Why do split-brain problems cause data loss?

Master-slave after switching from the library upgraded to the new main library, the other from the library will perform replicaof command, and the new master database synchronization, if appear old main library under the network condition recovered well network, still need to be synchronized when you get back the new master database full amount of data, so the client during the split brain to write new data will be cleared, leading to loss of data

How to avoid the brain split problem?

We can take advantage of two configurations of Redis to avoid the split brain problem

  • Min-rabes-to-write: the minimum number of slave libraries for which the master database can perform data synchronization
  • Min-rabes-max-lag: indicates the maximum delay of sending AN ACK message from the slave database to the master database during data replication between the master and slave databases

Assume that min-rabes-to-write is SET to A and min-rabes-max-lag is set to B. These two configurations enable the master database to connect to at least A slave database and the data synchronization delay between the master database and the master database cannot exceed B seconds. Otherwise, the master database will not receive requests from the client

With these two configurations, when a split brain problem occurs, the master does not have enough slave libraries, or the other slave libraries cannot synchronize data, the client cannot write data to the master

Cluster pattern

The above scheme is sufficient in the case of small amount of data, but there are still two problems in the case of large amount of data

  1. The capacity of a single node is insufficient: Each node stores a full amount of data. For example, a node has only 10 GB of memory and 20 GB of data, which cannot meet the storage requirements
  2. Concurrent write: Only the master node can perform write operations. In a scenario where many write operations are performed, a single node is prone to bottlenecks

The Redis slicing cluster emerged to solve these problems

How to store data

Redis Cluster uses Slot to process the mapping between data and instances. Firstly, a 16-bit value is calculated according to the KEY of the key-value pair according to the CRC16 algorithm. Then, the 16-bit value is modulo 16384 to obtain the Slot in the range of 0 to 16383. A slice Cluster has 16,384 slots, which are similar to data partitions. Each key pair is mapped to a Slot according to its key. When we deploy Redis Cluster, Redis automatically distributes these slots evenly across the Cluster instance. For example, if there are N instances in the cluster, the number of slots on each instance is 16384/N. Of course, memory sizes vary with instances. You can manually allocate the number of slots for each instance based on the actual situation

How does the client locate the data
  1. The Slot in which the key-value pair data resides can be computed, and this calculation can be performed when the client sends the request
  2. After the client establishes a connection with the cluster instance, the instance sends Slot allocation information to the client
  3. Slot mappings are shared between Redis instances
  4. Redis instance changes, the client can’t change in the perception to the instance, can get the data through the redirection mechanism, such as: the client send an instance when a read operation, and no corresponding data on this instance, the instance will return a response as a result, the result included the new instance access address, the client to send a new request to the new instance

The relationship between Redis Cluste nodes is shown below

validation

To set up a local cluster environment, two questions are mainly verified

  1. Slot allocation in cluster mode
  2. How does the client store the data, and does redirection occur

First modify the following three configurations in the configuration file, and then delete the remaining RBD and AOF files

#Enabling cluster Mode
cluster-enabled yes
#Set the configuration file name of the node
cluster-config-file nodes-6382.conf
#Set the node disconnection time (ms). If the node does not respond within the specified time, the primary/secondary switchover is performed
cluster-node-timeout 15000
Copy the code

  1. After the Redis service is started, there is no cluster between redis services.Redis -cli --cluster create 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382You also need to use this command to set up a cluster
  2. In this experiment, only three master nodes are set up. If you want to configure slave nodes for the three master nodes, run this commandRedis -cli --cluster create 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 --replicas 1, so you need six instances of Redis
  3. You can view slot allocation through logs

When new data is added to node 6381, the instance displays a message indicating that the slot does not belong to node 6381 and a redirection address is displayed

Adding data to the redirection address succeeded

Redis -cli -c -h 127.0.0.1 -p 6381, the client will automatically redirect us

The problem

Why is Redis 16,384 slots

Github.com/redis/redis… This is the official author’s answer

www.cnblogs.com/rjzheng/p/1… The answer of this article is very good

The resources

Geek time Redis core technology and actual combat

zhuanlan.zhihu.com/p/308534431