Times are moving forward, and stand-alone deployments are becoming less common in production environments, as is Redis. Redis cluster technology is becoming more and more mature, redIS cluster deployment has become a common technology in production. This article is mainly summarized from the “Redis in-depth adventure” article published in my CSDN blog Redis master slave copy, sentinel mode, today to nuggets over.


Master-slave synchronization

All clustering patterns are based on one thing: synchronization. Data synchronization between nodes is the foundation of clustering, and master-slave synchronization is the foundation of all of this.

The significance of RedIS master-slave synchronization: There are some enterprises that do not use Redis clustering (generally based on traffic), but at least do master/slave synchronization. When a Master node of Redis hangs, it takes a lot of time to recover data and restart the redis node. In order not to affect online functions, slave nodes are used to replace the master node to minimize the impact.


Master-slave synchronization: Redis data is asynchronously synchronized, so redIS in distributed mode cannot meet the consistency requirements. If there is a problem between the primary and secondary nodes, such as a network interruption, the primary node can continue to provide external services, but the secondary node cannot synchronize data with the primary node. At this point, data inconsistencies can occur and slave nodes will struggle to catch up with the master node when the network is restored.

During data synchronization, master slave and slave slave synchronization are supported.

Synchronization mechanism

The incremental synchronization

Incremental synchronization is an instruction stream in which the master node records instructions that have a modify-effect on its data state in its local buffer and then synchronizes the instructions from the buffer to the slave node. The slave node synchronizes to the same state as the master node while reporting to the master where it has synchronized (offset). The instruction stream is in memory and the read and write rate is very fast. The buffer is finite. It is a circular array that overwrites everything from the beginning when the array is full:

When the instruction in the master node of Redis is overwritten by the new instruction before it has time to synchronize, the slave node cannot synchronize through the instruction stream. At this time, a new synchronization mechanism ———— snapshot synchronization is needed.

The snapshot sync

During snapshot synchronization, the data in the memory of the primary node is flushed to the disk file, and the snapshot file is uploaded to the secondary node. After the file is received from the node, its memory is emptied and a full load is performed. After the load is complete, the master node is notified to perform incremental synchronization. If a new command is overwritten during the snapshot synchronization, incremental synchronization cannot be performed but needs to be performed again. In this case, a cycle may occur: After each snapshot synchronization, the memory buffer is full of instructions and you can only perform snapshot synchronization once more.

  • When a node is added to the cluster, snapshot synchronization is performed. After the synchronization is complete, variable increment synchronization is performed.
  • A reasonable buffer size must be configured to avoid this duplication cycle.

Hiccup: There was an issue in production where Redis became very slow, slower than directly looking up the database. This is clearly not normal and there must be a problem. After checking, it was found that the buffer memory setting was too small, and the snapshot synchronization was triggered when the master/slave synchronization was performed. After the snapshot synchronization, the buffer was filled with new instruction streams, and the snapshot synchronization was triggered again, resulting in the abnormal slow redis. Emergency solution: Increase buffer memory. Question reflection:

  • Did not consider the size of buffer from the actual business (actually, I did not know such a thing existed before I encountered this problem)
  • The cache data is incorrectly set. Some cache data is valid for too long, and some keys occupy too much memory
  • The system design must be carefully detailed to any small detail, a small problem can cause the system to crash

No disk replication

During snapshot synchronization on the primary node, many I/O operations are performed, which greatly affects the system load. In particular, when the system is synchronizing AOF, the AOF synchronization will be delayed if snapshot synchronization occurs, which seriously affects the service efficiency of the primary node. Therefore, diskless replication is supported from Redis-2.8.18: the master node sends the snapshot content to the slave node through the socket. The snapshot generation is a traversal process: the serialized content is sent to the slave node while traversing the memory. The slave node, as before, stores the content to disk and then loads it all at once.

Wait instruction

The WAIT command can change the asynchronous replication mode to synchronous replication to ensure system consistency. The cost of achieving strong consistency is the performance of the primary node.


Sentinel mode Sentinel

The master/slave mode ensures high data availability to a certain extent, but requires manual node switchover. When a master node fails, a slave node needs to be switched over. It’s 2020, and that model is definitely not desirable. So we need to let the system do this automatically: if the master node fails, the slave node will automatically elect a new master node, and the system can continue to run smoothly. The Redis Sentinel Sentinel model meets this need.

Redis Sentinel is an independent cluster that can monitor multiple Redis master-slave clusters at the same time. It can be regarded as a set of ZooKeeper clusters. It is the core of high availability cluster, usually composed of 3 to 5 nodes, even if a node is down, the system can run efficiently.

When a client connects to a cluster, it first connects to sentinel, and then gets the address of the master node through Sentinel, and then interacts with the master node. When the master node fails or goes down, the client will re-issue a request to Sentinel to obtain the new master node address. Sentinel then sends the newly elected address of the Master node to the client, which continues to interact with Redis through the new address.

When the master node fails, the client fails to communicate with the master node.

At this point, the Redis cluster has elected a new master node. If the interaction with the original master fails, the redis cluster may send a request again to obtain a new master address. After obtaining a new master address, the redis cluster will conduct data interaction with the new master node.

Message loss

Redis uses asynchronous replication for data synchronization, which means that there will be unexpected data loss. The master node may die before the slave node receives all data from the master node. The greater the latency of the master and slave nodes, the more data is lost. Redis Sentinel does not guarantee data loss, but can be configured to prevent data loss:

Min-rabes-to-write 1 min-rabes-max-lag 10 # min-rabes-to-write 1 # indicates that at least one slave node of the master node is performing normal replication, otherwise the external writing service will be stopped. Loss of availability # min-slaves-max-lag 10 # means that the master node has not received feedback from the slave node within 10s which means that the slave node is not synchronizing properlyCopy the code

When a sentinel master-slave switchover occurs, the master node may not be down (for example, a master-slave switchover is performed manually), or the master node may be down, but a new master node has been elected and the original master node has become slaver. In this case, how can the client obtain the correct master node? Through the source code, when the connection will be made, the main library address change judgment. When establishing a connection, the connection pool queries the master library address and compares it with the address in memory. If it changes, all connections are disconnected and new links are created using the new address. If the original master node does not hang, and the original master node connection is already in use, the above mechanism is not sufficient. ==ReadOnlyError== : When master is demoted to Slaver, ReadOnlyError is raised for all modifiable commands. In this way, data conflicts between master and slave nodes are avoided.