Redis Cluster

These reviews

It is recommended that students who have not read the previous article read the previous article first:

“The Old Driver will teach you how to interview (1) : Redis basics and Data Persistence”

Redis Expiration Strategy and Cache Avalanche, Breakdown, And Penetration

“Redis High Availability master-Slave Mode”

“Sentry Mode with Redis High Availability”

Introduction to the

The high availability solution of Redis, master-slave mode or sentinel mode, is only solving the problem of high availability. For example, master-slave mode solves the problem of read high availability, and sentinel mode solves the problem of write high availability.

If the amount of data we need to cache is relatively small, and a few gigabytes are enough, then the high availability mode of the two schemes can fully meet the requirements, one master against multiple salves, and the throughput of several salves is related to the read. And then a Sentinel cluster to ensure high availability of Redis master-slave mode.

But if we have a large amount of data to cache, the storage capacity of a single machine can not meet, what to do?

This is where distributed caching comes in, and until a few years ago, distributed caching would have required middleware, such as coDIS, or TwemProxy, which was all the rage. In use, we read and write to Redis middleware, which is responsible for storing our data in distributed Redis instances on multiple machines.

And Redis is also in continuous development, finally in 3.0 version (in fact, for now is also relatively early version, the current version of 6.0), native support cluster mode.

You can deploy multiple instances of Redis on multiple machines, each instance stores part of the data, and each master instance of Redis can mount the slave instance of Redis. When the master instance of Redis is hanged, it will automatically switch to the slave instance of Redis.

In addition to the realization of distributed storage, but also incidentally to achieve high availability.

Distributed addressing algorithm

The data of Redis Cluster is distributed storage, which will inevitably lead to a problem. If I have three nodes, how can I know which node of the three nodes a key exists in, and how can I accurately find the corresponding node to extract the data when taking the number? This is where distributed addressing algorithms come in.

There are two common distributed addressing algorithms:

The hash algorithm
Consistent hash algorithm

The hash algorithm

The hash algorithm hashes the key and modulos the number of nodes.

The key is then stored in the corresponding node, but if one of the nodes goes down, all the requests coming in will be modelled based on the latest number of surviving nodes, which will result in most requests not getting cached data.

For example, I have three nodes at the beginning, then all the modulus of normal operation the data is even the existence of the three basic nodes, suddenly down a, now only two effective node, then all the operation will be based on the latest 2 nodes modulus calculation, the original key modulus on 2 after operation, It’s obviously not the same thing as modulo 3.

As a result, a large number of data requests are made directly to the DB, which is unacceptable.

Consistent hash algorithm

The consistent hash algorithm organizes the entire hash space into a virtual circle. The entire space is organized clockwise. The next step is to hash each master node (using the IP address or host name of the server). This determines the position of each node on its hash ring.

Take a look at the loop diagram of a classic consistent hash algorithm:

When a key is given, the hash value is computed and its position on the ring is determined. From there, the first master node encountered is the key’s position.

In the consistent hash algorithm, if a node fails, the data affected is only the data between this node and the previous node in the ring space (the first node encountered by walking counterclockwise), nothing else is affected. The same goes for adding a node.

When there are too few nodes in the consistent hash algorithm, it is easy to cause the problem of cache hot spots because of the uneven distribution of nodes. In order to solve this hot spot problem, the consistent hash algorithm introduces the virtual node mechanism, which computes multiple hashes for each node and places a virtual node in each computed result position. In this way, the data is evenly distributed and the load is balanced.

Hash Slot algorithm of Redis Cluster

The implementation of Redis Cluster is very clever, its partition method uses virtual slot partition.

The Redis Cluster first presets virtual slots, each of which corresponds to a number with a range, and each slot maps a subset of data.

The default virtual slots in Redis Cluster range from 0 to 16383

The Redis Cluster allocates 16,384 slots evenly based on the number of nodes and manages them.
When a key is entered, the hash operation is performed on the key according to CRC16 rules.
Mod the hash result to 16383.
The remainder is sent to the Redis node.
After receiving data, the node verifies whether the slot number is in the range of the slot number it manages:
1. If the slot number is within the range of the slot number, the data is saved to the slot and the execution result is returned.
2. If the slot number is not within the range of the slot number, the system sends data to the correct node, and the correct node stores data in the corresponding slot.

Note: Messages are shared between nodes in the Redis Cluster, and each node knows which node is responsible for which slot

In the virtual slot distribution mode, each node manages some data slots and stores data in the data slots. When nodes are expanded or scaled down, data slots can be relocated to prevent data loss.

Internal communication mechanism of nodes

In the distributed storage mode, there is another point that we need to pay attention to, that is, the internal communication between clusters. After all, the whole cluster needs to know how many valid nodes in the current cluster, the IP allocated by these nodes and some other data, which we can call metadata.

Cluster metadata can be maintained in two modes: centralized mode and Gossip protocol. Redis Cluster nodes communicate with each other using the Gossip protocol.

Centralized is the storage of several types of cluster metadata (node information, faults, and so on) on a node. A good example of centralized metadata storage is storm in the big data space. It is a distributed real-time computing engine for big data. It is a centralized metadata storage structure. The bottom layer is based on ZooKeeper (distributed coordination middleware) to store and maintain all metadata.

Redis maintains cluster metadata in another way, the Gossip protocol, all nodes hold a copy of metadata, different nodes if metadata changes occur, constantly send metadata to other nodes, so that other nodes also make metadata changes.

Gossip protocols

The Gossip protocol is a very interesting protocol. The process is initiated by a seed node, which randomly selects several nearby nodes to spread the message. The receiving nodes also repeat the process until all nodes in the network have received the message.

This process may take some time, since there is no guarantee that all nodes will receive the message at some point, but theoretically all nodes will eventually receive the message, so it is a final consistency protocol.

Some features of the Gossip protocol:

Extensibility: The network can allow arbitrary addition and subtraction of nodes, and the state of the newly added node will eventually be the same as that of other nodes.
Fault tolerance: The Gossip protocol has natural distributed system fault tolerance, and the Gossip message will not be affected by the downtime or restart of any node in the network.
Decentralization: The Gossip protocol does not require any central node, all nodes can be peer, any node does not need to know the entire network, as long as the network is connected, any node can spread information to the entire network.
Consistency convergence: Messages in the Gossip protocol are transmitted at an exponential speed across the network. Therefore, inconsistent system states can be converged to the same one in a very short time. Messages travel at logN.
Message latency: Because nodes in the Gossip protocol send messages to only a few random nodes, messages are eventually propagated in multiple rounds to reach the entire network, so the use of the Gossip protocol causes inevitable message latency. It is not suitable for the scene requiring high real-time performance.
Message redundancy: According to the Gossip protocol, nodes periodically select neighboring nodes to send messages, and the receiving nodes also repeat this step. Therefore, messages are inevitably sent repeatedly to the same node, causing redundancy of messages and increasing the processing pressure of the receiving nodes. Moreover, because the message is sent periodically, even the nodes that receive the message will receive repeated messages, which aggravates the redundancy of the message.

The Gossip protocol contains multiple messages, including ping, pong, meet, fail, and so on.

Meet: A node sends a meet to a new node to join the cluster, and then the new node starts communicating with other nodes.
Ping: Each node frequently sends ping messages to other nodes, including its own status and cluster metadata, to exchange metadata with each other.
Pong: Returns ping and meeet, containing its own status and other information, and is also used for information broadcasts and updates.
Fail: When a node determines that another node fails, it sends a fail message to other nodes to inform them that a node is down.

The number of nodes in the Redis Cluster should not exceed 1000. When the number of nodes in the Redis Cluster is too large, the number of nodes in the Redis Cluster should not exceed 1000. The number of nodes in the Redis Cluster should not exceed 1000. There is a significant bandwidth consumption.

High availability and active/standby switchover

When it comes to clustering, you have to improve availability and master/slave switchover, and the high availability principle of Redis Cluster is almost similar to sentry.

Judge downtime

The first step, of course, is to judge downtime, which is very similar to the Sentinel mode. It is also divided into two types, one is subjective downtime PFAIL, and the other is objective downtime FAIL.

Subjective outage: When one node thinks another node is down, this is a “bias” and does not represent the perception of the whole cluster.

Node 1 periodically sends ping messages to node 2.
If the message is sent successfully, it means that node 2 is running normally. Node 2 will respond to the PONG message to node 1, and node 1 will update the last communication time with node 2
If the ping message fails to be sent, the communication between node 1 and node 2 is abnormal. In the next scheduled task period, the ping message is still sent to node 2
If node 1 finds that the last communication between node 2 and node 1 is overduecluster-node-timeout, node 2 is in the PFAIL state.

Objective outage: when more than half of the master nodes with slots have marked a node as subjective offline.

If node 1 thinks node 2 is down, it pings node 2 to other nodes in the Gossip ping message.
When other nodes re-attempt the previous process of node 1 subjective downtime.
If more than half of the nodes agreepfailThen this node 2 will become truefail 。

Note: In cluster mode, only the master node has the read/write permission and slot maintenance permission, while the slave node has the replication permission.

Slave node election

Check eligibility first, check all salve disconnection time from master, if exceededcluster-node-timeout * cluster-slave-validity-factor, so long not qualified to be master.
Each slave node sets an election time based on its offset of data replicated from the master. The slave node with a larger offset (more data replicated) has a higher election time and is preferred for election.
If a majority of master nodes (N/2 + 1) vote for a slave node, then the vote passes and that slave node becomes master.
The active/standby switchover is performed on the secondary node.

reference

www.cnblogs.com/williamjie/…

Github.com/doocs/advan…

zhuanlan.zhihu.com/p/41228196