How to ensure that Redis is high concurrency and high availability (mainly theoretical)

The redis expiration policy reviewed yesterday continues today

1. How does Redis handle read requests with QPS exceeding 10+ through read/write separation

There are four ways to think about it

1. First of all, we should know the relationship between the high concurrency of Redis and the high concurrency of the whole system

In order to do high concurrency, it is inevitable that the underlying cache will be OK. In fact, mysql can also do high concurrency, through a series of complex sub-library sub-table, order system, transaction requirements, QPS to tens of thousands, quite high. If do some e-commerce product details page, really super concurrency (although I have not experienced, but also want to see, be prepared) in fact, millions of seconds redis is not enough, but Redis is the whole large cache architecture to support high concurrency is an important link.Copy the code

2. Why can’t Redis support high concurrency?

Haha is because of the single ah, as the saying goes that two fists are difficult to defeat four handsCopy the code

3, if Redis supports more than 100,000 + concurrent, what should be done

Except for special cases, it is possible to do a master-slave architecture for read and write separation, assuming that the write operations are few and most are read operationsCopy the code

Redis replication master/slave replicationCopy the code

2. Redis Replication and Master replication

It says something about Redis Replication

1. Explain redis Replication

Borrow a picture, quite clear, the most basic is copy

2. The core mechanism of Redis Replication

(1) Redis uses asynchronous replication to replicate data to slave nodes, but starting from Redis 2.8, The slave node periodically checks the amount of data to be replicated. (2) A master node can be configured with multiple slave nodes. (3) A slave node can also be connected to other slave nodes. (5) The slave node does not block its own query operations when performing replication. It uses old data sets to provide services. However, when the replication is complete, the old data set needs to be deleted and the new data set needs to be loaded. In this case, the external services are suspended. (6) The slave node is mainly used for horizontal capacity expansion to perform read/write separationCopy the code

3. The significance of master persistence for the security of master/slave architecture

If you are using a master/slave architecture, it is recommended that persistence be enabled for the Master Node! It is not recommended to use slave nodes as hot standby data for master nodes, because then if you turn off master persistence, the data may be empty when the master is down and restarted, and then the data may be copied, Salve node data lost his master - > RDB and AOF were closed - > all in memory master goes down, restart, there is no local data can be restored, then you will think I IDE data directly is empty master will empty data set synchronous to the slave, 100% of the data is lost. The master node must use persistence. Second, the master's various backup schemes, whether to do it or not, in case all local files are lost. Select an RDB from the backup to restore the master; So that you can make sure that when the master starts up, it has dataCopy the code

But there are still problems, such as:

The slave node can automatically take over the master node even with the high availability mechanism described below, but the master node may restart automatically before Sentinal detects a master failure. Or it may cause all slave node data to be clearedCopy the code

3, Redis master slave replication principle, breakpoint continuation, diskless replication, expired key processing

1. The core principles of master-slave architecture

When starting a slave node, it sends a PSYNC command to the master node. If the slave node reconnects to the master node, the master node only copies the missing data to the slave node. If it's the first full resynchronization, when you start full resynchronization, the master starts a background thread and starts generating an RDB snapshot file, All write commands received from the client are also cached in memory. When the RDB file is finished, the slave receives the file, saves it to the local disk, and then loads it from the disk to memory. If a slave node disconnects from the master node due to a network failure, it will automatically reconnect. If the master finds that multiple slaves are trying to reconnect, it will give a copy of the generated RDB file to whoever connected.Copy the code

2. Resumable transmission for the master/slave replication

Since Redis 2.8, breakpoint continuation from master-slave replication is supported. The master node has a backlog. Both the master and slave have a replica offset. The sea has a master ID. Offset is stored in the backlog. If the connection between the master and slave breaks, slave will ask the master to continue copying the replica offset obtained last time. Of course, if you don't, it's back to square oneCopy the code

3. Diskless replication

Repl-diskless-sync-delay creates an RDB in memory and gives it to the slave. It will not save the RDB to the slaveCopy the code

4. Handle expired keys

Slave does not expire keys (read-only keys are not allowed to expire). If the master expires a key, or if the LRU discards a key, a del command is emulated and sent to the slave.Copy the code

4. Complete process operation and principle of Redis Replication

Let’s do a little bit more research in this video. How does this copy work

1. Complete process of replication

1. The slave node is started and only the master node information is saved, including the host and IP of the master node, but the replication process has not started yet. 2. There is a scheduled task inside the slave node to check if there is a new master node to be connected and copied every second. If there is a new master node to be connected and copied, The slave node sends the ping command to the master node. 4. Password authentication. The salve node must send the masterauth password for authentication. 5. The master node performs full replication for the first time and sends all data to the slave node. Asynchronously replicating data to slave nodeCopy the code

#####2. Core mechanisms related to data synchronization

This refers to the full replication performed the first time the slave connects to the MSater, and some of the details of your mechanism in that process

(1) Both the master and the slave maintain an offset. The master will continuously add offsets to itself, and the slave will continuously add offsets to itself. The slave will report its offset to the master every second. The master node has a backlog for data inconsistency. The master node has a backlog for data inconsistency. The master node has a backlog for data inconsistency. The default size is 1MB. When the master node copies data to the slave node, the data is synchronized from the backlog. The backlog is mainly used for incremental replication after full replication is interrupted. Master run ID, info server, master run ID Locate a Maser node based on host+ IP.Copy the code

Redis-cli debug reload command redis-cli debug reload command redis-cli debug reload command redis-cli debug reload command Psync runid offset The master node returns a response based on its own situation. The response can be FULLRESYNC runid offset to trigger full replication or CONTINUE to trigger incremental replicationCopy the code

3. Full copy

2. The master node sends the RDB snapshot file to the Salve node. If the RDB replication exceeds 60 seconds (Repl-timeout), the slave node considers the replication failed. If you have a gigabit network adapter, 100MB / 6GB files will be uploaded in 60 seconds. When the master node generates an RDB, it stores all new write commands in the cache. Client-output-buffer-limit slave 256MB 60MB 60, slave 256MB 60MB 60 If the memory buffer continues to consume more than 64MB during the replication, or if the memory buffer consumes more than 256MB at a time, the replication will stop and the replication will fail. If AOF is enabled on the slave node, the slave node will execute BGREWRITEAOF to override the full copy of AOF. RDB generation, RDB copying over the network, slave old data cleansing, slave Aof rewrite - time consuming, probably a minute and a half to 2 minutes if replication is between 4G-6GCopy the code

4. Incremental replication

2. The master retrieves some of the lost data from its own backlog and sends it to slave nodes. The default backlog is 1MB. 3. The master obtains data from the backlog based on the offset in the psync sent by the slaveCopy the code

5, the heartbeat is ready to replicate asynchronously.

Both the master and slave nodes send heartbeat information to each other. By default, the master sends heartbeat information every 10 seconds, and the slave node sends heartbeat information every secondCopy the code

6. Asynchronous replication

Each time the master receives a write command, it writes data to the slave node and asynchronously sends the data to the slave nodeCopy the code

5. How does redis master/slave architecture achieve 99% high availability?

B: well… High availability is when your service is available all year round.

How does Redis make it highly available? Let’s start with sentry

1. Basic theory of Sentinel architecture

1, Sentry introduction

Sentinal is a very important component of redis cluster architecture. Its main functions are as follows: 1. Monitoring the cluster, monitoring whether the master and slave are working properly; 2. If the master node hangs, a slave node is automatically selected to be replaced. 4. Configure the center to notify each client of the new master address if the failover occursCopy the code

Of course the sentry is the supervisor, what if the supervisor dies, so the sentry also needs the cluster

During failover, a master node is down, which requires the consent of most of the sentries, hence the distributed election. Sentinal2 is currently used in Redis, and is an upgrade to Sentinal1. The purpose of the upgrade is to make failover more robust and easyCopy the code

2. Core knowledge of sentinels

Sentry + master-slave architecture does not guarantee zero data loss, only high availability of Redis cluster. 3. It would be embarrassing if it crashedCopy the code

3. Why do sentinels need three or more?

Look at my defense If just deployed a sentry cluster two sentinels instance, quorum = 1 + - + + - + | | M1 -- -- -- -- -- -- -- -- - | R1 | | S1 | | | + S2 - + + - + Configuration: Quorum = 1 If the master fails, the switch can proceed as long as only one sentinel in S1 and S2 thinks the master is down. Meanwhile, a sentinel in S1 and S2 will be elected to perform the failover. Majority of 2 sentries is 2 (2: majority=2, 5: majority= 3,4: majority=2). If both sentries are running, then failover is allowed. If there is only one sentinel, there is no majority to allow failover. The other machine still has an R1, but the failover will not take placeCopy the code

4. Classic 3-node sentinel cluster

+----+ | M1 | | S1 | +----+ | +----+ | +----+ | R2 |----+----| R3 | | S2 | | S3 | +----+ +----+ Configuration: Quorum = 2, majority 1 if M1 fails, then S2 and S3 can agree that the master is down, and then elect one to perform the failover. You can allow failoverCopy the code

2. Data loss of sentry active/standby switchover

1. Two cases of data loss (asynchronous replication and split brain)

During the master/slave switchover, data may be lost. 1. In the asynchronous replication, the master sends data to the slave asynchronously. If you have a bad network, the master's machine is disconnected from the other slaves, but the master is still running. This was discovered by the sentry, who chose a new master, but he didn't tell the client that he had chosen the master, and the client didn't know it and thought that the old master could continue to write data to him. Because you have a master it's going to be mounted as a slave to the new master and it's going to be wiped out of its old data. Copy the data from the new masterCopy the code

2, found the problem, to solve the loss situation

1, to solve the brain split: If you look at these two configurations min-rabes-to-write 1 min-rabes-max-lag 10 requires that there be at least one slave and that the data replication and synchronization delay be no more than 10 seconds and if once the data replication and synchronization delay is more than 10 seconds for all the slaves, 2. Solve asynchronous replication With min-slaves max-lag configuration, it ensures that if the slave replication data and ACK delay is too long, It considers that too much data may be lost after the master outage, and rejects the write request. In this way, the data loss caused by the master outage due to some data not synchronized to the slave can be reduced within the controllable rangeCopy the code

Said how to use, by the way, learn the principle

1. Sdown and ODOWN transition mechanisms (two failed states)

Sdown is a master shutdown. If a sentry thinks the master is down, it is a subjective shutdown. Odown is an objective outage, where multiple sentinels think the master is down and sDOWN is achieved by saying, if one of the sentinels ping the master, After the number of milliseconds specified for IS-IS master-down-after-milliseconds, a sentry went from SDOWN to ODOWN. In the specified amount of time, other sentry receiving quorum thought the master was down, so it was downCopy the code

2. Automatic discovery mechanism of sentinel cluster

Sentinels find each other through Redis's PUB/SUB system, where each sentinel posts a message to the -- Sentinel_ : Hello channel, which the sentinels consume and sense their presence every two seconds. Every time the sentinels send a message to one of the Master + Alaves' sentinel_: Hello Channels they monitor, The message content is its own host, IP and RUNId as well as the monitoring configuration of the Master and each sentinel listens to _sentinel_:hell for every Master + Slaves it monitors Channel and then sense the presence of other sentinels who are also listening to this master+ Slaves. Each sentry also exchanges master monitoring configurations with other sentries to synchronize the monitoring configurations with each otherCopy the code

3. Automatic correction of slave configuration

The sentry is responsible for automatically correcting some configurations of the slave. If the slave becomes a candidate for the master, the sentry makes sure that the slave copies the data of the existing master. If the slave is connected to the wrong master, for example, after a failover, Then the sentry will make sure they're connected to the correct masterCopy the code

4. Slave – The master election algorithm

If a master considers oDown and majority of users allow the active/standby switchover, a sentinel will perform the active/standby switchover. When electing a slave, some information about the slave is taken into account, such as 1, duration of disconnection from the master, 2, priority of the slave, 3, offset copy, 4, and RUN ID The connection was over 10 times down-after-milliseconds, plus how long the master was down, [milliseconds_since_master_is_in_SDOWN_state] [milliseconds_since_master_is_in_SDOWN_state] (1) Sort the data according to the slave priority. The lower the slave priority is, the higher the priority is. (2) If the slave priority is the same, look at replica offset. The lower the offset is, the higher the priority is. (3) If the above two conditions are the same, select the slave whose RUN ID is smallerCopy the code

Quorum and majority

Each time a sentinel performs a master/standby switch, the quorum sentry must first consider the switch to be oDOWN and then elect a sentinel to do the switch. The sentinel must also be authorized by the majority sentry to perform the switch. If quorum is less than majority, such as five sentinels, the majority is three, If quorum is set to 2, three sentinels can be authorized to perform the switch. However, if quorum >= majority, all sentinels must be authorized, such as five sentinels. If quorum is 5, all five sentinels must be authorized to perform the switchCopy the code

6, the configuration of epoch

The sentry will monitor a set of Redis master+slave, and the sentry that performs the switch will get a configuration epoch from the new master(Salve -> Master) to which it will switch. If the first sentry fails to switch over the sentry elected by the first sentry, the other sentries wait for a fail-out and then continue switching. At this point, a new Configuration epoch will be obtained, which is the latest versionCopy the code

7. Configuration propagation

After the switch is complete, it will update the master configuration locally and then synchronize it to the other sentinels. The synchronization method is pub/sub message mechanism. At this time, the previous version number is important because all messages are published and monitored through a channel. So when a sentry makes a new switch, the new master configuration follows the new version number and the other sentries update their master configuration based on the size of the version numberCopy the code