Original link (with modification)

Can insist on others can not insist, to have others have not. Pay attention to programming avenue public number, let us stick to what we think in mind, grow up together!

“Interview shock” – Redis – Redis Sentry principle and persistence mechanism

In this series, I will sort out some interview questions to share with you, to help students who want to change jobs in Jinsan Yinsi to consolidate and surprise some interview questions often asked by interviewers, come on!!

Redis data type? Which scenarios are applicable?

The thread model of Redis Why is single thread efficiency so high?

[Interview shock] – Redis – Redis master copy? Sentinel mechanism?

The interviewer temporarily suspended the interview for the first two times due to time constraints. The interviewer thinks that you have a good grasp of the knowledge of redis’ master-slave replication and sentry mechanism. Therefore, today the interviewer wants to see how deep you know about Redis and increases the offensive, are you ready?

I have a few more sentry questions for you today after my last interview ended due to time constraints. First, how does Redis Sentinel work? Focus on the failover process. Okay.

  • 1) Each Sentinel sends a PING command once per second to the Master, Slave and other Sentinel instances it knows.

  • If an instance has taken longer than the last active PING response specified in the down-after-milliseconds option, it will be marked as subjective offline by Sentinel.

  • 3) If a Master is marked as subjectively offline, all sentinels that are monitoring the Master confirm that the Master is subjectively offline at a rate of once per second.

  • 4) The Master is marked as objectively offline when a sufficient number of sentinels (greater than or equal to the value specified in the configuration file) confirm that the Master has indeed gone subjectively offline within the specified time range.

  • 5) When the Master is marked as objective offline by Sentinel, the frequency of Sentinel sending the INFO command to all the slaves of the offline Master will be changed from once every 10 seconds to once every second (under normal circumstances, Each Sentinel sends INFO commands to all known masters and slaves every 10 seconds.

  • 6) If a sufficient number of sentinels agree that the Master is offline, the Master’s objective offline status becomes subjective offline. If the Master returns a valid reply to the PING command of Sentinel, the Master’s subjective offline status will be removed.

  • 7) The Sentinel node will “communicate” with other Sentinel nodes and vote for a sentinel node for troubleshooting. A primary node will be selected from the nodes, and the data of the new primary node will be automatically replicated by other secondary nodes mounted to the new primary node.

During failover, a new master is elected from the remaining slaves. What are the criteria for being elected master?

If a master is considered to be oDown and the majority of sentries allow a master/slave switchover, a majority sentry will perform the master/slave switchover. In this case, a slave is elected first, and some information about the slave is considered.

(1) Duration of disconnection from master. If a slave disconnects from the master for more than 10 times down-after-milliseconds, plus how long the master has been down, [milliseconds_since_master_is_in_SDOWN_state]. (milliseconds_since_milliseconds * 10) + milliseconds_since_master_is_in_SDOWN_state

(2) Slave priority. The slave priority is sorted by slave priority. The lower slave Priority is, the higher priority is

(3) Copy offset. If the slave priority is the same, check replica offset to see which slave replicates more data. The lower the offset is, the higher the priority is

(4) Run ID If the preceding two conditions are the same, select a slave with a smaller RUN ID

What does the sentry performing the switch do after failover?

Configuraiton configuration information is propagated.

Once the sentry completes the switch, it updates the master configuration locally and synchronizes it with the other sentries via the PUB/SUB messaging mechanism.

What do the other sentinels update their own configurations with when they synchronize their configurations?

The sentry performing the switch gets a Configuration epoch from the new master (Salve -> Master) to which it is switching. This is a version number that must be unique for each switch.

If the first elected sentry fails to switch, the other sentries will wait for the fail-timeout time and then continue to switch. At this time, a new Configuration epoch will be obtained as the new version.

This version number is important because messages are published and listened to through a channel, so when a sentinel makes a new switch, the new master configuration follows the new version number, and the other sentinels update their master configuration based on the size of the version number.

Ok, so let’s leave the sentry problem for now, and let’s talk about redis persistence. First, should Redis be persisted in production? If so, why is it needed, or what does persistence mean for production systems?

Want to.

Redis persistence is mainly for disaster recovery and data recovery, which can also be classified as highly available.

For example, the whole Redis hangs and becomes unavailable. In this case, the first thing to do is to make Redis available as soon as possible. Then it will restart Redis and make it available to the outside world as soon as possible. However, if there is no persistence and no data backup, Redis will not be available at this time, ah, data is lost!

At this time, it is very likely that a large number of requests will come to the cache and all of them will not be hit. At this time, it will die, which may lead to the cache avalanche problem. All the requests that are not hit in Redis will go to the database to find, and the database will suddenly accept high concurrency, and then hang. When the database dies, you can’t even find data to restore to Redis.

What are the Redis persistence mechanisms?

Redis has two persistence mechanisms, AOF and RDB.

AOF records the command of each write request and appends it to the end of the file. It is efficient to append it directly. The operating system has its own cache. Data written to disks is cached in the OS cache. Redis invokes the fsync operation of the operating system every second to flush the data in the OS cache to AOF files.

When Redis is restarted, the command recorded in AOF can be executed again. However, if the file is large, the execution will take a long time, and the data recovery will take a little longer.

RDB, is a snapshot file, every certain time redis memory data to generate a complete RDB snapshot file, when Redis restart directly load the data can be, the same data than AOF recovery faster.

Talk about the pros and cons of each of these persistence mechanisms

good

The first advantage of RDB is that it generates multiple data files. Each data file represents the data in Redis at a certain point in time, which is ideal for cold backup. Second, the RDB persistence mechanism has very little impact on the read and write services provided by Redis. It enables Redis to maintain high performance, because the main redis process only needs to fork a sub-process to perform disk I/O operations for RDB persistence. Third, it is faster to restart and restore redis processes directly based on RDB data files compared to AOF persistence.

AOF stores instruction logs. When data recovery is performed, all instruction logs are actually played back and executed to recover all data in memory.

RDB is a data file that can be loaded directly into memory during recovery.

Disadvantages of RBD 1) More data may be lost during failure than AOF. In general, RDB snapshot files are generated every 5 minutes or longer. If the Redis process is down, the data in the last 5 minutes will be lost.

This problem, and the biggest disadvantage of RDB, is that it is not suitable for the first-priority recovery scheme. If you rely on RDB for the first-priority recovery scheme, more data will be lost

2) Every time the RDB forks the RDB snapshot data file, if the data file is very large, the service provided to the client may be suspended for milliseconds, or even seconds.

So generally do not let the RDB interval is too long, otherwise the generated RDB file is too large, the performance of Redis itself may be affected.

The advantages of AOF

1) AOF can better protect data from loss. Generally, AOF will perform fsync operation every second through a background thread and lose data for a maximum of one second.

The fsync operation is performed every second to ensure that data in the OS cache is written to disks. The Redis process hangs and loses at most 1 second of data.

2) High AOF persistence AOF log files are written in appends-only mode, so there is no disk addressing overhead, very high write performance, and files are not prone to breakage, even if the tail of the file is broken, it is easy to repair.

3) Even if the AOF log file is too large, the background rewrite operation will not affect the client’s reading and writing. This is because the rewrite log compacts the instructions to create a minimal log that needs to be retrieved. When a new log file is created, the old log file is written as usual. When the log files after the merge are ready, the old and new log files can be exchanged.

4) Commands for AOF log files are logged in a very readable way, which is ideal for emergency recovery in the event of catastrophic deletions.

Flushhall flushes all data in the flushhall file. Rewrite in the background has not yet happened. Flushhall deletes the last item in the AOF file and then flushes the AOF file back.

The disadvantage of AOF

(1) AOF log files are usually larger than RDB data snapshot files for the same data

(2) When AOF is enabled, the write QPS supported by RDB is lower than that supported by RDB, because AOF is usually configured to fsync log files once per second. Of course, once per second is still very high performance.

If you want to ensure that no data is lost, this is ok, the AOF fsync is set to write no data, fsync once, but that causes the QPS of Redis to drop significantly.

(3) There was a bug in AOF before, that is, the same data was not recovered when the logs recorded by AOF were recovered.

Therefore, a more complex command log /merge/ playback approach such as AOF is more vulnerable and buggy than the rDB-based approach of persisting a complete data snapshot file at a time. AOF, however, is designed to avoid bugs in the rewrite process, so instead of merging the rewrite log, rewrite it based on the data in memory at the time, which is much more robust.

(4) The only major disadvantage is that data recovery will be slow, so it is not appropriate to do cold backup.

You mentioned cold reserve just now, why is AOF not suitable for RDB?

You can do both, but RDB is better.

RDB can do cold standby, because it will generate multiple files, each file represents a complete data snapshot of a moment, we can send the complete data file to some remote safety storage, such as can be ali cloud ODPS on distributed storage, backup strategy to scheduled to backup the data in the redis on a regular basis.

AOF can also do cold standby, but it only has a file, but we can go to write their own procedures, every certain time, to copy a copy of the file out.

RDB cold standby, the advantage lies in redis to control the fixed time generated snapshot file things, more convenient, and AOF, we also need to write some scripts to do this thing, various timing, more trouble.

RDB data is cold standby and, in the worst case, provides data recovery faster than AOF.

With all this talk of AOF and RDB, how does a production system choose between these two persistence mechanisms?

As for how to choose RDB and AOF, I think we should choose both.

1) Don’t just use RDB, because that will cause you to lose a lot of data.

2) Don’t just use AOF either, because there are two problems with that,

First, if you use AOF to do cold backup, you can recover faster without RDB.

Second, RDB is more robust by simply generating snapshots each time, avoiding the bugs of complex backup and recovery mechanisms such as AOF.

3) Use AOF and RDB persistence mechanisms comprehensively, and use AOF to ensure data loss as the first choice for data recovery; RDB is used to do different degrees of cold backup, in the case of AOF files are lost or damaged and unavailable, RDB can also be used for fast data recovery, as the last line of defense for data recovery.

Ok, that’s all for today, and we’ll talk more next time

It’s finally over.


If you actually wrote to master Redis on the resume, so if the interviewer is proficient in Redis, he will catch you this a Redis from shallow to deep has been chased and asked, what see what on earth do you know about Redis, accumulation of at ordinary times you really have the knowledge, is to know a little more than other people, after a layer of deep excavation, Let’s see how many levels you can pass. The more rounds you go, the better your chances will be with the interviewer.

If Redis asks you a few basic questions, the interviewer either doesn’t know much about Redis or is looking at other highlights on your resume.

Tip: One more thing to say is that the interview is not just a list of technical terms, can be specific to a technical point, such as: familiar with redis threading model, persistence mechanism (of course, if you really understand this write). Rather than just saying you are familiar with Redis, the interviewer will know what to ask you. Otherwise, it’s just a list of technical terms so broad that the interviewer won’t know what to ask you, and you’ll be embarrassed if you ask anything. You can be less passive by saying that you are familiar with a specific knowledge point. Of course you’re going to be asked about something that’s not on your resume…

This series of articles is a surprise interview, not a tutorial. You can talk a lot if you dig deep, and you just need to explain the principle of the interview. Even better, you can draw pictures as you talk. This series of articles is quick assault, quick pick up, review.

Please give it a thumbs up

Pay attention to the public number programming avenue, the first time to get the article push.

Feel good, please like, follow, forward oh ~