Redis master-slave replication is the cornerstone of high availability. What is high availability? High availability is about reducing the amount of time the system can’t provide, which is often heard on the basis of six nines. Sentinels and clusters are essential to high availability. This article focuses on the sentinel mechanism.

This article mainly focuses on the following aspects of sentry

  • The sentry is introduced
  • The sentry configuration
  • Sentry principle

Implementation environment of this paper

  • Centos7.3 redis4.0
  • Redis working directory /usr/local/redis
  • Perform simulation operations on the VM

What is a sentry

Let me just say a few words: when we configure the master/slave replication, we have a situation where the master node goes down. Who provides the service?

Master-slave replication is meaningless when the master node goes down, and the era of data is king without data, there is no high availability.

The sentry

Since the master node as the master node does not lead you to play. I’m going to pick one of the four of you to be the oldest, and you’re going to play with him.

When the boss who doesn’t play with you comes back he’s no longer your boss. He had to play with the boss I picked.

The above dialogue process is the significance of our configuration of the sentry exactly where, with whom to play is who to give the data, knowing the role of the sentry we continue.

Finally, let’s explain in technical terms what a sentry is.

Sentinel, also known as Sentinel, is a distributed system used to monitor every server in the master-slave structure. When the master node fails, the new master node is selected through voting mechanism, and all the slave nodes are connected to the new master node.

Second, the role of sentry

The conversation process we talked about above is one of the sentry’s functions automatic failover.

The role must be what the sentry did on the job. We will describe it in dry terms and then explain how it works in the following sections.

The sentry has three functions: monitoring, notifying, and automatic failover

  • monitoring
    • Monitor? The job of supporting a master-slave structure is to have a master node and a slave node, and that must be monitoring those two.
    • Monitor whether the primary and secondary nodes are running properly
    • Checks whether the primary node is alive and whether the primary and secondary nodes are running
  • notice
    • When the sentinel detects a server problem, it sends a notification to other sentinels. The sentinels are like a wechat group, and each sentinel sends a problem to this group.
  • Automatic failover
    • When the primary node is detected to be down, disconnect all slave nodes that are connected to the primary node, select one slave node as the primary node, and connect the other slave nodes to the new primary node. And inform the client of the latest server address.

One caveat here is that Sentry is also a Redis server, but does not provide any services to the outside world.

The sentry is set to singular. So why configure an odd number of Sentinel servers? With this question in mind you’ll find the answer below.

How to configure sentries

1. Preparation

In this chapter we begin to configure the sentry, the preparatory work. The picture below shows the preparation of kaka. Start 8 clients, 3 sentinels, 1 master node, 2 slave nodes, 1 master node client, 1 slave node client.

2. Read the sentinel.conf configuration

The sentinel uses a configuration file called sentinel.conf

cat sentinel.conf | grep -v '#' | grep -v '^$'

  • Port 26379: indicates the external service port number
  • Dir/TMP: Stores the work information of the sentinel
  • Sentinel monitor myMaster 127.0.0.1 6379 2: Sentinel monitor myMaster 127.0.0.1 6379 2
  • Sentinel down-after-milliseconds myMaster 30000: How long did the sentinel connect to the primary node before it responded? And then 30,000 is milliseconds, which is 30 seconds.
  • Sentinel parallel-syncs myMaster 1: This configuration item indicates the maximum number of slave nodes synchronizing the new master node during a failover. The smaller the value, the longer it takes to complete failover, and the larger the value, the more slave nodes are unavailable for synchronization.
  • Sentinel failover-timeout myMaster 180000: Specifies the timeout period for synchronization. The default value is 3 minutes.

3. Start the configuration

Using the cat command sentinel. Conf | grep -v ‘#’ | grep -v ‘^ $> Conf Move the sentinel-filtered information to /usr/local/redis/conf./conf/sentinel-26379.conf

sentinel-26379.conf

sed 's/26379/26381/g' sentinel-26379.conf > sentinel-26381.conf

Test that the master/slave replication is working properly. Start three Redis servers with ports 6379, 6380, and 6381 respectively

There is a little bit of lag, because one is 1 and one is 0. Lag is the delay time. This is a local test, so there will be 0, which is rare when using a cloud server. A value of 0 or 1 lag is normal.

hset kaka name kaka

After testing our master-slave structure is working properly.

redis-sentinel 26379-sentinel.conf

26380
redis-sentinel 26380-sentinel.conf
26379

At this point our sentinel configuration is complete, and then we break the master node

  • +sdown: One of the three sentinels thinks the master node is down
  • +odown: This message refers to the fact that the other two sentinels connected to the master node and found that the master node was down
  • Then a vote was taken, and here, kaka used Redis4.0, which is a bit different information between versions
  • +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380: until here is the result of sentry voting redis with port 6380 as the primary node
  • +slave slave 127.0.0.1:6381 127.0.0.1 6381 @myMaster 127.0.0.1 6380: connect port 6381 to 6379 and the new master node 6380
  • +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @mymaster 127.0.0.1 6380

When we put the redis server 6379 online again, we can see that the Sentinel server responds with two sentences. One sentence is to remove 6379 offline. The last sentence is to reconnect 6379 to the new master node.

Add the list type to the new primary node 6380

Three, sentry work principle

After the sentinel is configured, it is necessary to analyze its working principle. Only by knowing its working process can we have a better understanding of the sentinel.

This article explains the principle is not so dry! It allows you to read a technical article as a story.

Getting down to business, the sentry’s role is to monitor, notify, and failover. So the working principle is also around these three points.

1. Monitor workflow

  1. The sentinel sends the INFO command and saves all sentinel status, master and slave node information
  2. The master node records information about the redis instance. The information recorded by the master node looks the same as the information recorded by the sentinel, but there is a slight difference.
  3. The sentry sends the info command to the corresponding slave node based on the slave node information obtained at the master node
  4. Then sentry 2 comes, which also changes the master node to send the INFO command and establishes a CMD connection
  5. At this point, Sentry 2 will also store the same information as Sentry 1, except for two sentinels.
  6. At this point a publish subscription is established between each sentinel to ensure that their information is consistent. The sentinels also ping each other for long-term symmetry.
  7. When a second Sentinel 3 comes in, it does the same thing, sending info to the master and slave nodes. And make connections to Sentry 1 and Sentry 2.

2. Inform workflow

Sentinel sends commands to all of its master and slave nodes to obtain their status and publishes the information to the Sentinel’s subscription.

3. Failover principle (emphasis of this article)

  • The sentinel will publish sentinel: hello to the master until it says sdown. This is exactly what the Sentinel server reported when we disconnected the master node. The sentinel reports that the primary node sDown is not complete. The sentinel also sends a message to the Intranet indicating that the primary node is down. The instruction sent issentinel is-master-down-by-address-port
  • When the rest of the sentinels received their instructions, did the master node die? Let me go see if I’m hung. The message sent is also hello. The rest of the sentinels also send the messages they receive and send instructionssentinel is-master-down-by-address-portGo to your own Intranet, make sure you send firstsentinel is-master-down-by-address-portThe sentry said you were right, this guy did die. When everyone thinks the master node is down, it changes its state toodown. When a sentry thinks the primary node is hung with a flagsdown, when half of the sentinels thought their mark was hangingodown. This is why sentry configuration is singular.
  • For one sentry who thought the master node was down, it was called subjective offline. For half of the sentries who thought the master node was down, it was called guest offline.
  • Once the guest officer of the master node is considered offline, the sentry will proceed to the next step

At this point, the sentry has detected the problem, so which sentry is responsible for electing the new master node? It could not be that Three would go too, and four would go, and five would go too. Then there would be confusion, and one would have to choose the leader among all the sentries. Look at the picture below.

This time! All five sentinels would meet together, all sentinels would be on an Intranet, and then one thing they would do is all five sentinels would send commands at the same time, sentinel is-master-down-by-address-port and they would carry their campaign count and their Runid.

Take out the ones that aren’t online first

  • First, it will make other judgments based on the priority, if the priority is the same
  • If slave4’s offset is 90, Slave5’s offset is 100, then the sentry will think there is something wrong with the network. Slave5 will be chosen as the new master node. Slave4 and Slave5 have the same offset. One final judgment
  • The last step is to judge the RUNID, which is also the seniority in the workplace. That is to say, according to the creation time of the RUNID, the time is early.

Four,

That’s all you need to know about the sentry, but the most important part of this article is how it works. Let’s just briefly review how it works.

  • Monitoring is done first, and all sentinels synchronize information

  • The sentinel posts information to the subscription

  • failover

    • The sentinel found that the primary node went offline
    • Sentries open polls for chief
    • The responsible person elects the new master node
    • The new primary node disconnects the original primary node, and other secondary nodes connect to the new primary node. After the original primary node goes online, it connects to the new primary node as the secondary node.

This is how Kaka understands the sentry. If mistakes can be made, kaka corrects them in time.

5. Personal Profile of Kaka

Kaka, male, has been working for 17 years and has been working for three years. From the brick-moving lifestyle to the “single” life now. Of course this is not a single single! Although extremely rigorous technical learning but also far less than customers strange requirements. Entered the nine to six, although escaped the wind and sun, but still very enjoy those boil only dark circles of the day. Adhering to learning, blogging and sharing is the belief that Kakha has been upholding since she started her career. Hope in the big Internet in the kakaka article can bring you a little bit of help.