This article is based on the speech delivered by Mr. Qiang Changjin at DBAplus Beijing Open Source and Architecture Technology Salon on January 6, 2018.



 

Today, I will introduce to you the common Redis architecture, some of my experience in Redis in Momo and Qunar, mainly including the daily maintenance work of MySQL or Redis by DBA, how to formulate the Redis architecture according to the daily work and business needs, and finally share some useful auxiliary tools to encourage you.

I. Daily work of DBA

Daily operations, such as building example, deployment, high availability, do business, query, metadata management, monitoring and alarm handling alarm, these can also be addressed by some artificial, but as time goes on, more and more instance, machine more and more, the business is more and more complicated, at this moment we have to know each business corresponding Redis instance, And other MySQL instances need automated management platforms to simplify their work.

The business requirements

 

When faced with business requirements, such as a business going to Redis, dbAs first need to understand what the business does.

Qunar has a good habit of dbAs going deep into the business line. In the past, I was in charge of the air ticket system. I stayed in the business line of air ticket almost all the year round. I would have a deep understanding of how each Redis of air ticket was used, and distinguish whether Redis was used for cache or storage.

If it is used in the cache, how to configure the maximum capacity, alarm and configuration; If it is used in storage, how should the function be managed. Do you need to communicate with the business when using caches? Do you set expiration time for each Key? If not, I think you are using it for storage rather than caching.

We can make some provisions in the early stage and make strict management in the later stage. If the service is used for storage, it may cause problems such as writing failure. Therefore, we must communicate with the service to find out the real purpose of the service requirements.

Some business needs come and say that 600G or even 1T capacity is needed. At this time, facing the business needs of large capacity, how should we build the cluster, and do well in the future expansion and maintenance?

In other cases, the capacity of business demand is very low, maybe only 1G, but QPS reaches one million, at this time, we deploy multiple instances, including how to expand the instance, how to manage, and how to use multiple instances after they come in, which need a good architecture.

Operational requirements

After understanding the business requirements, we also need to understand the operation and maintenance requirements, because after starting to build the Redis architecture, the business may be very comfortable to use, but the DBA is very difficult to maintain, the whole architecture is very difficult to upgrade.

How to solve it? First we need to understand the business and establish some metadata. Because there are many businesses, each business may correspond to multiple clusters. By establishing metadata management, businesses can be quickly contacted when problems occur, which requires us to cooperate with businesses to solve them.

 

People under d if it is based on the former, using the recording of my own some notes or Wiki is enough, but if the metadata record to automation management platform, you can put these meta data collected, including daily operation, and instance, shut down the instance, adjust the master-slave state, data cleaning, such as operation, the DBA needs to consider.

In addition, whether the operation of late Redis maintenance can be done simply on the basis of not affecting the upper business use; How to set monitoring alarm and storage monitoring; In the process of high availability, there are many machines, and the machine may need offline maintenance every day. In this case, how to do online switchover? When the machine breaks down or the entire equipment room is unavailable, how to quickly restart the machine in another equipment room?

It is very tiring to be an operation and maintenance DBA. 7×24 hours is not enough to rely on people alone. It is necessary to have a good architecture combined with operation and maintenance platform to achieve rapid response.

Problems faced

After that, I began to think about how to design a good architecture by understanding my own requirements and the problems I would face in the process:

  • Storage & cache cluster design

  • capacity

  • High availability

  • Operational platform

Second, Redis architecture design

Objectives of the Architecture

In terms of how to design a good Redis architecture, we set ourselves a small goal — to achieve financial freedom. In this process, we should do the following:

  • Fast. Because in the rapid development of business, we may put on line N systems, build N sets of clusters and expand business every day. A good architecture needs to be able to make the business faster and friendlier to use!

  • Steady. As the number of clusters and machines increases, the demand for capacity expansion, machine relocation, and maintenance will be encountered almost every day during operation and maintenance. So when the nodes of the underlying Redis cluster change, it should ensure that the business is not affected.

  • Prospective. Due to downtime, misoperation and other reasons, Redis will switch, how to reduce the impact of misswitchover, which requires a good automatic switch mechanism!

Architecture selection

Once the small eye is calibrated, the selection of architecture should begin. Here are some points we considered:

 

1, Redis master/slave or Redis Cluster?

  • Redis master-slave: stable and easy to maintain, but with high cost of expansion, services may be affected in the traction process.

  • Redis Cluster: high maintenance cost, simple expansion, relatively complex, stuck need to study the source code, add node operation is also more difficult.

Therefore, we choose a master and slave of Redis and a relatively simple and stable version. Of course, we still use 2.8, as long as the performance is OK.

2. Proxy or similar Proxy

  • It can isolate the relationship between the business and the underlying Redis cluster, which is convenient for operation and maintenance personnel to manage the background cluster.

  • You can manage connection pools;

  • You can control HASH rules.

3. High availability

We provide a Sentinel that automatically lifts the slave machine up when there is a problem with the main machine.

4. Metadata management

Including the metadata mentioned earlier, we will consider whether to use MySQL or ZooKeeper to do this. The advantage of using ZooKeeper is that all business ends can be notified when a node changes.

For us, using MySQL is firstly easier to maintain, and we can also see more clearly which business corresponds to which node information. As a result, we started out using only ZooKeeper, and later changed to MySQL+ZooKeeper.

Architecture design

This is the flow chart of our Redis architecture visit instance, which is also the architecture that Qunar has been using in recent years, with some upgrades in the middle.

Initially, clients access the Redis cluster through the Proxy cluster and based on the Redis node information, such as a Key, which is accessed through some consistent Hash, and the Redis node information can be understood as being packaged by the Proxy cluster or access client into their business clients.

Architecture extension — Proxy

 

1, change

This is because we have a Namespace for each business, which stores relevant information (some master node information or connection pool size can be set here). When the client comes in, it first reads the information of Redis from ZK, and obtains this information from Namespace. If it gets the information of three nodes, it can directly establish the corresponding number of connections according to the size of the connection pool configured above, so at this time, we have got these nodes when the business comes.

How to access the Key if the business comes to another Key? This Key is first evaluated in the Proxy or access client using a consistency algorithm, and a Key can be assigned to a node in the cluster. We can all grab the node and manipulate the data inside, but there is a problem: what if the node fails? That’s where we need to activate one of our sentinels.

Sentinels can be deployed in multiple machine rooms, with a set of sentinels assigned to each instance (it is possible to monitor five sentinels per instance). If the primary node fails, we can promote the secondary instance of Redis to the primary instance through the sentry election mechanism, and the old instance can be automatically changed to the secondary instance once it is up. However, the business side did not know that they were still getting the old instance, only to find that they could not connect to it now.

We through the sentry ZK information changes, because the guard switch can know which is the new primary instance, after we modify the corresponding information, modify after triggering the client again, we can let the client know know now who is the new master node, it will know after the connection again, and then found the client connected to the new place. This is a simple Redis architecture.

But there is a problem with this Redis architecture: if you start with a cluster of three nodes, how do you scale from three nodes to five? Because all the data need to be rehash rules, I can first build five instances, and then analyze the AOF files of these three instances according to the synchronization tool written by ourselves and the new Hash rules. By constantly analyzing the AOF files, I can Hash these three groups of instance data to five nodes through the new Hash rules. Finally, ZK information was manually modified to inform it that it had been expanded to five nodes.

If our synchronization tools can’t keep up with Redis’s writing speed, we may never synchronize. Therefore, in order not to affect the business operation, we can change this area.

We started with a simple consistency Hash, and then we did a two-level Hash on top of that, so the first level Hash is this node, and the second level Hash is for simple residuals, and for us, it’s pretty easy to scale up, using master-slave.

2. Customize Hash rules

You can customize Hash rules. If the problem cannot be solved with the first-layer Hash, you can set up the second-layer Hash and expand the capacity of a node. Compared with the previous capacity of three nodes, five nodes can be expanded, but each node can be expanded during layer 2 Hash. This is very simple for O&M, and only two slave libraries need to be built.

3, safety

The security aspect is that the dangerous commands are shielded from the client, and the first dangerous DEL operation, including the more dangerous FLUSH operation and some new operations, is not allowed.

These operations tend to stall in Redis, so we need to filter out dangerous commands. Therefore, in the PROCESS of DBA operation and maintenance, when the node information changes, the client can automatically update the cluster information and connection pool information.

4. Connection pool

Based on the rules of the connection pool, may be a node number of connections is too large, could reach 8 to 9000 or even tens of thousands of connections, and a customer may not need so many connections, and many inside connection is abandoned, so we need to modify the number of data pool in MySQL, let it can quick response when the monitor abnormal, and through the ZK to notice, This will have no impact on our business.

Architecture Expansion — Sentinel

1, one to one: a group of sentinels is one node, the maintenance cost is lower.

2, multi-machine room distributed: more sensitive and accurate.

3. Automatic switchover: the premise is that the slave library generally does not provide services. When the master library breaks down, it will automatically switch and push the new master node information to the business end through ZK and configuration center, so that the slave library can be promoted to the master library.

4. Notification: including automatic induction of ZK business end, including manual switch of sentry when the business goes online, and detection of high availability, machine availability and architecture problems, etc. At this time, ZK can be modified by Sentry, and Proxy will automatically sense.

5, DBA admission: after admission need to look at the monitor and alarm information, confirm whether sentry really switch or do you have any updates to the ZK information, including the updated business side have received, if not received, need artificial test again, but this kind of mechanism for us a little simpler, because a lot of things can be done through tools.

Architecture extension — ZooKeeper

ZooKeeper is responsible for two things: logging configuration information and subscribing to notifications. First, it defines a Namespace concept for each business, so that multiple pieces of business information can be set there.

In addition, when cluster node information changes, ZooKeeper notifies the client or Proxy for quick response. One problem with this, however, is that failure can occur at notification time.

 

Use processes and rules

Once we have designed a Redis architecture, how can we use it?

 

1. Describe business requirements and communication. We need to confirm the usage scenario, capacity, QPS and a series of information of the business.

2. Develop the cluster model. To set up a good cluster, ten to twenty nodes may be needed. A mode should be established in the early stage to determine whether it is cache. If not, the alarm information will be different and the information monitoring points will be different.

3. Automate cluster building. You can use your own tools to build some clusters quickly and automatically.

4. Deploy high availability and Failover. A Failover mechanism must be implemented in a high availability deployment, because it is possible to deploy a Failover mechanism that is not truly refined, which can lead to problems when manual access occurs.

Here we made a simple tool, that is, we built a set of clusters (with 100 instances), but it is impossible to detect every instance, so we can define a level of cluster, if the core cluster, I choose 90% to automatically perform a Failover, if the non-core cluster, Select 60% to perform a Failover automatically.

5. Initialize the configuration center and fill in the information.

6. Notify the service cluster that has been set up.

Automatic capacity expansion

The previous article just said our first and second versions of capacity expansion plan and process, here carefully comb out.

First of all, when we get a business, the actual content detected by the operation and maintenance side is 10G at the beginning, but when we monitor it, it has reached 15G. At this time, we may need to expand.

With the development, we only defined 10 G at the beginning, but gradually exceeded 10 G, 20 G, because when the storage capacity of a node of Redis becomes larger and larger, the operation and maintenance cost will also become larger and larger. In addition, it is very simple to set up master slave instances within 10 GB, but if it is 20 GB or 30 GB, it takes a long time to do a recovery, so the O&M side will also trigger a demand.

In fact, services do not care about service expansion, but it is very difficult for O&M. The first solution is to write synchronization tools to expand the first-layer Hash.

It’s easy to expand when we have two levels of Hash, two slave libraries, three slave libraries. But there is a problem: there is redundant data.

In other words, when we do the environment cleaning, setting up the master and slave, we just migrate the data and then update the configuration and tell the business cluster to take a look. But here could be a cluster nodes by three to five, because each node in the enlarged part of data is the garbage data, and this part of the data when doing the Hash is not needed, so we need according to the new Hash data, the old information, artificial slowly removed will not affect the business, so our capacity expansion also finished.

Advantages & Disadvantages

1. Advantages:

  • Easy to expand: directly build the slave library.

  • Easy adjustment: The information about node adjustment and connection pool can be controlled, and services can be automatically sensed.

  • Simple maintenance.

  • Stability: the use of sentinel mechanisms for security.

2. Disadvantages:

  • Complex capacity reduction. How to reduce the three nodes to two nodes, we have not thought of a good solution, we can adjust the maximum memory, such as 10 G to 5 G, of course, the number of instances remains the same.

  • The client upgrade is difficult. Procedure Since all businesses need to be upgraded, the principle we currently use is that the old business may still use version 1 of the Hash rule and the new business will use version 2.

Third, auxiliary tools

When we have a good Redis architecture, do we need other tools or platforms to assist dbAs in their management? This part is simple and everyone introduces the experience.

Operation and maintenance of gadgets

We can do some operational widgets, including deployment related automation widgets, smooth migration, memory analysis tools, and data cleansing.

 

Redis management platform

In addition to simple gadgets, you can also use some artificial intelligence. We created a simple platform to help us achieve metadata, daily operations, monitoring & alarm, automation and other management. Among them:

 

  • Metadata management: includes the management of machines, clusters, nodes, and business owners.

  • Daily operations: These operations can be provided for development use, including queries, parameter adjustments, slow queries, and data cleaning.

  • Monitoring & Alerting: For caches, we do not set the maximum memory alarm. As I have emphasized above, when using a cluster, it is important to define whether it is storage or cache. After classifying this area, many operations can be performed on this platform.

  • Automation: Includes automatic deployment and capacity expansion.

Backup & Restore

Backup is a daily task for dbAs, and Redis is no exception. Here we can combine RDB+AOF functionality. Because the master library sometimes may be under great pressure, and the RDB may be done once every half an hour, if the master library dies within half an hour, what should I do? At this point you can open the slave library.

So why combine RDB with AOF? Because if we have RDB, the data may be lost, and if we only use AOF files, the recovery process is very slow. Some businesses say, I can allow some data to be lost, but it needs to be recovered quickly.

Therefore, we chose the combination of the two methods, which not only has a very fast recovery speed, but also ensures that the data can still be recovered even when the master and slave libraries are suspended. Of course, we also have another machine room to use, which is only for the core. Normally, a combination of these two options would suffice.

Four,

In conclusion, we must go deep into the business and know where and how Redis is specifically used. After understanding it, we can choose simpler and more suitable technology to do architecture, including achieving good automation. Because it will be very tiring to operate and maintain only by stacking people, if we have a good management platform and automatic tools to assist us, it will get twice the result with half the effort.

For future exploration, we will investigate some new technologies. Ideally, some simple automation would make it easier for DBAs to learn new technologies and solve other problems on the job.

Question and answer session

Question 1: Is there a sentinel in a machine room or a group of sentinels under the Redis architecture?

A: If your company currently has only one room, the sentry must be placed in the same room. But if your company has several rooms like Qunar or Momo, you can put one sentry in each room. For example, if you want to monitor this example, you have five rooms with five sentinels, and a set of sentinels is five nodes.

[Follow-up] What if the sentry is deployed in multiple computer rooms, and the sentry has a brain fracture?

A: So we want to better avoid this problem with the deployment base.

[Q2] If metadata is placed on ZooKeeper, should ZooKeeper be placed in the same room as Redis?

A: We suggest that several nodes in a ZooKeeper cluster be placed in different computer rooms. It is inevitable that there may be problems with multiple computer rooms. However, we have never had any problems in this way, and it is more sensitive than normal network monitoring.

[Question 3] Dear teacher, WHEN I was doing Redis monitoring, I had such a puzzle: The QPS of Redis are different according to different businesses. How can WE meet the requirement that we want to monitor whether the Redis of this business is abnormal?

A: Set different values for each service.

[Follow-up] It will be very troublesome to design this way, and the value will not float when the business changes. Do you have any good ideas?

A: First we can see each instance through the management platform, so that we can set the monitoring value for each instance, including QPS. We can set 20,000 alarm, 30,000 alarm, if at this time our business visits are relatively large, suppose it may be 60,000, then we set to 70,000 alarm, which can be very fine granularity, for each instance to set. So we will know when building “you this business of QPS is probably how many”, for example my QPS is sixty thousand, may set up four nodes, may also be the initial alarm number is sixty thousand, initially to business once more than you can call the police, then we can be initialized, the alarm threshold is set out.

[Follow-up] that is, refinement to the instance level?

A: Yes, down to each instance. The monitored values are different for each instance.

[Q4] Could you tell us more about remote disaster preparedness in the PPT?

A: We can make a core cluster and build another set of slave libraries in the remote computer room.

[Follow-up] Do I need to be notified if the local machine room is down?

A: At this point, it is possible to not use the sentinel monitoring, if the whole machine room is down, you need to manual. At this time, the remote DISASTER recovery is the entire machine room can not be used, I think it needs manual intervention.