Original: Taste of Little Sister (wechat official ID: XjjDog), welcome to share, please reserve the source.

The data of the system is the life of the company. Even if it’s dog shit, we freeze it and freeze it for later use. The product design of garbage is more confusing, and people will take excrement out of the freezer from time to time, trying to taste the residual taste.

But there are some valuable needs. In this case, it is necessary to separate hot and cold data, to isolate the data. Not to let a rat excrement, bad pot of porridge.

What XJjDog is sharing today is a very common hot and cold separation feature. There are many solutions, but just the most common ones.

In the end, within the constraints of RDS, you had to choose a solution that wasn’t the most beautiful. This proves from the side that the wife is not the most beautiful good, to the most suitable to be happy and complete.

Problem scenario

With the development of business, database growth is fast. The boss does not understand the truth among them, but as the maintainer of the database, but see the fear tremble.

Finally, the database is slowly approaching the number of bottlenecks, and administrators are increasingly anxious.

Use a partition table. No. As mentioned above, some requests to dig ancestral graves will load some data from a long time ago, and partition tables will not solve the problem.

Obviously, we need to do a little bit of data slicing, hot and cold separation.

The general structure is shown in the figure above. We have a data routing, which is responsible for distinguishing data according to the time dimension and locating it to the corresponding database for query.

Hot storage and cold storage can be heterogeneous.

solution

The problem has been transformed. Our next goal becomes how to construct a separation of hot and cold data in terms of time dimensions.

By far the most used database is mysql, which is where we started.

In fact, the two pieces of data separated by hot and cold, query the “latest time” data, there is no difference. The only difference is that hot repositories periodically delete old data.

Double write

Double-writing is the simplest, but also the least reliable solution. The structure is shown below.

However, note that steps 1 and 2, which involve distributed transactions, need to ensure that both libraries are written successfully.

That makes things a little bit more difficult. As a man who has suffered numerous business problems, he will not repeat such mistakes.

So, in this case, just pass.

Walk the message

Careful students should have noticed that the optimization point above is that by introducing something called message queues, the mountain of distributed transactions can be circumvated, with only final consistency guaranteed.

What a beautiful idea. Ideal is full, reality is very skinny. Because the separation of hot and cold involves a large number of data tables, the need to modify unpredictable business code was met with unanimous opposition.

The scheme went nowhere.

Look at the picture. It’s just two different lines.

Use the binlog

For those of you who already have a problem, why not use binlog? So let’s talk about that.

Admittedly, this is a very elegant approach. The data only needs to be written to the hot storage. Incrementally, the data is written to the cold storage by subscribing to the data.

But wait. Our scheduled task, when deleting data, also generates a binlog. How can I distinguish whether data deletion is caused by scheduled tasks or normal services?

Fortunately, XJjDog knows a very stealthy way to do it.

Yeah, yeah, yeah, that’s the process.

set session sql_log_bin=0;
//opt
set session sql_log_bin=1;
Copy the code

Binlog Can be set to the session-level, that is, statements that operate in this session will not generate a binlog.

In this way, we disable the binlog for the scheduled task, execute the delete statement, and then restore the binlog. The deleted data will not be synchronized to the cold storage through CANAL.

To my surprise

mmp?

Why not? Why is that? Let me take a cautious guess. Your RDS may be sharing an instance with someone else.

In addition to the LIMITATIONS of RDS, there is a bug with this solution. Like when there’s a separation of hot and cold in the heat chamber. Think about why.

Mark clear

Come on, Xjjdog can only save the country. Do the fuckin ‘thing the best way possible.

Flag clearance. These four bold words remind people of the JVM’s garbage collection algorithm.

In fact, the principle is similar, the steps are divided in two.

First, marking stage

For each table, add a field called mark2Del. Then, through timing, mark all data to expire (that is, to be put into the cold storage).

Second, the clearance stage

When the next time comes, the data marked to be deleted will be moved to the cold storage one by one. After the relocation, the next round of marking.

The scheme is simple, but it has one Achilles’ heel. Since all library tables are old tables, we need to add a field called mark2Del, which is very troublesome.

However, the above introduction only addresses data deletion, not data synchronization.

Final plan

Combine the above description with environmental constraints. We chose to use the binlog+ flag for cleanup.

Tag clearance is responsible for deleting data.

Binlog is responsible for incremental data synchronization. However, in this synchronization logic, there is an additional judgment that if the value of mark2Del is set to true, the binlog is ignored.

In other words, we impose a judgment flag on each deleted record.

Finally, the system works.

End

Mysql to mysql is hot and cold.

But if I want to do a tiered data warehouse.

The first layer is the heat storage.

The second floor is the cold storage.

And then the third layer, you have archive libraries, maybe big data stores like Druid.

How to design?

This article does not cover too much. The difficulty in architecture is not the result, but the process.

There’s always a story behind a plan that looks bad to you, and it helps to try to understand it.

Unless it’s really lame. But isn’t that your chance, too?

Xjjdog is a public account that doesn’t allow programmers to get sidetracked. Focus on infrastructure and Linux. Ten years architecture, ten billion daily flow, and you discuss the world of high concurrency, give you a different taste. My personal wechat xjjdog0, welcome to add friends, further communication.