Resolution of distributed database and cache double-write consistency schemes

Turn the maimai. Cn/article/det…

Why write this article?

First, caching has been widely used in projects due to its high concurrency and high performance. In terms of reading cache, there is no doubt that business operations are carried out according to the following flow.

But in terms of updating the cache, after updating the database, should you update the cache or delete the cache? Or delete the cache first, then update the database, in fact, there is a big debate. There is currently no comprehensive blog that dissects these options.

The body of the

As a reminder, in theory, setting an expiration date for the cache is the solution to ensuring ultimate consistency. In this scenario, we can set expiration dates for cached data, all writes to the database, and cache operations as best we can. That is, if the database write succeeds and the cache update fails, subsequent read requests will naturally read the new value from the database and backfill the cache as soon as the expiration date is reached. Therefore, the idea discussed next does not rely on the idea of setting an expiration time for the cache.

Here, we discuss three update strategies:

Update the database first, then the cache
Delete the cache first, then update the database
Update the database first, then delete the cache

No one asked me why I didn’t update the cache first and then the database.

(1) Update the database first, then update the cache

There was widespread opposition to the scheme. Why is that? There are two reasons.

Reason 1 (Thread-safety perspective)

If there are both requests A and B for the update operation, it will appear

(1) Thread A updates database

(2) Thread B updates database

(3) Thread B updates the cache

(4) Thread A updates the cache

A should update the cache earlier than B, but B updates the cache earlier than A because of network reasons. This results in dirty data and is therefore not considered.

Cause 2 (Business Scenario)

There are two points:

(1) If you are a business requirement with more write database scenarios and less read data scenarios, using this scheme will result in frequent cache updates before data is read at all, wasting performance.

(2) If you write a value to the database, it is not written directly to the cache, but is written to the cache after a series of complex calculations. Then, it is a waste of performance to evaluate the value written to the cache again after each write to the database. Obviously, deleting the cache is a better fit.

Next comes the most controversial one, deleting the cache before updating the database. Or update the database first, then delete the cache problem.

(2) Delete the cache first, then update the database

The reason this scheme can lead to inconsistencies is. At the same time, there is A request A for update operation, and another request B for query operation. Then the following situation will occur:

(1) Request A to perform write operations and delete the cache

(2) The cache does not exist in the query of B

(3) Request B to query the database and get the old value

(4) Request B to write the old value to the cache

(5) Request A to write the new value to the database

This leads to inconsistencies. Also, if you don’t set an expiration time policy for the cache, the data will always be dirty.

So, how to solve it? The delayed dual-delete policy is adopted

The pseudocode is as follows

public void write(String key,Object data){ redis.delKey(key); db.updateData(data); Thread.sleep(1000); redis.delKey(key); }

Translated into Chinese description is

(1) First eliminate cache

(2) Write database again (these two steps are the same as before)

(3) Sleep for 1 second and eliminate the cache again

By doing so, you can delete the cached dirty data that has been generated within 1 second.

So, how do you determine this one second, and how long should you sleep?

In this case, the reader should evaluate the time it takes to read the data business logic for his project. Then the sleep time of write data is based on the time of read data business logic, add several hundred ms. The purpose of this operation is to ensure that the read request ends and the write request can delete the cache dirty data caused by the read request.

What if you use mysql’s read-write separation architecture?

Ok, in this case, the data inconsistency is caused by the following, again two requests, one request A to update, and the other request B to query.

(1) Request A to perform write operations and delete the cache

(2) request A to write data to the database,

(3) Request B to query the cache and find no value in the cache

(4) Request B to query the slave database. At this time, the master/slave synchronization has not been completed, so the queried value is the old value

(5) Request B to write the old value to the cache

(6) The database completes the master/slave synchronization, and the slave library changes to the new value

This is why the data are inconsistent. The dual-delete delay policy is still used. However, the sleep time is modified to add several hundred ms on the basis of the delay time of master and slave synchronization.

What about throughput degradation with this synchronous obsolescence strategy?

Ok, then make the second deletion asynchronous. Own a thread, asynchronous deletion. In this way, the written request does not have to sleep for a while and then return. By doing this, you increase throughput.

Second delete, what if delete failed?

This is a very good question, because the second delete fails, and the following happens. There are still two requests, one request A for update operation, and the other request B for query operation. For convenience, suppose it is A single library:

(1) Request A to perform write operations and delete the cache

(2) The cache does not exist in the query of B

(3) Request B to query the database and get the old value

(4) Request B to write the old value to the cache

(5) Request A to write the new value to the database

(6) Request A tries to delete request B to write the pair cache value, but fails.

Ok, so that means that. If the cache fails to be deleted the second time, the cache and database inconsistency will occur again.

How to solve it?

For specific solutions, see the blogger’s analysis of the (3) update strategy.

(3) Update the database first, then delete the cache

First of all, a word. Cache-aside Pattern: Cache-Aside pattern One of them is that

Invalid: The application retrieves data from the cache. If the application fails to retrieve data, it retrieves data from the database and puts it in the cache.
Hit: An application retrieves data from the cache and returns it.
Update: Save the data to the database, and then invalidate the cache after success.

In addition, facebook also used a strategy of updating its database first and then deleting its cache, as suggested in the Scaling Memcache at Facebook paper.

Isn’t there a concurrency problem in this case?

Isn’t. Suppose there are two requests, one for query A and one for update B, then the following situation will occur

(1) The cache is just invalid

(2) request A to query the database, get an old value

(3) Request B to write the new value to the database

(4) Request B to delete the cache

(5) Request A to write the found old value into the cache

Ok, if that happens, there will be dirty data.

But what are the odds of that happening?

The congenital condition is that the write operation of step (3) takes less time than the read operation of step (2), so step (4) may precede step (5). However, if you think about it, the database read operation is much faster than the write operation, so step (3) takes less time than step (2), this situation is difficult to occur.

Hypothetically, what if someone has to push back, obsessive-compulsive disorder, and must be resolved?

How to solve the above concurrency problem?

First, setting an expiration time for the cache is one solution. Second, the asynchronous delay deletion policy described in Policy (2) is adopted to ensure that the read request is completed before the deletion operation.

Are there any other reasons for the inconsistency?

Yes, this is a problem that exists in both cache update policy (2) and cache update policy (3). What if the cache deletion fails? For example, a request to write data to the database, and then write to the database, the cache deletion fails, and this will cause inconsistencies. This is the last question left in cache update strategy (2).

How to solve it?

Just provide a guaranteed retry mechanism, and two schemes are presented here.

Solution a:

As shown in the figure below

The process is shown below

(1) Update database data;

(2) The cache fails to be deleted due to various problems

(3) Send the key to be deleted to the message queue

(4) Consume the message and obtain the key to be deleted

(5) Retry the deletion until the deletion succeeds

However, this scheme has a disadvantage of causing a lot of intrusion into the line of business code. This leads to plan 2, in which a subscriber is started to subscribe to the binlog of the database to obtain the data to be manipulated. In the application, start another program, get the information from the subscriber, and delete the cache.

Scheme 2:

The process is as follows:

(1) Update database data

(2) The database will write the operation information to the binlog

(3) The subscriber extracts the required data and key

(4) Start another non-business code to obtain the information

(5) Failed to delete the cache

(6) Send the information to the message queue

(7) Retrieve the data from the message queue again and retry the operation.

Note: The above subscription binlog program in mysql has a ready-made middleware called Canal, can complete the subscription binlog function. As for Oracle, the blogger is currently not aware of any off-the-shelf middleware available. In addition, the retry mechanism, bloggers are using message queue mode. If the consistency requirement is not very high, directly start another thread in the program, every once in a while to retry, these we can be flexible and free to play, just to provide an idea.

Resolution of distributed database and cache double-write consistency schemes

Related Posts

Merge multiple videos using Python and add background music | Python theme month

Use and difference between Go mod and Govendor

The solution of atomicity problem — lock