Error: Redis/mysql double write cache

But in terms of updating the cache, after updating the database, should you update the cache or delete the cache? Or delete the cache first, then update the database, in fact, there is a big debate. There is currently no comprehensive blog that dissects these options. So the blogger wrote this article, terrified and at the risk of being attacked.

The body of the

Set an expiration time for cached data

As a reminder, in theory, setting an expiration date for the cache is the solution to ensuring ultimate consistency. In this scenario, we can set expiration dates for cached data, all writes to the database, and cache operations as best we can. That is, if the database write succeeds and the cache update fails, subsequent read requests will naturally read the new value from the database and backfill the cache as soon as the expiration date is reached. Therefore, the idea discussed next does not rely on the idea of setting an expiration time for the cache.

Here, we discuss three update strategies:

  1. Update the database first, then the cache
  2. Delete the cache first, then update the database
  3. Update the database first, then delete the cache

Update the database first, then the cache

There was widespread opposition to the scheme. Why is that? There are two points:

  • Reason 1 (Thread-safety perspective)

(1) Thread A updates the database (2) thread B updates the database (3) Thread B updates the cache (4) Thread A updates the cache

A should update the cache earlier than B, but B updates the cache earlier than A because of network reasons. This results in dirty data and is therefore not considered.

  • Cause 2 (Business Scenario)

(1) If you are a business requirement with more write database scenarios and less read data scenarios, using this scheme will result in frequent cache updates before data is read at all, wasting performance.

(2) If you write a value to the database, it is not written directly to the cache, but is written to the cache after a series of complex calculations. Then, it is a waste of performance to evaluate the value written to the cache again after each write to the database. Obviously, deleting the cache is a better fit.

Delete the cache first, then update the database

The scheme can lead to inconsistencies because. At the same time, one request A for update operation, and the other request B for query operation. Then the following situation will occur:

(1) Requests A to perform write operations and delete the cache. (2) Requests B to query and find that the cache does not exist. (3) Requests B to query the database and obtain the old value

This leads to inconsistencies. Also, if you don’t set an expiration time policy for the cache, the data will always be dirty.

So, how to solve it? The delayed dual-delete policy is adopted

Cache delay dual-delete

public class CacheServiceImpl implements ICacheService {

    @Resource
    private RedisOperator redisOperator;
    
    @Autowired
    private IShopService shopService;

    //1. Use delayed dual deletion to ensure consistency between the database and cache
    @Override
    public void updateHotCount(String id) {
        try {
            // Delete the cache
            redisOperator.del("redis_key_" + id);
            // Update database
            shopService.updataHotShop();
            Thread.sleep(1000);// Sleep for 1 second
            // Delay deletion
            redisOperator.del("redis_key_" + id);
        } catch(InterruptedException e) { e.printStackTrace(); }}@Override
    public Integer getHotCount(String id) {
        return null; }}Copy the code

Explanation:

  1. Make caching obsolete
  2. Rewrite database
  3. Sleep for 1 second and then flush the cache.

In this case, the reader should evaluate the time it takes to read the data business logic for his project. Then the sleep time of write data is based on the time of read data business logic, add several hundred ms. The purpose of this operation is to ensure that the read request ends and the write request can delete the cache dirty data caused by the read request.

What if the database has a read-write separation architecture? (Master library is responsible for write operations, slave library is responsible for read operations)

Ok, in this case, the data inconsistency is caused by the following, again two requests, one request A to update, and the other request B to query.

(1) Request A to write, delete the cache, request A to write data to the master library, has not started to synchronize the slave library

(2) (within 1s) REQUEST B to query the cache, no cache is found, request B to query the secondary library, at this time has not completed the master/slave synchronization, check the old value, and write the old value into the cache.

(3) The master library completes the master/slave synchronization, and the slave library changes to the new value

The above process is the data inconsistency problem and uses the double deletion delay policy. However, the sleep time is modified to add several hundred ms on the basis of the delay time of master and slave synchronization

What about throughput degradation with this synchronous obsolescence strategy?

Ok, then make the second deletion asynchronous. Own a thread, asynchronous deletion. In this way, the written request does not have to sleep for a while and then return. By doing this, you increase throughput.

Second delete, what if delete failed?

This is a very good question, because the second delete fails, and the following happens. There are still two requests, one request A for update operation, and the other request B for query operation. For convenience, suppose it is A single library:

(1) request for A write operation, delete query cache (2) request B found cache does not exist (3) ask B to database query get old value (4) request B will old value written to the cache (5) requests to write A new value into the database (6) A request to attempt to delete A B write to the cache value, the result failed.

Ok, so that means that. If the cache fails to be deleted the second time, the cache and database inconsistency will occur again.

How to solve it?

The specific solution, and see the blogger to update the database first, then delete the cache update strategy analysis.

Delete the cache retry mechanism

Whether it is delayed double deletion or cache-aside, it may fail to delete the Cache in the second step, resulting in data inconsistency. We can use this solution to optimize: delete the cache several times if the deletion fails, so we can introduce a retry mechanism for deleting cache

  1. (1) Update database data; (2) The cache fails to delete because of various problems. (3) The cache sends the key that needs to be deleted to the message queue. (4) It consumes the message to obtain the key that needs to be deleted

However, this scheme has a disadvantage of causing a lot of intrusion into the line of business code. This leads to plan 2, in which a subscriber is started to subscribe to the binlog of the database to obtain the data to be manipulated. In the application, start another program, get the information from the subscriber, and delete the cache.

Read biglog asynchronously delete cache

The process is as follows:

(1) Update the database data (2) Write the operation information to the binlog (3) extract the required data and key (4) create a non-business code to obtain the information (5) attempt to delete the cache operation, Delete failed (6) send the information to the message queue (7) retrieve the data from the message queue again and retry the operation.

Note: The above subscription binlog program in mysql has a ready-made middleware ** called canal, ** can complete the subscription binlog function. As for Oracle, the blogger is currently not aware of any off-the-shelf middleware available. In addition, the retry mechanism, bloggers are using message queue mode. If the consistency requirement is not very high, directly start another thread in the program, every once in a while to retry, these we can be flexible and free to play, just to provide an idea.

In fact, this paper is a summary of the existing consistency scheme in the Internet. As for the update strategy of deleting the cache first and then updating the database, and the way of maintaining a memory queue, the blogger looked at it and felt that the implementation was extremely complicated and unnecessary, so there was no need to give it in the article. Finally, I hope you have learned something.

Source: www.jianshu.com/p/597150952…


My wechat official account: Java Architect advanced programming focus on sharing Java technology dry goods, looking forward to your attention!