Welcome to pay attention to github.com/hsfxuebao, I hope to help you, if you think it can trouble to click on the Star ha

1. An overview of the

1.1 What Do I Understand about Cache Double-write Consistency?

  • If the redisThere are dataMust be the same as the value in the database
  • If the redisNo data, the value in the database must be the latest value

1.2 Cache classification

  • A read-only cache
  • Read and write cache
    • Synchronous direct write policy: Data is also written to the cache to ensure that the data in the cache is consistent with that in the database
    • For read/write caches, a synchronous direct write strategy is used to ensure that the cache is consistent with the data in the database

2. Database and cache consistency update policy

Setting an expiration time for the cache is a solution to ensure ultimate consistency.

We can set expiration times for cached data, all writes to the database, and cache operations as best we can. In other words, if the database write succeeds and the cache update fails, then as long as the expiration time reaches, the subsequent read request will naturally read the new value from the database and backfill the cache to achieve consistency. Remember to use the mysql database write to the database as the standard.

The above scheme is a mainstream + mature approach after investigation, but considering the difference of business systems of various companies, it is not 100% absolutely correct and cannot be guaranteed to be absolutely suitable for all situations. In short, we must achieve final consistency.

2.1 Update the database first and then the cache

There was widespread opposition to the scheme. Why is that? There are two reasons.

Cause one (thread-safety perspective) occurs when there are both requests A and B for update operations

  • (1) Thread A updates database
  • (2) Thread B updates database
  • (3) Thread B updates the cache
  • (4) Thread A updates the cache

A should update the cache earlier than B, but B updates the cache earlier than A because of network reasons. This results in dirty data and is therefore not considered.

Cause two (business scenario) :

  • (1) If you are a business requirement with more write database scenarios and less read data scenarios, using this scheme will result in frequent cache updates before data is read at all, wasting performance.
  • (2) If you write a value to the database, it is not written directly to the cache, but is written to the cache after a series of complex calculations. Then, it is a waste of performance to evaluate the value written to the cache again after each write to the database. Obviously, deleting the cache is a better fit.

Next comes the most controversial one, deleting the cache before updating the database. Or update the database first, then delete the cache problem.

2.2 Delete the cache and then update the database

The reason this scheme can lead to inconsistencies is. At the same time, there is A request A for update operation, and another request B for query operation. Then the following situation will occur:

  1. Request A for write operation, delete cache, work in progress…… A has not been completely updated
  2. Request B to start, query redis and find that the cache does not exist
  3. Mysql > select myslq from myslq
  4. Request B to write the old value to the Redis cache
  5. Request A to write the new value to the mysql database

This can lead to inconsistencies between the database and the cache as follows:

time Thread A Thread B Problems arising
t1 Request A for write operation, delete cache, work in progress……
t2 (1) Mysql cannot be read in the cache, so I immediately read mysql. Because A has not finished updating mysql, I read the old value. (2) Write old values from mysql back to Redis Thread B writes the old value back to redis, causing other requests to read the old value.
t3 Update mysql database value, over Redis is the old value written back by B, mysql is the new value updated by A. There’s a problem with data inconsistencies.

Summary: If the database update fails, thread B will find that there is no data in Redis when it requests to access the cache again, and the cache is missing. When it tries to read mysql again, it will read the old value from the database

So, how to solve it? The pseudo-code of the delayed double deletion policy is as follows:

public void write(String key,Object data){
        // Get rid of caching
        redis.delKey(key);
        // Write the database again (same steps as before)
        db.updateData(data);
        // Sleep for 1 second to flush out the cache again.
        Thread.sleep(1000);
        redis.delKey(key);
}
Copy the code

Delayed dual-delete problem:

  1. How is this 1 second determined, specific should dormancy how long?

In view of the above situation, you need to evaluate the time of reading data business logic for your own project. Then the sleep time of write data is based on the time of read data business logic, add several hundred ms. The purpose of this operation is to ensure that the read request ends and the write request can delete the cache dirty data caused by the read request.

  1. What if you use mysql’s read-write separation architecture?

In this case, the reason for the data inconsistency is again two requests, one request A to update and the other request B to query.

    1. Request A to write and delete the cache
    1. Request A writes data to the database,
    1. No value is found in the cache
    1. Ask B toFrom the library queryAt this point, the master/slave synchronization has not been completed, so the query returns the old value
    1. Request B to write the old value to the cache
    1. The database completes master/slave synchronization and changes from library to new value, which is the reason for the data inconsistency. The dual-delete delay policy is still used.

However, the sleep time is modified to add several hundred ms on the basis of the delay time of master and slave synchronization.

  1. What about throughput degradation with this synchronous obsolescence strategy?

Make the second delete asynchronous. Own a thread, asynchronous deletion. In this way, the written request does not have to sleep for a while and then return. By doing this, you increase throughput.

  1. Second delete, what if delete failed?

This is a very good question, because the second delete fails, and the following happens. There are still two requests, one request A for update operation, and the other request B for query operation. For convenience, suppose it is A single library:

    1. Request A to write and delete the cache
    1. Query B found that the cache does not exist
    1. Request B to the database query to get the old value
    1. Request B to write the old value to the cache
    1. Request A to write the new value to the database
    1. Request A tries to delete request B from the cache and fails.

Ok, so that means that. If the cache fails to be deleted the second time, the cache and database inconsistency will occur again. How to solve it? For specific solutions, see the analysis of the (3) update strategy.

2.3 Update the database before deleting the cache

time Thread A Thread B Problems arising
t1 Deletes a value from the database
t2 The cache hits immediately, and then B reads the old cache value. User A does not have time to delete the cached value, causing user B to hit the old value.
t3 Update cached data, over

Summary: If the cache deletion fails or is too late, the cache hit when the request accesses Redis again, and the old cache value is read.

2.3.1 Theoretical basis

Cache-aside Pattern: Cache-Aside pattern One of them is that

  1. Invalid: The application retrieves data from the cache. If the application fails to retrieve data, it retrieves data from the database and puts it in the cache.
  2. Hit: An application retrieves data from the cache and returns it.
  3. Update: Save the data to the database, and then invalidate the cache after success.

In addition, facebook also used a strategy of updating its database first and then deleting its cache, as suggested in the Scaling Memcache at Facebook paper.

Isn’t there a concurrency problem in this case?

Isn’t. Suppose there are two requests, one for query A and one for update B, then the following situation will occur

(1) The cache is just invalid

(2) request A to query the database, get an old value

(3) Request B to write the new value to the database

(4) Request B to delete the cache

(5) Request A to write the found old value to cache OK. If the above situation occurs, it is true that dirty data will occur.

But what are the odds of that happening?

The congenital condition is that the write operation of step (3) takes less time than the read operation of step (2), so step (4) may precede step (5).

However, if you think about it, the database read operation is much faster than the write operation, so step (3) takes less time than step (2), this situation is difficult to occur. Hypothetically, what if someone has to push back, obsessive-compulsive disorder, and must be resolved?

How to solve the above concurrency problem?

First, setting an expiration time for the cache is one solution. Second, the asynchronous delay deletion policy described in Policy (2) is adopted to ensure that the read request is completed before the deletion operation.

Are there any other reasons for the inconsistency?

Yes, this is a problem that exists in both cache update policy (2) and cache update policy (3). What if the cache deletion fails? For example, a request to write data to the database, and then write to the database, the cache deletion fails, and this will cause inconsistencies. This is the last question left in cache update strategy (2).

How to solve it? Just provide a guaranteed retry mechanism, and two schemes are presented here.

Scheme 1: as shown in the figure below:

The process is shown below

  1. Update database data;
  2. The cache failed to delete due to various problems
  3. Sends the key that needs to be deleted to the message queue
  4. Consume the message yourself and get the key that needs to be deleted
  5. Continue to retry the delete operation until it succeeds. However, this scenario has a disadvantage of causing significant intrusion into the line of business code.

This leads to plan 2, in which a subscriber is started to subscribe to the binlog of the database to obtain the data to be manipulated. In the application, start another program, get the information from the subscriber, and delete the cache.

Scheme 2:



The process is as follows:

  1. Update database data
  2. The database writes the operation information to the binlog
  3. The subscriber extracts the required data and key
  4. Write another non-business code to get this information
  5. Failed to delete the cache operation. Procedure
  6. This information is sent to the message queue
  7. Retrieve the data from the message queue and retry the operation.

Note: The above subscription binlog program in mysql has a ready-made middleware called Canal, can complete the subscription binlog function. As for Oracle, there is no known off-the-shelf middleware available. In addition, the retry mechanism adopts the message queue mode. If the consistency requirement is not very high, directly start another thread in the program, every once in a while to retry, these we can be flexible and free to play, just to provide an idea. Redis and MySQL data double-write consistency project implementation case

3. Strong consistency scheme between Redis and Mysql

The database dbVersion is compared with the cache cacheVersion

Solution 1: Query the DATABASE twice and cache once. The procedure is as follows:

  1. Query caches (including cacheVersion)
  2. Query the database (version only, high index performance)
  3. Query the database (query the required data, relatively low performance)
  4. Return the data

Solution 2: Query the cache and database once. The procedure is as follows:

  1. Query caches (including cacheVersion)
  2. Query the database (both version and data, relatively low performance)

The above two schemes are for your reference only, and you need to choose the appropriate scheme according to the actual situation.

Redis and MySQL data double write consistency project Implementation Redis and MySQL data double write consistency solution Resolves the cache update routine of primary and secondary DB and cache consistency