How to ensure data consistency between the cache and the database in dual write?

When doing system optimization, I thought of the idea of hierarchical storage of data. Because there will be some data in the system, the real-time requirements of some data are not high, such as some configuration information. Basically, it takes a long time for the configuration to change. And some data have very high real-time requirements, such as order and flow data. Therefore, data are divided into three levels according to different real-time requirements.

  • Level 1: Order data and payment flow data; These two pieces of data for real-time and accuracy requirements are very high, so do not add any cache, read and write operations will directly operate the database.

  • Level 2: user-related data; These data are user-related and have the characteristics of read more than write less, so we use Redis for caching.

  • Level 3: Payment configuration information; These data are user-independent and have the characteristics of small amount of data, frequent reading and almost no modification, so we use local memory for caching.

However, as long as caches are used, either in local memory or redis, there is a data synchronization problem because configuration information is cached in memory, which is not aware of changes in the database. This can cause data in the database to be inconsistent with the data in the cache. Next, we’ll talk about ensuring data consistency between cache and database double writes.

The solution

So let’s list all the strategies here, and discuss their pros and cons.

  1. Update the database first, then the cache
  2. Update the database first, then delete the cache
  3. Update the cache first, then the database
  4. Delete the cache first and update the database later

Update the database first, then the cache

This scenario is generally not used, mainly because of the update cache step. Why? Because some business needs cache values that are not directly retrieved from the database, some need to go through a series of computed values in the cache, then you have to update the cache at this point is actually very expensive. If there are a large number of write requests to the database, but not many read requests, then if the cache is updated for each write request, the performance cost will be very high.

For example as in a database has a value of 1, now we have the time of 10 requests a operation, but this period did not read operation to come in, if used to update the database, so at this time there will be ten requests to the cache is updated, there will be a lot of cold data, if we don’t update cache but delete cache, So when a read request comes in then the cache will only be updated once.

Update the cache first, then the database

I don’t think we need to think about this case, it’s the same as the first case.

Delete the cache first and update the database later

This scheme can also go wrong for the following reasons.

Delete the cache first and update the database later

Two requests come in, request A (update operation) and request B (query operation)

  1. Request A will delete the data in Redis first and then go to the database for update operation
  2. When request B sees the data space and time in Redis, it will query the value in the database and add it to Redis
  3. But request A has not been updated successfully or the transaction has not yet committed

At this time, the database and Redis data inconsistency will occur. How to solve it? In fact, the simplest solution is the delayed double deletion strategy.

Delay double delete

There is also a problem with removing the cache after the transaction is committed. If you are using Mysql’s read-write architecture, there will be a time lag between the master and slave synchronization.

Time difference between master and slave synchronization

Two requests come in, request A (update operation) and request B (query operation)

  1. Request A update operation, delete Redis
  2. Request the master database for update operation, master database and slave database for data synchronization operation
  3. B checked the operation and found no data in Redis
  4. Go get the data from the library
  5. At this time, the synchronization data has not been completed, and the data obtained is old data

The solution is to force Redis to point to the master library if it is a query database operation that populates data.

Fetch data from the master library

Update the database first, then delete the cache

Problem: This situation can also cause problems, such as updating the database successfully, but failed to delete the cache during the deletion phase, then the cache will be read again every time the wrong data.

Update the database first, then delete the cache

The solution at this point is to compensate for deletion with message queues. The specific service logic is described as follows:

  1. Request A to update the database first
  2. When deleting Redis, an error was reported and the deletion failed
  3. The key of Redis is sent to the message queue as the message body
  4. After receiving the message from the message queue, the system deletes the Redis again

However, one disadvantage of this scheme is that it will cause a lot of intrusion into the business code and deeply coupled together, so there will be an optimized scheme at this time. We know that the corresponding operation can be found in the binlog after the Mysql database update operation. So we can subscribe to the Mysql database’s binlog to operate on the cache.

Delete the cache with subscription binlog

conclusion

For each solution has advantages and disadvantages, such as the second first remove the cache, update the database after we last discussed the scheme to update the Redis when forced to walk the main library query can solve the problem, then the operation to a large number of business code to enter, but I don’t need to increase the system, do not need to increase the complexity of the overall service. In the last solution, we discussed the use of subscription binlog to build an independent system to operate Redis. Such a disadvantage actually increases the system complexity. In fact, every choice requires us to evaluate our business to choose, no technology is universal for all businesses. There is no best, only the most suitable for us.