“This is the 23rd day of my participation in the First Challenge 2022. For details: First Challenge 2022.”

We often use cache in our daily work to optimize the performance of our program, but after the use of cache, often cause a problem, that is, the database cache double write consistency problem, when we update data how to do, in order to better ensure the consistency of database cache data.

1 introduction

In daily development, the slow response of our interface is often caused by database read and write. At this time, in order to optimize these interfaces, we often write the data of the database into the cache, so that the interface directly obtains data from the cache, which can greatly improve the access speed of the interface. However, the problem is that when we update the data in the database, we need to update the data in the cache, or delete the data in the cache, so that it can read the database again, and then flush the data into the cache. However, if concurrency occurs during this period, it can easily lead to data inconsistencies in the database cache, which is the subject of this article on how to deal with database cache double-write consistency.

By consistency we mean final consistency, not strong consistency.

2 Recommended Cache Aside Pattern

Hit: the program reads data from the cache and hits when it reads data

Invalid: the program reads data from the cache, but does not read the data. In this case, the cache is invalid, and the program needs to read data from the database and then flush into the cache

Update: Update the database first, then delete the cache

3 Update Scheme

  • Update the cache

    • Update the cache first, then the database
    • Update the database first, then the cache
  • Delete the cache

    • Delete the cache first, then update the database
    • Update the database before deleting the cache (common)

4 Update the solution selection reason

4.1 Update the cache first and then the database

The following figure, when A thread to update the cached data. A, then A thread appeared delay, B threads will be covered in A thread cache update, and will also update the data in the database A, A thread to update the database, then B to modify the database covers threads, this would be A serious double writing, Lead to follow-up every time read the data in the cache is problematic, and database of the data is more important for us, we usually persistent is dependent on the database, if update the cache first, follow-up process downtime, data in the database is not updated, we generally is not dependent on the cache do persist, So this kind of plan must not be chosen.

4.2 Update the database first and then the cache

Below, when A thread to update the database data. A, then A thread appeared delay, thread B threads will be A database update cover, and A data from the cache is updated, A thread to restore to update the cache, then the thread to modify A cache of B cover, it will appear serious double writing, Each subsequent retrieval of data from the cache is problematic.

As with the previous solution, any attempt to update the cache data will have this serious double write inconsistency, so we generally do not take the update cache solution, but instead choose to delete the cache solution.

4.3 Delete the cache and then update the database

As shown in the figure below, when thread A deletes cached data A, thread A will be delayed, thread B will read A, find no cached data, find out the old value of database A, and update it to the cache. When thread A recovers, it will write the new value of A to the database, which will also cause serious double-write inconsistency. Each subsequent retrieval of data from the cache is problematic.

Therefore, deleting the cache first is not recommended.

4.4 Update the Database before Deleting the Cache (Common)

We chose to update the database first and then delete the cache. Of course, there is a chance of inconsistent cache. When the cache is invalid, the same problem will occur as deleting the cache first and updating the database. The diagram below:

But under normal circumstances, it is not the above situation, the occurrence of the above situation probability is particularly low. In this case, you can also use delayed double deletion. You can delete the data once and let the thread sleep for a while, and then delete the data again.

So this solution will only lead to the next situation, if you want to avoid this situation can only be solved by locking, to avoid reading dirty data.

5 Optimization Scheme

What other optimizations can we make based on the above scheme

  • Read data locking (distributed locking) prevents high concurrency from crushing the database
  • Delayed dual deletion prevents cache failure (secondary library read delay in read-write separation architecture) and saves old data. The second deletion can be performed asynchronously to wait for deletion
  • If a retry mechanism is required, it can rely on reliable consumption of message queues
  • The deletion logic can be optimized by subscribing to Binlog

Don’t: Over designIn general, simple delayed double deletion can meet the requirements without increasing the system complexity