preface

In today’s system architecture, caching plays a very high role. Because in the era of Internet, the amount of concurrent requests can be very high, but high relational database for concurrent processing capability is not very strong, and because the cache is in memory processing, does not need disk IO, so it is very suitable for the processing of high concurrency, and became an essential part of the system, Redis + Ali internal distributed cache

However, there are many problems arising from this, one of which is how to ensure data consistency between the database and the cache.

Because the operation of the database and the operation of the cache can not be in a transaction, it is bound to appear the database write failure, the cache can not update, the compensation mechanism of the cache write failure. So how do we do that?

Let’s start with one of the most common read cache examples

Of all the ways to read the cache, the one shown above is arguably the most widely used. There is nothing wrong with the reads themselves, but the way they are written to the cache is crucial to ensuring data consistency.

We do not consider periodic flushing of the cache, such as the following:

Writing to the database and writing to the cache are independent. After writing to the database, the cache data is refreshed only after the scheduled service is executed.

In this mode, data inconsistency takes a long time. When data is refreshed, it will be reloaded regardless of whether the data is changed or not, resulting in poor efficiency. Of course, this is not to say that this approach is useless, but there are some scenarios that can be used, such as caching of some system configurations, and it is easy to maintain with very little code.

We are only going to think about data consistency for double write today. Because of the different writing methods, the possible results are also different. In general, what are the ways we write data and flush the cache?

Method 1: Update the database first, and then update the cache

This scheme is the simplest one, so let’s take a look at the flow chart

With this double-write scheme, as long as the data is successfully written to the database, the cache can be refreshed, the code is simple and easy to maintain. However, the simple premise, the problem is also very straightforward.

First, thread data security cannot be guaranteed

For example, we now have two simultaneous requests that operate on the same piece of data, one request A and one request B. Request A is executed first and request B is executed later, so the database record is the record after request B is executed.

However, due to network reasons or other circumstances, the final order of execution may be:

Request A Update database -> Request B Update database -> Request B Update cache -> Request A Update cache.

This results in:

  1. The data in the database and the cache are inconsistent, making the data in the cache dirty.

  2. There are more writes than reads, and the cache is flushed frequently, but the data is never read. This wastes resources on the server.

Therefore, this double-write mode cannot ensure data consistency and is not recommended.

Delete the cache first and then update the database

Because of the problem with the above approach, we will consider whether we can delete the cache first and update the database, so that before and after updating the database, since there is no data in the cache, the request will pass through the database and read the data directly into the cache, so that the cache will not be refreshed frequently.

So, we set up a new execution order:

This, however, raises new questions. There are two requests, one request A, one request B, request A to write data, request B to read data. When concurrency is high, the following happens:

Request B finds that the cache does not exist. Request B searches for the old value in the database. Request B writes the old value into the cache

This is dirty data again. If we do not set the expiration time of the cache, the dirty data will remain until the next data is loaded. In view of this kind of dirty data, we decided to add a little delay after writing data, and then delete data again, so we have method three.

Method 3. Delayed double deletion

Using the strategy of delayed double deletion can well solve the data inconsistency caused by the concurrency before. Is there no problem with delayed double deletion? Not at all.

Let’s imagine a scenario where we do read/write separation. What happens when we use delayed double delete?

Request for A write operation, delete the cache – > request to write data into the database A – > B query cache found that no value – > ask B to cache queries from the library, then, is not yet complete master-slave synchronization, so the query to the old value – > request B will old value written to the cache – > database to complete the master-slave synchronization, from libraries to the new value.

Oops, there’s another data inconsistency.

Then we will see how the performance is. Due to the delay, if it is synchronous execution, the performance must be very poor, so the second deletion can only be made asynchronous to avoid affecting the performance. If the asynchronous thread fails, the old data will not be deleted and data inconsistencies will occur again.

No, we need a once and for all solution, simple double deletion is still not reliable.

Method 4. Queue delete cache

After updating the data to the database, we queue the message to remove the cache. If the queue fails, we queue it again until it succeeds.

In this way, we can effectively ensure that the database and cache data consistency, whether read/write separation or other cases, as long as the queue messages are safe, the cache will be flushed.

Of course, according to this scheme, we can further optimize. Because here our cache flush is based on business code, that is, business code and cache flush are highly coupled. Is there a way to separate the cache flush from the business code?

Binlog subscription deletes the cache

To keep the business code independent, we can refresh the cache by subscribing to the binlog. Start mysql binlog first, and then design process as shown in the following figure:

By subscribing to binlog, we separate the business code from the non-business code that the cache flusher. The amount of code is small, but also easy to maintain. Programmers don’t have to worry about when or if the cache should be flushed.

Of course, in practice, there are many different business scenarios that may require different data consistency synchronization solutions. This is just one case.