The introduction

In today’s system development, caching data has become a routine operation in development mode in order to speed up business and interface processing. By introducing cache, the query operation of database is reduced and the query speed of data is improved. But every thing must be seen from both sides. The introduction of caching brings many advantages, but also increases the complexity of the system, such as how to ensure the consistency of cache and database.

Caching strategies

In real business, we often adopt a caching strategy as follows:

  • Cache – database read process
  1. The user initiates a query request
  2. The business service first queries cached as keys based on key parameters
  3. If a cache hit exists in the cache, the query results are returned directly from the cache.
  4. If the data is not in the cache cache miss, the database query operation is performed, the results are cached and the query results are returned.
  • Cache – database write process
  1. The user makes a request, and the data needs to be written.
  2. After the business service completes the logical processing, it begins to update the database.
  3. Delete cached data according to key after database update.

This Strategy is known as a cache-aside Strategy, and its core idea is to Cache objects only when requested by an application. And this strategy is suitable for the scenario of frequent read but infrequent write or update, that is, once the data is written, it is mainly used for query display and will not be updated.

There are two other strategies that are commonly used:

  • Read-through/write-through Caching Strategy: Reading and writing requests are uniformly encapsulated by the Caching layer, and business services operate on Caching only.
  • Write-behind Caching Strategy: Data reads are similar to read-through, but data writes are asynchronously batch processed by separate threads to update the database.

The above are the three commonly used caching strategies, which will be explained in detail later. This paper only analyzes and explains cache-aside.

Problem is introduced into

Under the Cache-aside strategy, there are two options for handling write/update requests:

  • Question 1: Update or delete the old data in the cache?
  • Problem two: update the database first or cache first

Update OR Delete

Assuming that we have chosen to cache updates, let’s look at the case where multiple requests are actually concurrent

  1. A and B are requested to update the data at the same time;
  2. The request is processed in the respective business threads A and B;
  3. Thread A updates the database to 90, thread B updates the database to 80;
  4. Because thread A and B execute concurrently, B first updates the cache, and then thread A executes the cache update. As A result, the value in the database is 80, and the data in the cache is 90, and there is an inconsistency between the database and cache.

Based on this scenario, choosing to delete the cache can prevent a similar problem, at most a cache miss, triggering a query load from the database. But is removing the cache perfect?

Database first OR cache first

In standard practice, we choose to do the database update first, and then do the cache. First of all, the result of a business operation is not considered complete until the database is persisted. Second, let’s take a look at possible caching issues first.

  1. At the same time, request A for data update and request B for query;
  2. Thread A completes the cache deletion operation first.
  3. Because of the concurrency, thread B executes after A deletes the cache, and A cache miss triggers the database query to load
  4. Thread B completes the database query, retrieves the old data 100, and caches the query result.
  5. Thread A completes the database update with A result of 90 in the database, resulting in an inconsistency between the cache and the database.

If you choose to update the database first, can you guarantee consistency? Not necessarily.

  • Scenario 1: Cache deletion fails.

After the completion of the operation of the database, the cache deletion failed because of the cache service and other reasons, resulting in inconsistent database and cache.

  • Scenario 2: Cache invalidation

Usually we will cache the data set a certain period of validity, so it is back to the case of multiple concurrent requests

  1. At the same time, request A for data update and request B for query;
  2. Thread A performs the database update operation;
  3. The cache data expired when thread B made the query request, triggering the database query to load (which is equivalent to a delete being performed first). Thread B completes database query and gets old data 100;
  4. Thread A completes the database update to 90 and then deletes the cache
  5. Thread B performs the cache operation, setting the cache data to 100. There is a mismatch between the cache and the database.

But there are several prerequisites for this to happen:

First, the caching platform appears abnormal, with a low probability;

Second, the cache data is expired, and the database query operation takes longer than the update operation, resulting in the setting of the cache, the probability can be said to be very small.

Can it be consistent?

Through the analysis of the above scenarios, it can be found that even if we choose the standard bypass cache strategy, we still cannot guarantee 100% data consistency. Here, we need to introduce the core CAP theory under the distributed system. Based on the analysis of CAP theory, the system using cache belongs to AP in CAP theory, so we cannot guarantee strong consistency, but can only achieve the final consistency in BASE theory, that is, to ensure the final consistency of cache and database data.

Final consistency emphasizes that all copies of the data in the system, after a period of synchronization, can eventually reach a consistent state. Therefore, the essence of the final consistency is that the system needs to ensure that the final data can achieve the consistency, but does not need to guarantee the strong consistency of the system data in real time.

Final consensus scheme

Delay double deletion scheme

It can be seen from the name that the essence of the scheme is to delete the cache again after a certain delay, so as to solve the problem of cache to old data in the concurrent situation. Even if the operation of cache is carried out before the operation of database, the consistency of final data can be guaranteed.

  • Solution process

  1. The user initiates a request for updated data to be written
  2. The business service first deletes the cache
  3. The business service then updates the database
  4. After a delay of some time T, the cache deletion is performed again.
  • Project analysis

The key point of this scheme lies in the delay time T, which is usually set as the time of a query operation in the same service + several hundred milliseconds, so as to ensure that the second deletion can clear the cache dirty data caused by concurrency.

The disadvantages of this scheme are:

  1. Need to evaluate latency for possible, and add secondary deletion logic, code strong coupling, increased complexity.
  2. A cache failure can also occur with a secondary delete.

Cache deletion retry

To ensure a successful cache deletion, you need to add a retry mechanism in the event of a cache failure. Failed data deletion can be asynchronously retried with the help of message queues.

  1. The user initiates a request for updated data to be written
  2. The business service first performs the database update operation
  3. The business service then performs the cache deletion, which fails for some reason
  4. The failed cache key will be removed into the message queue
  5. ConSUME the message in the message queue and get the cached key that needs to be retried
  6. Retry the cache delete operation
  • Project analysis

Although this scheme dismantled the retry logic and independently executed, it needed to add the deletion failure processing code into the normal business logic, which was very intrusive. Now let’s look at the solution to cache deletion using MySQL Binlog

Binlog cache deletion scheme

By subscribing to the database’s BinLog, which stores a log of changes made to the database, updates to the cache are made, and the business code no longer cares about the cache update operations.

  1. The user initiates a request for updated data to be written
  2. The business service performs the database update operation to complete the business request
  3. Database operations are written to the BINLOG log
  4. Subscribe to the database BinLog log (e.g., canel) through the middleware to get the keys and data that need to be updated in the cache
  5. The cache is deleted according to the result of parsing, and if the deletion fails, it is put into the message queue
  6. ConSUME the message in the message queue and get the cached key that needs to be retried
  7. Retry the cache delete operation

conclusion

The problem of cache and database consistency is that cache and database operations are not caused by atomicity under high concurrent requests. Although many schemes can be introduced to ensure the final consistency of data, either scheme will greatly increase the complexity of the system and introduce more problems. Therefore, it is necessary to reasonably evaluate the business and select the appropriate solution for the sensitivity of data consistency. It is not necessary to pursue consistency for the sake of consistency.