@[TOC] Many people have encountered similar questions in the interview, how to ensure that the cache and database consistency?

If you’ve done any research on this question, you’ll know that it’s actually quite easy to answer. If you’re hearing it for the first time, or if you’re encountering it for the first time, you might be a little confused. Let’s talk about it today.

1. Problem analysis

First let’s see why this is a problem!

In our daily development, in order to improve the data response speed, we may save some hot data in the cache, so that we do not have to go to the database every time to query, can effectively improve the response speed of the server, so at present we most often use the cache is Redis.

Using Redis for caching does not mean that caching is Redis, but it still needs to be combined with the specific situation of the business. We can divide data into three levels according to the different real-time requirements of different businesses. Taking e-commerce projects as an example:

  • Level 1: Order data and payment flow data: these two pieces of data have high requirements on real-time performance and accuracy, so it is generally not necessary to add cache, and can directly operate the database.
  • Level 2: user-related data: These data are user-related and have the characteristics of read more than write less, so we use Redis for caching.
  • Level 3: payment configuration information: these data are user-independent and have the characteristics of small amount of data, frequent reading and almost no modification, so we use local memory for caching.

Select the appropriate data and store it in Redis. Then, whenever you want to read data, you will first go to Redis to check whether there is any, and if there is any, you will directly return. If not, go to the database to read, and read data from the database cache to Redis, basically such a process, the process of reading data is actually relatively clear and relatively simple, there is nothing to say.

However, when data is cached, another problem often arises if it needs to be updated:

  1. When data needs to be updated, do you update the cache or the database first? How do I ensure atomicity in updating the cache and updating the database?
  2. How do I update the cache? Modify or delete?

How to do? Normally, we have four options:

  1. Update the cache first, then the database.
  2. Update the database first, then the cache.
  3. First eliminate the cache, then update the database.
  4. Update the database first, then eliminate the cache.

Which one?

Before we answer this question, let’s look at three classic caching patterns:

  1. Cache-Aside
  2. Read-Through/Write through
  3. Write Behind

2. Cache-Aside

If cache-aside is used in our project, then we can solve the problem of inconsistency between Cache and database data as much as possible.

There are two kinds of cache-aside: read Cache and write Cache.

2.1 read cache

Let’s start with a flow chart:

It goes like this:

  1. Read data.
  2. Check if there is any needed data in the Cache. If a Cache Hit is Hit, the data is returned.
  3. If there is a Cache Miss, the database is accessed first.
  4. Sets the data read from the database into the cache.
  5. Return data.

This is cache-aside reading Cache flow.

In fact, for the process of reading cache, people generally have no objection, the main objection is the write process, let’s continue to look at.

2.2 write caching

Let’s start with a flow chart:

The process of writing to the cache is relatively simple, updating the data in the database and then deleting the old cache.

The process is simple, but it raises two questions:

  1. Why delete the old cache instead of updating it?
  2. Why not delete the old cache first and update the database later?

Let’s answer these two questions separately.

Why delete the old cache instead of updating it?

  1. Updating the cache is easier said than done. Many times when we update the cache, we don’t simply update a Bean. Most of the time, we cache the results of some complex operations or calculations (such as a large number of join table operations, some group calculations). If we do not cache, not only cannot meet the high concurrency, but also will bring a huge burden to the MySQL database. Therefore, it is not easy to update the cache, so it is better to delete the cache.
  2. For someWrite frequentlyThe application if according toUpdate the cache -> Update the databaseMode, more waste performance, because the first write cache is trouble, then every time write cache, but may has written ten times, read only once; read read cache data is the tenth time, nine times in front of the write cache is invalid, in this case than write database first, then delete the cache strategy.
  3. In a multi-threaded environment, such an update strategy may also lead to data logic errors, as shown in the following flow chart:

As you can see, there are two concurrent threads A and B:

  • First, thread A updates the database.
  • Next, thread B updates the database.
  • Due to network and other reasons, the cache of line B was updated first.
  • Thread A updated the cache.

In this case, the data stored in the cache is incorrect, which would not have happened if the cache had been deleted.

Why not delete the old cache first and update the database later?

If we delete the old cache and then update the database, the following situation may occur:

There are two threads, A and B, where A writes data and B reads data. The process is as follows:

  1. Thread A first deletes the cache.
  2. Thread B reads the cache and finds no data in the cache.
  3. Thread B reads the database.
  4. Thread B writes the data it reads from the database to the cache.
  5. Thread A updates the database.

After a set of operations, we found that the database and cache data inconsistency! So, in cache-aside, update the database first, then delete the Cache.

2.3 Delayed Dual-Delete

Either update the database first and then delete the cache, or delete the cache first and then update the database in a concurrent environment can cause problems:

Suppose there are two concurrent requests A and B:

  • Update the database before deleting the Cache: Request B finds and uses the old data in the Cache before clearing the Cache after request A updates the database.
  • Delete the Cache first and then update the database: After user A clears the Cache but does not update the database, user B is asked to query the old data and write it into the Cache.

Of course, we have already analyzed before, try to operate the database before operating the cache, but even then there may be problems, the solution to the problem is to delay double delete.

In this case, the cache is cleared first, the database is updated, and the cache is cleared again after N seconds. In this way, the data in the cache is not inconsistent with the data in the database.

So what is the appropriate delay of N seconds? Generally speaking, N must be longer than the time of A write operation. If the delay is less than the time of writing to the cache, request A has been delayed clearing the cache, but request B has not been written to the cache. The specific value should be calculated based on your own business.

2.4 How to ensure atomicity

After all, updating the database and deleting the cache are not atomic operations. What if deleting the cache fails after updating the database?

A common solution to this situation is to use message-oriented middleware to implement retry of deletes. As you know, MQ usually has a retry mechanism for consuming failure. When we want to delete the cache, we throw a message to MQ. The cache service reads the message and tries to delete the cache. If you do not know how to use RabbitMQ, you can reply to RabbitMQ in the public account jiangnan little rain background, there is a free video + documentation.

3. Read-Through/Write-Through

Oracle Coherence, an in-memory data grid that allows developers and administrators to quickly access key and value data, is what Songo is most impressed with. Coherence provides clustered, low-latency data storage, multilingual grid computing and asynchronous event streaming to deliver exceptional levels of scalability and performance to customer enterprise applications.

Oracle Coherence will not be discussed. Let’s talk about read-through.

3.1 Read-Through

Here in order to save trouble, I will not draw their own pictures, online to find a picture, as follows:

At first glance, a lot of people might think it’s the same as cache-aside. Yeah, it’s hard to see the difference just looking at the flow.

Read-through is a caching method similar to cache-aside, except that in cache-aside, it is up to the application to Read the Cache or the database, resulting in a lot of non-business code in the application. In read-through, an additional Cache Middleware layer is created to Read the Cache or database, simplifying the application layer. Remember the @cacheable annotation in Spring Cache. Does it feel like read-through?

Let me draw a simple flow chart for you to see:

Aside, it’s like having a Cache Middleware bar, so you can just read and write data in your application, regardless of the underlying logic. It’s like taking cache-specific code out of your application. The application just needs to focus on the business.

3.2 Write – Through

The same is true for write-through, where all the work is done by Cache Middleware. In an application, it’s a simple update. Here’s how it works:

In a write-through strategy, all writes go Through Cache Middleware. Cache Middleware stores data in the DB and Cache on each Write. These two operations occur in a single transaction, so only two writes succeed. Everything will be successful.

The advantage of writing data is that the application only talks to Cache Middleware, so its code is cleaner and simpler.

4. Write Behind

The write-behind Cache policy is similar to the write-through Cache in that applications only communicate with Cache Middleware, and Cache Middleware reserves an interface to communicate with applications.

The main difference between write-behind and write-through is that data is first written to the cache, and then written to the Database after a period of time (or Through other triggers). The Write is an asynchronous operation. In this mode, the consistency between Cache and DB data is not strong. Therefore, use caution on systems that require high consistency. If someone directly obtains data from the data source before data is written to the data source, expired data may be obtained.

Writing data to DB can be done in a number of ways:

  • One is to collect all writes and then batch write to the data source at a certain point in time (for example, when DB load is low).
  • Another approach is to consolidate writes into smaller batches, such as collecting five writes at a time and then bulk writes to the data source.

This flow chart does not want to draw, found a picture on the Internet, friends for reference:

Ok, I talked with my friends about the consistency of double writing. If you have any questions, please leave a comment.

References:

  • www.jianshu.com/p/a8eb14124…
  • Catsincode.com/caching-str…