One, foreword
I recently encountered problems online due to inconsistency between cache and database data. After investigation, we found that it is because we use the policy of write operation is to write database first, and then delete cache. If the data is read immediately after the write operation, the database will be checked because the cache is deleted. Old data from the library is loaded into the cache due to database master-slave latency. The database has been modified to new data, but the cache is still old data.
This online event brings up the age-old question of how to ensure database and cache consistency. That’s what we’re going to talk about today.
2. Analysis of inconsistent scenarios
1. When there is no concurrency
Let’s start with the simplest case. When a write request wants to change data from D1 to D2. Because we need to do two operations to write request: 1, write database; Update/delete the cache. These two operations are not a single transaction, so it is certainly possible that one will succeed and the other will fail (usually the latter, because if the former fails, the second operation will not normally be done). This leads to the following two situations.
- Change DB data from D1 to D2
- Then delete or update the cache (this step fails if an exception occurs)
Finally, the database data is D2, and the cache data is still D1.
- Update the cache data from D1 to D2
- An exception occurred when updating the database. Procedure
Finally, the database data is D1, and the cache data has been modified to D2.
2. Concurrency
Let’s move on to the more complex case of inconsistencies that can occur when reading and writing concurrently. To be clear, when a read misses the cache, we typically look up the database and write the value to the cache. But when writing data, the strategy is often different. We often think about two things. 1) Cache or database first; 2) Delete cache or update cache.
1) DB first, then cache
1. Update the cache after writing to the database
- The first is the scenario where the concurrent read and write operations are not hit.
- Thread A read cache, missed
- Thread A reads DB and gets Data1
- Thread B writes to DB, updating data from Data1 to Data2
- Thread B writes to the cache and updates to Data2
- Thread A writes to the cache, updating it to the Data1 read earlier
Finally, the DB value is Data2, but the value in the cache is Data1.
- The second scenario is concurrent write operations.
- Thread A writes to DB, writes to Data1
- Thread B writes DB, writes Data2
- Thread A updates the cache and writes Data1
- Thread B updates the cache and writes Data2
Finally, the DB value is Data2 and the cache value is Data1.
2. Write to the database and delete the cache
Deleting the cache after writing to the database seems to solve the above problem, as shown in the figure below.
But this is not a panacea, as I mentioned in the beginning of the online problem, the use of write DB+ delete cache strategy. Because our project read QPS is very large, but write QPS is not high. Therefore, the master-slave architecture with read and write separation is adopted. Write requests are made on the master library, while read requests access the slave library and rely on master/slave synchronization to ensure data consistency. Because master/slave synchronization takes time, the following situations can occur that cause DB and cache data to be inconsistent.
2) Cache first and then DB
1. Delete the cache and write data to the database
- Thread A writes the request and deletes the cache first
- Thread B read cache, missed
- Thread B reads DB and gets D1
- Thread A writes to the database, and D1 is updated to D2
- Thread B writes cache D1
Finally, the DB data is D2 and the cache is D1.
2. Update the cache and write to the database
- Thread B read cache, missed
- Thread A writes the request, and the cache is updated from D1 to D2
- Thread B reads DB and gets D1
- Thread A writes to DB and updates D1 to D2
- Thread B writes cache D1
Finally, the DB data is D2 and the cache data is D1.
Three, inconsistent processing methods
After cache and database inconsistencies, if the expiration time is very long and there is no write operation in the period, there will be a long period of incorrect data read. So how do you fix it or try to make it consistent?
1. Delay dual-delete
As the name suggests, the delay is double delete after finished the database, lie between a short period of time Δ Delta T \ {T} Δ T, delete cache once again. Of course the second cache deletion is done asynchronously.
In the following two cases, the delayed dual-delete policy ensures that dirty data in the cache will be deleted after a period of time. That is, the final consistency is achieved. However, there may be requests to read dirty data.
What is the value of Delta T\Delta{T} Delta T? δ T\Delta{T} δ T is deleted again in order to delete the dirty data generated by concurrent unhit reads. So it is generally slightly larger than a read request and slightly larger than the master-slave synchronization delay.
2. Delete cache retry mechanism
This step may cause exceptions. To ensure successful cache deletion, you can introduce a retry mechanism. If the operation fails to delete the cache, it enters the retry queue. The retry queue selection can be either Kafka or a list in Redis. For those with less consistency requirements, you can even store queues in stand-alone memory.
3. Read binlog to check the cache
Use component/middleware to get the binlog of the database. Binlog In Row mode, the latest data in the Row is displayed after parsing. Use this information to look up the cache and delete the cache if any inconsistency is found. If they are the same, they are not processed.
Four,
In fact, the final use of which strategy to write data depends on the characteristics of their own services, there is no universal strategy (unless serialization or a lot of restrictions to ensure strong consistency of data, which will reduce system availability).
The frequently used Cache Aside strategy, which is to update the database first and then delete the Cache for write requests, has encountered many problems in our service. So we ended up updating the database first and then the cache.
For the online situation, you can try different policies, and do the consistency statistics of database and cache in the background, and select the most appropriate solution based on service characteristics.