Previously

There are many problems and knowledge points involved in the cache system. I am mainly divided into the following aspects to discuss with you:

  • The stability of
  • correctness
  • observability
  • Standardize landing and tool construction

In the previous article, we analyzed the stability of the cache system and introduced how Go-Zero solved cache penetration, cache breakdown, and cache avalanche. Relatively easy to understand, and has a strong practical significance, recommended a read.

This article, the second in a series of articles, focuses on caching data consistency

Cache correctness

As mentioned in the previous article, we introduced caching to reduce DB stress and increase system stability, so we initially focused on the stability of the cache system. When stability is resolved, we often face data correctness problems. We may often encounter “why is the data still old when it is updated?” That sort of thing. This is often referred to as the “cache data consistency” problem. Let’s take a closer look at the causes and how to deal with them.

Common practices for data updates

First of all, we speak data consistency is the premise of we don’t update the DB and cache and delete as an atomic operation, because in the scenario of high concurrency, we could not have introduced a distributed lock to both binding as an atomic operation, if the binding will largely affect the concurrent performance, and increase system complexity, Therefore, we will only pursue the final consistency of data, and this paper only applies to high concurrency scenarios that do not require strong consistency. Students can make their own judgment on financial payment.

There are two common types of data updates, and the rest are basically variations of these two types:

  • Delete the cache first, then update the database

When data is updated, we delete the cache first and then update the DB, as shown on the left. Let’s take a look at the process:

  • When A requests to update data, the corresponding cache is deleted first, and the DB has not been updated yet
  • B requests to read data
  • Request B reads from DB and writes old data to cache (dirty data)
  • A requests to update the DB

As you can see, the B request writes dirty data to the cache. If it is an overread and underwritten data, the dirty data may exist for a long time (either with subsequent updates or waiting for the cache to expire), which is not acceptable for the business.

  • Update the database first, then delete the cache

In the right part of the figure above, it can be seen that B will read the old data between A updating the DB and deleting the cache, because the operation of A has not been completed at this time, and the time to read the old data is very short, which can meet the requirements of data consistency.

As you can see in the figure above, we used delete cache instead of update cache for the following reasons:

In the figure above, I use operations instead of deleting or updating. When we delete, it does not matter whether A deletes or B deletes first, because the latest data will be loaded from DB in subsequent read requests. However, when we update the cache, we will be sensitive to whether A updates the cache first or B updates the cache first. If A updates the cache after A, then there will be dirty data in the cache again, so Go-Zero will only delete the cache.

Let’s take a look at the complete request processing flow:

Note: Different colors represent different requests.

  • Request 1 to update DB
  • Request 2 queries the same data and returns old data. This short period of time to return old data is acceptable and meets final consistency
  • Request 1 deletes the cache
  • If request 3 does not exist in the cache, it will query the database and write back to the cache to return the result
  • Subsequent requests will read directly from the cache

Another question for you to think about is, what should we do about this scenario?

If you have a good solution or want to know how to solve it, you are welcome to go to the wechat group of go-Zero community. It is better to teach people to fish than to teach them to fish

To be continued

In this article, WE discussed the consistency of cached data. In the next article, WE will discuss the monitoring of caching systems and how to make caching code more formal and less buggy.

The solutions to all of these problems are included in the Go-Zero Microservices framework. If you want to get a better understanding of the Go-Zero project, please go to the official website for detailed examples.

Video Playback Address

ArchSummit Architect Summit – Cache architecture design for massive concurrency

The project address

Github.com/tal-tech/go…

Welcome to Go-Zero and star support us!

Wechat communication group

Pay attention to the “micro service practice” public account and click into the group to obtain the qr code of the community group.