Ref
- Cache avalanche, breakdown, and penetration analysis
- “We together into a giant” series – the cache avalanche, breakdown, penetrate | the Denver nuggets
- Redis – avoid cache penetrating weapon BloomFilter | the Denver nuggets
Three problems with cache exceptions
When designing a caching system, you face three problems with cache exceptions
- Cache avalanche
- Cache breakdown
- The cache to penetrate
classification | why |
---|---|
Cache avalanche | 1. A large amount of cached data expires at the same time. 2. Redis fault |
Cache breakdown | Hot data cache expired |
The cache to penetrate | The data is neither in the cache nor in the database |
Cache avalanche
The reasons causing
- Large amounts of cached data expire or become invalid at the same time
- Redis is down
The solution
1. Set the expiration time evenly to prevent the expiration at the same time
Avoid setting a large number of data to the same expiration date. When setting an expiration time for cached data, you can add a random number to the expiration time to ensure that the data will not expire at the same time.
SetRedis (Key, value, time + math.random () *10000);Copy the code
2. The mutex
Use mutex to ensure that only one request is made at a time to build the cache (reading data from the database and updating it to Redis). When the cache build is complete, the lock is released.
Requests that fail to acquire the mutex will either wait for the lock to be released and re-read the cache, or return null or default values.
When implementing a mutex, it is best to set a timeout period, otherwise the first request gets the lock, and then something happens to this request and it blocks and does not release the lock, then other requests do not get the lock, and the whole system will be unresponsive.
3. Dual-key policy
Two keys are used for cached data. One is the primary key, and the expiration time is set. One is the standby key, which is not set to expire. The key is different, but the value is the same, making a copy of the cached data.
If the service thread cannot access the cache data of the primary key, it directly returns the cache data of the secondary key, and updates the data of both the primary and secondary keys when updating the cache.
4. Background update cache, periodic update or message queue notification update
The business thread is no longer responsible for updating the cache, and the cache does not set an expiration date. Instead, the cache is made “permanent” and updated periodically by background threads.
The fact that cached data does not have an expiration date does not mean that data is always in memory. Because when the system memory is tight, some cached data will be “eliminated”, and in the period between the “eliminated” cache and the next scheduled update of the cache in the background, the business thread fails to read the cache and returns a null value, which is considered as data loss from the perspective of the business.
There are two ways to solve the above problem.
- In the first way, the background thread is responsible not only for updating the cache periodically, but also for frequently checking whether the cache is valid. A cache failure is detected, possibly due to system strain and obsolescence, and data is immediately read from the database and updated to the cache.
This method of detection interval should not be too long, too long will cause the user to obtain a null value rather than real data, so the detection interval should be milliseconds. But there is always an interval and the user experience is mediocre.
- In the second way, after the business thread finds that the cache data is invalid (the cache data is obsolete), it sends a message through the message queue to inform the background thread to update the cache. After receiving the message, the background thread can determine whether the cache exists before updating the cache, and does not update the cache if the cache exists. The database data is read if it does not exist and loaded into the cache. This method provides more timely cache updates and better user experience than the first method.
It is better to slow down the data early in the business rather than wait for user access to trigger a cache build, which is known as “cache warm-up.” The background update cache mechanism also works well for this purpose.
5. Service fuse
Lead to cache for Redis fault outage avalanche problem, we can start the service circuit breakers, suspension of business application access to cache service, direct return error, don’t have to continue to access the database, thus reducing pressure of access to the database, ensure the normal operation of the database system, and then wait for Redis returned to normal after, Then allow business applications to access the caching service.
6. Request traffic limiting
Enable the request traffic limiting mechanism. Only a small part of the requests are sent to the database for processing, and more requests are directly denied service at the entrance. After Redis recovers and the cache is warmed up, the request traffic limiting mechanism will be lifted.
7. Build a Redis cache highly reliable cluster by using primary and secondary nodes
Redis cache highly reliable cluster is constructed by using master and slave nodes.
If the primary Redis cache node fails and goes down, the secondary node can switch to the primary node and continue to provide cache services, avoiding the cache avalanche caused by the Redis failure.
Cache breakdown
The reasons causing
- Hot data cache expired
If a hotspot data in the cache expires and a large number of requests access the hotspot data, it cannot be read from the cache and accesses the database directly. The database is easily overwhelmed by high concurrency requests, resulting in cache breakdown.
Cache breakdowns are similar to cache avalanches and can be considered a subset of cache avalanches.
The solution
1. The mutex
“Mutex” solution with “cache Avalanche”.
The background asynchronously updates the cache without setting an expiration time for hotspot data, or notifies the background thread to update the cache and reset the expiration time before hotspot data is about to expire
2. Do not set an expiration time for hotspot data. The cache is asynchronously updated by the background
No expiration time is set for hotspot data and the cache is asynchronously updated by the background.
The cache to penetrate
The reasons causing
In the event of a cache avalanche or breakdown, data for application access is still stored in the database. Once the cache recovers the corresponding data, it takes the strain off the database, but cache penetration is a different story.
When the user accesses data that is neither in the cache nor in the database, the cache is missing when the user accesses the cache. When the user accesses the database again, the cache data is not found in the database. Therefore, the cache data cannot be constructed to serve the subsequent requests. Then when a large number of such requests arrive, the database pressure increases dramatically, which is the problem of cache penetration.
Cache penetration generally occurs in two ways
- The data in the cache and the database are deleted by mistake. As a result, there is no data in the cache and database
- Malicious attacks by hackers that intentionally flood businesses that read nonexistent data
The solution
1. Limit illegal requests
When the interface is called, judge whether the request parameters are reasonable, whether the request parameters contain illegal values, whether the request field exists and so on. If a malicious request is determined, return an error, avoiding further access to the cache and database.
2. Cache null values or default values
When online services detect cache penetration, they can set a null value or default value in the cache for the queried data. In this way, subsequent requests can read the null value or default value from the cache and return the value to the application without continuing to query the database.
3. Use bloom filter to quickly check whether data exists
Use a Bloom filter to quickly determine whether the data exists, rather than querying the database to determine whether the data exists.