This is the 11th day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021

In real development, cache exceptions can cause three problems: cache avalanche, cache breakdown, and cache penetration. These three problems can cause a large number of requests to be moved from the cache to the database, and can cause the database to crash if the number of concurrent requests is high. So what should we do about it?

The following is a solution for each case.

The cache to penetrate

A query for data that must not exist in a database.

The normal process of using cache is that data is queried in the cache first. If the key does not exist or has expired, the database is queried and the queried objects are put into the cache. If the database query object is empty, it is not cached.

If have maliciously attack, can use this loophole, cause pressure to database, crush database even.

The solution

Caching empty objects

  1. If the database query object is empty, the empty object is cached, and the data is retrieved from the cache again, protecting the back-end data source.

If the cache is NULL, it will not work, or it will cause cache penetration.

There are two problems with caching empty objects:

1. If a malicious network attack occurs (the key is different each time and the database does not exist), the cache consumes more memory.

Solution:

Caching empty objects takes into account the setting of the cache time. Setting a short expiration time (usually 60 seconds for cache expiration) will automatically eliminate the keys.

  1. If the expiration time is set too high, the data is added to the database during this period, and data inconsistency may occur.

Solution:

Clear the cache of empty objects through the messaging system or other means.

Use a Bloom filter

Bloom filter consists of an array of bits with initial values of 0 and N hash functions, which can be used to quickly determine whether a certain data exists. When data is written to the database, bloom filters are marked with three actions:

  • Use N hash functions to compute the hash value of this data, resulting in N hash values.
  • Modulo these N hash values with respect to the length of the bit array to find the position of each hash value in the array.
  • Set the corresponding position to 1. If the data does not exist, then the data has not been marked with a Bloom filter, and the array of bits corresponds to zero bits. As long as one of the bits is not 1, the Bloom filter has not marked the data.
  • When writing data to the database, use a Bloom filter to mark it.
  • When the cache disappears, the database will not be queried if the data exists by querying the Bloom filter before going to the database.

Request entry to add detection

Filter out malicious requests (invalid parameters, invalid parameters, non-existent parameters, or ids smaller than 0).

Cache breakdown

When a key is very hot, it is constantly carrying large concurrency, and large concurrency is concentrated on this point for access. When the key fails at the moment, the continuous large concurrency will Pierce the cache and directly request the database, just like cutting a hole in a barrier.

The solution

Set hotspot data to never expire

From redis, there is indeed no expiration time, which ensures that there will be no hot key expiration issues, i.e. “physical” non-expiration.

Functionally, if it doesn’t expire, isn’t it static? So we store the expiration time in the value corresponding to the key. If we find that the expiration time is about to expire, we build the cache through an asynchronous thread in the background.

From a practical point of view, this approach is very performance-friendly. The only downside is that while building the cache, other threads (non-building cache threads) may access old data, but this is tolerable for general Internet functionality.

Add a mutex key

The general idea is to have only one thread build the cache, and the other threads wait for the thread that built the cache to finish and fetch data from the cache again, as shown in the figure below.

Keep it simple. In general, it’s good to set hotspot data to never expire. Mutex keys are really not necessary.

Cache avalanche

A cache set expires during a certain period of time.

When a large-scale cache failure occurs at a certain time, it will lead to a large number of requests directly hitting the database, resulting in a huge pressure on the database. In the case of high concurrency, the database may be down in an instant. At this time, if the operation and maintenance immediately restart the database, immediately there will be new traffic to the database. This is cache avalanche.

What is causing the cache avalanche?

The key to causing cache avalanches is massive key failures at the same time.

There are several possible reasons for this:

  • The first possibility is that Redis is down,
  • The second possibility is that the same expiration time was used.

The solution

Expiration time Set to a random value

Add a random value to the original expiration time, say, 1-5 minutes random. This avoids cache avalanches when large numbers of data expire at the same time.

Distributed deployment and evenly distributed hotspot data

If the cache database is deployed in a distributed manner, distribute the hotspot data evenly among different cache databases. Also, distributed clustering prevents cache avalanches caused by Redis outages.

Hotspot data never expires

Set hotspot data to never expire.

Backstop measures to cache avalanches

If a cache avalanche does occur, is there a backstop?

1. Use the circuit breaker mechanism. When the traffic reaches a certain threshold, the system directly returns a message such as “system congestion” to prevent too many requests from hitting the database. At least some users can use it normally, and other users can get results after refreshing several times.

2. To improve the disaster recovery capability of the database, you can use the strategy of separate databases, separate tables, and separate read and write.

For example

For example, for e-commerce projects, we generally adopt different categories of goods and cache different cycles.

Goods in the same category, plus a random factor. In this way, the cache expiration time can be spread as much as possible. Moreover, the cache time of popular items is longer and that of unpopular items is shorter, which can also save the resources of the cache service.

In fact, centralized expiration is not so deadly, but a more deadly cache avalanche is when a node of the cache server goes down or is disconnected.

Because a natural avalanche of caches must be created at a time when the database can withstand the pressure, the database can withstand the pressure. It’s just a periodic strain on the database.

If the cache service node goes down, the pressure on the database server is unpredictable and can overwhelm the database instantaneously.

conclusion

Cache avalanche and cache breakdowns are mainly data not in the cache, while cache penetration is data neither in the cache nor in the data.

Cache avalanche workaround:

  • Add a small random number to the expiration time
  • Service degradation
  • Service fusing
  • Request the current limit
  • Redis sets up the primary and secondary clusters

Cache breakdown solution:

  • No expiration time is set
  • Add a mutex

Cache penetration workaround:

  • Entry for validity verification
  • Cache null values or default values
  • Use bloom filter for quick judgment

Prevention program:

Service degradation, request fuses, and request restrictions can affect the user experience. It is best to use preventive measures.

  • In view of the cache avalanche, set the data expiration time reasonably, and set up the redis primary and secondary cluster.
  • Do not set expiration time for cache breakdowns.
  • For cache penetration, perform specification validation at the request entry and use bloom filters to determine whether data exists.

Reference documentation

  • Cache avalanche, cache breakdown, and cache penetration
  • Cache avalanche, cache breakdown, and cache penetration
  • What is cache avalanche, cache breakdown, cache penetration?