The original data is stored in a DATABASE (DB), such as MySQL and Hbase. However, the DB has low read/write performance and high latency.

For example, the TPS and QPS of MySQL on 4-core 8G are about 5000 and 10000, and the average reading and writing time is 10-100 ms.

Using Redis as the cache system can make up for the shortage of DB. Redis performed the following performance tests on his MacBook Pro 2019:

$ redis-benchmark -t set,get -n 100000 -q
SET: 107758.62 requests per second, p50=0.239 msec
GET: 108813.92 requests per second, p50=0.239 msec
Copy the code

When TPS and QPS reached 100,000, we introduced a cache architecture, storing raw data in the database and storing a copy in the cache.

When a request comes in, the data is removed from the cache first and returned directly to the cache if any.

If there is no data in the cache, it reads the data from the database, writes it to the cache, and returns the result.

Does that make it seamless? Improper cache design will lead to serious consequences. This article will introduce three common problems and solutions in the use of cache:

  • Cache breakdown (failure);
  • Cache penetration;
  • Cache avalanche.

Cache breakdown (failure)

High concurrent traffic, the data accessed is hot data, the requested data exists in DB, but the copy stored in Redis has expired, the back end needs to load data from DB and write to Redis.

Keywords: single hotspot data, high concurrency, data failure

However, due to high concurrency, DB may be overwhelmed, resulting in service unavailability. As shown below:

The solution

Expiration time + random value

For hot data, we do not set expiration times so that requests can be processed in the cache to take full advantage of Redis’s high throughput performance.

Or expiration time plus a random value.

When designing the expiration time of the cache, use the formula: expiration time = Baes time + random time.

That is, when the same service data is written to the cache, a random expiration time is added to the basic expiration time so that the data will gradually expire in the future. This prevents all the data from expiring instantly, which causes excessive pressure on the DB

preheating

The hot data is stored in Redis in advance, and the expiration time of the hot data is set to be extremely large.

Use the lock

When a cache failure is discovered, data is not immediately loaded from the database.

The distributed lock is obtained first, and the database query and data writing operations are performed only after the lock is successfully obtained. If the lock acquisition fails, the current thread is performing database query operations, and the current thread is trying again after sleeping for a period of time.

This allows only one request to read data from the database.

The pseudocode is as follows:

public Object getData(String id) {
    String desc = redis.get(id);
        // The cache is empty and expired
        if (desc == null) {
            // Mutex, only one request can succeed
            if (redis(lockName)) {
                try 
                    // Retrieve data from the database
                    desc = getFromDB(id);
                    / / to Redis
                    redis.set(id, desc, 60 * 60 * 24);
                } catch (Exception ex) {
                    LogHelper.error(ex);
                } finally {
                    // Make sure to delete last to release the lock
                    redis.del(lockName);
                    returndesc; }}else {
                // Otherwise sleep 200ms, then get the lock
                Thread.sleep(200);
                returngetData(id); }}}Copy the code

The cache to penetrate

Cache penetration: this means that there is a special request to query data that does not exist, i.e. data that does not exist in Redis and does not exist in the database.

As a result, every request penetrates into the database, and the cache becomes useless, putting a lot of pressure on the database and affecting normal service.

As shown in the figure:

The solution

  • Cache null: Set a default value (e.g., None) if the requested data does not exist in Redis or the database. When a subsequent query is made, the null value or default value is returned.
  • Bloom filter: Synchronize the ID to the Bloom filter while writing data to the database. If the requested ID does not exist in the Bloom filter, the requested data must not be stored in the database. Do not query the database.

BloomFilter needs to cache the full number of keys. This requires a small number of keys, preferably less than 1 billion keys, because 1 billion keys take up about 1.2GB of memory.

Let’s talk about how bloom filters work

BloomFilter allocates a block of memory for an array of bits, all of which are initially set to 0.

When an element is added, k independent Hash functions are used to calculate the element, and all k positions of the element Hash map are set to 1.

Check whether the key exists, still use the K Hash function to calculate k positions. If all positions are 1, the key exists; otherwise, the key does not exist.

As shown below:

Hash functions can collide, so bloom filters can misjudge.

The misjudgment rate here refers to the probability that BloomFilter determines that a key exists, but it does not because it stores the Hash value of the key, not the key value.

So there are probabilities of keys that have different contents but have the same Hash value after multiple hashes.

For the key that BloomFilter determines does not exist, it is 100% non-existent. If the key exists, its Hash position must be 1 instead of 0 after each Hash. The Bloom filter determines that the presence does not necessarily exist.

Cache avalanche

Cache avalanche refers to the situation where a large number of requests cannot be processed in the Redis cache system and all requests are sent to the database, resulting in a surge of database pressure and even downtime.

There are two main reasons for this:

  • A large amount of hot data expires at the same time, resulting in a large number of requests to query the database and write to the cache.
  • Redis is down and the cache system is abnormal.

Large amounts of cached data expire at the same time

The data is stored in the cache system with an expiration time set, but because a large amount of data expires at the same time.

The system calls all the requests to the database to get the data, and a large amount of concurrency will lead to a surge in database pressure.

Cache avalanches occur when a large amount of data fails at the same time, while cache breakdowns occur when a hot data fails, which is the biggest difference between them.

The diagram below:

The solution

Add a random value for expiration time

Avoid setting the same expiration time for a large number of data. Expiration time = Baes time + random time (a small random number, such as random increase of 1 to 5 minutes).

In this way, all hotspot data will not become invalid at the same time, and the expiration time will not be too different. This ensures that the hotspot data will become invalid at the same time and meets service requirements.

The interface current limiting

When not the core data is accessed, the interface traffic limiting protection is added to the query method. For example, set 10000 REq /s.

If the core data interface is accessed, the cache does not exist to allow queries from the database to be set into the cache.

This way, only part of the request is sent to the database, reducing the stress.

Flow limiting means that we control the number of requests per second that come into the system at the front end of the business system’s request entry, so that too many requests don’t get sent to the database.

As shown below:

Redis is down

A Redis instance can support 100,000 QPS, while a database instance can only support 1000 QPS.

When Redis goes down, a large number of requests hit the database, causing a cache avalanche.

The solution

There are two solutions to cache avalanche caused by cache system failures:

  • Service fuses and interface current limiting;
  • Build a high availability cache cluster system.

Service fuses and current limiting

In business systems, the use of service fuses for high concurrency compromises service delivery to ensure system availability.

If an exception is found in obtaining data from the cache, the system directly returns the error data to the front-end to prevent all traffic from being sent to the database and causing a downtime.

Service fuses and traffic limiting are solutions to reduce the impact of a cache avalanche on the database.

Build a highly available cache cluster

Therefore, the cache system must build a set of Redis high availability Cluster, such as Redis Sentinel Cluster or Redis Cluster. If the primary node of Redis fails and goes down, the secondary node can also be switched to the primary node and continue to provide cache services. Avoids cache avalanche problems caused by cache instance outages.

conclusion

  • Cache penetration is when the database does not have the data, so the request goes straight to the database, and the cache system is useless.
  • ** Cache breakdown (failure) ** when the database has data, the cache should have data, but the cache expires, the Redis layer of traffic protection has been broken, the request to the database.
  • Cache avalanche refers to the fact that a large amount of hotspot data cannot be processed in the Redis cache (a large area of hotspot data cache failure, Redis downtime), and all traffic is sent to the database, causing great pressure on the database.

The resources

Segmentfault.com/a/119000003…

Cloud.tencent.com/developer/a…

learn.lianglianglee.com/

time.geekbang.org/