Redis cache penetration, cache breakdown, cache avalanche, and how to solve it. It is still the basic knowledge point, but also the hot topic of the interview, let’s understand it.

This article will focus on the following three knowledge points:

  • The cache to penetrate
  • Cache avalanche
  • Cache breakdown

The cache to penetrate

define

Cache penetration: refers to when the query a certain data that does not exist, because the cache is not in the cache hit, the passive writing, and for the sake of fault tolerance, if not available data from the DB is not write to cache, this will lead to this there is no data every request to bypass the cache in the DB queries, thus losing the meaning of the cache. When the access traffic is heavy, the DB may be suspended.

To put it simply: access a nonexistent key, the cache doesn’t work, the request penetrates to the DB, and the DB hangs on high traffic

The solution

There are two common approaches to cache penetration:

  1. If the accessed key cannot be queried in DB, the null value is also written to the cache, but a shorter expiration time can be set.The purpose of this is to reduce the pressure on the DB to make invalid queries against non-existent data.
  2. Use bloom filter, use a bitmap large enough to store the keys that may be accessed. Non-existent keys are filtered directly, thus avoiding queries to DB.

What is a Bloom filter

Bloom filters rely primarily on a data structure called a Bitmap, which is essentially a bit array that provides a maximum length of 512MB (2^32) of bits.

A bit array is an array where each element takes up only one bit, and each element can only be 0 or 1. In this way, applying a bit array of 10000 elements only takes up 10000/8 = 1250 B space, which can avoid performance and memory problems.

In addition to containing the bit array, the Bloom filter has K hash functions.

A common feature of bloom filters is to remove weight and also to determine whether an element is in a collection. When an element is added to the Bloom filter, the following operations are performed:

  • K hash functions are used to evaluate the element value K times, resulting in K hashes.
  • Based on the resulting hash value, the value of the corresponding subscript in the bit array is set to 1.

As an example, suppose the Bloom filter has three hash functions: F1, F2, F3 and a bit array arr. Now to insert the “dashu” into the Bloom filter:

  • Three hash computations are performed to obtain three values, n1, n2, n3;
  • Set arr[n1], arr[n2], arr[3] to 1

To determine whether a value is in the Bloom filter, the element is hashed again to determine whether each element in the bit array is 1. If the value is 1, the value is in the Bloom filter. If there is a value that is not 1, the element is not in the Bloom filter.

Sounds like the Bloom filter will work! Does not take up memory, and can go heavy, determine whether the element exists in the set of what, seems to be quite reliable and effective. Is that really the case?

Uncle interrogative question all came out, explain this goods affirmation is a bit eccentric……

Disadvantages of bloom filters

From the way the above logo is marked, there is bound to be a problem:

The more elements that are inserted, the more positions in the bit array are set to 1. When an element that is not in the Bloom filter is hashed into the bit array, it is possible that these positions will also be set to 1. The problem with this is that a filter that does not exist in a Bloom filter can be misidentified as being in a bloom filter.

So we can sum it up like this:

  • Bloom filter says that an element is in, and it can be misjudged.
  • The Bloom filter says an element is not there, so it must be not there.

In view of the above kind of misjudgment, is it laissez-faire? For those of you who aren’t, there are two values in Redis that determine the accuracy of bloom filters:

  • error_rate: Error rate allowed for Bloom filter,The lower the value, the larger the size of the filter’s bit array and the more space it takes up.
  • initial_size: Number of elements that the Bloom filter can store,When the actual number of stored elements exceeds this value, the accuracy of the filter decreases.

There is a command in Redis to set both values:

Bf. Reserve 0.01 100 urlsCopy the code
  • The first value is the name of the filter,
  • The second value is the value of error_rate,
  • The third value is the value of initial_size

Quick experience with bloom filter

Module functionality was added to Redis in version 4.0. Bloom filters can be added to Redis in the form of modules. Need to compile and install:

Download the Rebloom plugin
Wget HTTP: / / https://github.com/RedisLabsModules/rebloom/archive/v1.1.1.tar.gz
# decompress compile install
The tar ZXVF v1.1.1. Tar. Gzcd rebloom-1.11. make  Redis. conf, add the following configuration, save and exit loadmodule /xxx/redis/redis-5.08./RedisBloom-2.22./redisbloom.so  Restart the Redis service redis-server redis.conf  # Everyone client redis-cli -h 127.0. 01. -p 6379  # test command bf.add test xxx The command is successfully startedCopy the code

Uncle here to introduce you to another relatively simple experience method: use Docker directly in Redis experience Bloom filter.

1. Find and download the image

$ docker search rebloom

$ docker pull redislabs/rebloom
Copy the code

2. Check the downloaded image

$ docker images
REPOSITORY                                     TAG                   IMAGE ID            CREATED             SIZE
redislabs/rebloom                              latest                03841e395ca0        3 months ago        104MB
Copy the code

3. Start the container

$ docker run -d -p 6379:6379 --name bloomfilter redislabs/rebloom
Copy the code

4. Experience the Bloom filter

$ docker exec -it bloomfilter redis-cli
127.0.0.1:6379 > bf. Add urls dashu(integer) 1
127.0.0.1:6379 > bf. The exists urls dashu(integer) 1
127.0.0.1:6379> bf.reserve test 0.01 100 OK 127.0.0.1:6379> bf.reserve urls 0.01 100 (error) ERR item exists 127.0.0.1:6379 >Copy the code
  • bf.addAdds an element to the Bloom filter
  • bf.existsDetermines whether an element exists in the filter
  • bf.reserveSets the error rate allowed by the Bloom filter and the number of elements that can be stored[Note: The filter name should not exist before executing this command. If it exists before executing this command, an error is reported: (error) ERR item exists.]

Number of hash functions and length of Bloom filter

Above, we optimize the accuracy of the Bloom filter from its own attributes. In fact, other elements that affect the false positive rate of the Bloom filter are the hash function and the length of the Bloom filter. There are the following relationships between them:


K is the number of hash functions, m is the length of Bloom filter, n is the number of inserted elements, and p is the false positive rate.

Obviously, too small a Bloom filter will soon have all bits 1, and any query for a value will return “may exist”, defeating the purpose of filtering. The length of Bloom filter directly affects the false positive rate, the longer the bloom filter, the lower the false positive rate.

In addition, the number of hash functions also needs to be weighed. The more the number, the faster the Bit position of The Bloom filter is 1, and the lower the efficiency of the Bloom filter is. But if there are too few, our false positives will be higher.

Bloom filters can only insert data to determine whether it exists, not delete it, and can only ensure that the “non-existence” judgment is absolutely accurate

That’s all about cache penetration, including bloom filter knowledge recommended mark!

Cache avalanche

define

Cache avalanche refers to the situation that when we set the cache with the same expiration time, the cache becomes invalid at the same time and all requests are forwarded to DB. The DB is overburdened instantly, causing an avalanche.

The solution

The cache expiration time can be set with a random value of time, so that the expiration time of each key is distributed and not concentrated at the same time.

Cache breakdown

For some keys that are set to expire, it is a very “hot” data if the keys are likely to be accessed at a very high concurrency at some point in time. In this case, when the cache expires at a certain point in time, a large number of concurrent requests for the Key come in. These requests find that the cache has expired and usually load data from the backend DB and set it back to the cache. In this case, a large number of concurrent requests can overwhelm the backend DB.

In other words, when the cache expires, there will be a large number of requests for an existing hot key. These requests will penetrate into DB, resulting in a large number of instantaneous DB requests and a sudden increase in pressure.

The solution

1. Use distributed locks

To put it simply, SETNX (Set if not exists) is used to set another short-term key to lock the access of the current key when the cache fails (the value obtained is judged to be null). When the operation succeeds, the load DB operation is performed and the cache is set back. Otherwise, retry the entire get cache method, simple code implementation:

public String get(key) {

    String value = redis.get(key);

    if (value == null) { // Indicates that the cache value expires
 // Set a timeout of 3 minutes to prevent db from being loaded when the cache expires next time if the DEL operation fails  if (redis.setnx(key_mutex, 1.3 * 60) = =1) { // Indicates that the configuration is successful  value = db.get(key);  redis.set(key, value, expire_secs);  redis.del(key_mutex);  } else { // This means that other threads at the same time have loaded db back to the cache, then retry to obtain the cache value  sleep(50);  get(key); / / try again  }  } else {  return value;  } } Copy the code

2. “Never expire” Here “never expire” has two meanings:

  • Redis does not set the expiration time, which ensures that the hot key will not expire.
  • Functionally, if it doesn’t expire, isn’t it static? So we store the expiration time in the value corresponding to the key. If we find that the expiration time is about to expire, we build the cache through a background asynchronous thread, which is called “logical” expiration.

This approach is still very friendly, but the only downside is that while the cache is being built, the rest of the threads (non-build cache threads) may be accessing old data.

Pseudo-code implementation:

String get(final String key) {  
    V v = redis.get(key);  
    String value = v.getValue();  
    long timeout = v.getTimeout();  
    if (v.timeout <= System.currentTimeMillis()) {  
 // Asynchronous update background execution exception  threadPool.execute(new Runnable() {  public void run(a) {  String keyMutex = "mutex:" + key;  if (redis.setnx(keyMutex, "1")) {  // 3 min timeout to avoid mutex holder crash   redis.expire(keyMutex, 3 * 60);  String dbValue = db.get(key);  redis.set(key, dbValue);  redis.delete(keyMutex);  }  }  });  }  return value; } Copy the code

The above is the content to share with you today, welcome to leave a message exchange ~

Follow the public account “Uncle said code” for more dry goods. See you next time

Reference Documents:

  1. https://segmentfault.com/a/1190000016721700