preface

Redis is a very popular cache database, one of the main functions is to avoid a large number of requests directly to the database, in order to relieve the pressure on the database server; Is it safe to use the cache? No,no,no, there is no such thing as a perfect technology. Cache penetration, cache avalanche, cache breakdown all need to be talked about.

The body of the

1. Cache penetration

1.1 Brief Description

Cache penetration refers to the search data does not exist in the cache and the database, so every request data cannot be obtained from the cache, and the request is sent to the database server, but there is no corresponding data in the database, and finally every request is sent to the database. In high concurrency scenarios or malicious attacks, the backend database server may be under increased pressure and the system may eventually collapse. Here’s a direct picture:

Brief description:

Redis server color description: the green block represents cached data, and the pink block represents no data in the cache. The green arrow represents fetching data directly from the cache; The yellow arrows represent data from the database through the cache, but there may not be any.

The process is as follows:

1. A large number of clients send massive requests to the server.

2. The server code logic will be cached first. If there is cached data (green part), the data will be directly retrieved from the cache and returned; If there is no data in the cache (pink), the request goes directly to the database server (yellow arrow).

3. If there are a large number of requests for uncached data, the database will eventually collapse due to excessive pressure, resulting in system unavailability.

1.2 Common Solutions

Cache is null

If the data is not obtained in the database, the null value of the corresponding key can be cached and a short expiration time can be set.

Advantages: Returns null values directly from the cache during expiration time; To avoid database stress;

Disadvantages: Redis memory consumption:

If the attacker changes the unconventional key value request, if the cache in Redis every time, a large amount of empty data will occupy the memory space;

Inconsistent data: If it is normal data, there is no data at first, then null values are cached and a short expiration time is set; If the corresponding data is properly maintained within the expiration time, the value is still empty and no new maintenance data is obtained from the database, which results in inconsistent data acquisition.

Bloom filter

Add a layer of filter for interception, determine whether the request corresponding key is in the filter, if not directly return, do not request the database, also do not cache null value. The Bloom filter uses the form of bit bits to identify the existence of corresponding keys (each key will get a specific position after Hash), which can identify a large amount of data with very little space.

Disadvantages: The Bloom filter can determine that data must not be in the filter, but has a misjudgment rate for existing judgments, because Hash algorithms conflict.

1.3 Bloom filter

Bloom filter is not specifically used for cache penetration, but has many application scenarios, such as avoiding email retransmission, crawler recrawling, video push repetition, etc. Maybe some friends do not understand why can be used so, that first briefly talk about the principle of bloom filter.

Take a look:

Brief description:

1. Select a Key first, and then check whether the Key exists (Key can be any data you want to save, such as user ID and video ID).

2. Hash the Key multiple times. The result of the hash algorithm is different each time; In the figure above, only three hash computations are drawn. In fact, the number of hash computations varies depending on the misjudgment rate.

3. Change the bit of the index corresponding to the hash result to 1, indicating that the hash result exists. After three hashes in the figure above, the results are 2, 5 and 9 respectively, so change the corresponding position to 1.

4. If you need to check whether the Key is in the filter, you also need to perform multiple hash computations. In the figure above, three hash computations are performed to obtain the corresponding identifier. If the result exists, the value of the corresponding position of the three results should be 1, but there is a possibility of misjudgment, because the hash result of different keys may be the same, which leads to the conflict when setting the corresponding index bit, as shown in the figure below.

If Key1 sets positions 2, 5, and 9 to 1, then Key2 does not exist in the filter because the hash result is the same.

The misjudgment rate can be controlled in the Bloom filter. If you need to reduce the misjudgment rate, you can hash it more times, and the probability of the same position will be reduced. However, this will affect efficiency, and there will be additional memory overhead, the more times the hash, the more bits need to be identified. The miscarriage rate, if any, is small and acceptable in most scenarios.

1.4 Use of Bloom filter

Since say Redis, say Redis bloom filter, in fact partners can use Redis bitmap implementation according to their own needs. There is, of course, a bloom filter component from the beginning of Redis4.0, out of the box, as well as some other packages, memory based, distributed based. Here is a simple talk about the Redis Bloom filter plug-in, personally feel very good, recommend oh.

The official documentation address: oss.redislabs.com/redisbloom/

I use centos for demonstration, the main steps are as follows:

1. If you don’t have Git, install it. If you don’t use Git, download the code package.

yum install -y git
Copy the code

2. Get the redis Bloom filter source code down, use git here; You can also download it;

git clone https://github.com/RedisLabsModules/redisbloom.git
Copy the code

3. Go to the code directory to make(generate redisbloom. So file), if the make command can not be found, you need to install VC++ compiler related package;

cd redisbloom
make
Copy the code

Redisbloom: redisbloom: redisbloom: redisBloom: redisBloom Can also be started when specified to load the plug-in to run;

Configuration file mode: To add the following configuration in the configuration file, specify the location of the redisbloom.

Then specify the configuration file to start;

./redis-server redis.conf
Copy the code

Specify the module running mode when starting:

./redis-server --loadmodule ./redisbloom.so
Copy the code

5. Easy to use

The command uses the same as the regular command, so I don’t need to write a program. If I have to, I can simply say:

A. Save the data to be determined in the filter, such as all user ids;

B. When the request comes, it will judge whether there is data in the filter first, no direct return, not to cache, nor to the database;

C. If a user is added, add the new user ID to the filter.

There are still some commands about Redis Bloom filter, you can visit the official website. We can implement it by ourselves. However, some of our friends have packaged packages, including memory based and Redis based, as shown in the following figure:

I’m not going to do the code, and I’ll leave the rest to my friends.

2. Cache avalanche

1.1 Brief Description

Cache avalanche refers to the sudden unavailability of the cache layer, resulting in a large number of requests being sent directly to the database, which may eventually cause the system to crash due to excessive database pressure. The cache layer is unavailable in two ways:

When the cache server breaks down, the system sends requests to the database. Cache data suddenly expires in a large scale, resulting in a large number of requests to the database to reload data; As shown in figure:

Brief description:

Redis server color description: the green block represents cached data, and the pink block represents no data in the cache. The white block represents large scale invalid cache data, and the green arrow represents directly fetching data from the cache. The yellow arrow represents looking up data from the database through the cache.

The process is as follows:

1. A large number of clients send massive requests to the server.

2. The server code logic will be cached first. If there is cached data (green part), the data will be directly retrieved from the cache and returned; If the cache expires (in the white block), the request goes directly to the database server (see yellow arrow).

3. If a large number of hot data requests exist but the hot data expires in a large range, the database will collapse due to excessive pressure and the system will become unavailable.

1.2 Common Solutions

Cache preheating:

Before the peak period, load hot data into the cache in advance to avoid heavy database pressure during the peak period.

Uniformly set expiration time:

For different hot data, the expiration time is added with a random value, so that the expiration time is not concentrated at one point, so as to reduce a large part of the database pressure.

Multi-level cache:

In addition to using Redis cache, you can also add some other caches of hot data according to business, such as memory cache, which can separate the validity period of the cache at all levels. This way can also ease the pressure on the database.

Current limiting and degradation:

If the pressure is too great to avoid crashing the system, you can add some means of limiting the flow, whether it is middleware or message queues, to keep the system usable.

Add mutex:

The purpose is to lock the exclusive operation, so that an operation to the cache reload data, let the request operation wait, in fact, this experience is not good, caution use. If you do use locks, be super aware of their performance and stability.

In the case of the entire cache layer collapsing:

Use highly available architectures, such as master-slave replication, sentinels, and clustering, as required to ensure that the cache layer does not collapse.

3. Cache breakdown

1.1 Brief Description

Cache breakdown refers to the sudden expiration of super hotspot data. As a result, data requests for super hotspot data are directly sent to the database during the expiration period. As a result, the database server crashes due to excessive pressure caused by some super hot data.

Super-hot data: for example, the data of a certain treasure, a certain East and a certain Duoduo will fail during a certain period of time, and the number of requests will be enough to cause the database to collapse.

As shown in figure:

Brief description:

Redis server color description: the green block represents cached data, and the pink block represents no data in the cache. The white circle indicates that the super hotspot cache data is expired, and the green arrow indicates that the data is directly fetched from the cache. The yellow arrow represents looking up data from the database through the cache.

The process is as follows:

1. A large number of clients send massive requests to the server.

2. The server code logic will be cached first. If there is cached data (green part), the data will be directly retrieved from the cache and returned; If the super-hot cache data expires (circled in white), the request goes directly to the database server (yellow arrow).

3. The super hotspot data expires, for example, the super hotspot data. If the super hotspot data expires during the super hotspot data period, the database will collapse due to excessive pressure and the system will become unavailable.

Note: This is for ultra-hot data only, not large-scale data.

1.2 Common Solutions

Hot Data does not expire: Data such as this super hot data is set to never expire. Avoid database overloading due to expiration.

Mutex: The purpose is to lock and reload data into the cache to make the request wait. If you do use locks, be super aware of their performance and stability.

conclusion

Cache penetration, cache avalanche, cache breakdown, whatever it is, the main cause of the problem is still a missed hit in the cache layer, sending requests directly to the database, and eventually causing the database to become overburdened and the system to become unavailable. Partners deal with problems according to the needs of the system. There is no perfect solution, but there is always a solution suitable for the needs. Solving business problems is the real purpose.

The last

I here organized a: Redis interview question summary, Spring series of family, Java systematic information (including Java core knowledge, interview topics and 20 years of the latest Internet real questions, e-books, etc.) friends who need to pay attention to the public number [procedure Yuan xiao wan] can be obtained.