I haven’t looked at Redis recently, so now I’m back to review it. I’m going to start with Redis’s three caches to see how deep and shallow it is.

Let me begin to enlighten you! It’s time to show some real technique. (beep beep beep….)

For all your hard work, if you find this article helpful, please put your hands up and give it a thumbs up.

Next, we will start our Redis three cache problems tour, let’s take a small spaceship to tour the Virgin Peak.

There are three must-know concepts in Redis cache: cache penetration, cache breakdown, and cache avalanche.

The cache to penetrate

What is cache penetration? It means that when a user queries a piece of data, the database and the cache have no record of the data, and the data is not found in the cache, they will ask the database for data. When it can not get data, it will always query the database, which will cause a lot of pressure to the database access.

For example, when a user queries a commodity information whose ID is -1, the database ID value increases from 1. Obviously, this information is not in the database. When no information is returned, the user will keep querying the database, causing great access pressure to the current database.

At this time we have to think about, how to solve this problem? O (╥ man ╥) o

The general idea is to start with the cache and say, what if we set the cache to something that doesn’t exist in the current database, cache it as an empty object and return it to the user.

^_^ yes, this is a solution, also known as caching empty objects (the code is easy to maintain, but not very effective).

Redis also provides us with a solution, that is bloom filter (code maintenance is complicated, the effect is very good).

Next, let’s explain the two options:

Caching empty objects

What is cache empty object ah! Don’t worry, empty cache object it refers to a request is sent to come over, if the cache and database does not exist the request to query the relevant information, the database will return an empty object, and to save the empty object and request associated as a cache, the next time or the request to come over, then the cache will be hit, The empty object is returned directly from the cache, which reduces the stress of accessing the database and improves the performance of the current database. We can look at the following process

If a large number of non-existent requests come in, then the cache will cache a lot of empty objects

That’s right! This is one of the problems with using empty objects in the cache: over a long period of time this will result in a large number of empty objects in the cache, which will not only take up a lot of memory space, but also waste a lot of resources. . Is there a solution to this problem? Let’s think about it: we can clean up these objects after a while

Yeah, yeah! Redis provides us with the expiration time command (^▽^), so we can set the empty object time, incidentally set an expiration time, can solve the problem!

Setex key seconds valule: Set key/value pairs with expiration time (s)Copy the code

Call the API operation directly from Java:

Rediscache. put(integer.toString (ID), null, 60) // The expiration time is 60sCopy the code

Bloom filter

The Bloom filter is not a filter, filter things ah! It’s a probabilistic data structure that uses love to determine whether or not an element is currently in the set. It’s fast. We can also simply think of it as an imprecise set structure (set has the effect of de-weighting). But there’s a small problem: when you use its contains method to determine whether an object exists, it might misjudge it. This means that the Bloom filter is not particularly imprecise, but as long as the parameters are set properly, the accuracy can be controlled to be relatively accurate, with only a small probability of miscalculation (which is acceptable). When a Bloom filter says a value exists, it may not exist; When it says it doesn’t exist, it definitely doesn’t exist.

Here is a typical example from Qian Da:

For example, when it says it doesn’t know you, it definitely doesn’t know you. When it says it’s seen you, it may not have met you at all, but because your face looks similar to one of the faces it knows (a combination of some familiar faces), it may be mistaken for having seen you before. In the usage scenario above, the Bloom filter was able to accurately filter out content that had already been viewed and new content that had not been viewed. It would also filter out a small percentage of new content that had not been viewed, but it correctly identified the vast majority of new content. In this way, you can be absolutely sure that nothing you recommend to your users will be duplicated.

So what are the features of bloom filters?

Characteristics, many to let one blow with you (blow to you doubt life (≧∇ blue)

  1. A very large array of bits (only zeros and ones)

  2. Having several Hash functions

  3. Both spatial efficiency and query efficiency are very high

  4. Bloom filters do not provide a deletion method and are difficult to maintain in code.

Each Bloom filter corresponds to a large array of bits and several different unbiased hash functions in the Redis data structure. Unbiased is the ability to compute the hash values of elements evenly.

When a key is added to a Bloom filter, multiple hash functions are used to hash the key to an integer index value and then modulo the length of the bit array to get a position. Each hash function evaluates a different position. Add by setting each of these bits to 1. (Each key is mapped to a large array of bits using hash functions, and the corresponding position in the array is changed to 1.)

So why does the Bloom filter have a misjudgment rate?

Miscalculation? Life which does not fall, as long as the hoe swing well, still can dig. (Ahem, ahem, wrong…)

In fact, it will misjudge this:

  

When key1 and key2 are mapped to position 1 in the array, if there is a key3 in the array, and the location of key3 is mapped to position 1 in the array, then the Bloom filter will assume that key3 exists and will misjudge it (because key3 is not present).

Ha ha ~, then you will ask: how to improve the accuracy of bloom filter?

To improve the accuracy of bloom’s filter, there are three important factors that affect it:

  1. Good or bad hash functions

  2. Storage space size

  3. Number of hash functions

The design of hash functions is also very important. Good hash functions can greatly reduce the error rate of Bloom filters.

(It’s as if a great accessory works so well because it’s properly designed inside.)

At the same time, for a Bloom filter, if the bit array is larger, the positions mapped by the hash function for each key will become much more sparse and less compact, which is conducive to improving the accuracy of the Bloom filter. At the same time, for a Bloom filter, if the key is mapped through many hash functions, there will be many locations in the bitwise array marked, which will reduce the misjudgment rate when the user searches through the Bloom filter.

For those of you who are interested in how it works, you can look at the mathematics of Bloom filtering, which includes design algorithms and mathematics. (It’s actually pretty easy.)

Cache breakdown

Cache breakdown refers to the fact that a key is frequently queried and often given special care by users. Users love it very much (^▽^), which is similar to “regular customers” or a key that is often not accessed. However, at this time, if the key is invalid at the expiration time of the cache or it is an unpopular key, there will be a large number of access requests for this key, which will lead to a large number of concurrent requests directly through the cache, requesting the database, and instantly the database access pressure increases.

To sum up: there are two reasons for cache breakdown.

(1) An “unpopular” key is suddenly accessed by a large number of users.

(2) A “hot” key that expires in the cache and is accessed by a large number of users.

For cache breakdown problems: our common solution is to lock. When a key is expired and a lock is added to the key when it wants to query the database, only the first request can query the database, and then the value queried from the database is stored in the cache. For the remaining same keys, they can be directly obtained from the cache.

If we are in a stand-alone environment, we can directly use common locks (such as Lock, Synchronized, etc.). In a distributed environment, we can use distributed locks, such as distributed locks based on databases, Redis, or ZooKeeper.

Cache avalanche

Cache avalanche refers to the expiration of the cache set in a certain period of time. If there are a large number of requests and a large amount of query data during this period, all the requests will reach the storage layer, and the storage layer will be overloaded and even break down.

Search the Java column public account, reply “manual”, send you a Java interview questions treasure book.pdf

The reason:

  1. Redis suddenly went down

  2. Most of the data fails

Here’s an example to understand:

For example, most of us have experienced the shopping carnival. Let’s say the merchants hold the promotion activity of breaking a bone between 23:00 and 24:00. When the program was designed, it put the broken goods into the cache at 23:00, and set the expiration time to 1 hour through redis expire. This is the time when many users access the product information, purchase, and so on. However, when it is 24:00, there are still many users accessing these goods. At this time, the access to these goods will fall on the database, which will cause the database to resist huge pressure. A slight mistake will lead to the database directly down (over).

This is what happens when the product doesn’t expire:

When the cache GG (invalid) looks like this:

There are solutions to cache avalanche:

(1) Redis is highly available

Redis may fail, so add more instances of Redis (one master with many slaves or many masters with many slaves), so that after one of them fails, the others can continue to work. In fact, it is a cluster built.

(2) Traffic limiting degradation

After the cache is invalid, the number of threads that read the database write cache is controlled by locking or queuing. Only one thread is allowed to query data and write the cache for a certain key, while other threads wait.

(3) Data preheating

The idea of data heating is that I pre-access all possible data prior to deployment, so that some of the potentially heavily accessed data is loaded into the cache. Manually trigger the loading of different cache keys before large concurrent access.

(4) Different expiration times

Set different expiration times so that the cache expires as evenly as possible.