This is the 7th day of my participation in Gwen Challenge

preface

When we don’t have a lot of traffic on our system, we usually have a very simple architecture, where the service receives the request and just clicks it into the database.

However, as more and more people use redis and more and more concurrent visits, we often consider using Redis as a data cache to improve system performance.

What is cache penetration

If someone malicious attacks your system, using a data that does not exist in Redis and mysql to request a large number of your services, then these requests are equivalent to penetrating the Redis service, and will all hit the database. If the request frequency is high, it will cause downtime and service unavailability when the mysql bottleneck is reached.

How to solve the cache penetration problem

1. Cache null values.

We can write the null value into the cache, so that the same request can be found in Redis next time, and Redis can help us resist these invalid requests and prevent these requests from entering the database.

This may seem like a perfect solution to our problem, but there’s a downside to this approach. If the malicious attack carries a different key each time, such as a UUID. At this time, our cache will cache a large number of useless data, redis data is stored in our memory, we know that Redis can not let the data infinite expansion, it has a memory elimination strategy. When the memory usage reaches the specified value, data that is about to expire or least used will be deleted according to the preset elimination strategy.

Using this approach, redis may cache all invalid data, causing our normal data to be deleted, which may have the opposite effect.

The second method is to add another layer of filter between Redis and the database. If the filter finds that the data does not exist, it will directly return an invalid request.

A Bloom filter can be understood simply as: what it says it doesn’t certainly doesn’t, and what it says it does probably doesn’t.

A Bloom filter consists of a bitSet and three hash functions. Bits take up very little space, so you can store a lot of data in a Bloom filter.

Its hash function only needs to satisfy as many hashes as possible (inevitably there will be hash collisions, which is why it has a certain error rate). , and the hash value must be between the array lengths.

Each bit is zero at initialization, and if there is a hash value for data at that subscript position, the subscript value is changed to 1.

We mentioned above that bloom filters have a certain error rate, so how to reduce this error rate?

  • We can increase the size of the array, so that the probability of collisions is reduced
  • Also, we can use more than three hash functions, and the more times we hash, the more bits each value corresponds to, and the less likely we are to misjudge.

It is not recommended to write the new data to bloom filter immediately. You can write a timed task to asynchronously write the new data to bloom filter, so the system performance is better.

Above, thanks for reading.