Welcome to pay attention to github.com/hsfxuebao, I hope to help you, if you think it can trouble to click on the Star ha

Redis as the most widely used cache, I believe we are familiar with. But using cache is not so simple, there are cache avalanche, cache breakdown, cache penetration problems, what is cache avalanche, cache breakdown, cache penetration, and how to solve these problems?

1. Cache avalanche

When a large-scale cache failure occurs at a certain time, it will lead to a large number of requests directly hitting the database, resulting in a huge pressure on the database. In the case of high concurrency, the database may be down in an instant. At this time, if the operation and maintenance immediately restart the database, immediately there will be new traffic to the database. This is cache avalanche.

1.1 Analysis of Causes

The key to causing cache avalanches is massive key failures at the same time. Why does this happen? There are several possibilities:

  • The Redis host is down, and the Redis group disk crashes
  • Large amounts of data in the cache expire simultaneously

1.2 Solutions

  • Redis Cluster high availability (master/Slave + Sentinel, Redis Cluster)

  • Encache cache +Hystrix or Alisentine Limit & Degrade: When traffic reaches a certain threshold, it returns a message such as “system is crowded” to prevent too many requests from hitting the database. At least some users can use it normally, and other users can get results after refreshing several times.

  • Enable the Redis persistence mechanism AOF/RDB to recover the cache cluster as soon as possible

  • To improve the disaster recovery capability of databases, you can use separate databases, separate tables, and separate read and write data

  • Data warm-up: For a large number of incoming requests, we can go through the system ahead of time, cache the data in Redis ahead of time, and set different expiration times (such as 1-5 minutes randomly).

2. Cache penetration

In fact, it is similar to the cache avalanche. The cache avalanche is a large-scale key failure, while the cache breakdown is a hot key that is accessed by a large number of concurrent requests. All of a sudden, the key fails, resulting in a large number of concurrent requests hitting the database, resulting in a sharp increase in database pressure. This phenomenon is called cache breakdown.

2.1 Cause Analysis

The key of a hot spot fails, causing a large number of concurrent calls to the database. Therefore, we need to solve the problem from two aspects: first, we can consider whether the hot key does not set expiration time; second, we can consider reducing the number of requests made on the database.

Harm: The database request volume is too large at a certain time, resulting in a sudden increase in pressure

2.2 Solutions

2.2.1 Do not Set the expiration time

If the service allows, you can set keys that never expire for hot spots.

2.2.2 Mutex exclusive lock prevents breakdown

If the cache is invalid, only get the lock can query the database, reduce the request to hit the database at the same time, prevent database killing. Of course, this will lead to poor system performance.

If multiple threads are querying the database at the same time, we can use a mutex on the first request to query the data to lock it. The other threads get to this point and can’t get the lock and wait until the first thread gets the data, and then they cache it. The next thread comes in and finds that there is already a cache, so it goes directly to the cache.

2.2.3 Asynchronous Update

Another possible solution is to make the cache permanent and update it asynchronously. For example, there is a guard thread in the background dedicated to regularly update the cache, but it generally needs to regularly and frequently detect the cache. Once it is found to be kicked off (such as FIFO, LFU, LRU, etc.), the cache needs to be updated immediately. However, the degree of “timing” is difficult to master, and the implementation is simple but the user experience is general.

The asynchronous update mechanism is also suitable for cache preheating. Cache preheating means that relevant cache data is directly loaded to the cache system after the system goes online, avoiding caching data upon user requests and improving performance.

3. Cache penetration

In most cases, we use Redis to query the corresponding value through Key. If the Key in the sent request does not exist in Redis, it will not be found in the cache. If it is not found in the cache, we will go to the database for query. If a large number of such requests hit the database as if they had “penetrated” the cache, this phenomenon is called cache penetration.

3.1 Analysis of Causes

The key value cannot be found in Redis, which is fundamentally different from cache breakdown. The difference is that cache penetration is that the key passed in does not exist in Redis. If a hacker passes a large number of non-existent keys, then a large number of requests hit the database is a very fatal problem, so in daily development to do a good check of parameters, some illegal parameters, impossible to exist key directly return error prompt, to maintain this “distrust” mentality to the caller.

3.2 Solutions

3.2.1 Empty Object cache or default value

If Redis does not find the data and the database does not find the data, we save the Key to Redis, set value=”null”, the next query through this Key does not need to query the database.

There is definitely a problem with this approach. If the nonexistent Key is passed in randomly every time, there is no point in saving it in Redis. For example, hacking or malicious attacks.

Hackers will attack your system, take a non-existent ID to query data, will generate a large number of requests to the database query, may cause your database due to excessive pressure and downtime.

  • Mysql > select * from user where id = 1; mysql > select * from user where id = 1
  • Id different attack system: Due to empty object cache and cache write back, there are more and more useless keys in Redis (Set the redis expiration time)

3.2.2 Guava Bloom filter solves cache penetration

The code is as follows:

public class GuavaBloomfilterDemo {
    public static final int _1W = 10000;
    // How much data is expected to be inserted into the Bloom filter
    public static int size = 100 * _1W;
    // The error rate, the smaller it is, the fewer the number of errors (think, can be set to infinitesimal, no error better)
    public static double fpp = 0.01;

    /** * helloWorld starts */
    public void bloomFilter(a) {
        // Create a Bloom filter object
        BloomFilter<Integer> filter = BloomFilter.create(Funnels.integerFunnel(), 100);
        // Checks whether the specified element exists
        System.out.println(filter.mightContain(1));
        System.out.println(filter.mightContain(2));
        // Add the element to the Bloom filter
        filter.put(1);
        filter.put(2);
        System.out.println(filter.mightContain(1));
        System.out.println(filter.mightContain(2));

    }

    /** ** error rate demo + source code analysis */
    public void bloomFilter2(a) {
        // Build a Bloom filter
        BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(),size,fpp);

        //1 Insert 1 million sample data into the Bloom filter
        for (int i = 0; i < size; i++) {
            bloomFilter.put(i);
        }
       /* List
      
        listSample = new ArrayList<>(size); //2 Does the 1 million sample data exist in the Bloom filter? for (int i = 0; i < size; i++) { if (bloomFilter.mightContain(i)) { listSample.add(i); continue; }} system.out.println (" count: "+ listsample.size ()); * /
      

        //3 Deliberately take 100,000 values that are not in the filter and see how many of them are considered in the filter
        List<Integer> list = new ArrayList<>(10 * _1W);

        for (int i = size+1; i < size + 100000; i++) {
            if (bloomFilter.mightContain(i)) {
                System.out.println(i+"\t"+"Misjudged.");
                list.add(i);
            }
        }
        System.out.println("The number of miscarriages of justice:" + list.size());
    }

    public static void main(String[] args)
    {
        newGuavaBloomfilterDemo().bloomFilter2(); }}Copy the code

Guava provides a good implementation of the Bloom filter (see the source code implementation for more details), but it has a major drawback that it can only be used on a single machine, and the Internet is generally distributed nowadays. To solve this problem, we need to use the Bloom filter in Redis

3.2.3 Redis Bloom filter

Case 1: Whitelist filter

  1. Architecture description:

  1. Error detection problem, but the probability is acceptably small, cannot remove elements from bloom filter
  2. All valid keys need to be put into the filter +redis, otherwise the data will return NULL

The code is as follows:

public class RedissonBloomFilterDemo {
    public static final int _1W = 10000;

    // How much data is expected to be inserted into the Bloom filter
    public static int size = 100 * _1W;
    // Misjudgment rate, the smaller it is, the fewer misjudgments there are
    public static double fpp = 0.03;

    static RedissonClient redissonClient = null;//jedis
    static RBloomFilter rBloomFilter = null;// The Redis version has a built-in Bloom filter

    static {
        Config config = new Config();
        config.useSingleServer().setAddress("Redis: / / 192.168.111.147:6379").setDatabase(0);
        / / redisson construction
        redissonClient = Redisson.create(config);
        // Construct rBloomFilter by redisson
        rBloomFilter = redissonClient.getBloomFilter("phoneListBloomFilter".new StringCodec());

        rBloomFilter.tryInit(size,fpp);

        // 1 test Bloom filter has +redis has
        //rBloomFilter.add("10086");
        //redissonClient.getBucket("10086",new StringCodec()).set("chinamobile10086");

        // 2 Test Bloom filter has +redis none
        //rBloomFilter.add("10087");

        //3 Test, Bloom filter none +redis none

    }

    private static String getPhoneListById(String IDNumber) {
        String result = null;

        if (IDNumber == null) {
            return null;
        }
        //1 go to the bloom filter first
        if (rBloomFilter.contains(IDNumber)) {
            // If there is a bloom filter, go to Redis
            RBucket<String> rBucket = redissonClient.getBucket(IDNumber, new StringCodec());
            result = rBucket.get();
            if(result ! =null) {
                return "i come from redis: "+result;
            }else{
                result = getPhoneListByMySQL(IDNumber);
                if (result == null) {
                    return null;
                }
                // Update the data back to Redis
                redissonClient.getBucket(IDNumber, new StringCodec()).set(result);
            }
            return "i come from mysql: "+result;
        }
        return result;
    }

    private static String getPhoneListByMySQL(String IDNumber) {
        return "chinamobile"+IDNumber;
    }



    public static void main(String[] args) {
        //String phoneListById = getPhoneListById("10086");
        //String phoneListById = getPhoneListById("10087"); // Perform the test twice
        String phoneListById = getPhoneListById("10088");
        System.out.println("------ query result:+phoneListById);

        // Pause the thread for a few seconds
        try {
            TimeUnit.SECONDS.sleep(1);

        }catch(InterruptedException e) { e.printStackTrace(); } redissonClient.shutdown(); }}Copy the code

Case 2: Blacklist filter

You write your own code.

4. To summarize

What is cache avalanche, cache breakdown, cache penetration, cache penetration