Ali side: on the [cache penetration, cache breakdown, cache avalanche, hot data failure] problem solution

Welcome to follow our wechat official account: Shishan100

My new course ** “C2C e-commerce System Micro-service Architecture 120-day Practical Training Camp” is online in the public account ruxihu Technology Nest **, interested students, you can click the link below for details:

120-Day Training Camp of C2C E-commerce System Micro-Service Architecture

This article is reprinted from the public account: Qiao Er Ye, is the host of the recent interview ali experience to share.

Just before the interview, the student learned a set of Java interview tutorial, which involves some questions about cache.

This article is the student’s summary of his Ali side, you can refer to it, but also thanks to Joe er Ye for sharing.

1 introduction

I got a phone call from Ali last night and asked a question about caching.

Although I have been in touch with them before, I know more or less about them. But before I did not take a good record of these contents, in the real interview, and did not answer. Write it down today. Learn it.

We use caching more or less in our normal projects, because we don’t need to go to the database every time we query some data.

Especially high QPS system, every time to query the database, for your database will be a disaster.

Today we do not involve the knowledge of multi-level caching, we will refer to the system used by the cache, whether level or level of all the cache, mainly to describe some of the problems that may be encountered when using cache and some solutions.

When we use cache, the general call flow of our business system is shown as follows:

! [](https://p1-jj.byteimg.com/tos-cn-i-t2oaga2asx/gold-user-assets/2019/3/27/169bb2607456de97~tplv-t2oaga2asx-image.image )

When we query a piece of data, we query the cache first, return if the cache has it, query the database if not, and return. In this case, some phenomena may occur.

2 Cache Penetration

2.1 What is Cache penetration

Normally, we go to query the data is there.

So a request to query for data that doesn’t exist in the database at all, meaning that neither the cache nor the database can find the data, but the request will be sent to the database every time.

This phenomenon of no data in the query is called cache penetration.

2.2 Problems arising from penetration

Imagine if a hacker attacked your system and took a non-existent ID to query data, it would generate a large number of requests to the database to query. May cause your database to break down due to stress.

2.3 Solutions

2.3.1 Cache null values

Penetration occurs because there is no key in the cache to store the empty data. This results in every query going to the database.

Then we can set the values of these keys to NULL and drop them into the cache. Return null on subsequent requests for this key.

That way, you don’t have to go all the way through the database, but don’t forget to set the expiration time.

2.3.2 BloomFilter

BloomFilter is similar to an hbase set used to determine whether an element (key) exists in a collection.

This method is widely used in big data scenarios. For example, it is used in Hbase to check whether data is on disks. There is also a crawler scenario to determine whether the URL has been crawled.

If the key does not exist, it will be returned directly. If the key does exist, then go to the cache -> check DB.

The flow chart is as follows:

! [](https://p1-jj.byteimg.com/tos-cn-i-t2oaga2asx/gold-user-assets/2019/3/27/169bb2638b91b339~tplv-t2oaga2asx-image.image )

2.4 How to Choose

For some malicious attacks, a large number of keys brought by the attack do not exist, so we use the first scheme to cache a large number of non-existent key data.

At this point, it is not appropriate for us to use the first scheme, we can completely use the second scheme to filter out these keys.

For the data with unusually large numbers of keys and low request repetition rate, there is no need to cache it, and the second scheme is used to filter it out directly.

For empty data with limited key and high repetition rate, we can adopt the first way to cache.

3 Cache Breakdown

3.1 What is breakdown

Cache breakdown is the second problem we might encounter with a caching scheme.

In the usual high concurrency system, when a large number of requests query a key at the same time, this key just fails, it will lead to a large number of requests to the database. This phenomenon is called cache breakdown.

3.2 What are the problems

At some point, the database becomes overloaded and the pressure increases.

3.3 How can I Solve the problem?

If multiple threads are querying the database at the same time, we can use a mutex on the first query request to lock it.

The other threads get to this point and can’t get the lock and wait until the first thread gets the data, and then they cache it. The next thread comes in and finds that there is already a cache, so it goes directly to the cache.

Cache avalanche

4.1 What is cache avalanche

Cache avalanche is when you have a massive cache failure at some point, like when your cache service goes down, and a lot of requests come in and hit DB directly. The result is that DB fails and dies.

4.2 Solutions

2 in advance:

Cluster caching ensures high availability of cache services

The solution is to make the cache Cluster highly available before an avalanche occurs. If you use Redis, you can use master-slave + sentry, Redis Cluster to avoid a full Redis crash.

4.2.2 matter:

Ehcache local cache + Hystrix stream limiting & degrade to prevent MySQL from being killed

The purpose of using the EhCache local cache is also to allow for a period of time when the Redis Cluster is completely unavailable.

With Hystrix, for example, 5000 requests come in a second. We can assume that only 2000 requests will pass through the component in a second, and the remaining 3000 requests will follow the limiting logic.

Then we call our own degradation component (degradation), such as setting some default values etc. This protects the final MySQL from being overwhelmed by requests.

Holdings after the event:

Enable Redis persistence to restore the cache cluster as soon as possible

Once restarted, data is automatically loaded from disk to restore data in memory.

The avalanche prevention scheme is shown below:

! [](https://p1-jj.byteimg.com/tos-cn-i-t2oaga2asx/gold-user-assets/2019/3/27/169bb265aa52e948~tplv-t2oaga2asx-image.image )

5 Resolve the hotspot data set failure

When we set up the cache, we usually set an expiration date for the cache, after which the cache will become invalid.

For some hot data, when the cache fails, there will be a large number of requests to the database, and then the database may crash.

5.1 Solution

5.1.1 Set different failure times

To avoid invalidation of these hot data sets, we stagger their expiration times when setting cache expiration times.

Like adding or subtracting a range of random values to a base time.

5.1.2 mutex

In combination with the above breakdown, a mutex is added to the first request to query the database, and all other queries are blocked until the lock is released, protecting the database.

But also because it blocks other threads, the system throughput is reduced. You need to consider whether to do this in the context of the actual business.

References:

Three major problems and solutions in the caching world

Chinese Shishan teacher Java surprise interview materials

! [](https://p1-jj.byteimg.com/tos-cn-i-t2oaga2asx/gold-user-assets/2019/3/27/169bb24f6c5fd434~tplv-t2oaga2asx-image.image )

A wave of microservices, distributed, high-concurrency, high-availability original series of articles is on the way,

Welcome to our official account: Notes on the architecture of Huoia

8:30am Monday to Friday! Quality technical articles sent on time!!

More than ten years of EXPERIENCE in BAT architecture

Ali side: on the [cache penetration, cache breakdown, cache avalanche, hot data failure] problem solution

This article is reprinted from the public account: Qiao Er Ye, is the host of the recent interview ali experience to share.

1 introduction

2 Cache Penetration

3 Cache Breakdown

Cache avalanche

5 Resolve the hotspot data set failure

Related Posts

Maybe some tips you should know about using ASP.NET Core Web API

Effective Java exceptions and usage tips

Java AQS source code interpretation