One, foreword During the development of our daily, and they are using the database for data storage, as a result of the general system task in high concurrency usually doesn’t exist, so it looks like there is no problem, but when it comes to the demand of the large amount of data, such as some snapping up the goods, or the home page views moment is larger, Using a single database to store the data of the system will face to disk, disk read/write speed is slow and existence of serious performance, instant tens of thousands of requests coming, require the system to finish the tens of thousands of times in a very short time of the read/write operations, this time is often not databases can afford, extremely easy to cause the database systems, Severe production problems that ultimately led to service outages.

To overcome these problems, projects often introduce NoSQL technology, which is an in-memory database and provides some persistence capabilities.

Redis technology is one of NoSQL technology, but the introduction of Redis may cause cache penetration, cache breakdown, cache avalanche and other problems. This article carries on more thorough analysis to these three kinds of problems.

Second, the initial understanding of cache penetration: the data corresponding to the key does not exist in the data source, and every time the request for this key cannot be obtained from the cache, the request will go to the data source, which may overwhelm the data source. For example, a non-existent user ID is used to obtain user information, neither the cache nor the database is available. If the hacker uses this vulnerability to attack, the database may be overwhelmed. Cache breakdown: The data corresponding to the key exists, but expires in Redis. At this time, if a large number of concurrent requests come in, they will generally load the data from the backend DB and set it back to the cache. At this time, large concurrent requests may overwhelm the backend DB in an instant. Cache avalanche: When a cache server is restarted or a large number of caches fail in a certain period of time, it can also put a lot of stress on a backend system (such as a DB) when it fails. Three, caching through solution A certain there is no cache and query data, because the cache is not hit, passive, and for the sake of fault tolerance, do not check if the storage layer data is not written to the cache, this will lead to this there is no data every request to go to the storage layer query, lost the meaning of the cache.

There are a number of ways to effectively solve the cache penetration problem. The most common one is to use a Bloon filter to hash all possible data into a bitmap large enough that a certain data that does not exist will be intercepted by the bitmap, thus avoiding the query pressure on the underlying storage system. A simpler and more crude approach (which is the one we use) is that if a query returns null data (either non-existent data or a system failure), we still cache the null result, but the expiration time is very short, up to five minutes.

Rough style pseudocode:

Public Object getProductListNew () {

int cacheTime = 30; String cacheKey = "product_list"; String cacheValue = CacheHelper.Get(cacheKey); if (cacheValue ! = null) { return cacheValue; } cacheValue = CacheHelper.Get(cacheKey); if (cacheValue ! = null) { return cacheValue; } else {// CachValue = getProductListFromDB ();} else {// CachValue = getProductListFromDB (); If (cacheValue == null) {if (cacheValue == null, set a default value and cache cacheValue = string.empty; } CacheHelper.Add(cacheKey, cacheValue, cacheTime); return cacheValue; }

} 4. Cache breakdown solution Key may be accessed at some point in time with extremely high concurrency, which is a very “hot” data. At this point, there is one issue to consider: cache “penetration”.

Using mutex keys

A common practice in the industry is to use mutex. To put it simply, when the cache fails, instead of loading the DB immediately, it sets a mutex key using some operation of the caching tool with the return value of a successful operation (such as Redis SETNX or Memcache ADD). When the operation returns a successful operation, it sets a mutex key. When the operation returns a successful operation, it sets a mutex key. Then load db and set cache; Otherwise, the entire GET cached method is retried.

Setnx, short for “SET if Not eXists”, which is SET only when it does Not exist, can be used to implement locking effects.

public String get(key) {

String value = redis.get(key); If (value == null) {if (value == null) {if (value == null) {if (value == null) { Load db if (redis.setnx(key_mutex, 1, 3 * 60) == 1) {value = db.get(key); redis.set(key, value, expire_secs); redis.del(key_mutex); } else {sleep(50);} else {sleep(50); sleep(50); get(key); // retry}} else {return value; }

} memcache code:

if (memcache.get(key) == null) {

// 3 min timeout to avoid mutex holder crash  
if (memcache.add(key_mutex, 3 * 60 * 1000) == true) {  
    value = db.get(key);  
    memcache.set(key, value);  
    memcache.delete(key_mutex);  
} else {  
    sleep(50);  
    retry();  
}  

} Other programs: Please add.

The difference between the Cache Avalanche solution and Cache Breakdown is that there are many key caches. The Cache Avalanche solution is a single key.

The cache is normally obtained from Redis, as shown in the schematic below:

The diagram of cache failure moment is as follows:

The avalanche effect of cache invalidation on the underlying system is terrible! Most system designers consider locking or queuing to ensure that a large number of threads do not read or write to the database at once, thus avoiding a large number of concurrent requests to the underlying storage system in the event of failure. Another simple solution is to spread the cache invalidation time. For example, we can add a random value on the basis of the original invalidation time, such as 1-5 minutes random. In this way, the repetition rate of each cache’s expiration time will be reduced, and it is difficult to trigger collective invalidation events.

The pseudo-code is as follows:

Public Object getProductListNew () {

int cacheTime = 30; String cacheKey = "product_list"; String lockKey = cacheKey; String cacheValue = CacheHelper.get(cacheKey); if (cacheValue ! = null) { return cacheValue; } else { synchronized(lockKey) { cacheValue = CacheHelper.get(cacheKey); if (cacheValue ! = null) { return cacheValue; } else {// cacheValue = getProductListFromDB (); CacheHelper.Add(cacheKey, cacheValue, cacheTime); } } return cacheValue; }

} Lock queuing is only used to relieve the pressure on the database and does not improve the system throughput. So let’s say that at high concurrency, the key is locked during the cache rebuild, and that’s when 999 of the last 1,000 requests are blocked. Also causes users to wait for a timeout. This is a palliative, not a cure!

Note: the solution of locking queuing is the concurrency problem of distributed environment, and it is possible to solve the problem of distributed locking. Threads will also get blocked and the user experience will be bad! Therefore, it is rarely used in truly high concurrency scenarios!

Random value pseudocode:

Public Object getProductListNew () {

int cacheTime = 30; String cacheKey = "product_list"; // Cache flag String cacheSign = CacheKey + "_sign"; String sign = CacheHelper.Get(cacheSign); String cacheValue = cacheHelper.get (cacheKey); if (sign ! = null) { return cacheValue; } else {cacheHelper. Add(cacheSign, "1", cacheTime); ThreadPool. QueueUserWorkItem ((arg) - > {/ / there is generally an SQL query data cacheValue = GetProductListFromDB (); Add(cacheKey, cacheValue, cacheTime * 2); }); return cacheValue; }

} Explain:

Cache flag: records whether the cache data is out of date. If it is out of date, it will trigger a notification to another thread in the background to update the cache of the actual key. Cache data: its expiration time is twice as long as the time of the cache tag, for example, the tag cache time is 30 minutes, and the data cache is set to 60 minutes. In this way, when the cache key expires, the actual cache can return the old data to the caller until another thread completes the update in the background and returns the new cache. There are three proposed solutions to cache crashes: using locks or queues, setting expiration flags to update the cache, setting different cache expiration times for keys, and a solution called “second level caching.”

Six, summary for the business system, always is a case by case analysis, there is no best, only the most appropriate.

Other problems with caching, cache full and data loss, we can learn by ourselves. Finally, we also mention three words LRU, RDB and AOF. Usually, we use LRU strategy to deal with overflow, while Redis RDB and AOF persistence strategy to ensure data security under certain circumstances.