“This article has participated in the call for good writing activities, click to view: the back end, the big front end double track submission, 20,000 yuan prize pool waiting for you to challenge!”

Welcome to search and pay attention to the same name wechat public number [Coder’s technical road], high concurrency series optimization article, welcome to collect, welcome to discuss

Overview of this article

What is caching 1.1. Deadly Cost of storage outages 1.2. Why database performance Degrades 1.3. Type of cache

The most troublesome cache problem in front-line R&D 2.1. Cache penetration 2.2. Cache breakdown 2.3. Cache avalanche 2.4. Data drift 2.5. Cache stampede 2.6. Cache contamination 2.7. Hot key

Top level Cache architecture overview 3.1. Evolution of Weibo cache architecture 3.2. Design of Zhihu home page read filter cache

conclusion

Part1 What is a cache

1.1 Deadly Cost of storage outages [1]

On May 28, 2015, ctrip’s website and APP were completely paralyzed for 12 hours, and the news that the database was physically deleted circulated in friends circle.

According to the last quarterly earnings estimates, the outage directly affected Ctrip revenue of about $1200W, Ctrip shares also fell 11%. And that was in 2015, when the Internet was just beginning to spread. If it happened now… According to the company announcement, it was caused by employee error.

While this is outside the scope of performance causes we want to discuss, it does not prevent us from concluding that the impact of a database outage on a system can be catastrophic.

1.2 Why does the performance of structured Databases Deteriorate

Mysql, for example, uses a buffer pool to load disk data into a contiguity of memory for reading and writing to reconcile the speed mismatch between CPU and disk. In general, if the buffer pool is large enough to hold all the data pages, the mysql operation will generate little read IO, which is asynchronous and does not affect the read and write operations.

The Buffer pool is not large enough and the data page is not in it.

To load the data pages in the disk file into the buffer pool, wait for the synchronous read operation of the physical I/O to complete. If the I/O does not respond in time, it will be blocked. Because read and write operations require the data page to be in the buffer, the thread must wait for the operating system to complete the IO or it cannot proceed.

Hot data, what if a new session thread also needs to access the same data page?

Wait for the thread above to read the data page into the buffer pool. If the first thread requesting the data page does not read the physical data page into the buffer pool due to disk I/O bottlenecks, the longer the time interval, the more user threads waiting for the data block. For highly concurrent systems, this can cause a lot of waiting.

What happens when the access behavior of a large number of requests with high concurrency is blocked?

For services, a large number of timeouts can make the server unavailable. The machine will trigger a circuit breaker. When the circuit breaker is triggered, the traffic from one machine will be transferred to other machines, and the possibility of similar situation occurring on other machines will be increased. In extreme cases, all services will be down and the curve will fall out.

The above is the analysis logic of service exception caused by disk I/O, which is also a database performance exception scenario most commonly encountered in our production. In addition, there are also abnormal scenarios such as lock contention cache hit ratio that can cause service exceptions.

If the limit of single database and single table exists, optimization strategies such as sub-database and sub-table can only be alleviated rather than eradicated

To avoid this, the use of caching is necessary.

1.3 Types of cache

Caches exist to mediate differences.

There are many differences, such as differences in speed between processor and storage, differences in user experience with the product, and differences in service processing efficiency.

1.3.1 Client Cache

The Nearest Web page cache to the user &app cache. Web pages are not too much of a problem due to mature technology, but apps should be careful when using caching due to device limitations.

There was a problem with the client cache in a certain business I had experienced before, and the order number sequence was requested twice, resulting in service abnormalities. String singleton, guess because the cache has a mess, so far it is strange to happen this situation, need to deepen the understanding of the client side.

1.3.2 Single-machine Cache

CPU cache [2]. In order to reconcile the huge speed difference between CPU and memory, L1/L2/L3 level caches are set up. The closer to the CPU, the faster the speed. The cache architecture of zhihu home page read filtering introduced in the following chapters is inspired by this.

L1 cache row example

Ehcache [3]. Is one of the most popular Java caching frameworks. Because of its open source nature, it is widely used in frameworks such as Spring /Hibernate. Support for disk persistence and off-heap memory. The cache function is complete.

Ehcache architecture diagram

It is worth mentioning that EhCache has the capability of out-of-heap caching, which is not limited by the JVM and therefore does not cause more GC pauses, which makes a lot of sense for GC pauses tuning in certain scenarios. However, it is important to note that the off-heap memory needs byte operation, serialization and deserialization, and is much slower than the heap memory, so if GC pauses are not a big problem, and the impact on the business is large, there is no need to use it.

Guava cache. Inspired by ConcurrentHashMap, but with richer element invalidation policies, features less complete than EhCache, such as only supporting JVM memory, but lighter and simpler. Previously, guava Cache was used to cache some configuration information of the gateway, and the automatic loading function of the expiration time is relatively convenient.

1.3.3 Database Caching

Query Cache Is used to cache Query results. This function takes effect after it is enabled. It can reduce the execution time of the query and has obvious effect on the query that consumes a lot of resources.

Query Cache rationality test [4]

1.3.4 Distributed Caching

Memcached. [5] Memcached is a highly efficient distributed memory cache. It is simple to build and operate. The entire cache is memory based, so the response time is fast, but there is no persistence capability.

Memcached stores the core

Redis. Redis is more and more widely used in the industry for its excellent performance and rich data structure, as well as the support of stability and data consistency.

Redis core object schematic

Who is using Redis?

Redis official website list of Redis users

I saw the familiar company — Weibo. Weibo is a heavy user of Redis, and it is rumored that many of the new features of Redis are customized for Weibo. The storage architecture of weibo will be detailed in a later section.

Much of the rest of this article will also be based on Redis.

1.3.5 Network Cache

Indicate the cache locations in a simple request

CDN server is a content distribution network built on the network. With the edge servers located in various places, users can obtain the content needed nearby through the function modules of load balancing, content distribution and scheduling of the central channel, thus reducing network congestion and improving response speed and hit ratio.

Nginx is based on Proxy Store implementation, using Nginx http_proxy module can achieve similar to squid caching function. When caching is enabled, Nginx stores the data in the disk cache and uses the cached data in response to client requests as long as the cached data has not expired.

Part2 The most troublesome cache problem in front-line r&d

In fact, the following problems should be seen in many places, but for the sake of completeness, or enumerated.

2.1 Cache Penetration

The query is for data that does not exist in the database, does not hit the cache and the database query is empty, and does not update the cache. Lead to check every time, if not processed, encounter malicious attacks, will lead to the database under great pressure, until the crash.

There are two solutions: one is to cache a null value into the cache when the query is null, rather than entering the database every time. The second is the Bloom filter, which determines whether the data exists in the database in advance, if not, it will be intercepted.

Bloom filters use multiple hash functions to determine whether data exists or not. This method allows more data to be stored in a smaller space with controllable conflicts. It works on the principle that data that the filter determines does not exist must not exist.

This is a GIF, please wait a second – The principle of Bloom filter

As shown in the figure above, the hash slot changes when elements are added on the left, and the hash slot is checked when data exists on the right. It can be seen that some hash slots are occupied after 1 and 2 are added, and the corresponding hash slots need to be checked when 2 and 3 exist.

2.2 Cache Breakdown

Literally, caching works at first. The cache failure of some hotspot keys results in a large number of hotspot requests being sent to the database, which leads to a sharp increase in database pressure and even downtime.

There are two solutions:

One is that the hot key does not expire. Some students proposed a logical expiration solution, that is, no expiration time is set physically, the expected expiration time is stored in value, and when the value is queried, the cache is reconstructed through asynchronous threads.

The second is to restrict the execution logic, for example, from a single thread pool to queue hot keys to access the underlying storage, at the cost of losing system throughput to maintain system stability.

2.3 Cache Avalanche

Due to the role of caching, an expiration time is usually set when data is stored, and this problem is less likely to occur if the insert operation is performed at the same time as the user operation, since the user operation is by nature uniformly hashed.

Others, such as cache warming, rely on offline tasks, periodic batch data updates or storage, and expire time issues are of particular concern.

Offline tasks complete a large number of data operations in a short time. If the expiration time is set to the same, the data will expire and become invalid at the same time. As a result, upstream requests will send a large number of invalid requests to the downstream database at the same time, causing storage pressure at the bottom layer. The same happens when the cache goes down.

Solution:

One is to consider the hot data not to expire and retrieve the logical expiration mentioned in the previous section.

The second is to discretize the expiration time. For example, add a random number to the fixed expiration time, so that the cache expiration time will be scattered at different points in time, and the underlying storage will not surge instantly.

The third is to ensure the high availability of cache service by using the cluster master/slave mode. Prevent total collapse. Of course, there should be fuses and limiting mechanisms to deal with possible cache outages.

2.4 Data Drift

Data drift occurs when distributed cache uses consistent hash cluster mode. When a node is down, the data originally routed on this node is mapped to the next node.

Photo source: Zhihu user Java architect

However, after the node is recovered, all the data that was originally hash to the next node is invalid because the hash route has been restored to the node. Therefore, the data of the next node becomes redundant data. In addition, if the current node is requested to find that the data does not exist, the underlying storage call will be added.

This is an additional problem with our use of consistent hashes to ensure that the cache cluster machine goes down without causing a large cache invalidation scheme. Therefore, ensure that the consistent hash is as uniform as possible (the application of the consistent Hash virtual node) to prevent the breakdown and recovery of the data skew node from impacting other nodes.

2.5 Cache Stampede [6]

A cache stampede is simply a term for a cache invalidation scenario where the underlying reason is that the cache is empty or not yet in effect. The point is that upstream calls that time out evoke retries, setting off a vicious cycle.

New released, for example, when a celebrity pictures, and they will receive notification of fans, a lot of fans want to see what released after the fight, but, because it is newly released images, the server is not cached, occurs a large number of requests are playing to the underlying storage, more than processing power service results after a timeout, fans will keep refresh again, causing a vicious cycle.

Solution: Locks and promises.

The underlying cause of this stampede is a scramble for a common resource such as the cache, so lock the common resource and eliminate the concurrent scramble.

However, locking causes another problem while solving the common resource struggle, that is, the threads that did not preempt the lock will block and wait to wake up, when the lock is released, all threads will wake up together, a large number of threads blocking and waking up is a huge consumption and waste of server resources, that is, _ scare effect _.

How promise works

The actual cached value is replaced by a promise. All threads get the promise and wait for the promise to return to them. The promise is responsible for retrieving data from the underlying store, notifying them asynchronously, and finally returning the result to each worker thread.

This prevents a large number of concurrent requests from operating on the underlying storage at the same time.

2.6 Cache Contamination

The main symptom of cache corruption is that normal cached data is always affected by other non-mainline operations, resulting in invalid replacement. Kafka’s cache corruption and its solution are described in detail in this article.

The basic starting point to solve cache pollution is to disaggregate tasks with different consumption speeds (real-time consumption/timed consumption) or different data production sources (main flow /follower) to avoid the influence of cache on each other.

2.7 hot key

Hot key processing logic diagram

The influence of hot key is no longer described, and the solution to hot key is mainly in the discovery and response of hot key:

Nginx logs can be monitored for time window counting of user requests, multi-level caching can be established, LRU can be used locally by the server to cache hot keys, and hot keys can be preheated in advance according to service estimation, etc.

You can reduce the pressure on a single cache node to deal with hot spots by spreading storage around.

Part3 Overview of top-level cache architecture

3.1 Evolution of Weibo Cache architecture

Weibo has 100T+ storage, 1000+ physical machines, 10000+Redis instances, how did his cache scheme evolve to be able to resist N stars divorce at the same time?

Architectural evolution of caching [7]

<<< swipe left and right to see more >>>

Evolution from the above a few cache in the architecture diagram you can see, weibo cache architecture in fact most of them are in dealing with hot spot data, for example, use hash HA layer without consistency, trample effect, because weibo has typical follower consistency hash outage of a node under the trample effect, cause a series of abnormal node downstream. For example, the introduction of L1 cache is due to the fact that microblog traffic has some attenuation laws in time, and the newer the segment, the hotter it is. Therefore, small hot segments are used to block the occurrence of small but large traffic.

But these are not enough, and some systemic problems should not be ignored:

Too many nodes are required for a group of resource requests
The Cache scaling capacity and node replacement are too busy
O&m problems caused by excessive resources
The Cache ease-of-use problem

CacheService CacheService [8]

To solve the above problems, Resource Weibo provides a distributed CacheService architecture to simplify the use of service developers and implement functions such as dynamic scaling, disaster recovery, and multi-tier Cache.

As you can see, a layer of proxy logic is encapsulated on the upper layer of the cache pool, including asynchronous event handlers to manage data connections and receive data requests, processers to parse data, Adapters to adapt to the underlying storage protocols, and routers to route requests to the corresponding resource fragments. LRU_cache is used to optimize performance and reduce proxy performance loss, and Timer is used to detect health status.

An coincidence and microblogging architecture group director understands simple talk, now the cacheService service ease of use is very high, server nodes elastic telescopic depend on full automatic detection system, and greatly reduced the operations and maintenance costs, possible weibo students which happy days work overtime to eat the melon had gone.

The extreme application of Redis in weibo [9][10]

It has been more than 10 years since Redis was introduced in 2010. I have a lot of experience and customization needs, otherwise I would not be listed in the top three users list on the official website of Redis.

$BGSave retry stuck under single thread

Bgsave because it is a very heavy operation, there will be obvious lag, resulting in business fluctuations; In the recovery process after a failure, the primary/secondary speed is slow and the bandwidth peak often occurs

Separate Bio Threads from the main thread to perform Bgsave operations to avoid interference.
Built-in Cronsave function in Redis to control backup time;
Give up bgaofrewrite.

$redis completely replaces mysql to implement storage landing

In the process of Redis replacing MySQL storage landing, Weibo also carried out a lot of customization of Redis:

Modified AOF mechanism and added POS bits that did not exist originally;
Modified Replication mechanism for data synchronization based on AOF+POS position
The landing mechanism is changed to RDB+AOF rolling mechanism to ensure persistent data storage.

$longset custom data structure

In order to reduce the cost, the native Hash structure was abandoned, and the memory was reduced to 1/10 of the original memory.

$count function optimized

In order to facilitate counting, KV of Redis is changed to KV of fixed length. By pre-allocating memory, the total number is known, which will greatly reduce the operation overhead of counting.

After 10 years of deep dependence, Weibo has accumulated a lot of experience and skills in the use of Redis, which is worth our learning and reference.

3.2 Design of Read filtering Cache for Zhihu home page [11]

Zhihu community has 220 million users, 380,000 questions, 28 million questions and 130 million answers. The personalized home page needs to be filtered and stored for a long time to display rich content, which has high requirements for system performance and stability.

$Early Program

<<< swipe left and right to see more >>>

$Optimization scheme

See references for sources

Did you notice that the idea of this architecture is familiar? Yes, it is the multi-level cache architecture of the CPU. Cache interception, copy expansion, compression and decompression are basically the overall response to the cache problems described in the previous chapter, so as to achieve low latency and stable cache service effect.

Part4 summary

This article, through the limit theory of underlying storage, demonstrates the necessity of the existence of cache; Some typical problems in the cache scene are analyzed and elaborated. Finally, two top-level cache architecture examples of Weibo and Zhihu are used to echo the above content. Original is not easy, if you feel helpful, welcome the help of readers friends to share, after all, Hui Ren brand kidney treasure, everyone is really good ~

High concurrency series of historical articles wechat link documents

Vertical performance improvement 1.1. Architecture optimization: cluster deployment and load balancing 1.2. Implementation of load balancing under trillions of traffic 1.3. Architecture Optimization: Clever Use of Messaging middleware 1.4. Storage optimization: Mysql indexing principles and optimization 1.6. Storage optimization: explain index optimization combat 1.7. Storage optimization: explain sub-database sub-table 1.8. This article: Storage optimization: Cache is king

Copyright notice: This article is an original article by Coder’s Technical Path. Article reprint please contact the author.

The resources

Universal brigade – [1] : www.traveldaily.cn/article/925…

[2] CPU cache: manybutfinite.com/post/intel-…

[3] EhCache official website: www.ehcache.org/

[4] Deep Into Distributed Caching: China Machine Press

[5] Memcached official website: memcached.org/

[6] Analysis of the worst Facebook outage in history: www.infoq.cn/article/Bb2…

[7] billions of level day visits application cache architecture design: how to do my.oschina.net/JKOPERA/blo…

[8] Brief analysis of The CacheService architecture on Weibo: www.infoq.cn/article/wei…

[9] trillion-dollar day visits the Redis nine years optimization process in weibo: cloud.tencent.com/developer/n…

[10] weibo Redis customized road: developer.aliyun.com/article/625…

[11] Design of query System architecture with trillions of read data on Zhihu home page: Sharing at Qcon

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

High Concurrent storage optimization: Cache is king