This paper mainly introduces the theory of cache in large distributed system, common cache components and application scenarios.

Summary of the cache

Cache classification

Caches are mainly divided into four types, as shown in the following figure:

Cache classification

CDN cache

The basic principle of Content Delivery Network (CDN) is to use a wide range of cache servers and distribute these cache servers to centralized regions or networks.

When the user visits the website, the global load technology is used to direct the user’s access to the nearest cache server, which can directly respond to the user’s request.

Application Scenarios:Mainly cache static resources such as images and videos.

CDN cache application is shown as follows:

CDN cache is not used

Use CDN cache

The advantages of CDN cache are shown as follows:

advantages

Reverse proxy cache

The reverse proxy resides in the application server room and processes all requests to the Web server.

If the page requested by the user is buffered on the proxy server, the proxy server sends the buffered content directly to the user.

If there is no buffer, a request is made to the Web server to retrieve the data, which is cached locally and then sent to the user. This reduces the load on the Web server by reducing the number of requests to the Web server.

Application Scenarios:Generally, only small static file resources such as CSS, JS, and images are cached.

The reverse proxy caching application is shown below:

Reverse proxy cache application diagram

The open source implementation is shown below:

Open source implementation

Local application cache

Cache component refers to the Cache component in the application. Its biggest advantage is that the application and Cache are in the same process, the request Cache is very fast, without too much network overhead.

Local caching is suitable for scenarios where a single application does not need clustering support or where nodes do not need to notify each other.

At the same time, its disadvantage is that the cache is coupled with the application program, multiple applications cannot directly share the cache, each application or cluster node needs to maintain its own separate cache, which is a waste of memory.

Application Scenarios:Cache common data such as dictionaries.

The cache media is as follows:

The cache media

The programming is directly realized as shown in the figure below:

Programming direct implementation

Ehcache

Basic introduction:Ehcache is a standards-based open source cache that improves performance, offloads databases, and simplifies scalability.

It is the most widely used Java-based cache because it is powerful, proven, full-featured, and integrated with other popular libraries and frameworks.

Ehcache can scale from in-process cache to hybrid in-process/out-of-process deployment using terabyte cache.

The application scenarios of Ehcache are as follows:

Ehcache application scenarios

The architecture of Ehcache is shown below:

Ehcache architecture diagram

The main features of Ehcache are as follows:

Ehcache main features

The Ehcache data expiration policy is shown below:

Cache data expiration policy

Ehcache obsolete data elimination mechanism:Lazy elimination mechanism: Each time data is put into the cache, it will save a time. When reading data, it will compare the TTL with the set time to determine whether it is expired.

Guava Cache

Basic introduction:Guava Cache is a Cache tool in Google’s Open source Java reuse toolkit.

Guava Cache features and functions:

Features and functions of Guava Cache

The application scenario of Guava Cache is as follows:

Application scenarios of Guava Cache

Guava Cache data structure:

Guava Cache data structure diagram

Guava Cache structure features

The Guava Cache update strategy is as follows:

Guava Cache update policy

The following figure shows the Cache reclamation strategy of Guava Cache:

Guava Cache Cache reclaiming policy

Distributed cache

A cache component or service is separated from an application. The biggest advantage of a cache component or service is that it is an independent application and is isolated from a local application. Multiple applications can directly share the cache.

The main application scenarios of distributed cache are as follows:

Distributed cache application scenario

The main access modes of distributed cache are as follows:

Distributed cache access mode

Here are two common open source implementations of distributed caching, Memcached and Redis.

Memcached

Memcached is a high-performance, distributed memory object caching system that can be used to store data in a variety of formats, including images, videos, files, and database retrieval results, by maintaining a single large Hash table in memory.

Simply put, the data is called into memory, and then read from memory, thus greatly improving the reading speed.

The features of Memcached are shown below:

Memcached characteristics

The basic architecture of Memcached is shown below:

Memcached basic architecture

Cache data expiration policy:LRU (least recently used) expiration policy. When storing data items in Memcached, you can specify when they will expire in the cache. The default is permanent.

When the Memcached server runs out of allocated memory, the stale data is replaced first, followed by the recently unused data.

Internal implementation of data elimination:Lazy elimination mechanism: Each time data is put into the cache, it will save a time. When reading data, it will compare the TTL with the set time to determine whether it is expired.

Distributed cluster implementation:There is no “distributed” functionality on the server. Each server is a completely separate and isolated service. Memcached’s distribution is implemented by the client program.

Data Read and write Flow chart

Memcached distributed cluster implementation

Redis

Redis is a remote in-memory database (non-relational database) with strong performance, replication features and a unique data model for problem solving.

It can store mappings between key-value pairs and five different types of values, can persist key-value pair data stored in memory to disk, and can use replication features to extend read performance.

Redis can also extend write performance with client sharding, built-in replication, LUA scripting, LRU eviction, Transactions and different levels of disk persistence.

And provides High Availability through Redis Sentinel and Automated partitioning (Cluster).

The data model of Redis is shown below:

Redis data model

Redis’ data elimination strategy is shown below:

Redis data elimination strategy

Redis’ internal implementation of data elimination is shown below:

Redis data eliminates internal implementations

Redis persists as shown below:

Redis persistence

Part of the underlying implementation of Redis is analyzed as follows:

Diagram of part of the start-up process

Part of the operation diagram for server-side persistence

The underlying hash table implementation (progressive Rehash) is shown below:

Initialize the dictionary

Added dictionary element diagrams

Rehash executes the process

Redis cache design principles are shown below:

Redis cache design principles

Redis compares to Memcached as follows:

Redis compared to Memcached

The following describes common cache architecture problems, solutions and industry cases.

Hierarchical cache architecture design

The complexity of caching

Common problems are as follows:

Data consistency
The cache to penetrate
Cache avalanche
Cache high availability
Caching hot

The following describes the problems and corresponding solutions one by one.

Data consistency

Because the cache is a copy of persistent data, data inconsistencies inevitably occur, resulting in dirty reads or unread data.

Data inconsistency is usually caused by network instability or node faults. There are three common scenarios and their solutions:

The cache to penetrate

The cache usually exists in key-value mode. When a Key does not exist, the database will be queried. If the Key does not exist all the time, the database will be frequently requested, causing access pressure to the database.

Main solutions:

Data with empty results is also cached, and when there is data for this Key, the cache is cleared.
If the Key does not exist, bloom filter is used to create a large Bitmap and filter the Key through the Bitmap.

Cache avalanche

Cache high availability

The high availability of cache depends on actual scenarios. Not all services require high availability of cache. You need to design a solution based on specific services, for example, whether critical points affect back-end databases.

Main solutions:

Distributed: Massive cache of data.
Replication: High availability of cached data nodes.

Caching hot

Some hot data may concurrently access the same cache data, resulting in excessive pressure on the cache server.

Solution: Duplicate multiple cache copies to distribute requests to multiple cache servers, reducing the pressure on a single cache server caused by cache hotspots

The case

The case mainly refers to the technology share of Chen Bo on Sina Weibo. You can see the original article “How to design the Cache Architecture for the application of daily page views of ten billion?”

Technical challenges

Feed cache architecture diagram

Architectural features

Sina Weibo applies SSD in the distributed Cache scenario, extending the traditional Redis/MC + MySQL mode to Redis/MC + SSD Cache + MySQL mode.

Using SSD Cache as L2 Cache reduces the high cost and small capacity of MC/Redis, and also solves the database access pressure caused by penetrating DB.

Mainly in data architecture, performance, storage cost, servitization and other aspects of optimization and enhancement.

References:

Learning architecture from scratch — Alibaba’s Li Yunhua
Java Core Technology lecture 36 — Oracle Xiaofeng Yang
Analyzing Redis architecture design — God forbid
Memcached official document
Redis persistence mode RDB and AOF difference — 58 Shen Jian
Cache, are you really using it right? 58 Shen Jian –
Redis or memcached? 58 Shen Jian –
Caching those things — Meituan tech team
Redis cache design principles – Snow Feihong
Redis cache policy and primary key invalidation mechanism — Bing Yue
Design practice of Microblog Cache Architecture — Bo Chen
The best application of caching in large distributed system — Hou Zhonghao
Cache, concurrent update pit? 58 Shen Jian –
Distributed cache design — crossoverJie

Author: Chen Caihua

Editors: Tao Jialong, Sun Shujuan

Reference: https://juejin.cn/post/6844903636770750472, authorized by the author.

Excellent article recommendation:

It took me 14 hours to find out where the Changchun Changsheng were sold

From Memcache to Redis: Using filled “pits” for caching

How to design a highly available distributed architecture?