This paper mainly introduces the theory of cache in large distributed system, common cache components and application scenarios.

1 Overview of Caching

Summary of the cache

2 Cache classification

There are four main types of caches

Cache classification

2.1 the CDN cache

Basic introduction

The basic principle of Content Delivery Network (CDN) is to widely adopt various cache servers and distribute these cache servers to relatively concentrated regions or networks where users visit websites. When users visit websites, The global load technology is used to direct the user’s access to the nearest cache server, which can directly respond to the user’s request

Application scenarios

Mainly cache static resources such as images and videos

Application of figure

CDN cache is not used
Use CDN cache

advantages

advantages

2.2 Reverse proxy Caching

Basic introduction

The reverse proxy resides in the application server room and processes all requests to the WEB server. If the page requested by the user is buffered on the proxy server, the proxy server sends the buffered content directly to the user. If there is no buffer, a request is made to the WEB server to retrieve the data, which is cached locally and then sent to the user. This reduces the load on the WEB server by reducing the number of requests to the WEB server.

Application scenarios

Generally, only small static file resources such as CSS, JS, and images are cached

Application of figure

Reverse proxy cache application diagram

Open source implementation

Open source implementation

2.3 Local Application Caching

Basic introduction

It refers to the cache component in the application. Its biggest advantage is that the application and cache are in the same process, the request cache is very fast, without too much network overhead, etc. It is suitable to use local cache in the scenario where a single application does not need cluster support or nodes do not need to notify each other in the case of cluster. At the same time, its disadvantage is that the cache is coupled with the application program, multiple applications cannot directly share the cache, each application or cluster node needs to maintain its own separate cache, which is a waste of memory.

Application scenarios

Cache common data such as dictionaries

The cache media

The cache media

implementation

Programming direct implementation

Programming direct implementation

Ehcache

Basic introduction

Ehcache is???? An open-source, standards-based cache that improves performance, offloads databases, and simplifies scalability. It is the most widely used Java-based cache because it is powerful, proven, full-featured, and integrated with other popular libraries and frameworks. Ehcache can scale from in-process cache to hybrid in-process/out-of-process deployment using terabyte cache

Application scenarios
Ehcache application scenarios
Ehcache architecture diagram
Ehcache architecture diagram
Ehcache main features
Ehcache main features
Ehcache Cache data expiration policy
Cache data expiration policy. PNG
Ehcache obsolete data elimination mechanism

Lazy elimination mechanism: Each time data is put into the cache, it will save a time, and the TTL is compared with the set time to determine whether the data is expired

Guava Cache

2.4 Distributed Cache

Basic introduction

Guava Cache is a Cache tool in Google’s Open source Java reuse toolkit

Features and Functions
Guava Cache features and functions. PNG
Application scenarios
PNG application scenario of Guava Cache
Data structure diagram
Guava Cache data structure diagram
Guava Cache structure. PNG
Cache update strategy
Guava Cache update policy
Cache reclamation policy
Guava Cache Reclaiming policy. PNG

2.4 Distributed Cache

A cache component or service is separated from an application. The biggest advantage of a cache component or service is that it is an independent application and is isolated from a local application. Multiple applications can directly share the cache.

Main Application Scenarios

Application scenario of distributed cache. PNG

Main Access mode

Distributed cache access mode. PNG

Here are two common open source implementations of distributed caching, Memcached and Redis

Memcached

Basic introduction

Memcached is a high-performance, distributed memory object caching system that can be used to store data in a variety of formats, including images, videos, files, and database retrieval results, by maintaining a single large hash table in memory. Simply put, the data is called into memory, and then read from memory, thus greatly improving the reading speed.

The characteristics of

Memcached characteristics

Basic architecture

Memcached basic architecture

Cache data expiration policy

LRU (least recently used) expiration policy. When storing data items in Memcached, you can specify when they will expire in the cache. The default is permanent. When the Memcached server runs out of allocated memory, the stale data is replaced first and then with the most recently unused data.

Data obsolescence internal implementations

Lazy elimination mechanism: Each time data is put into the cache, it will save a time, and the TTL is compared with the set time to determine whether the data is expired

Distributed Cluster implementation

There is no “distributed” functionality on the server. Each server is a completely separate and isolated service. Memcached’s distribution is implemented by the client program

Data Read and write Flow chart
Memcached distributed cluster implementation

Redis

Basic introduction

Redis is a remote in-memory database (non-relational database) with strong performance, replication features and a unique data model for problem solving. It can store mappings between key-value pairs and five different types of values, persist key-value pair data stored in memory to hard disk, use replication features to extend read performance, and use client sharding to extend write performance. Built in replication, LUA scripting, LRU eviction, transactions and different levels of disk persistence, And provides high availability through Redis Sentinel and Automated partitioning (Cluster).

The data model

Redis data model

Data elimination strategy

Redis data elimination strategy

Data obsolescence internal implementations

Redis data out of internal implementation. PNG

Persistent mode

Redis persistence

The underlying implementation partially parses

  • Diagram of part of the start-up process

    Part of the process of starting
  • Part of the operation diagram for server-side persistence

    Part of the operation of server-side persistence
  • Low-level hash table implementation (progressive Rehash)

Initialize the dictionary

Initialize the dictionary

Added dictionary element diagrams

Added dictionary element diagrams

Rehash executes the process

Rehash executes the process

Cache design Principles

Redis cache design principles. PNG

Redis compared to Memcached

Redis Memcached
Supported data structures Hash, list, set, ordered set Pure kev – value
Persistence support There are There is no
High availability support Redis naturally supports clustering, which enables active replication and read-write separation. The Sentinel cluster management tool is also available, enabling master/slave service monitoring and automatic failover, all of which are transparent to the client without application changes or human intervention Need secondary development
Storage Value capacity Maximum 512 m Maximum 1 m
Memory allocation Temporary space request may cause debris Pre-allocate memory pools to manage memory, saving memory allocation time
Virtual memory usage It has its own VM mechanism, which can theoretically store more data than the physical memory. When the amount of data exceeds the threshold, swap will be triggered to flush cold data to the disk All data is stored in physical memory
A network model The non-blocking IO multiplexing model provides sorting and aggregation functions other than KV storage. Complex CPU calculations can block the entire IO scheduling when performing these functions Nonblocking IO multiplexing model
Horizontal scaling support no no
multithreading Redis supports single threading Memcached supports multi-threading, and Memcache is better than Redis in CPU utilization
Expiry policies There are special threads to clear cached data Lazy elimination mechanism: Each time data is put into the cache, it will save a time, and the TTL is compared with the set time to determine whether the data is expired
Stand-alone QPS About 10 w About 60 w
Source code readability Clean and concise code Can be considered too much scalability, multi-system compatibility, code is not clean
Applicable scenario Complex data structure, persistence, high availability requirements, and large value storage contents Pure KV, very large amount of data, very large amount of concurrent business

reference

Learning architecture from scratch — Alibaba’s Li Yunhua

Java Core Technology lecture 36 — Oracle Xiaofeng Yang

Analyzing Redis architecture design — God forbid

Memcached official document

Redis persistence mode RDB and AOF difference — 58 Shen Jian

Cache, are you really using it right? 58 Shen Jian –

Redis or memcached? 58 Shen Jian –

Caching those things — Meituan tech team

Redis cache design principles – Snow Feihong

Redis cache policy and primary key invalidation mechanism — Bing Yue

Reproduced description:

https://juejin.cn/post/6844903636770750472

– MORE | – MORE excellent articles

  • Send this article to the next person who asks you what the Java memory model is.

  • Distributed transaction solutions — Flexible transaction and Service patterns

  • That’s all you need to know about proxies in Java.

  • How Java code is compiled into machine instructions.

If you saw this, you enjoyed this article.

So please long press the QR code to follow Hollis

Forwarding moments is the biggest support for me.