1. Basic idea of caching

Many friends only know that caching can improve system performance and reduce the response time of requests, but do not know what the essence of caching is.

The basic idea of caching is actually very simple, it is very familiar with space to time. Don’t take caching too seriously, although it does provide a cost-effective performance boost to your system.

In fact, as we learn to use caching, you’ll see that the idea of caching is actually used a lot in operating systems and elsewhere. For example, the CPU Cache is used to Cache memory data to resolve the mismatch between CPU processing speed and memory, and the memory Cache is used to resolve the slow disk access speed. Another example is that the operating system introduces fast tables on the basis of the page table scheme to speed up the translation of virtual addresses to physical addresses. We can think of a fast table as a special kind of Cache.

Going back to the business system: in order to prevent users from getting too slow when they request data, we added a caching layer on top of the database to compensate.

When someone asks you again about the basic idea of caching, tell them the above paragraph 👆, I think it will make people look at you in a different way.

2. What problems does the use of caching bring to the system

There is no silver bullet in software system design, and often any technology introduced is like a double-edged sword. If you do it the right way, you can bring great benefits to the system. Otherwise, just spend energy not to please.

To put it simply, introducing caching into a system often leads to the following problems:

Ps: In fact, I think the actual cost of introducing local cache to do some simple business scenarios is almost negligible. The following 👇 is mainly for distributed cache.

  1. Increased system complexity: With the introduction of caches, you need to maintain data consistency between caches and databases, maintain hotspot caches, and so on.
  2. System development costs tend to increase: the introduction of caching means that the system needs a separate caching service, which has a corresponding cost, and this cost is very expensive, after all, it consumes precious memory. However, if you are simply using a local cache to store simple and small amounts of data, there is no need for a separate cache service.

3. Local cache solution

Let’s start by talking about local caching, which is actually used a lot in many projects, especially singleton architectures. If the data volume is small and there are no distribution requirements, it is possible to use local caching.

A common singleton architecture diagram is as follows. We use Nginx for load balancing, deploying two identical services to the server, using the same database and using local caching.

So what are the solutions for local caching? Let’s listen to Guide.

One: JDK built-inHashMapConcurrentHashMap.

ConcurrentHashMap can be thought of as a thread-safe version of HashMap, both of which hold key-value pairs in the form of key/value. However, most scenarios do not use either of these as caches because they only provide caching functionality and do not provide other functions such as expiration time. A slightly better caching framework would provide at least three things: expiration time, elimination mechanisms, and hit ratio statistics.

2:EhcacheGuava CacheSpring CacheThese three are the most commonly used local caching frameworks.

  • EhcacheIt’s heavier than the other two. However, compared toGuava CacheSpring CacheSpeaking,EhcacheSupport can be embedded in Hibernate and Mybatis as a multi-level cache, and can persist cached data to local disk, but also provides clustering solution (compared to the weak, can be ignored).
  • Guava CacheSpring CacheThe two words are more similar.GuavaCompared to theSpring CacheIf you want to use it a little bit more, it provides an API which is very convenient for us to use, and also provides the ability to set the cache validity time and so on. Its internal implementation is also cleaner, with a lot of overlapConcurrentHashMap“Has the same idea.
  • useSpring CacheThe code looks clean and elegant, but is prone to problems such as cache penetration and memory overflow.

Caffeine, a rising star.

Caffeine is better than Guava in various aspects, such as performance, and is generally recommended to replace Guava. Also, Guava and Caffeine are used in a very similar way!

Local caching is great, but there are obvious drawbacks, such as the inability to share data cached locally between multiple identical services.

4. Why a distributed cache? / Why not just use local cache?

The advantages of local caching are obvious: low dependency, lightweight, simple, and low cost.

However, local caching

  1. Local caching is not friendly to distributed architecture support. For example, when the same service is deployed on multiple machines, the cache between services cannot be shared because the local cache is only available on the current machine.
  2. Local cache capacity is significantly limited by the machine on which the service is deployed. If the current system services consume a lot of memory, then the local cache has less capacity available.

We can think of Distributed Cache as an in-memory database service whose ultimate purpose is to provide cached data.

A simple architecture diagram using distributed caching is shown below. We use Nginx for load balancing, deploying two identical services to the server that use the same database and cache.

  1. With distributed caching, the cache is deployed on a single server, and the same cache is used even if the same service is deployed on multiple machines. Moreover, the performance, capacity, and functionality provided by a separate distributed cache service are much more powerful.

The downside of using distributed caching, which is also obvious, is that you need to introduce additional services for distributed caching such as Redis or Memcached, and you need to make the Redis or Memcached services available separately.

5. Cache read/write mode/update policy

Each of the three cache read/write modes has its advantages and disadvantages. There is no optimal mode. Select the appropriate cache read/write mode based on specific service scenarios.

5.1. Cache Aside Pattern

  1. Write: update DB, then delete cache directly.
  2. Read: Reads data from the cache and returns it. If the data cannot be read, reads data from the DB and returns it to the cache.

In Cache Aside Pattern, the server needs to maintain both the DB and the Cache, and the result is based on the DB. In addition, in Cache Aside Pattern, the data must not be in the Cache for the first time. For hot data, it can be put into the Cache in advance.

Cache Aside Pattern is a Cache read/write Pattern that we usually use. It is suitable for scenarios with a lot of read requests.

5.2. Read/Write Through Pattern

In the Read/Write Through scenario, the server reads and writes data from the cache as the primary data store. The cache service is responsible for reading and writing this data to the DB, relieving the application of its responsibility.

  1. Write Through: Checks the cache. If the cache does not exist, the DATABASE is directly updated. If the cache exists, the cache is updated first, and the cache service updates the DB (cache and DB) simultaneously.
  2. Read Through: Reads data from the cache and returns it. If the data cannot be read, it is loaded from DB and written to cache before returning a response.

Read-Through Pattern actually encapsulates above cache-aside Pattern. In cache-aside Pattern, if there is no data in the Cache, the client writes the data to the Cache, while the Cache service writes the data to the Cache. This is transparent to the client.

Like Cache Aside Pattern, the Read-Through Pattern must not be cached for the first time. For hot data, it can be put into the Cache in advance.

5.3. Write Behind Pattern

Write Behind Pattern is similar to Read/Write Through Pattern. In both cases, the cache service is responsible for cache and DB reads and writes.

However, there is a big difference between the two. Read/Write Through updates the cache and DB synchronously, while Write Behind Caching updates only the cache and does not update the DB directly. Instead, it updates the DB asynchronously in batches.

The Write performance of DB in Write Behind Pattern is very high, especially suitable for some business scenarios where data often changes, such as the number of likes and readings of an article. Normally a post with 500 likes would need to be changed 500 times, but with Write Behind Pattern you might only need to change the DB once.

However, this mode also imposes a new test on the consistency between DB and Cache. In many cases, if the data is not asynchronously updated to DB, the Cache service will fail.