This paper links: https://blog.csdn.net/weixin_42742643/article/details/111581888

Why cache?

  1. High performance this scenario assumes that there is a request to query a database table, which takes 500ms. But that result may not change for another day. So put it in the cache, the next time the request comes, it does not need to spend 500ms to check the database table, directly do it in 2ms, improve performance by a hundred times.
  2. High concurrency mysql is not designed to allow you to play with high concurrency, although it can play, but natural support is not good. Mysql standalone support to 2000Qps is also easy to alarm. If a system has 10,000 requests per second, a single mysql machine will definitely fail. At this time, we need to slow down the data storage, the concurrency of single machine support is tens of thousands of thousands of seconds easily, is dozens of times of mysql single machine.
  3. Public number: Java source code

The difference between Redis and memcached

  • Redis has more data structures and supports richer data operations than Memcached.

  • Redis is faster than Memcached. In Memcached, you take the data to the client to make similar changes and set it back. This greatly increases the number of network IO and data volume. In Redis, these complex operations are usually just as efficient as regular GET/SET.

  • Redis can persist data

  • Redis natively supports the cluster mode

3. Redis persistence

1. RDB persistence mechanism

Redis uses RDB by default. This method is characterized by timing execution and high performance.

Advantages of RDB:

  • High persistence performance
  • Based on the RDB snapshot file, to restart redis very fast
  • Cache files are suitable for cold backup

Disadvantages of RDB:

  • Periodically persisting data may result in data loss
  • If the snapshot file is very large, the service may be suspended for milliseconds or even seconds.

2. AOF persistence mechanism

AOF allows data to be persisted by writing a write command to a file and provides two options: synchronous write and asynchronous write per second.

Synchronous write ensures that data is written to files and results are returned, avoiding data loss during downtime. Asynchronous writing per second ensures high performance and data loss of at most one second.

Advantages of AOF:

  • AOF can better protect data against loss
  • AOF log files are written in appends-only mode, so there is no disk addressing overhead and write performance is very high

Disadvantages of the AOF mechanism

  • AOF log files are usually larger than RDB data snapshot files for the same data

How to choose BETWEEN RDB and AOF

  • Don’t just use RDB, because that can cause a lot of data to be lost.

  • Also do not only use AOF, through AOF to do cold backup, no RDB to do cold backup, to recover faster

  • AOF and RDB persistence mechanisms are used in a comprehensive way. AOF is used to ensure that data is not lost as the first choice for data recovery. RDB is used for varying degrees of cold backup and for quick data recovery when AOF files are lost or corrupted and unavailable

Cache avalanche, penetration, breakdown

1. Cache penetration

Cache penetration refers to data that is neither in the cache nor in the database, and users continuously send requests for data with the ID of “-1” or data with a very large ID that does not exist, resulting in excessive database pressure

Solution:

  • Using a Bloom filter, a bitmap large enough to store keys that may be accessed, non-existent keys are directly filtered;

  • Interceptor, direct interception with id<=0

  • If the value of the key-value cannot be obtained from the cache or db, you can write the key-value to key-null and set a short expiration time, such as 30 seconds. (If the value is too long, the value cannot be used even in normal cases.) This prevents the attacking user from using the same ID repeatedly for violent attacks

2. Cache breakdown

When a hotspot data expires again, a large number of requests are sent to the database, causing huge pressure on the database. Solution:

  • Set hotspot data to never expire
  • Use setnx with mutex

3. Cache avalanche

A large number of keys are set with the same expiration time. As a result, all cache failures occur at the same time, resulting in a large number of DB requests and a sudden increase in pressure, causing an avalanche. Solution:

  • The expiration time is set to random

  • Set the hotspot data to never expire

  • If the cache database is deployed in a distributed manner, distribute the hotspot data evenly among different cache databases

5. Cache expiration policy

  • Periodically delete: randomly select some keys every 100ms to check whether they are expired, and delete them when they are expired

  • Lazy delete: When retrieving a key, Redis checks whether the key is expired if it has an expiration date. If it’s out of date it’s deleted at this point and it doesn’t give you anything back.

By combining these two methods, you ensure that expired keys are always killed. But this is actually a problem. What happens if you periodically delete a lot of expired keys and you don’t check them in time? What if a large number of expired keys pile up in memory, causing redis to run out of memory blocks?

This is where memory flushing is used, and the strategy is as follows

  • Noeviction: New write operations will bug when memory is insufficient to accommodate new write data, not recommended

  • Allkeys-lru: Removes the least-recently used key from the key space when memory is insufficient to accommodate new writes (this is the most commonly used)

  • Allkeys-random: Randomly removes a key from the key space when the memory is insufficient to hold new data. This parameter is not recommended

  • Volatile -random: Randomly removes a key from the expired key space when memory is insufficient to accommodate new writes

  • Volatile – TTL: When the memory is insufficient to accommodate new data, the key whose expiration time is earlier is removed from the key space

Update sequence of cache and database

1. Cache Aside Pattern

  • If the cache does not exist, read the database, fetch the data, put it in the cache, and return the response

  • When updating, delete the cache first and then update the database

2. Why delete the cache instead of update it

Sometimes the cache may not be values fetched directly from the database table, and may require fetching operations or complex queries, in which case updating the cache can be costly.

If you change multiple tables in a cache frequently, the cache will be updated frequently, the cache will be updated frequently, but the question is, will the cache be accessed frequently at all?

For example, if a table field is cached 20 times in 1 minute, the cache will be updated 20 times. However, the cache is read once in a minute, so a large cache update operation is meaningless. In fact, if you just delete the cache, the cache is recalculated within a minute, and the overhead is significantly reduced. Deleting the cache instead of updating it is also a lazy loading idea, waiting until you are ready to use it.

Why is Redis single thread efficient

  • Pure memory operation
  • The core is IO multiplexing mechanism based on non-blocking
  • Single-threading instead avoids the frequent context switching problems of multithreading

Eight, recommended reading

Mysql implements HIGH availability architecture MHA

Sub-database sub-table vs NewSQL database

No longer worried about being asked a high concurrency question in an interview?

[Redis] SpringBoot integration Redis

How is a query SQL executed