Why do we do distributed use Redis?

Author: Yao Dengyan

www.cnblogs.com/yaodengyan/

When using Redis in actual development, the vast majority of business programmers only know two operations, Set Value and Get Value, and lack a general understanding of Redis. Here is a summary of Redis common problems to solve our knowledge blind spots.

1. Why Redis

When using Redis in a project, there are two main considerations: performance and concurrency. If only for distributed locks and other functions, there are other middleware such as Zookpeer instead of Redis is not necessary.

Performance:

As shown in the figure below, caching is especially useful for SQL that takes a long time to execute and whose results do not change frequently. Subsequent requests are then read from the cache, allowing the request to respond quickly.

Especially in seckill system, at the same time, almost everyone is clicking, are ordering… . You do the same thing — look up data from the database.

There is no set standard for response time, depending on the interaction. Ideally, our page jumps need to be solved in a split second, and for in-page operations in a split second.

Concurrent:

As shown in the figure below, in the case of large concurrency, all requests directly access the database, and the database will have a connection exception. At this point, you need to do a buffer operation using Redis so that the request accesses Redis first, rather than directly accessing the database.

Common problems with Redis

Cache and database double-write consistency problems
Cache avalanche problem
Cache breakdown problem
Concurrency contention issues for caches

2. Why is single-threaded Redis so fast

This question is an examination of the internal mechanism of Redis. Many people don’t know that Redis is a single-threaded working model.

The reasons are mainly as follows:

Pure memory operation
Single threaded operation, avoiding frequent context switches
A non-blocking I/O multiplexing mechanism is used

To elaborate on the I/O multiplexing mechanism, let’s take an example: Xiao Ming opens A fast-food restaurant in City A and is responsible for the local fast-food service. Because of financial constraints, Xiao Ming hires a group of delivery men, and then Qu finds that the money is not enough, just enough to buy a car to deliver the delivery.

Operation Mode I

Every time a customer placed an order, Ming had a delivery man watch over him and then sent someone to drive him. Slowly xiaoqu found the following problems exist in this mode of operation:

Time is spent grabbing cars, most of the delivery people are idle, grabbing cars to deliver.
As the number of orders increased, so did the number of deliverymen. Xiao Ming found that the store was getting crowded and he could not hire new deliverymen.
Coordination among deliverymen takes time.

Taking the above shortcomings into consideration, Xiao Ming put forward the second management mode.

Mode of Operation ii

Xiaoming employs only one delivery man. When the customer orders, Xiao Ming according to the place of delivery marked, put in one place. Finally, let the delivery person in turn drive the car to deliver, send good come back to take one. Compared with the above two modes of operation, it is obvious that the second mode is more efficient.

In the above metaphor:

Every deliverer → every thread
Per order → per Socket(I/O stream)
Where the order arrives → different states of the Socket
Customer meal delivery request → Request from the client
The operating mode of Mingqu → the code that the server runs
One car → number of CPU cores

The conclusion is as follows:

The first is the traditional concurrency model, where each I/O flow (order) is managed by a new thread (deliverer).
Business mode two is I/O multiplexing. A single thread (a dispatcher) manages multiple I/O flows by tracking the state of each I/O flow (where each dispatcher arrives).

An analogy to the real Redis threading model is shown below:

The redis-client generates sockets with different event types during operation. On the server side, you have a piece of I/O multiplexing that you put into a queue. The file event dispatcher, in turn, fetches it from the queue and forwards it to the different event handlers.

3. Data types and usage scenarios of Redis

A qualified programmer will use all five types.

String

The most common set/get operation, Value can be either a String or a number. Generally do some complex counting function of the cache.

Hash

Here Value stores a structured object, and it is convenient to manipulate a field within it. I use this data structure to store user information when I do single sign-on. CookieId is used as the Key and 30 minutes is set as the cache expiration time, which can well simulate the effect similar to Session.

List

Using the List data structure, you can do simple message queue functions. In addition, you can use lrange command, do based on Redis paging function, excellent performance, good user experience.

Set

Because a Set is a collection of non-repeating values. So you can do global deduplication function. Our systems are generally clustered, and using the Set that comes with the JVM is cumbersome. In addition, it is the use of intersection, union, difference set and other operations, can calculate common preferences, all preferences, their own unique preferences and other functions.

Sorted Set

Sorted Set has a weight parameter Score. The elements in the Set can be Sorted according to Score. You can do ranking application, take TOP N operation. Sorted Set can be used to do delayed tasks.

4. Redis expiration strategy and memory elimination mechanism

This will tell you if Redis works well. For example, you can only store 5 gigabytes of data in Redis, but you write 10 gigabytes, that will delete 5 gigabytes of data. How to delete? Have you thought about this problem?

Truth: Redis uses a strategy of periodic deletion + lazy deletion.

Why not use a timed deletion policy

A timer is used to monitor the Key. If the Key expires, it will be deleted automatically. Although memory is released in time, it consumes CPU resources. In the case of large concurrent requests, the CPU spends time processing the request, not deleting the Key, so this strategy is not used.

How does periodic deletion + lazy deletion work

Delete periodically. Redis checks every 100ms by default. Delete expired keys. It should be noted that Redis does not check all keys once every 100ms, but randomly selects them for checking. If the periodic deletion policy is adopted, many keys are not deleted. Here, lazy delete comes in handy.

Use periodic delete + lazy delete there is no other problem

No, if the Key is not deleted on a regular basis. And you didn’t request the Key in time, meaning lazy deletion didn’t take effect either. As a result, Redis memory gets higher and higher. Then memory flushing should be used.

There is a line of configuration in redis.conf:

# maxmemory-policy volatile-lruCopy the code

This configuration is configured with a memory elimination policy:

Noeviction: New write operations will bug when memory is insufficient to accommodate new write data.
Allkeys-lru: Removes the least recently used Key from the Key space when memory is insufficient to accommodate new writes. (Recommended, currently the project is using this) (most recently used algorithm)
Allkeys-random: Randomly removes a Key from the Key space when memory is insufficient to accommodate new writes. (Should also no one use it, you do not delete at least use Key, to randomly delete)
Volatile – lRU: Removes the least recently used Key from the expired Key space when memory is insufficient to accommodate new writes. This is generally used when Redis is used for both caching and persistent storage. (Not recommended)
Volatile -random: Randomly removes a Key from the expired Key space when memory is insufficient to accommodate new writes. (Still not recommended)
Volatile – TTL: When the memory is insufficient to accommodate new data, the Key whose expiration time is earlier is removed from the Key space. (Not recommended)

5. Redis and database double write consistency problem

Consistency problems can also be subdivided into final consistency and strong consistency. Database and cache double writes, there is bound to be inconsistency issues. The premise is that if the data has strong consistency requirements, can not slow down the storage. All we can do is ensure final consistency.

In addition, what we do is fundamentally reduce the probability of inconsistencies. Therefore, data with strong consistency requirements cannot be slowed down. First, adopt the correct update strategy, update the database first, then delete the cache. Second, because there may be a failure to remove the cache, provide a remedy, such as using message queues.

6. How to deal with cache penetration and cache avalanche

These two problems, general medium and small traditional software enterprise encounters very hard. If there is a large concurrent project, traffic of millions or so, these two issues must be considered deeply. Cache penetration occurs when a hacker intentionally requests data that does not exist in the cache, causing all requests to be directed against the database and causing the database connection to fail.

Cache penetration solution:

With mutex, when the cache fails, first to get the lock, get the lock, and then to request the database. If no lock is obtained, sleep for a while and try again.
In an asynchronous update policy, the Key is returned regardless of whether the value is obtained. Value maintains a cache expiration time. If the cache expires, a thread is asynchronously started to read the database and update the cache. Cache preheating (loading the cache before starting the project) is required.
Provide an interception mechanism to quickly determine whether a request is valid, for example, using bloom filters to internally maintain a set of valid keys. Quickly determine whether the Key carried by the request is valid or not. If not, return directly.

Cache avalanche refers to a large cache failure at the same time, when a new wave of requests are sent to the database, causing the database connection to fail.

Cache Avalanche solution:

Add a random value to the cache expiration time to avoid collective invalidation.
Mutex is used, but the throughput of this scheme is significantly reduced.
Double cache. We have two caches, cache A and cache B. The expiration time of cache A is 20 minutes. The expiration time of cache B is not set. Do your own cache preheating.
Then subdivide the following points: read the database from cache A, or return it directly; A has no data, reads data directly from B, returns data directly, and asynchronously starts an update thread that updates both cache A and cache B.

7. How to solve the problem of concurrent competing keys in Redis

The problem is that multiple subsystems Set a Key at the same time. What should we pay attention to at this time? The Redis transaction mechanism is generally recommended.

However, I do not recommend using Redis transaction mechanism. Because our production environment, basically Redis cluster environment, do data sharding operation. If you have multiple Key operations in a transaction, they may not all be stored on the same Redis-server. Therefore, Redis transaction mechanism, very weak.

If you operate on this Key, no order is required

In this case, prepare a distributed lock, everyone to grab the lock, grab the lock to do the set operation, relatively simple.

If you operate on this Key, order is required

If there is A key1, system A needs to set key1 to valueA, system B needs to set Key1 to valueB, and system C needs to set key1 to valueC.

Values of key1 are expected to change in the order valueA > valueB > valueC. In this case, we need to save a timestamp when the data is written to the database.

Assume the timestamp is as follows:

System A Key 1 {valueA 3:00}
System B Key 1 {valueB 3:05}
System C Key 1 {valueC 3:10}

So, suppose system B grabs the lock first and sets key1 to {valueB 3:05}. Then system A grabs the lock, finds that its valueA timestamp is older than the timestamp in the cache, and does not set, and so on. Other methods, such as making use of queues and making set methods serial access, are also available.

8, summary

Redis can be seen in major domestic companies, such as Sina, Ali, Tencent, Baidu, Meituan, Xiaomi and so on. Learning Redis, these aspects are particularly important: Redis client, Redis advanced functions, Redis persistence and development operation and maintenance of common problems, Redis replication principle and optimization strategy, Redis distributed solutions and so on.