Wechat official account: Moon chat technology

Focus on the choice of “star standard”, heavy dry goods, first time delivery! [If you find the article helpful, you are welcome to follow, read, like, retweet]


Hello, I’m Moon.

Redis is the most commonly used in-memory database. It can be found in many places, such as the storage of login information and the use of distributed locks. Redis is often used as a cache.

But reids have been used for so long, do you understand it?

Other articles in the Interview Essay series


[TOC]

1. What is Redis? What can it do?

Redis: Redis refers to Remote Dictionary Server, which can be translated as Remote data service or Remote Dictionary service. It is a key-value storage system written in C language

Application scenarios: Caches, databases, message queues, distributed locks, like lists, leaderboards, etc

2. What are the eight data types in Redis? What are the application scenarios?

In total, Redis has eight data structures, five basic data types and three special data types.

There are five basic data types:

  • 1. String: a string type, often used to store counters, fan counts, etc. Simple distributed locks also use this type
  • 2. Hashmap: In the form of key-value, value is a map
  • 3. List: The basic data type, list. In Redis, you can use lists as stacks, queues and blocking queues.
  • 4. Do not have duplicate elements, can do like, collection, etc
  • 5. Zset: ordered set, no duplicate elements, ordered set each element needs to specify a fraction, according to the fraction of the elements in ascending order. You can make leaderboards

Three special data types:

  • 1. Geospatial: Redis introduced the Geo type in 3.2, which can calculate the geographical location information and the distance between two places.
  • 2. Hyperloglog: The number of elements of a set that cannot be repeated mathematically. This data structure is often used for statistical web site UV.
  • 3. Bitmap: A bitmap is set to 0 or 1 by the smallest bit, indicating the value or state of an element. A bit value, either 0 or 1; In other words, the most information a bit can store is 2. Bitmaps are often used to collect user information such as active and inactive fans, logged in and logged out, and whether or not they clocked in.

3. Why is Redis so quick?

According to the official data, Redis can achieve nearly 10 WATTS of concurrency per second. The reasons for this speed are summarized as follows:

  • 1: Fully memory based operations
  • 2: Use a single-threaded model to handle client requests, avoiding context switching
  • 3:I/O multiplexing mechanism
  • 4: the use of C language, there are many optimization mechanisms, such as dynamic string SDS

4. I heard that redis 6.0 introduced multi-threading. Isn’t there a thread safety issue?

Don’t

In fact, Redis still uses the single-threaded model to handle client requests, only using multiple threads to handle data reading and writing and protocol parsing, and executing commands is still using a single thread, so there is no thread safety problem.

The reason why we added multithreading is because redis’ performance bottleneck is network IO, not CPU. Using multithreading can increase the efficiency of IO reading and writing, thus improving redis’ overall performance.

5. What are the persistence mechanisms of Redis? Pros and cons

Redis has two types of persistence, AOF and RDB.

AOF:

  • Every time redis executes a command, it records “the original statement of the command to an.aod file and persists the command to disk using the fsync policy” (excluding read commands).

The pros and cons of AOF

  • The “advantages” of AOF:
    • 1.AOF can “better protect data against loss”. Generally, AOF will perform fsync operation every second through a background thread
    • 2.AOF is a command appended directly to the end of the file, “very high write performance”
    • 3. The AOF log file command is recorded in a very readable way, which is very “suitable for catastrophic deletion emergency recovery”. If someone accidentally flushall all data in the log file, they can remove the flushall from the log file, as long as they haven’t executed rewrite. To restore

  • Disadvantages of AOF:
    • 1. For the same data source, the AOF file is usually larger than the RDB data snapshot
    • 2. Since.aof is written every time, “more performance is required” compared to RDB. Of course, there will be an Aof rewrite to optimize the Aof file.
    • 3. The data recovery is slow and is not suitable for cold backup.

RDB:

  • The default redis persistence method is to store the data in redis memory at a certain point in time in binary format in a file with a. RDB extension, i.e. “periodically back up the entire redis data”. Write synchronization is performed as a fork child process.

Advantages and disadvantages of RDB

  • Advantages of RDB:
    • 1. It saves all the data in Redis at a certain point in time, so when we do “large data recovery,RDB recovery speed will be very fast”

    • 2. Because of the FROK subprocess mechanism of RDB, the impact of teammates providing read and write services to clients is very small

  • Disadvantages of RDB:
    • 1: “Data loss may occur for a long time”
      • For example, suppose we are scheduled to back up data every 5 minutes. Redis backed up data at 10:00, but if the service fails at 10:04, we will lose all data between 10:00 and 10:04
    • 2: There may be a long pause: As we mentioned earlier, the fork of a child process is related to the amount of data redis is using. If there is “a large amount of data, it is likely to cause Redis to pause for a few seconds”

6. What are the deletion policies for Redis expired keys?

There are three types of expiration policies:

  • Periodic expiration: Each key that is set to expire needs to create a timer, and the timer will be cleared immediately when the expiration time expires. This policy clears expired data immediately and is memory friendly; However, processing expired data consumes a large amount of CPU resources, which affects cache response time and throughput.
  • Lazy expiration: Only when a key is accessed, it is determined whether the key has expired. When the key expires, it is cleared. This strategy maximizes CPU savings, but is very memory unfriendly. In extreme cases, a large number of expired keys may not be accessed again and will not be cleared, occupying a large amount of memory.
  • Periodic expiration: At regular intervals, a certain number of database expires dictionaries are scanned for a certain number of keys and the expired keys are cleared. The strategy is a compromise between the first two. By adjusting the interval of periodic scanning and the duration of each scan, you can achieve the optimal balance between CPU and memory resources under different conditions.

7. What happens when Redis’ memory is full?

In fact, Redis defines “8 memory elimination policies” to handle Redis when its memory is full:

  • 1. Noeviction: Returns an error directly without eliminating any existing Redis keys
  • 2. Allkeys-lru: allkeys are eliminated using lru algorithm
  • 3. Volatile – LRU: Use the LRU algorithm to eliminate those with expiration time
  • 4. Allkeys-random: Delete the Redis key randomly
  • 5. Volatile -random: The redis key with expiration time is deleted randomly
  • 6. Volatile – TTL: deletes the redis key that is about to expire
  • 7. Volatile – lFU: deletes the key with expiration time based on the LFU algorithm
  • 8. Allkeys-lfu: deletes allkeys based on the lfu algorithm

8. How to solve the hot key problem in Redis?

A hot key means that at a certain time, there are so many requests for access to a key that the traffic is so heavy that the REDI server goes down

Solution:

  • The results can be cached in local memory
  • Spread the hot key across different servers
  • Setting never expires

9. What are cache penetration, cache penetration, cache avalanche? What’s the solution?

Cache penetration:

  • Cache penetration means that the data requested by the user does not exist in the cache and does not exist in the database. As a result, the user has to query the data in the database every time he requests the data, and then returns null.

Solution:

  • Bloom filter
  • Returns an empty object

Cache breakdown:

  • Cache penetration is when a key is very hot and is constantly carrying a large number of concurrent accesses to the same point. When the key fails at the moment, the continuous large number of concurrent accesses the cache and directly requests the database. It is like a hole in the barrier.

Solution:

  • The mutex
  • Never expire

Cache avalanche:

  • Cache avalanche refers to the large amount of different data in the cache up to expiration time, and the amount of query data is so large that the requests fall directly onto the database, resulting in an outage.

Solution:

  • Uniform overdue
  • Add a mutex
  • Cache never expires
  • Two-tier cache strategy

10. How can Redis be deployed?

  • Single-machine deployment: This is also the most basic deployment. It requires only one machine, reads and writes, and is usually used only by developers for testing

  • Sentinel mode: Sentinel mode is a special mode, first Redis provides sentinel command, sentinel is a separate process, as a process, it will run independently. The idea is that sentry monitors multiple running Redis instances by sending commands and waiting for the Redis server to respond. It has automatic failover, cluster monitoring, message notification, and other functions.

  • Cluster mode: redis3.0 supports the cluster deployment mode. In this mode, data is automatically fragmented and a part of data is placed on each master to provide a built-in high availability service. Even if a master fails, the service can still be provided normally.

  • Master-slave replication: In master-slave replication, there are two types of databases. The first type is called master database and the other type is called slave database. The master database is responsible for all read and write operations in our system, and the slave database is responsible for all read operations in our database. One of the things that’s really happening in the workplace is that we have the primary database doing all the writes and the secondary database doing all the reads, just to separate the reads and the reads and take the pressure off the server.

11. What are the functions of sentinels?

  • 1. Monitor the entire master database and slave database to see whether they are running properly

  • 2. When the primary database is abnormal, the primary database is automatically upgraded from the primary database to ensure the stability of the entire service

12. What is the sentry election process like?

  • 1. The first sentry who notices that the master has failed sends an order to each sentry to elect him as the lead sentry

  • 2. The other sentries, if they have not voted for anyone else, will vote for the one who first notices that the master has failed

  • 3. The first sentinel to discover that the master has failed becomes the lead sentinel if he finds that more than half of the sentinels have voted for him and the number of sentinels exceeds the quoram parameter

  • 4. If multiple sentries participate in the election at the same time, the process is repeated until a lead sentry is selected

Once the lead sentinel is selected, troubleshooting begins, with a new master selected from the slave database

How to store data in cluster mode?

A cluster has a total of 16,384 nodes, and the cluster will allocate the 16,384 nodes equally to each node. Of course, by node I mean each master node, as shown below:

14. What is the fault recovery of cluster?

In a cluster, each node periodically sends a ping command to the other nodes to determine whether the other nodes are offline.

If there is no reply for a long time, the node that initiated the ping command will assume that the target node is suspected to be offline. This can also be called a subjective offline like the sentinel. Of course, a certain number of nodes in the cluster will assume that the target node is offline.

  • 1. When node A finds that the target node is suspected to be offline, it disseminates messages to other nodes in the cluster, and then other nodes send commands to the target node to determine whether the target node is offline
  • 2. If more than half of the nodes in the cluster think that the target node is offline, they mark the target node as offline and inform other nodes that the target node is offline in the whole cluster

15. What is the principle of master-slave synchronization?

  • 1. When a slave database is started, it sends a SYNC command to the master database. After receiving the command, the master saves the snapshot in the background, which is called RDB persistence

  • 2. After the snapshot is completed, cache commands and snapshots are packaged and sent to the slave node to ensure consistency between the primary and secondary databases.

  • (3) from the database after receive the snapshot and cache command will this part of the data written to a temporary file on disk, after the completion of the write will use the document to replace RDB snapshot files, of course, this operation is not blocked, can continue to receive the command, the specific reason is actually the fork a child process, with child processes to achieve these functions.

Since this part of the initialization is complete, when the master database executes the command to change the data, it will asynchronously send the data to the slave. This is also called the replication synchronization phase. This phase will continue throughout the synchronization process until the master/slave synchronization ends.

16. What is no hard disk replication?

We just said that the master and slave interact via RDB snapshots. Although the logic seems simple, there are some problems, but there are some problems.

  • 1. When the master disables the RDB snapshot, a master-slave synchronization (replication initialization) occurs and an RDB snapshot is generated. However, if the master sends a restart message, the RDB snapshot is used to restore the data

  • 2. In the master and slave system, a snapshot is taken each time the master and slave synchronize data. As a result, RDB files are generated on the hard disk

In order to solve this problem, Redis has added the diskless replication feature in a subsequent update, which means that the data is sent directly to the slave over the network, avoiding the interaction with the hard disk, but also has IO consumption