preface

Welcome to our GitHub repository Star: github.com/bin39232820… The best time to plant a tree was ten years ago, followed by now

Tips

The interview guide series, which in many cases does not delve into the details, is a way for students to review their knowledge in the role of the interviewee, so I assume that most of the things you, as the interviewer, know.

www.processon.com/view/link/6…

This is the brain map address

Where t

This series has also written several articles, today we will take a look at Redis, the first can be seen on Github. Then below is a summary of previous articles

  • 2021-Java Backend Engineer Interview Guide (Introduction)
  • 2021-Java Backend Engineer Interview Guide
  • 2021-Java Backend Engineer Interview Guide -(Concurrency – Multithreading)
  • 2021-Java Backend Engineer Interview Guide -(JVM)
  • 2021-Java Backend Engineer Interview Guide -(MySQL)

What is redis

Redis is an open source (BSD licensed) store of in-memory data structures used as databases, caches, and message brokers. It supports data structures such as strings, hashes, lists, collections, sorted collections with range queries, bitmaps, hyperlogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU exportation, transactions and different levels of disk persistence, and offers high availability through Redis Sentinel and Redis Cluster automatic partitioning.

Talk about the pros and cons of Redis

advantages

  • Excellent performance, Redis can read 110,000 times /s, write 81,000 times /s.
  • Data persistence: AOF and RDB are supported.
  • Transactions, all Redis operations are atomic, and Redis also supports atomic execution of several combined operations.
  • It supports hash, set, zset, and list data structures in addition to string values.
  • In master/slave replication, the master machine automatically synchronizes data to the slave machine, enabling read/write separation.

disadvantages

  • The library capacity is limited by physical memory and cannot be used for high-performance read and write of massive data. Therefore, Redis is mainly suitable for high-performance operations and operations with small data volume.
  • If the system is down, some data cannot be synchronized to the slave server in a timely manner. After the IP address is switched, data inconsistency may occur, which reduces system availability.
  • Redis is difficult to support online capacity expansion, which can be complicated when the cluster capacity reaches its maximum. To avoid this problem, o&M personnel must ensure sufficient space when the system goes online, which wastes a lot of resources.

Tell me why you use caching

The main purpose is to improve the throughput of the system and cope with high concurrency and high performance scenarios

Why Redis and not Map/Guava?

  • The Java implementation of Map is a local cache. If there are multiple instances (machines), each instance needs to save a cache, and the cache is not consistent
  • Redis implements distributed caching. If there are multiple instances (machines), each instance shares a cache and the cache is consistent.
  • Java Map implementation is not specialized for caching, JVM memory is too large to fail. Typically used in containers to store temporary data that ends up being destroyed by the JVM. The Map stores data structures, cache expiration mechanisms, and so on that need to be hand-written by the programmer.
  • Redis is a professional cache, can use dozens of G memory to do cache. Redis is generally used for caching. You can store cached data on hard disks and restore it after Redis restarts. Native provides rich data structures, cache expiration mechanisms, and so on.

Why is Redis so fast

1, completely memory based, most requests are pure memory operations, very fast. Data is stored in memory, similar to a HashMap, which has the advantage of O(1) time complexity for both lookup and operation.

2, the data structure is simple, the data operation is also simple, Redis data structure is specially designed;

3, by a single thread, to avoid the unnecessary context switch and competition condition, there is no switch caused by multiple processes or threads that consume CPU, don’t need to consider various problems of lock, there is no locking releases the lock operation, not because of possible deadlock caused by a performance overhead (the vast majority of bottleneck in CPU)

4. Use multi-channel I/O multiplexing model, non-blocking IO;

5, the use of the underlying model is different, the underlying implementation between them and the application protocol between the communication between the client is not the same, Redis directly built their own VM mechanism, using the RESP protocol

Let’s talk about resP

Redis is the Redis serialization protocol and the Redis client RESP protocol communicates with the Redis server. The Redis protocol makes a compromise between:

  • Simple implementation
  • Can be quickly interpreted by a computer
  • Simple enough to be manually parsed

In RESP, the type of some data depends on the first byte:

“+” stands for Simple Strings

+ indicates an error type

The colon is an integer

Based on this protocol, in fact, we can implement a Redis client, later have the opportunity to write for you.

What if the CPU becomes your Redis bottleneck, or you just don’t want the rest of the server core to be idle?

That’s easy, you just need more Redis. Redis is a key-value database, not a relational database. There are no constraints between data. As long as the client knows which key is on which Redis process. Redis-cluster can help you do better.

Talk about the basic data structure of Redis

  • String An integer, floating point, or String
  • The Set collection
  • Zset ordered set
  • Hash Hash
  • List the List

So what kind of data structure is an ordered set?

Jump table.

  • When the data is small, the sorted set is implemented by a Ziplist.
  • When there is a lot of data, the sorted set is implemented by a dict plus a skiplist. Simply put, dict is used to query data for scores, while Skiplist is used to query data (possibly range lookup) for scores.

Talk about the underlying data structure of Redis

sds

The string of Redis is not a string in C (that is, an array of characters ending with a null character ‘\0’). It is an abstract type called simple Dynamic String (SDS), which it has constructed itself and represents as the default string of Redis.

Len saves the length of the string saved by SDS

The buf[] array is used to hold each element of the string

Free j records the number of unused bytes in the BUF array

The list

Linked list is a commonly used data structure. There is no built-in implementation of such data structure in C language, so Redis builds the implementation of linked list by itself

The dictionary

A dictionary, also known as a symbol table, associative array, or map, is an abstract data structure for holding key-value pairs. Each key in the dictionary is unique and can be used to find or modify values. There is no implementation of this data structure built into C, so Redis still builds the dictionary itself.

Jump table

A skiplist is an ordered data structure that allows fast access to nodes by maintaining multiple Pointers to other nodes in each node. It has the following properties:

1. It is composed of many layers;

2. Each layer is an ordered linked list, arranged from top to bottom, and contains at least two linked list nodes, namely the head node in front and the nil node in the back.

3. The lowest linked list contains all elements;

4. If an element appears in a list at a certain level, then all lists below that level also appear (the elements at the upper level are subsets of the elements at the current level);

5. Each node in a linked list contains two Pointers, one to the next linked list node at the same level and the other to the same linked list node at the next level.

List of compression

Ziplist is developed by Redis to save memory. It is a sequential data structure composed of a series of consecutively encoded memory blocks. A ziplist can contain any number of entries, and each node can hold a byte array or an integer value.

Principle of compressed list: Compressed list does not compress data using a certain algorithm, but encodes data in a contiguous memory area according to certain rules to save memory.

Talk about cache avalanche

A cache avalanche process

  • A large number of redis clusters are faulty
  • Cache failure, but still a large number of requests to access the cache service Redis
  • After a large number of redis failures, a large number of requests were redirected to the mysql database
  • Mysql will soon be unable to support the call volume explosion, and even directly down
  • With a large number of application services dependent on mysql and Redis, this can quickly turn into an avalanche of server clusters, culminating in a site crash.

How to resolve cache avalanche

The first solution: the cache layer is designed to be highly available to prevent large cache failures. Services can be provided even if individual nodes, individual machines, or even machine rooms go down. For example, Redis Sentinel and Redis Cluster are both highly available.

The second solution: when storing data in batches to Redis, it is good to add a random value to the failure time of each Key, so as to ensure that the data will not fail in a large area at the same time. I believe that Redis can withstand the flow at this point.

So you talk about cache breakdown

Personally, I understand that breakdown is frontal just like I am a spear and you are a shield and I directly puncture your shield. That is, for example, several hot keys with millions of concurrent messages directly kill Redis, and then all the data is sent to the database, or these hot data of Redis fail. At the same time, all of the hot data concurrently, resulting in the final call to the database, this is a cache breakdown.

How to solve cache breakdown

Because distributed locks can control the last line of defense to the database, Redis is the cluster sentinel

Normally, the QPS of a system has a peak, and we use memory that can withstand this peak to do this cache

Tell me about cache penetration

Cache penetration is data that is not in the cache or the database, and the user keeps making requests, such as for data with id “-1” or data with an ID that is particularly large and does not exist. In this case, the user is likely to be the attacker, and the attack will cause the database to be overburdened.

How to resolve cache penetration

The first one is the same as the double lock above so if we’re going to get the database empty we’re going to set a null value for this key and we’re going to shorten it by 30 seconds so that the next time it comes in it doesn’t say “hit our database”

The other thing is that when we write code we have to check for some invalid request parameters and I’m sure we all do that.

The second approach uses the advanced use of bitMap, which we learned in part 1, to check the bitMap to see if it contains the key

Tell me how you solved the cache consistency problem

Causes and solutions of several caching inconsistencies

The first solution is to update the database and then delete the cache

What’s the problem with this plan? Suppose we update the data successfully and then fail to delete the cache which results in old data in the cache and inconsistencies in the cache

Then we have to make sure that the deletion is successful, we can delete more times at the end of the deletion, The second is to use a middleware canal to listen to mysql’s binlog and parse out the fields to be deleted from the binlong and continue with the first method above (the benefit of this method is also asynchronous and unrelated to the business code).

Plan 2 updates the database first and then the cache

This operation problem is more likely to feel like a failure to update the cache first, or a failure to update the database first, then a success to update the cache and then a success to roll back the transaction, also cache inconsistency.

Plan 3: Delete the cache and update the database

It might seem like it’s best if I delete the cache anyway and if the update fails it’s still going to be the latest data the next time I read it (everything looks great), but it’s not, Imagine that two concurrent updates a query after you update from time to time Delete the cache But this time the query found no cache Then it will check the database cache data to the database But at this point update and the update is successful, the last would be a long time the cache and the database is not consistent, so this scheme is not desirable

To sum up, I think the best way is to check and delete first, and then cooperate with the subscription binlong to do multiple deletion. I may not have much contact with you, so I hope you have a better way to propose

Talk about Redis’s elimination strategy

  • Noeviction: New write operations will bug when memory is insufficient to accommodate new write data.
  • Allkeys-lru: Removes the least recently used key from the key space when memory is insufficient to accommodate new writes.
  • Allkeys-random: Randomly removes a key from the key space when memory is insufficient to accommodate new writes.
  • Volatile – lRU: Removes the least recently used key from the expired key space when memory is insufficient to accommodate new writes.
  • Volatile -random: Randomly removes a key from the expired key space when memory is insufficient to accommodate new writes.
  • Volatile – TTL: When the memory is insufficient to accommodate new data, the key whose expiration time is earlier is removed from the key space.

In fact, I think volatile lRU is a good option because there’s no need to report any errors at all and then set up an alarm and if it’s not enough, do whatever you want

Let’s talk about Redis’ persistence strategy

Redis provides two ways to persist:

  • RDB: Snapshots can be taken of your data at specified intervals.
  • AOF: Records each write operation to the server. When the server is restarted, these commands are executed again to restore the original data.

Talk to RDB

RDB is the default persistence mode. For example, you can set it to persist once every write in the 90s, once every five writes in the 30s, and so on. Of course, if you want to disable RDB configuration, it is very easy to just write: save “” on the last line of save.

RDB persistence in Redis can be triggered in two ways: manually triggered and Redis timed. For RDB-style persistence, manual triggering can be used:

  • Save: blocks the current Redis server until persistence is complete and should be disabled online.

  • Bgsave: This trigger forks a child process that is responsible for persistence, so blocking only happens when the child process is forked.

Tell me about AOF

Appendonly Yes first initiates aOF. Appendfsync everysec has three modes:

  • Always: Synchronizes every write command to aOF immediately, slowly but safely
  • Everysec: Sync once per second is a compromise (default)
  • No: Redis does not process and hands it over to the OS, which is very fast, but also the most insecure

The entire AOF process can be roughly divided into two steps, one is the real-time write of the command (1s loss in the appendfsync Everysec configuration) and the second is the rewriting of the AOF file.

The process of adding increments to files is as follows: Command write = append to aof_buf = synchronize to aOF disk. So why write buF first before synchronizing to disk? Writing data to disks in real time results in high DISK I/OS, which affects overall performance.

How do I recover redis data

On startup, the AOF file is checked to see if it exists, and if not, the RDB is loaded. So why load AOF first? Because AOF saves more complete data, we know from the above analysis that AOF basically loses at most 1s of data.

Let’s talk about the performance of persistence

Some online experience

  • If the data in Redis is not particularly sensitive or the generated data can be overwritten in some other way, persistence can be turned off. If the data is lost, it can be replaced in another way.
  • Create your own policy to periodically check on Redis, and then manually trigger backups and rewrites of data;
  • The master and slave machines can be added, and one slave machine can be used for backup processing. The other machines can normally respond to the commands of the client.

Talk about the master-slave mode in Redis

Characteristics of master-slave architecture

  • The primary server is responsible for receiving write requests
  • The slave server is responsible for receiving read requests
  • Data from the slave server is copied to the master server. The data on the primary and secondary servers is consistent

Benefits of master-slave architecture

  • Read/write separation (primary writes, secondary reads)
  • High availability (if one slave server hangs up, other slave servers can continue to receive requests without affecting services)
  • Handle more concurrency (each slave server can receive read requests, so read QPS go up)

Talk about master-slave synchronization

One of the characteristics of the master-slave architecture is that data on the master and slave servers is consistent. Two cases of master/slave synchronization

Complete synchronization

  • The secondary server sends the PSYNC command to the primary server
  • The primary server receiving the PSYNC command executes the BGSAVE command to generate an RDB file in the background. A buffer is used to record all write commands executed from now on.
  • When the BGSAVE command of the primary server is finished, the generated RDB file is sent to the slave server, which receives and loads the RBD file. Update the state of your database to the state when you run the BGSAVE command with the master server.
  • The master server sends all buffer write commands to the slave server, and the slave server executes these write commands to achieve final data consistency.

Partial resynchronization

  • Replication offset of the primary and secondary servers Each time N bytes are propagated, the primary server adds N to its own replication offset
  • Each time the slave server receives N bytes from the master, it adds N to its own replication offset
  • By comparing the offsets of the master/slave replication, it is easy to know whether the data on the master/slave server is in a consistent state!

Tell me about redis’s high availability solution

Redis is generally deployed in master/slave mode (the slave application instance discussed here is mainly used for backup, and the master instance provides read and write). There are several schemes to implement HA in this mode:

  • Keepalived: Keepalived virtual IP provides unified access to master and slave. When the master has a problem, running scripts through Keepalived will upgrade from master to master. After the master is restored, the first synchronization will automatically change to master. The downside is that the introduction of Keepalived increases deployment complexity and in some cases leads to data loss
  • Zookeeper: Uses ZooKeeper to monitor the master and slave instances and maintain the latest and effective IP addresses. Applications obtain IP addresses from ZooKeeper and access Redis. This solution requires a lot of monitoring codes
  • Sentinel: Monitors master/slave instances through Sentinel and automatically recovers faults. This scheme has the following defects: Because the primary and secondary instance addresses (IP & PORT) are different, the application cannot know the new address after the primary and secondary switchover occurs. Therefore, Sentinel support has been added in Jedis2.2.2. Application by redis. Clients. Jedis. JedisSentinelPool. GetResource () to obtain jedis instance will update to the new primary instance address

What about the concept of a Redis hash slot? Consistency Hash and hash slot concepts and differences

In fact, this question is to ask which node redis stores different keys to in the cluster environment. The Redis cluster has 16384 built-in hash slots. When a key-value needs to be placed in the Redis cluster, Redis first uses crC16 algorithm to calculate a result for the key, and then calculates the remainder of the result for 16384, so that each key will correspond to a hash slot numbered between 0 and 16383. Redis will map hash slots to different nodes equally according to the large number of nodes.

Redis Cluster is a simple hash algorithm of CRC16 created by itself. It does not use consistent hash. Redis’ crC16 (Key) Mod 16384 does a good job. It’s not as flexible as consistent hash, but it’s easy to implement and easy to add and remove nodes.

Talk about distributed locking

This topic is basically one of the questions that distributed system development must ask you how distributed lock is implemented, and then you might take its set NX EX command and use lua script to make an atomic operation to implement distributed lock. This is fine, but if we’re in production, we might use some open source framework. You might as well say Redisson for distributed locking.

How does Redisson implement distributed locking

  • The first step is to try to lock, return the expiration time, if null, then obtain the lock (return lock success) (lua script will determine whether your key and value already hold the lock, if so, will give you retry times, then obtain the lock also failed)
  • If you failed to lock the first time, it will determine your maximum wait time. If you reach this point, the maximum wait time has exceeded.
  • The next step is to subscribe to the redis unlock event, and once someone releases the lock it will continue to notify all threads to compete for the lock (reducing CPU consumption).
  • Then there is an infinite cycle to acquire the lock. Each time this cycle is executed, it is necessary to judge whether the maximum waiting time has been exceeded before acquiring the lock. If it is exceeded, the lock will be released directly. A successful lock acquisition flag is returned only when the lock is acquired or the maximum wait time is exceeded. (Inside also need to be notified to continue the loop)
  • RedissonLock: RedissonLock: RedissonLock: RedissonLock: RedissonLock: RedissonLock: RedissonLock
  • Is reentrant, and consider the retry failure, can set the maximum lock wait time, in the implementation of some optimization, reduce invalid lock applications, improve resource utilization.

The end of the

So that’s redis, let’s review es

Daily for praise

Ok, everybody, that’s all for this article, you can see people here, they are real fans.

Creation is not easy, your support and recognition, is the biggest motivation for my creation, we will see in the next article

Wechat search “six pulse Excalibur program life” reply 888 I find a lot of information to you