If you want a PDF version of this tutorial, you can download it here.

1. Have you heard of Redis? What is it?

Redis is a database, but different from the traditional database is Redis database is stored in memory, so the read and write speed is very fast, so Redis is widely used in the cache direction.

In addition, Redis is often used for distributed locks, and Redis provides multiple data types to support different business scenarios. In addition, Redis supports transaction persistence, LUA scripting, LRU driver events, and a variety of clustering schemes.

2. Five data structures of Redis

Simple Dynamic String (SDS)

Instead of using the traditional strings of C directly, Redis built an abstract type called Simple Dynamic String (SDS) and used SDS as the default string representation of Redis.

In fact, SDS is equivalent to char * in C language, but it can store arbitrary binary data and cannot be identified by the character ‘\0’ like the string in C language, so it must have a length field.

define

struct sdshdr {
    // Records the number of bytes used in the BUF array
    // is equal to the length of the string saved by SDS
    int len;
    
    // Records the number of unused bytes in the BUF array
    int free;
    
    // An array of bytes to hold strings
    char buf[];
}
Copy the code

advantages

  • The complexity of getting string length is O(1).
  • Prevent buffer overflows.
  • Reduces the number of memory reallocations required to modify the string length.
  • Binary security.
  • Compatible with some C string functions.

It has a very general set/get operation, value can be a String or a number, generally do some complex counting function of the cache.

The list

Redis uses linked lists as the underlying implementation of list building when there is a list key that contains a large number of elements, or when the list contains long strings of elements.

Node underlying structure

typedef struct listNode {
    // The front node
    struct listNode *prev;
    // back node
    struct listNode *next;
    // Node value
    void *value;
} listNode;
Copy the code

List underlying structure

typedef struct list {
    // Table header node
    listNode *head;
    // Table end node
    listNode *tail;
    // The number of nodes in the linked list
    unsigned long len;
    // Node value copy function
    void *(*dup)(void *ptr);
    // The node value is the pass function
    void (*free) (void *ptr);
    // Node value comparison function
    int((* match.void *ptr, void *key);
} list;
Copy the code

features

  • Linked lists are widely used to implement various functions of Redis, such as list building, publish and subscribe, slow query, monitor, etc.
  • Each listNode is represented by a listNode structure, and each node has a pointer to the front node and the back node, so the list implementation of Redis is a double-endian list.
  • Each list is represented by a list structure with Pointers to the top node, the bottom node, and the length of the list.
  • Because both the front node of the list head and the rear node of the list tail point to NULL, Redis’s list implementation is acyclic.
  • Redis lists can be used to hold various types of values by setting different type-specific functions for lists.

The dictionary

At the bottom of the dictionary is a hash table, similar to a map, or key-value pair, in C++.

Hash table

typedef struct dictht {
    // Hash table array
    dictEntry **table;
    // Hash table size
    unsigned long size;
    // Hash table size mask, used to calculate index values
    // always equal to size-1
    unsigned long sizemark;
    // The number of existing nodes in the hash table
    unsigned long used;
} dichht;
Copy the code

The hash algorithm

Redis uses the MurmurHash algorithm when dictionaries are used as the underlying implementation of a database, or the underlying implementation of a hash key. The advantage of this algorithm is that even if the input keys are regular, the algorithm can still give a very good random distribution, and the algorithm is very fast.

Hashing conflict resolution

Redis hash table uses chained address method to resolve key conflicts. Each hash table node has a next pointer. Multiple hash table nodes can be joined by this one-way linked list, which solves the problem of key conflicts.

features

  1. Dictionaries are widely used to implement various Redis features, including databases and hash keys.
  2. Dictionaries in Redis are implemented using hash tables as the underlying structure, with each dictionary having two hash tables, one for normal use and one for rehash only.
  3. Redis uses the MurmurHash2 algorithm to compute the hash value of the key.
  4. Hash tables use the chained address method to resolve key collisions.

Jump table

Take a look at this:

As shown in the figure above, to find an element, we need to traverse from the beginning node until we find the corresponding node or the first node larger than the element we are looking for (not found). Time complexity is O(N).

This lookup is inefficient, but if we pull some of the nodes in the list up one level, as shown below, for example, one in every two nodes becomes two levels. Then the nodes in the second layer are only half of the nodes in the first layer, and the search efficiency will be improved.

The search process starts at the top of the initial node. When the first node is larger than the specified element, go back to the previous node and continue the search at the next level.

Let’s say we want to find 16:

  1. Start at the top of the primary node and go to node 7.
  2. The next node of 7 is 39, greater than 16, so we go back to 7
  3. If you start at 7 and continue at the next level, you’ll find 16.

There are no fewer nodes to traverse in this example than in a one-dimensional list, but the advantages of this approach come into play when there are more nodes and larger numbers to look for. Again, if we were looking for 39, we would only need to access two nodes (7, 39) to find it. That’s half the number of one-dimensional lists.

To avoid the O(N) time complexity of insert operations, skiplist does not have a strict 2:1 ratio for each layer, but rather a random number of layers for each element to be inserted.

The calculation process of random layers is as follows:

  1. Each node has a first layer
  2. So the probability that it has a second layer is p, and the probability that it has a third layer is P times p
  3. The maximum number of layers cannot be exceeded

zskiplistNode

typedef struct zskiplistNode {
    // Back up the pointer
    struct zskiplistNode *backward;
    // Point weight
    double score;
    // Member objects
    robj *obj;
    / / layer
    struct zskiplistLevel {
        // Forward pointer
        struct zskiplistNode *forward;
        / / span
        unsigned int span;
    } leval[];
} zskiplistNode;
Copy the code

In general, the more layers you have, the faster you can access other nodes.

zskipList

typedef struct zskiplist {
    // Table header and table tail nodes
    struct zskiplistNode *header, *tail;
    // The number of nodes in the table
    unsigned long length;
    // The number of layers of the node with the largest middle level in the table
    int leval;
} zskiplist;
Copy the code

features

  • Skip lists are one of the underlying implementations of ordered collections
  • Redis’ skiplist implementation consists of zskiplist and zskiplistNode structures, where Zskiplist is used to store skiplist information (such as head node, tail node, length), and zskiplistNode is used to represent skiplist nodes
  • The height of each skip list node is a random number between 1 and 32
  • In the same skip list, multiple nodes can contain the same score, but the member objects of each node must be unique.
  • The nodes in the skip list are sorted by the size of the points, and when the points are the same, the nodes are sorted by the size of the member objects.
  • Skip list is a kind of simple implementation, single-layer multi-pointer linked list, its search efficiency is very high, comparable to the optimized binary balance tree, and more than the balance tree implementation.

List of compression

Ziplist is one of the underlying implementations of list keys and hash keys. When a list key contains only a small number of list items, and each item is either a small integer value or a short string length, Redis uses compressed lists for the underlying implementation of the list key.

features

As you can see from its name, it is a list structure built to save memory.

3. What are the common data structures and usage scenarios of Redis?

String

The String data structure is a simple key-value type. A value can be a number as well as a String. General key-value caching applications; General count: number of micro blog, number of fans, etc.

Hash

Hashes are a few field and value mapping tables of string type. Hashes are particularly good for storing objects where you can modify the value of a single field directly. For example, we can use Hash data structures to store user information, product information, etc.

List

List is linked list. Redis List has many application scenarios and is also one of the most important data structures of Redis. For example, functions such as weibo following list, fan list and message list can be realized by using the List structure of Redis.

Redis List is implemented as a bidirectional linked list, that is, it can support reverse lookup and traversal, which is easier to operate, but brings some extra memory overhead.

In addition, you can use the lrange command, which is how many elements to read from a certain element. You can implement paging query based on the list. This is a great power.

Set

A set provides functions similar to a list, except that it can automatically assign weights.

Set is a good choice when you need to store a list without duplicating it, and sets provide an important interface for determining whether a member is in a set that lists don’t provide. The operation of intersection, union and difference set can be easily realized based on set.

For example, in the microblog application, all the followers of a user can be stored in a collection, and all the fans can be stored in a collection. Redis is very convenient to achieve such functions as common following, common fans, common preferences and so on. This is also the process of obtaining the intersection. The specific command is as follows: sinterstore key1 key2 key3 Stores the intersection in key1.

Sorted Set

Compared with set, sorted set adds a weight parameter score, so that the elements in the set can be sorted according to score.

For example, in the live broadcast system, the real-time ranking information includes the list of online users in the live broadcast room, the list of various gifts, and the bullet-screen messages (which can be understood as the list of messages according to the message dimension), which are suitable for storage by the SortedSet structure in Redis.

4. Isn’t MySQL enough? Why use a new database like Redis?

This is mainly because Redis has two characteristics: high performance and high concurrency.

  • High performance: If the user accesses some data in the database for the first time. This process is slow because it is being read from the hard disk. The data accessed by the user is stored in the cache so that the next time the data is accessed, it can be retrieved directly from the cache. Operating cache is directly operating memory, so it’s pretty fast. If the corresponding data in the database changes, the corresponding data in the cache can be synchronized to change!
  • High concurrency: The direct operation cache can handle more requests than the direct access to the database, so we can consider moving some of the data from the database to the cache, so that part of the user’s requests will go directly to the cache without going through the database.

5. Map in C++ is also a cache type data structure. Why not use Map instead of Redis?

Strictly speaking, caches are divided into local caches and distributed caches.

That c + + language, for example, we can use own container under the STL map to implement caching, but only to realize the local cache, it is the most important characteristics of light weight and fast, but the end of its life cycle with the destruction of the program, and in many instances, each instance need to keep a journal of the cache, the cache is not consistent.

Using something like Redis or Memcached is called distributed caching. In the case of multiple instances, each instance shares a copy of cached data and the cache is consistent. This is the advantage of Redis or Memcached, but it also has the disadvantage of being architecturally complex to keep the Redis or Memcached service highly available.

6. What are the benefits of using Redis?

1. Fast access because the data is stored in memory, similar to a HashMap in Java or a hash table in C++ (such as unordered_map/unordered_set), both of which have the advantage of O(1) time complexity for lookup and operation.

2, rich data types, support String, list, set, sorted set, hash five data structures

3. Support transactions. All operations in Redis are atomic. In other words, changes to data are either all performed or none performed

4, rich features: Redis can be used for cache, message, according to the key to set the expiration time, after the expiration will be automatically deleted.

7. What is the difference between Memcached and Redis?

1. Storage mode

  • Memecache stores all data in memory. The Memecache hangs after a power failure. The Memecache has no persistence function.
  • Redis is partially stored on hard disk to ensure data persistence.

2. Data support types

  • Memcache’s support for data types is relatively simple, with only one type, String
  • Redis has complex data types. Redis not only supports simple K/V type data, but also provides the storage of list, set, zset, hash and other data structures.

3. Different underlying models are used

  • The underlying implementation mode and application protocol of communication between them and the client are different.
  • Redis directly built the VM mechanism itself, because normal system calls to system functions would waste a certain amount of time moving and requesting.

4. Cluster mode: Memcached has no native cluster mode. You need to rely on the client to write data in fragments to the cluster. But Redis currently supports the cluster model natively.

5. Memcached is a multi-threaded, non-blocking IO reuse network model; Redis uses a single-threaded multiplex IO multiplexing model.

6. Different Value sizes: Redis can be up to 512MB; Memcached is only 1MB.

What’s Redis’s advantage over Memcached?

Redis is an alternative to Memcached, where all values are simple strings, and supports richer data types

Redis is much faster than Memcached

3. Redis can persist data

9. What are hot and cold data in caches?

Hot data is data that is accessed frequently, and cold data is data that is accessed rarely or never.

Note that the cache is valuable only for hot data. For cold data, most of the data may be squeezed out of memory before it is accessed again, taking up memory and having little value.

To make sense, the cache must be read at least twice before the data is updated. This is the most basic strategy, and if the cache fails before it can take effect, it won’t be of much value.

This is mainly based on an objective reason to consider. Since Redis is a memory-based operation, CPU is not the bottleneck of Redis. The bottleneck of Redis is most likely the size of the machine’s memory or network bandwidth. Since single-threading is easy to implement and the CPU is not a bottleneck, it makes sense to go with a single-threaded solution (there is a lot of trouble with multi-threading after all!).

11. Why is single-threaded Redis so fast?

There are three main reasons: 1. All operations of Redis are pure memory operations; 2. Redis uses single thread to effectively avoid frequent context switching; 3. Non-blocking I/O multiplexing mechanism is adopted.

If you look at the source code of Redis, you will find that Redis uses file event handler internally. This file event handler is single threaded, so Redis is called single threaded model. It uses IO multiplexing mechanism to monitor multiple sockets at the same time, and selects the corresponding event processor for processing according to the events on the socket.

The structure of the file event handler consists of four parts:

  • Multiple socket
  • IO multiplexing procedures
  • File event dispatcher
  • Event handler (connection reply handler, command request handler, command reply handler)

Use an I/O multiplexer to listen on multiple sockets at the same time and associate the socket with different event handlers based on the task the socket is currently performing. When the socket being listened to is ready to perform accept, read, write, close, etc., file events corresponding to the operation are generated, and the file event handler invokes the event handler associated with the socket to handle these events.

Multiple sockets may concurrently produce different operations, each corresponding to a different file event, but IO multiplexers listen for multiple sockets, queue events generated by the socket, and the event dispatcher fetches events from the queue one at a time. The event is passed to the corresponding event handler for processing.

In a nutshell, “THE I/O multiplexer listens for multiple sockets and sends those sockets that generated the event to the file event dispatcher.”

13. What are the two ways Redis sets expiration time?

Redis has a time expiration feature that allows you to set an expiration time for values stored in the Redis database.

As a cached database, this is very practical. For example, some token or login information, especially SMS verification code, are time-limited. According to the traditional database processing method, they usually judge the expiration date by themselves, which will undoubtedly seriously affect the project performance.

When we set a key, we can specify an expire time. Through the expire time, we can specify the lifetime of the key. We can mainly use periodic deletion and lazy deletion.

  • Delete periodically: By default, Redis randomly selects some keys with expiration time every 100ms, checks whether they are expired, and deletes them if they are expired. Notice this is a random selection. Why should it be random? If you think about it, Redis stores hundreds of thousands of keys and iterates through all the keys with expiration dates every 100ms, it will put a lot of load on the CPU!
  • Lazy deletion: Regular deletion may result in many expired keys not being deleted when the time is up. So there’s lazy deletion. It means that after a key value expires, it will not be deleted immediately, but will be checked until the next time it is used, and then can be deleted. The disadvantage of lazy deletion is obviously a waste of memory. Redis will delete the key unless your system checks it. This is called lazy deletion!

14. Are periodic and lazy guarantees for data deletion? If not, what will Redis do about it?

There is no guarantee that data will be deleted. Redsi has a Redis memory flushing mechanism to ensure that data will be deleted.

First, let’s look at the workflows for periodic and lazy deletion:

1. Delete periodically. By default, Redis checks every 100ms for expired keys and deletes expired keys. It should be noted that Redis does not check all keys once every 100ms, but randomly selects and checks them (if all keys are checked every 100ms, Redis will not be stuck). Therefore, if you only use the periodic deletion policy, many keys will not be deleted.

So lazy delete comes in handy. So when you get a key, Redis will check, is that key expired if it’s expired? If it is out of date it will be deleted.

3. There are no other problems with regular deletion + lazy deletion? No, if the key is not deleted regularly. Then you didn’t ask for the key immediately, so lazy deletion didn’t take effect either. In this way, Redis

4. Memory gets higher and higher. Then memory flushing should be used.

In redis.conf, there is one line of configuration :maxmemory-policy volatile- lRU

Volatile – lRU: Disables volatile- TTL from the least recently used data set (server.db[I].expires). Nullify volatile-random: Nullify allKees-lru from a set that has expired (server.db[I].expires) : Select the least recently used data from the dataset (server.db[I].dict) and discard allkeys-random: select the least recently used data from the dataset (server.db[I].dict) Ps: If the EXPIRE key is not set, prerequisites are not met. Volatile – LRU, volatile- Random and volatile- TTL policies behave essentially the same as Noeviction policies.

15. How does Redis handle a large number of requests?

1. Redis is a single-threaded program, which means it can only handle one client request at a time. 2, Redis is through IO multiplexing (select, epoll, kqueue, depending on the platform, take different implementation) to handle multiple client requests.

Cache avalanche, cache penetration, cache preheating, cache update, cache breakdown, cache degradation all done!

Cache avalanche

A cache avalanche is a massive failure of the cache at the same time, so that subsequent requests fall on the database, causing the database to collapse under the weight of requests in a short period of time.

Look not to understand? Then I’ll speak English.

We can simply understand as: because the old cache is invalid, the new cache is not in the period (for example: Adopt the same when we set the cache expiration time, at the same time a large area of the cache expiration), all should have access to the cache request to query the database, and cause great pressure to the database CPU and memory, serious can cause database downtime, thus forming a series of chain reaction, cause the whole system crash.

The solution

  • Pre-event: try to ensure the high availability of the entire Redis cluster, as soon as possible to make up for machine downtime, choose the appropriate memory elimination strategy.
  • In: Local EhCache + Hystrix stream limiting & degrade, prevent MySQL from crashing, lock or queue to control the number of threads reading database write cache. For example, only one thread is allowed to query data and write to the cache for a key, while the other threads wait.
  • After the fact: Data saved by Redis persistence mechanism is restored to the cache as soon as possible

The cache to penetrate

Generally, hackers deliberately request data that does not exist in the cache, causing all requests to fall on the database, causing the database to collapse under a large number of requests in a short period of time.

You don’t understand that either? Well, let me rephrase it.

Cache penetration refers to the query of a certain data does not exist, because the cache does not hit, then the query database cannot query the result, so it will not be written to the cache, this will cause each query to the database, causing cache penetration.

The solution

1, Bloom filter

The most common solution is to hash all possible data into a bitmap that is large enough that a non-existent data will be intercepted by the bitmap, thus avoiding query pressure on the underlying storage system.

All possible query parameters are stored in hash format and verified at the control layer first. If they do not match, they are discarded, thus avoiding query pressure on the underlying storage system.

Here’s a little bit about the Bloom filter.

Bloom filter is the introduction of K (k>1) and K (k>1) mutually independent hash functions, to ensure that in a given space, misjudgment rate, the completion of element weight determination process. Its advantage is that its space efficiency and query time are far more than the general algorithm, but its disadvantage is that it has certain error recognition rate and deletion difficulty.

The core idea of this algorithm is to use multiple different Hash functions to resolve “conflicts”. Hash has a collision problem, where two urls with the same Hash may have the same value. In order to reduce conflicts, we can introduce more hashes, and if we conclude that an element is not in the set by one of the hashes, then that element is definitely not in the set. The element is in the collection only if all the Hash functions tell us that it is. This is the basic idea behind bloom filters, which are used to determine the existence of an element in a large set of data.

2. Cache empty objects

When the storage layer is not matched, even the returned empty object will be cached, and an expiration time will be set. After accessing this data, it will be obtained from the cache, protecting the back-end data source. If a query returns null data (whether the data does not exist or there is a system failure), we still cache the null result, but its expiration time is short, no more than five minutes.

But there are two problems with this approach:

1. If null values can be cached, this means that the cache needs more space to store more keys, because there may be a lot of null keys;

2. Even if the expiration time is set for null values, data at the cache layer and storage layer will be inconsistent for a period of time, which will affect services that need to maintain consistency.

A simple comparison can be made between these two summarization methods in terms of application scenarios and maintenance costs:

Application scenarios: Cache empty objects are applicable to 1. Data matching is not high. 2. However, Bloom filter is applicable to 1. Data hit is not high; 2. Data is relatively fixed, that is, real-time performance is low

Maintenance cost: the method of caching empty objects is suitable for 1. The code is easy to maintain; 2. More cache space is needed; 3. Bloom filter is suitable for 1, code maintenance is more complex, 2, less cache space

Cache warming

Cache preheating refers to loading relevant cache data directly to the cache system after the system comes online. This avoids the problem of querying the database and then caching the data when the user requests it. The user will directly query the cached data that has been preheated!

1, directly write a cache refresh page, manual operation when online; 2. The amount of data is not large, so it can be loaded automatically when the project is started; 3. Periodically refresh the cache;

The cache update

In addition to the cache invalidation policies provided by the cache server (Redis has 6 policies to choose from by default), we can also carry out customized cache obsoletion according to specific business requirements. There are two common strategies: (1) Periodically clearing expired cache; (2) When a user requests to come over, and then determine whether the cache used in this request is expired, expired to the underlying system to get new data and update the cache. Both have their own advantages and disadvantages, the first disadvantage is to maintain a large number of cache keys is more troublesome, the second disadvantage is that each user request to come over to determine cache invalidation, logic is relatively complex! You can weigh which solution to use according to your own application scenario.

Cache breakdown

Cache breakdown refers to a key is very hot, in the continuous carrying of large concurrency, large concurrency focused on this point to access, when the key in the moment of failure, continuous large concurrency will Pierce the cache, directly request the database, just like in a barrier cut a hole.

For example, in common e-commerce projects, some goods become “popular”, and the cache of some main goods can be directly set to never expire. Even if some goods fermented into explosive, but also directly set to never expire. Mutex key mutex is basically unusable, there’s a word for simplicity.

Cache the drop

When traffic surges, service problems (such as slow or unresponsive response times) occur, or non-core services affect the performance of the core process, you still need to ensure that the service is still available, even at the expense of the service. The system can automatically degrade according to some key data, or manually degrade by configuring switches. The ultimate goal of a downgrade is to ensure that the core service is available, even if it is lossy. And some services can’t be downgraded (add to cart, checkout). (1) General: For example, if some services time out due to network jitter or service going online, they can be automatically degraded. (2) Warning: If the success rate of some services fluctuates within a period of time (such as between 95 and 100%), it can be automatically degraded or manually degraded, and an alarm can be sent; (3) error: for example, the availability rate is lower than 90%, or the database connection pool was hit, or the number of visits suddenly jumped to the maximum threshold that the system can withstand, at this time can be automatically degraded or manually degraded according to the situation; (4) Serious error: for example, due to special reasons, the data is wrong, and emergency manual downgrade is needed at this time.

The purpose of service degradation is to prevent the Redis service failure from causing an avalanche of database problems. Therefore, for unimportant cached data, service degradation strategy can be adopted. For example, a common practice is that Redis does not query the database, but directly returns the default value to the user.

17, if MySQL has 10 million data, use Redis as the intermediate cache, take 100,000 of them, how to ensure that the data in Redis is hot data?

You can use Redis’s data obsolescence strategy, which is applied when the Redis memory dataset size increases to a certain size. Specifically, there are six main memory flushing strategies:

  • Voltile-lru: Discard least recently used data from a dataset that has been set to expire (server.db[I].expires)

  • Volatile – TTL: Selects expired data from a set (server.db[I].expires) to be discarded

  • Volatile -random: Selects any data from a set with an expiration date (server.db[I].expires) to be discarded

  • Allkeys-lru: Culls the least recently used data from the dataset (server.db[I].dict)

  • Allkeys-random: Random selection of data from a dataset (server.db[I].dict)

  • No-enviction: Data expulsion is prohibited

18. What about Redis persistence?

Redis is an in-memory database that supports persistence. Through the persistence mechanism, data in memory is synchronized to disk files to ensure data persistence. When Redis is restarted, data can be recovered by reloading the hard disk files into memory.

Most of the time we need persistent data, which is to write data from memory to hard disk, mostly for later reuse (such as restarting the machine, recovering data after a machine failure), or to back up data to a remote location in case of a system failure.

Implementation: create a separate fork() a child process, copy the current parent process database data into the child process memory, and then write the child process into a temporary file, the persistent process ends, and then use this temporary file to replace the last snapshot file, and then the child process exits, memory release.

There are two types of persistence mechanisms

There are two types of persistence mechanisms

Snapshot persistence (RDB persistence)

Redis can create snapshots to obtain a point-in-time copy of data stored in memory. After Redis has created a snapshot, you can back it up, copy it to another server to create a copy of the server with the same data (the Redis master-slave structure is used to improve Redis performance), or leave the snapshot in place for use when restarting the server.

Snapshot persistence is the default Redis persistence mode, which is configured in the Redis. Conf configuration file by default:

Save 900 1 # After 900 seconds (15 minutes), Redis automatically triggers the BGSAVE command to create a snapshot if at least one key has changed. After 300 seconds (5 minutes), Redis automatically triggers the BGSAVE command to create a snapshot if at least 10 keys have changed. After 60 seconds (1 minute), Redis automatically triggers the BGSAVE command to create a snapshot if at least 10,000 keys have changed.Copy the code

AOF (append-only file) persists

Compared with snapshot persistence, AOF persistence is more real-time, so it has become the mainstream persistence scheme. Redis is not enabled with AOF (Append only file) persistence by default. This can be enabled with the appendonly parameter: appendonly yes

After AOF persistence is enabled, Redis writes every command that changes data in Redis to an AOF file on the hard disk. The AOF file is saved in the same location as the RDB file, which is set with the dir parameter. The default file name is appendone.aof.

There are three different AOF persistence methods in Redis configuration files:

Appendfsync always # write to the AOF file every time a data change occurs, which seriously slows down Redis' speed. Appendfsync no # Lets the operating system decide when to synchronize multiple write commands explicitly to the diskCopy the code

To accommodate data and write performance, users can consider the appendfSync Everysec option, which lets Redis synchronize AOF files once per second with almost no performance impact. And even in the event of a system crash, users will only lose data generated within a second. When the hard drive is busy performing write operations, Redis also gracefully slows itself down to accommodate the maximum write speed of the hard drive.

Redis 4.0 optimizes persistence

Redis 4.0 starts to support mixed persistence of RDB and AOF (disabled by default and enabled by aof-use-rdb-preamble).

If mixed persistence is turned on, the CONTENTS of the RDB are written directly to the beginning of the AOF file when AOF is overwritten. The advantage of this is that you can combine the advantages of RDB and AOF to load quickly without losing too much data. Of course, there are some disadvantages, the RDB part of AOF is compressed format is no longer AOF format, poor readability.

19, AOF rewrite understand? Can you explain it briefly?

AOF overwriting can create a new AOF file that holds the same database state as the original AOF file, but is smaller.

AOF rewrite is an ambiguous name. It is implemented by reading key/value pairs in a database without requiring programs to read, analyze, or write to existing AOF files.

When the BGREWRITEAOF command is executed, the Redis server maintains an AOF rewrite buffer that records all write commands executed by the server while the child process creates a new AOF file. When the child process has finished creating a new AOF file, the server appends all the contents of the rewrite buffer to the end of the new AOF file, making the database state held by the old and new AOF files consistent. Finally, the server completes the AOF file rewrite by replacing the old AOF file with the new one.

20. Whether to use Redis cluster? What is the principle of cluster

Redis Sentinel focuses on high availability and automatically promotes the slave to Master in the event of a master outage.

Sentinel can listen to servers in a cluster and automatically elect a new master server from the primary server when it goes offline.

Redis Cluster focuses on scalability. When a single Redis memory is insufficient, a Cluster is used for shard storage.

21, How to solve the problem of concurrent competing keys in Redis

The problem with the so-called Redis concurrent contention for keys is that multiple systems operate on the same Key at the same time, but the order in which the Key is executed is not the order we expect it to be executed, resulting in different results!

One solution is recommended: distributed locks (distributed locks can be implemented by Both ZooKeeper and Redis). (Do not use distributed locks if Redis does not have concurrent competing Key problems, as this can affect performance)

Distributed lock based on ZooKeeper temporary ordered node. The general idea is that when each client locks a method, a unique instantaneous ordered node is generated in the directory of the specified node corresponding to the method on ZooKeeper. The way to determine whether to obtain the lock is very simple, just need to determine the lowest sequence number of nodes. When the lock is released, the instantaneous node is simply removed. At the same time, it can avoid deadlock problems caused by locks that cannot be released due to service downtime. When the business process is complete, delete the corresponding child node to release the lock.

In practice, of course, reliability is given priority to. That’s why Zookeeper came first.

22. How to ensure data consistency between cache and database in double write

First of all, as long as you use cache, it may involve cache and database double storage double write, as long as you are double write, there will be data consistency problem, so how do you solve the consistency problem?

In general, if your system is not strict with cache + database must be consistent, cache can be a little bit with the occasional inconsistent database, it is best not to do this project, it is best to read and write request serialization, string into an in-memory queue, so that you can guarantee won’t appear inconsistent.

Serialization causes the throughput of the system to be drastically reduced, requiring several times more machines than normal to support a single request on the line.

The most classic Cache + database read/write Pattern is the Cache Aside Pattern.

  • If the cache does not exist, read the database, and then fetch the data and put it in the cache, and return the response.
  • When updating, delete the cache first, then update the database, so that when reading, it will find that there is no data in the cache and directly fetch data from the database. (To delete, the asshole editor may do some optimizations behind your back, removing data from the cache completely.)

This is a very popular interview question for Internet companies because caching is used a lot by Internet companies

In a high-concurrency business scenario, the performance bottleneck of a database is often the large number of concurrent users. Therefore, Redis is generally used to do a buffer operation, so that the request first access Redis, rather than directly access MySQL and other databases, so as to reduce the delayed response of network requests.

23. Why are the data inconsistent?

The problem is that the cache and data are executed interactively during concurrent read and write access.

I. In the case of single library

Concurrent read and write requests occur at the same time, such as two requests A(write) and two requests B (read)

  1. When A requests to send A write operation to the server, the cache is discarded in the first step. Then, due to various reasons, the server does not perform subsequent services (for example, it takes 1s for A large number of service operations to invoke other services).

  2. B requests to send a read operation to cache, which is null because cache is obsolete

  3. B continues to read DB data, reads a dirty data, and writes it to the cache

  4. A request is finally executed completely before writing data to DB

Conclusion: Write operation data was not synchronized because it was put into DB at last. The cache keeps dirty data

Dirty data refers to the data in the source system that is not in a given scope or meaningless to the actual business, or the data format is illegal, and there is non-standard coding and ambiguous business logic in the source system.

2. In the case of master/slave synchronization and read/write separation, dirty data is generated when reading from the slave library

  1. A requests to send A write operation to the server. The first step will flush the cache

  2. A requests to write to the primary database and writes the latest data.

  3. B requests to send a read operation to cache, which is null because cache is obsolete

  4. B requests to continue reading DB data from the secondary database, but the master/slave synchronization is not successful. Read the dirty data and then import the dirty data to the cache

  5. Finally, the database synchronization is complete

Conclusion: In this case, the timing of the operation of request A and request B is not wrong, because of the delay of the master/slave synchronization (suppose 1s), which results in the data inconsistency caused by the read request reading dirty data from the library

Root cause:

In a single library, logic processing takes 1s, and old data may be read into the cache

Master-slave + read-write separation, in 1s master-slave synchronization delay, the old data to the slave library is cached

24. Do you know the common data optimization schemes?

First, cache double elimination method

  1. Make caching obsolete
  2. Rewrite database
  3. A knockout message is sent to the message bus ESB, which returns immediately. The processing time for write requests barely increased, and this method eliminated caching twice. Therefore, it is called “cache double elimination method”, and downstream of the message bus, there is an asynchronous elimination cache consumer, after receiving the elimination message, the cache will be eliminated after 1 second, so that even if there is dirty data in the cache, it can be eliminated.

Asynchronously eliminate caching

The above steps are performed in the business line. Add an offline reading binlog asynchronous elimination cache module to read binlog total data, and then carry out asynchronous elimination.

Here’s a quick idea

1. The train of thought:

MySQL binlog incremental publish subscription consumption + message queue + incremental data updates to Redis

1) Read requests go Redis: hot data is basically in Redis

MySQL > add, delete, and alter MySQL

3) Update Redis data: MySQ data operation binlog, to update to Redis

2. Redis update

1) Data operation is mainly divided into two parts:

  • One is full (write all data to Redis at once)
  • One is incremental (live updates)

Update, insert, and delate data from mysql. In this way, once new writes, updates, deletes, etc. are performed in MySQL, messages related to Binlog can be pushed to Redis, and Redis can update Redis based on the records in binlog, so there is no need to operate the cache content from the business line.