Redis principle analysis and application

Introducing skill points

Redis special data structures: HyperLogLog, Geo, Pub/Sub, Redis Module, BloomFilter, RedisSearch, Redis-ML

What is caching

Caches are classified into local cache (JVM), distributed cache, and multi-level cache (Local cache stores only the most frequently accessed hotspot data, and other hotspot data is stored in distributed cache.)

Redis review

Redis is a single-threaded NIO model (Epoll), which is used to avoid thread switching and improve efficiency (minimize long operations and avoid REDis blocking). The maximum number of clients in redis

Redis has 16 databases by default and uses the 0 database by default. You can use select to switch databases

Common commands: Select, flushDB, flushALl, type, TTL (remaining time), move, EXPIRE, exist, ping (pong) test connection, APPend, INCR, DECR, incrBY 10, setex (set expiration time), setnx, MSET (batch set value), msetNx (atomic operation), getSet (cas)

Redis performance issues

Redis (C language) is based on memory operation, so CPU is not the performance bottleneck of Redis, the performance bottleneck of Redis is memory and network bandwidth, official data per second 10W +QPS.

Redis is fast because of memory manipulation; Single thread; NIO. Redis has its own VM mechanism. Multi-core computers can open multiple instances of Redis.

Data structures and applications

Redis has six basic data structures (+ Streams), extended APIS for Bitmaps (String), Geo (Zset) and Hyperloglogs (String).

OBJECT ENCODING key You can view the ENCODING format as follows:

RedisObject:

1. String (set, Mset, INCR, INCRby,setnxSetex),

Data structure: SDS(simple dynamic string), the core of the string compared to C language len attribute, alloc attribute, flags attribute.

O (1) Check length Avoids buffer overflows thanks to len and can save binary data in addition to textCopy the code

According to the source code, we can see how many types of SDS alloc, redis according to the length of the string to select a SDS data structure that can accommodate him, occupy the smallest space to store, in order to achieve a purpose of saving space

There are three different encoding formats: int, redisObject, PTR, EMbSTR, OBJECT, SDS, raw, and other memory Spaces. The two are separated by the length of the stored string, if large, raw.

The selection of coding flow chart is as follows:

When converting to int, we also need to determine if we can convert the string to long (depending on the range of values of long).

Common usage scenarios :(cache + counter + shared session)

1) INCR: Realizing the reading of articles and the number of likes on Weibo, etc

2) INCby: implement global serial number (Snowflake algorithm can also implement global serial number)

3) Setnx: distributed lock

Use setnx to fight for locks, and use expire to prevent locks from being released.Such as exceptions or service outages). The set instructions haveComplex arguments can combine setnx and EXPIRE into a single instructionTo prevent atomicity between setNx and EXPIRE from being broken;A new problem arises when one thread may release a lock placed by another threadThe solution is as follows: set value not randomly, set it to uUID, and check it every time you release it. Since judging values and deleting keys are not atomic operations, lua scripts are required.However, in the case of timeout, the lock will be released before thread 1 completes execution, and the lock does not play a strict role. Open a thread to periodically check whether the lock is still held, introduce lock renewal mechanism, use mature API, redissionLock – “basic principle: use a large number ofLuna script, so as to ensure atomicity, reduce network overhead, but also can replace transactions.

However, the above process will still have problems. Under the Redis cluster architecture, if the master node suddenly goes down, there will be lock failure problem, which can be solved by macro lock

/ / three Redis cluster RLock lock1. = redissionInstance1 getLock (" lock1 "); RLock lock2 = redissionInstance2.getLock("lock2"); RLock lock3 = redissionInstance3.getLock("lock3"); RedissionRedLock lock = new RedissionLock(lock1, lock2, lock2); lock.lock(); // do something.... lock.unlock();Copy the code

Zookeeper and Redis implement distributed locking.

The core redis is AP, and ZK is CP. Zk ensures that at least part of the slave nodes are synchronized and then locked successfully, and the performance of the lock is missing.

Bitmaps are not special data structures. The contents of bitmaps are just ordinary strings called byte arrays. We can get and set the contents of the entire bitmap directly using the ordinary GET /set method, or we can treat byte arrays as “bit arrays” using bitmap operations such as getbit/setbit.

2. Hash(field,value)

Hset, hlen, Hincby, hdel, hGEtall

There are two encoding formats for hash structures: Ziplist (memory saving core) and Hashtable

Ziplist: A ZIplist is a sequential storage structure composed of a series of specially encoded contiguous memory blocks. Similar to an array, a ZIplist is stored contiguously in memory. However, unlike an array, each element of a ziplist can occupy different memory sizes in order to save memory. Ziplist is called node Entry.) Ziplist is designed to save memory space. The length of each ziplist node can be different, and it is impossible for us to sizeof(entry) directly for nodes of different lengths. So how does it access the next node? Ziplist records the necessary offset information in each node so that it can jump to the previous or the next node.

The node’s previous_entry_length property records the length of the previous node in the compressed list in bytes, and the encoding property records the type and length of the data stored in the node’s Content property. The content property of a node holds the value of the node, which can be a byte array or an integer. The type and length of the value are determined by the encoding property of the node.

ZiplistInsert Time to insert an entry into ziplist Average :O(n), worst :O(n²)

ZiplistDelete Delete an entry from the siplist average :O(n), worst :O(n²)

Why is the worst time complexity for both insert and delete nodes O(n²)? This is due to ziplist’s “chained update”, which in the worst case requires n space reallocations to ziplist, and the worst time complexity of each space reallocation is O(n).

Hashtable (dictionary)

Selection of coding process:

3. List (lpush, Rpush, LPOP, rPOP, lindex, rrange key start stop, Blpop key timeout)

OBJECT ENCODING Key: QuickList (=ziplist+linkList)

Ziplist is a linked list with compact memory

Data structure: C language does not have a linked list, so Redis implements a bidirectional linked list, each node has a pointer before and after, with a length counter O (1) to obtain the length of the list

Common usage scenarios:

Implement stack, queue, blocking queue

Microblog message flow and official account message flow

The message queue

Rpush produces messages, lPOP consumes messages. When there are no LPOP messages, sleep for a while and try again. You can also use BLPOP, which blocks until the message arrives when there is no message, or if you need to produce and consume more than one time, you can use the PUB/SUB topic subscriber pattern, which enables a 1:N message queue. If redis is needed for delay queue, sortedSet is used, time stamp is used as score, message content is used as key to call Zadd to produce messages, and consumers use ZrangeByScore command to obtain data polling N seconds ago for processing. 4. Set (saad, srem, sismember, smembers get all elements, scard get the number of elements, sinter, sunion, sdiff, srandmember randomly select one element)

Data structure: is an unordered arrangement of strings, and elements cannot be repeated (automatic de-duplication). There are two encoding formats: Hashtable and intset

Selection of coding process:

Common usage scenarios :(parallel lookup set, cross-machine parallel lookup set)

WeChat draw

Focus on model (common focus) product screening

5. Zset (zadd, zScore)

Data structure: Ziplist, Skiplist (core space for time)

X +y=x+n/x, where n is the total number of nodes. When x = n/x, x+y is at least 2sqrt(n). Similarly, k layer can be established, and the minimum value is KN ^(1/k). Every time I insert a node, I have a probability of 1/2 plus one order.

Common usage scenarios:

list

To quote knowledge: Why not use red black trees?

1. Red-black tree implementation is complex

2. When the range search is truncated in the middle, the red-black tree needs to be backtracked, and the hop table only needs to be checked twice

Red black tree, all inserted nodes are red

Red-black tree insertion logic:

When a continuous red node appears, the parent node and the uncle node are red, and the parent node and the uncle node are transformed into black, and the grandfather node becomes red

A left-handed parent node occurs when a continuous red node is present, the uncle node is black, and the current node is a right subtree

Left-handed indication:

When there is a continuous red node, the uncle node is black, and the current node is the left subtree, the dextral node is rotated by the grandfather node, and the father and grandfather change color first, and then the dextral node

Geo (low-level Zset implementation)

geoaddChina: City Longitude dimension Beijinggetpost getdist

Georadius took a certain latitude and longitude as the center to find elements within a certain radius

Hyperloglog (Advantages: low memory footprint 2^64 elements only 12KB; Disadvantages: Error rate)

Bit-based computing

Cardinality: number of non-repeating elements = “website UV (number of visits), one person only counts once. The traditional way of using set de-duplication is to store a large number of user ids, but our purpose is to de-duplication

Pfmerge merger

Redis Publishing subscription (dictionary + linked list)

Examples of Redis usage scenarios

Redis can do simple message queues if there is only one set of consumers.
Redis bitmap for check-in
Redis bitmap BloomFilter BloomFilter, the spam filtering function of the mailbox system is also commonly used in BloomFilter, because this filter is used

M is the size of the bit array, n is the amount of data, p is the error rate and k is the number of hash functionsCopy the code

Bloem filters can’t be removed, and over time, the filter gets more and more crowded, until one day you realize that the misjudgment rate is so high that you have to rebuild. Poor query performance is due to the fact that the Bloom filter needs to use multiple hash functions to detect multiple different loci in the bitmap, which are widely spread across memory, resulting in low CPU cache row hit ratio.

<dependency> <groupId>com.google.guava</groupId> <artifactId> <version>23.0</version> </dependency> / / a good: Data type (generally in the Funnels tool class) Private static BloomFilter<Integer> bf = bloomfiler.create (funnel.integerfunnel (), total); Bf. MightContain (I) / / expectedInsertions: Number of values to be inserted // FPP error rate (default value: 0.03) // Strategy Hash algorithm static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions, double fpp, Strategy strategy) { ...... }Copy the code

HyperLogLog can be de-weighted
Redisgeohash implements nearby people
Redis zset does asynchronous queues
Pub /sub does simple message queues
Pipelines can reduce frequent request responses by executing a set of instructions in batches and returning all the results at once
Lua script atomicity to perform a range of functions
If there are 100 million keys in Redis, 10W of them start with a fixed, known prefix, how do you find them all?

The keys command causes the thread to block for a period of time and the online service to pause until the command is executed. In this case, scan command can be used. Scan command can extract the key list of the specified mode without blocking, but there will be a certain probability of repetition. It is ok to perform deduplication on the client, but the overall time will be longer than that of direct keys command.

Redis transactions

Redis transactions have no isolation or atomicity, and command enqueueing ensures orderliness.

Open the transaction
The command team
Perform transactions

Persistence of REis

Mixed persistence: #aof-use-rdb-preamble yes

Redis expiration policy (set expiration time + out of memory)

Q: How to maintain consistency between database and cache if it is read-only (preheated), there is no consistency problem, just need to place an order to join MQ (order service generation, inventory service consumption) after redis counter -1 can be, the exception in the compensation back

If the database is also being updated, such as the inventory service adding new items, how does Redis keep consistent with the database?

Select final consistency and set an expiration time for the cache. Redis uses a policy of periodic deletion + lazy deletion.

Redis uses timed deletes (timed deletes require a timer to waste CPU resources. By default, Redis checks every 100ms for expired keys and deletes expired keys. It should be noted that Redis does not check all keys once every 100ms, but randomly selects and checks them (if all keys are checked every 100ms, Redis will not be stuck). Therefore, many keys will not be deleted at the time if only the periodic deletion policy is adopted.

A lazy delete policy (that is, when you obtain a key, Redis checks whether the key is expired if it is set to expire). If it expires, it will be deleted.

Memory flushing mechanism (allkeys-LRU: Removes the least recently used key from the key space when memory is insufficient to accommodate new writes. Recommended use, currently in the project. In the Redis configuration file, you can set maxMemory, the maximum amount of memory that can be used, and then perform a memory flushing mechanism. Random or error (all three operations are set to expire) =6

Reis master-slave architecture

To solve the single point problem and achieve high availability, the maximum memory usage of a single Redis should not exceed 20G

Data is synchronized between the primary and secondary nodes

1. Full replication

2. Partial copy

Sentinel Architecture (high availability)

Somewhat, down and does not require manual reconfiguration, sentry automatically elects, master/slave switch. Disadvantages: Election transient, not good online capacity expansion once the cluster capacity online, online capacity expansion is very troublesome, sentry implementation is very troublesome

Redis Cluster Cluster architecture(Extensible)

How to solve the election transient problem (reduce the probability of requests to the down server, there is still transient problem in nature)
How can master/slave communication (using gossip weak consistency protocol, ping, pong, fail, meet) be delayed

The gossip protocol is different from the broadcast protocol. In a network with a limited number of nodes, each node communicates randomly with some nodes. In this way, the convergence speed of nodes is not fast, but the load is smaller.

There are five types of protocol messages: meet messages, ping messages, pong messages, fail messages, and publish messages

Election mechanism (according to the time, through the formula dley = 5000+random (0-500) +slaveRank*10000) the smaller the rank is, the data is expressed to be new

Redis other strategies related

1. Do not persist AOF on the master node. Persist AOF on the slave node and synchronize once for 1s

2. The primary and secondary nodes must reside on the same LAN.

3. Do not use graph structure between Master and slave. Use linked list structure Master< – Slave1< – Slave2< – Slave3. If the Master fails, you can use Slave1 as Master immediately.

Redis performance optimization

1. Multi-level cache architecture (the core keeps the number of mysql requests as small as possible)

2. Cache penetration (check data that does not exist in database)

The solution

1) Set to null and cache (too many empty objects affect performance because the database is occupied, there are cache database inconsistency)

2) Bloom filter

3) Mutex: only one thread is allowed to operate the database at a time

4) Parameter verification

Verification is added at the interface layer, such as user authentication verification, parameter verification, and Return code for invalid parameters, such as basic verification for ID, and direct interception for ID <=0.

5) Nginx has a configuration item that enables O&M to mask IP addresses whose access times per second exceed the threshold.

3. Cache avalanche (massive cache failures and Redis outages at the same time)

1) Set the random expiration time

SetRedis (Key, value, time + math.random () * 10000);Copy the code

2) Set to never expire and update the cache

Electricity home page

4. Cache breakdown (hot key, cache does not exist but database does)

Large concurrent accesses are concentrated on this one point, and when the Key fails at the moment, continuous large concurrent requests Pierce the cache and request the database directly, like a hole in an undamaged bucket.

1) Hotspot data never expires and is fragmented

2) Distributed locks that allow only one thread to operate the database at a time

5. Cache KV design

1) Ensure small value (value less than 10KB, number of elements less than 5000, split by bigkey)

2) Use scan, zscan and HSCAN instead of del and keys

6. Connection pool optimization

Maxtotal = (expected QPS/ one link QPS) *2

maxIde = 1/2 maxtotal

MinIde (Minimum free links -> Connection pool warm up)

redis&memchace

Cache database double-write consistency problem

Delete the cache before updating the database.

A cache miss is added for atomicity problems, but the data is guaranteed to be available. However, a new problem arose, and the update of the database was slow, which was reflected in the time required for the master/slave synchronization of the database (the master node had completed the write, but the slave node had not completed the replication).

Since the cache is deleted by thread A, if another thread B requests the data in A short period of time, since the cache is not available, thread B queries the data in the database, and then puts the queried data into the cache. At this point, if A does not update the data before B reads it, it means that B gets old values (read from the library, not yet synchronized), and then B puts old data into the cache, and the data in the cache will be wrong until the next update, resulting in dirty reads.

Solution: Use the tool to subscribe to the slave database relaylog. After the write operation is completed from the slave database, delete the data that may be written to the cache during this period.

This is essentially a delayed double delete, except that the sleep time is set by copying from the library, otherwise the sleep time needs to be evaluated based on the business.

3. Update the database before deleting the cache.

Because thread A is updating data, thread B is fetching data from the cache. A After the update, delete the cache, and then insert (update cache). This way, even if one thread gets the wrong data before updating the cache, resulting in dirty reads, it doesn’t take a while for all threads to get the wrong data.

But not if the atomicity of these two operations is broken.

Springboot integrate redis

After spring Boot2. X, jedis is replaced with jedis: direct connection, multi-thread operation using jedis pool connection pool, more like BIO mode, netty, NIO mode

redistemplate.opsforvalue

When you need to customize the generics and serialization of redistemplate, you can write your own RedisConfig

JDK serialization is used by default. Redisconnectfactory Defaults to Lucence

Fixed template:

Example:

//@component (instantiate regular POJOs into the Spring container)

// To use @data annotations, you need to introduce Lombok. Lombok is a tool class library that uses simple annotations to simplify code and improve development efficiency. Automatically generate get and set methods

//implements srializable

The test class:

Implement a message queue using Redis

1. Implement the messageListener interface and rewrite the onMessage method

Public interface MessageListener {// Message is the message class, pattern message channel void onMessage(message message, @nullable byte[] pattern); }Copy the code

A thread listens on a queue that has unconsumed messages and fetches them and generates a new thread to consume them. The onMessage method requires a lock, and if one consumer uses a JVM lock, multiple (distributed) consumers use a distributed lock. Example:

2. Add listeners to the Redis queue listener container

RedisMessageListenerContainer container = new RedisMessageListenerContainer();

   container.setConnectionFactory(factory);

   container.addMessageListener(redisListener, new PatternTopic("demo-channel"));

   return container;
Copy the code

3. Producers

Redis.convertandsend () pushes a message (parameter 2) to a channel (parameter 1).Copy the code

Redis configuration file

1. You can include other configuration files

2. The network

3. The general

4.RDB snapshot (generally sufficient, the last time the data disappears, fork child thread will occupy memory)

To restore, you only need to dump the RDB file in the redis startup directory

5. Security

6. Client restrictions

7.AOF (Rewriting strategy for too big)

Configuration:

8. Master/slave replication (default master node)

Change the configuration file, each Redis service has its own configuration file, slaveof command is not permanent configuration, configuration in the configuration file can be permanent

Configuration file:

When the slave machine starts, the slave machine automatically backs up data from the host (full backup). If the sentinel mode is not configured, data cannot be written to the cluster. In this case, there is no primary node in the cluster.

The guard mode:

Configure the sentry conf file.

Install and use

1. The version 5.0.8

2. Install GCC

3.make install

4. The default installation path is usr/local/bin

5. Change the background mode vim redis.conf,daemonize yes

6. Start redis.conf

7. Redis -cli -p 6379 connection

8. Shutdown Exit Exit

9. Redis-benchmark comes with pressure measuring tool

100 concurrent connect 100,000 request tests, write 3 bytes at a time, and only one server processes these requests