First, use mode

Use the API in StringRedisTemplete encapsulated by SpringBoot to store data using the basic Redis type String.

Note: If you use RedisTemplate, you need to specify a generic type for auto-injection, otherwise the API will not fail, but there is no data in the library, so the command will not take effect, as shown below.

In addition, there are serialization issues to be aware of when using RedisTemplate, which uses JDK serialization by default. Using RedisTemplate directly will cause the stored data to appear in binary format when viewed in third-party software.

Second, problem scenario

2.1 All REDis operations report timeout exceptions

2.1.1 Configuration and Dependencies

Spring: redis: host: XXX #port: XXX # Redis server connection portpassword: XXX # Redis server connection password (default blank) max-active:1024Max-wait:10000# connection pool maximum block wait time (negative value indicates no limit) max-idle:1000Min-idle:200Minimum free connection in connection pooltimeout: 10000Request timeoutCopy the code
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
<version>2.3.1. RELEASE</version>
</dependency>
Copy the code

2.1.2 Problem situation

On the second day after the launch, it was found that an error was reported when querying Redis, and the service was restored only after repeated retries or restart. After a period of time, an error was reported again. Error message is “under Caused by: IO. Lettuce. Core. RedisCommandTimeoutException: Command timed out after 10 second (s)”

2.1.3 reason

The @import annotation in the red box is injected in order, and Lettuce precedes Jedis, so the default is to use Lettuce

The springboot version is upgraded to 2.X. By default, the springboot client is used to manage redis. Adaptive topology refresh and timed topology refresh are disabled by default.

The Redis server configuration has the timeout attribute, which means the number of seconds after the client is idle to close the connection. If 0 is specified, this function is disabled. Check that the timeout time of redis library in sandbox is set to 3600s= 1H, and confirm with DB colleagues of the company that the online timeout time is also 3600s.There is a disconnection reconnection mechanism, but the reconnection task is not immediately reconnected after disconnection, but is delayed according to a delayed reconnection policy, so that the caller can succeed after several retries.

There was no connection timeout in the testing process, because the sandbox environment was connected to the test library. DB colleagues thought that the sandbox environment did not have connection timeout because the version of redis was different from that of online redis and the driver was different. There was no timeout in the test after going online, because the test time was short from the service startup time and did not exceed the automatic disconnection time.

2.1.4 Solution

1. Enable the regular refreshing configuration of the oracle client and keep the heartbeat. Disadvantages: new code, unfamiliar, prone to new problems.

2. Redis is replaced with Jedis client, Jedis will automatically maintain physical connection. Disadvantages: High concurrency performance is not as good as the Lettuce effect.

Since it is an online problem, scheme 2 only needs to modify POM dependencies and configuration files, which takes the least time. Therefore, temporary scheme 2 is chosen.

2.1.5 Modified contents

Remove lettuce from dependencies, and add jedis dependencies.

2.1.6 supplement

In the follow-up troubleshooting, we found that some projects chose to use Jedis client when configuring Redis. Although the dependency content was modified as above, the format in the configuration file was still the same as the left side (without the Jedis. Pool field). The Jedis default connection pool configuration is used.

Configuration:

Spring: redis: host: XXX #port: XXX # Redis server connection portpassword: XXX # Redis server connection password (default blank) max-active:1024Max-wait:10000# connection pool maximum block wait time (negative value indicates no limit) max-idle:1000Min-idle:200Minimum free connection in connection pooltimeout: 10000Request timeoutCopy the code

Below is a screenshot of debugging with this configuration.

Because Jedis is used, the project starts initializing the ConnectionFactory directly by calling the Constructor of JedisConnectionFactory. Because only in the application. The yml file is configured with redis, without rewriting the Configuration class, so here shows actual for DefaultJedisClientConfigConfiguration clientConfig class.

See DefaultJedisClientConfigConfiguration properties, usePooling = false indicates no connection pool configuration, illustrate the connection pool written above is wrong. Although poolConfig is not configured, you can see that poolConfig does have a value and that GenericObjectPoolConfig is actually passed in, belonging to the commons.pool2 package introduced in Jedis.

Therefore, if Jedis is introduced, Jedis will create a default connection pool with a maximum number of connections and a maximum number of free connections of 8, even if the connection pool is not configured or configured incorrectly. Commons. Pool2 should not be excluded from Jedis dependencies, because there is no Commons. Pool2 package in lettuce by default. (In fact, the build will fail if you do exclude it)

Spring:redis: host: XXX # Redis Server addressport: XXX # Redis server connection portpassword: XXX # Redis server connection password (default null)jedis:
            pool:
                max-active: 1024Max-wait:10000Max-idle:200Min-idle:0Minimum free connection in connection pooltimeout: 10000Request timeoutCopy the code

If amend the configuration file format is correct, you can see the connection pool configuration is introduced to normal DefaultJedisClientConfigConfiguration.

For the actual number of connections, you can use lsof -i :XXX(Redis port) to view the number of Redis connections in the service.

2.2 Redis query time has a spike

2.2.1 configuration

For details, see 2.1.1.

2.2.2 Problem situation

Checking our service monitoring platform, it can be seen that the service time is uneven. TP99 reaches 100ms, indicating that the service is not stable and the quality is not high. Business scenarios have high performance requirements, so the service quality needs to be optimized.

2.2.3 reason

At the bottom of Jedis is synchronous blocking IO — BIO, and all read and write operations occupy a connection. If QPS is too high, the time will become high. Our service QPS is around 100, and Jedis performance is poor.

2.2.3 Solution

The underlying database uses Netty to connect, multiplexing asynchronous non-blocking (NIO) to execute redis commands, suitable for scenarios with large and short connections, few material service insertion operations, many query operations, so you can use the database to replace Jedis, In order to prevent the connection redis from timeout, you need to enable the topology refresh of the database.

I’ve made a handy note about synchronous and asynchronous versus blocking and non-blocking:

Scenario: You call the bookstore owner to ask if a certain book is available. Sync: the boss said don't hang up, let me see, in the process of looking for the book call at the same time, after looking for the result to tell you. Asynchronous: the boss said I first look, later will call you, hang up the phone, the boss after looking for the book to call you. Jam: After you ask your boss, you sit there and wait for an answer, but do nothing else. Non-blocking: After you ask your boss, you run off to do something else, but periodically check to see if your boss has found the book.Copy the code

It can be combined into three IO models: BIO (synchronous blocking), NIO (synchronous non-blocking) and AIO (asynchronous non-blocking). For details, see common IO models

2.2.4 Modified contents

Restore spring-redis dependencies and add commons.pool2 dependencies for connection pools.

Change Jedis to Lettuce and enable topology refresh to keep the “heartbeat”.

After pressure measurement, the effect is immediate, the time consumption is significantly reduced, the time consumption in the same period tends to be stable, TP99 within 10ms.

2.2.4 supplement

From the above changes, you can see that if you want to use connection pooling, you need to add commons.pool2 dependency. This is because the CONNECTION pooling is based on the NIO model to manage redis connections. By default, connection pooling is not useful, as can be seen from the source code.

When LettuceConnectionFactory was created without connection pool dependencies, none of the connection pool classes in the Configuration class were shown to be imported, and the annotations indicate the need to import the Commons.pool2 package. If there are only ordinary Redis commands in the project, such as set, get, etc., and no transaction or BLPOP operation is used, and the project does not have multiple Redis libraries connected, It is often possible to support the database without using the database connection pool (in fact, in this scenario, using the database connection pool is actually worse). In this case, you need to modify the configuration file to remove the database connection pool, such as:

Spring: redis: host: XXX #port: XXX # Redis server connection portpassword: XXX # Redis server connection password (default null)timeout: 10000Request timeoutlettuce:
            cluster:
                refresh:
                    adaptive: true# Official advice60Automatically refresh once every secondperiod: 60s
Copy the code

If the configuration file contains the lettuce. Pool field and there is no commons.pool2 package in the dependency, the service will start up with an error because the call to the getPoolConfig(Pool Properties) method depends on the presence of the lettuce.

When not using the oracle connection pool, SharedConnection is used, which is reusable as the name suggests. Using the top command in the terminal check process, select the Java process PID such as 13015, reoccupy jmap – histo: live 13015 | more | grep LettuceConnection command to see the number of objects currently in the process of survival, more QPS call query, insert, interface, There is always only one SharedConnection object and only one Redis connection.

But, if you open transaction again much QPS redis command, multiple LettuceConnection can be found in the object, the actual redis also have more than one.

2.3 Only the insertion operation reports a timeout exception

2.3.1 configuration

For details, see section 1.2 “Modifications”.

Due to changes in business requirements, the value of Redis is changed from JsonObject to JsonArray. In order to be compatible with old data, the type of Redis is not changed and String is still used instead of hash. Material services can have multiple creative inserts for the same key, because the insert process is get– then set, and therefore thread unsafe. In order to ensure data consistency, it is decided to use Watch optimistic lock and Redis transaction to deal with concurrency. Only one thread is allowed to operate the value of the same key at the same time to ensure that the previous material content will not be covered by subsequent operations.

Because the RedisConfiguration class is not overwritten, so not all redis transactions are enabled, only in the insert, update interface enabled transactions, this operation makes the insert, delete interface without error, avoid a disaster.

try{

// Start the transaction

stringRedisTemplate.setEnableTransactionSupport(true);

/ / optimistic locking

stringRedisTemplate.watch(keyList);

/ / query

stringRedisTemplate.opsForValue().get();

// Transaction start tag

stringRedisTemplate.multi();

// Insert operations

stringRedisTemplate.opsForValue().set();

// Transaction commit

resultList = stringRedisTemplate.exec();

}catch(Exception e){

XXX

}finally {

/ / release

stringRedisTemplate.unwatch();

}
Copy the code

2.3.2 Problem situation

The same error occurred in problem 1, but it is not necessary to appear, and only occasionally appear when inserting, the same time period of query, delete did not report an error. The error logs all point to the redistemplate.watch () or unwatch() methods.

2.3.3 reason

It was found that all the commands reporting errors were watch and unwatch, and only the insertion occasionally reported errors, so it was speculated that the problem was related to the use of transactions.

The reasons given are as follows: 1- Redis service or network partition error; 2- command timeout causing block; 3- Setting timeout too short; 4- Blocking Netty EventLoop. According to the actual situation, cause 2 is most likely, so I changed the timeout in the configuration to 1000(1 second) during the local debugging process, and the breakpoint pause time was specially extended. As expected, the same error occurred. After removing the transaction, the commands were reduced from 6 to 2, and they were common GET and set commands. The connection will not be blocked due to command timeout.

2.3.4 Solution

Remove the inserted and updated transactional code in the interface, leaving only the business code. The material service insert operation controls concurrent requests by IncJob, so removing the transaction code in the material service does not result in data errors.

2.3.5 Modified Contents

Only the code annotated with “Business Code” in Configuration is reserved.

2.3.6 supplement

The following is provided that the transaction in the insert operation has not been removed.

During local debugging, a WARN log of unreleased connections in the connection pool is printed when the unit test ends and the service is shut down.

In further study, we found that the connection executing the transaction command will not be released actively after the redis transaction is started.

The releaseConnection method here is called each time the redis command is executed, where the logic for releasing the connection shows that if the connection is executing a transaction command and is not a read-only transaction, the connection will not be released after the command is executed.

Actual log:

During validation, I changed max-active(the maximum number of connections in the connection pool) to 10 and called the insert interface 20 times.

The failure is caused by the change of the value of the key detected by The Watch, and null is caused by the service reporting an error message as follows, indicating that the connection resource cannot be obtained from the connection pool.

Check that the redis connection number is 10, then you can call the query interface normally (no transaction command).

Third, summary

The choice of Redis framework depends on the specific business scenario.

For the use of a popular framework, you should first know the information about the version you use before using some part of the tool encapsulated by it. If it is a new version, you need to know what changes have been made. If it is an old version, you should check whether there are unresolved issues.

It is best to find the answer to the problem from the source code.

At present, the puzzle is why watch and unwatch command timeout, experienced students welcome to comment section.

Four, reference

SpringBoot implements timeout problems with redis using the Oracle client

Jedis and Lettuce connection Redis scheme performance comparison

Jedis and Lettuce performance comparison