The problem background

Online application alarms Email alarms are frequently displayed:redis.clients.jedis.exceptions.JedisConnectionException: Unexpected end of stream.

The stack is shown below:

The frequency and timing of errors is shown below – the frequency of errors seems to follow a pattern, about eight every 10 minutes

### Troubleshooting steps

1. I have never seen this mistake before, and I don’t know why. The possible reasons are as follows:
The output buffer is full. 2. The server disconnects the connection that has been idle for a long time. 3. Abnormal concurrent read and write: The Jedis object may be concurrently operated by multiple threads. (one Jedis object per thread) Analyze the three causes and the code snippet of the error: exclude possible causes 1. Redis on the line are uniformly configured for operation and maintenance, and only this fragment in the project will fail. The key and value of input and output of REDis operation are very small, so it should not be the problem of full buffer. The usage scenario of the code is very low frequency, which is the logic of locking, but the concurrent calls of this code may be very few, almost negligible, so the biggest possible reason is 2Copy the code
2. Check the keepalive parameter configuration of redis server and find that timeOut and tcp_keepalive are both 600s and 10 minutes
3. Compared with the time point of the error reported above, it is estimated that the live parameter of the redis server can match with that of the redis server, so there are no unremarkable problems
4. Jmap-dump :format=b,file=out.hprof 1 Analyzes the connection pool configuration and finds no keepalive configuration

4. Analyze code – error snippets and redis connection pool configuration

The lockTemplate used is internally packaged

@Data
public class RedisProperties extends GenericObjectPoolConfig {
    private String host;
    private int port;
    private String password;
}

/ * * * org.apache.com mons. Pool2. Impl. GenericObjectPoolConfig keep alive by default parameters are false * /  
class GenericObjectPoolConfig{... .private boolean testOnCreate = DEFAULT_TEST_ON_CREATE; // false

    private boolean testOnBorrow = DEFAULT_TEST_ON_BORROW; // false

    private boolean testOnReturn = DEFAULT_TEST_ON_RETURN; // false

    private boolean testWhileIdle = DEFAULT_TEST_WHILE_IDLE; // false. . }/**
 * 由于GenericObjectPoolConfig默认的保活参数都是false
 */
private ConnectionFactory buildConnectionFactory(RedisProperties redisProperties) {
  if (redisProperties == null) {
    // Redis does not have a lock configured, then find the existing JedisPool from the project
    Pool<Jedis> jedisPool = lookupJediPoolFromSpring();
    return new JedisConnectionFactory(jedisPool);

  } else {
    return newJedisConnectionFactory( redisProperties.getHost(), redisProperties.getPort(), redisProperties.getPassword(), redisProperties); }}Copy the code
5. The default keepalive parameter of GenericObjectPoolConfig is false, and there is no special setting in the component, so the lockTemplate connection pool has no keepalive policy. If the connection is not used for a long time, the timeOut of the Redis-server is exceeded. It will be actively closed by the server, but the connection pool is not aware, so the closed connection is not released

In fact, the above analysis of the connection pool configuration parameters and code, because I do not know the components of Redis, actually took a little time, but after finding it is easy to understand

Why is the test environment not present? (Hit yourself in the face, hard)

test:

spring.redis.host = *. *. *. *
spring.redis.port = 6379
spring.redis.data.expire = 3600
spring.redis.data.max.time = 2592000
spring.redis.jedis.pool.maxActive = 30
spring.redis.jedis.pool.maxWait = 3000
spring.redis.jedis.pool.maxIdle = 8
spring.redis.jedis.pool.minIdle = 0

pro:

spring.redis.host = *. *. *. *
spring.redis.port = 6384
spring.redis.data.expire = 3600
spring.redis.data.max.time = 2592000
spring.redis.jedis.pool.maxActive = 8
spring.redis.jedis.pool.maxWait = 3000
spring.redis.jedis.pool.maxIdle = 8
spring.redis.jedis.pool.minIdle = 0
The default lock-related configuration is shown here
lock.lockWaitMillis = 5000
lock.lockExpireMillis = 600000
lock.retryMillis = 100
The following configuration is recommended if spring Data Redis is configured for your project, and the component can use the connection pool of your project directly
lock.redis.host = *. *. *. *
lock.redis.port = 6379
lock.redis.max-idle = 8
lock.redis.min-idle = 4
lock.redis.max-active = 8
lock.redis.max-wait = 10000
Copy the code
The only difference between the test environment and the online environment is that only the connection pool of Spring Data Redis is configured. That is, the test environment, locked components and normal Redis operations use Spring Data Redis. Spring Data Redis database connection pool setup
org.springframework.boot.autoconfigure.data.redis.JedisConnectionConfiguration

@Configuration
@ConditionalOnClass({ GenericObjectPool.class, JedisConnection.class, Jedis.class })
class JedisConnectionConfiguration extends RedisConnectionConfiguration {
	private JedisPoolConfig jedisPoolConfig(RedisProperties.Pool pool) {
    // Note that JedisPoolConfig is used
		JedisPoolConfig config = new JedisPoolConfig();
		config.setMaxTotal(pool.getMaxActive());
		config.setMaxIdle(pool.getMaxIdle());
		config.setMinIdle(pool.getMinIdle());
		if(pool.getMaxWait() ! =null) {
			config.setMaxWaitMillis(pool.getMaxWait().toMillis());
		}
		return config;
}


package redis.clients.jedis;

import org.apache.commons.pool2.impl.GenericObjectPoolConfig;

public class JedisPoolConfig extends GenericObjectPoolConfig {
  public JedisPoolConfig(a) {
    // defaults to make your life with connection pool easier :)
    setTestWhileIdle(true);
    setMinEvictableIdleTimeMillis(60000);
    setTimeBetweenEvictionRunsMillis(30000);
    setNumTestsPerEvictionRun(-1); }}Copy the code

Conclusion: Spring Data Redis’ Jedis connection pool configuration uses JedisPoolConfig. Although JedisPoolConfig also uses GenericObjectPoolConfig, it has its own configuration to check the validity of idle links. So the connection to the test environment is checked by the keepalive policy.

Recurring problems

class JedisTest {
    fun testBatchPool(a) {
        val config = JedisPoolConfig()
        config.numTestsPerEvictionRun = 3
        config.timeBetweenEvictionRunsMillis = 12000
        config.minIdle = 5
        config.maxTotal = 10
        config.testOnBorrow = false
        config.testWhileIdle = true

        val pool = JedisPool(config, "127.0.0.1".6379)

        val list: List<Int> = listOf(0.1.2.3.4.5.6.7.8.9)
        // Initialize 10 connections
        list.map { pool.resource }.forEach(Jedis::close)

        println("idle: ${pool.numIdle}")
        // Wait 14 seconds
        Thread.sleep(15000)

        println(LocalDateTime.now())
        println("idle: ${pool.numIdle}")
        list.map { pool.resource }
                .forEach {
                    try {
                        it.get("key")}catch (e: java.lang.Exception) {
                        e.printStackTrace()
                    }
                }
    }
}

fun main(a) {
    JedisTest().testBatchPool()
}
Copy the code

Result: Unexpected end of stream occurs 5 times.

Further research (the role of preservation parameters) :

Close the connection after a client is idle for N seconds (0 to disable) If the value is 0, the server does not disconnect the connection. The value cannot be smaller than 0. Timeout 10 # This parameter is used to send an Ack packet to check whether the client is alive. use SO_KEEPALIVE to send TCP ACKs to clients in absence # of communication. This is useful for two reasons: # # 1) Detect dead peers. # 2) Take the connection alive from the point of view of network # equipment in the middle. # # On Linux, the specified value (in seconds) is the period used to send ACKs. # Note that to close the connection the double of the time is needed. # On other kernels the period depends on the kernel configuration. # # A reasonable value for this option is 300 seconds, Which is the new # Redis default starting with Redis 3.2.1. Tcp-keepalive 10 Client keepalive parameter Org.apache.com mons. Pool2. Impl. BaseGenericObjectPool testOnCreate from pooledObjectFactory create objects are added to the objectPool, TestOnBorrow pooledObject Validation testOnBorrow pooledObject validation testOnReturn pooledObject validation testOnBorrow pooledObject validation Whether the validation testWhileIdle free detection timeBetweenEvictionRunsMillis idle time interval numTestsPerEvictionRun every time the number of detection MinEvictableIdleTimeMillis free link of survival in the minimum time BaseGenericObjectPool TimeTask test code Org.apache.com mons. Pool2. Impl. BaseGenericObjectPool. Evictor Jedis connection pool will invoke the quit command when expiredCopy the code

Take a look at wireshark to capture client and Redis server packages

It is found that there are several connections where the segment starts with the redis client initiating QUIT, and the server returning OK, but the 4 times of waving failed, resulting in RST

RST

In TCP, the RST segment identifies the reset and is used to close the connection abnormally. It is essential in the design of TCP to send an RST segment to close a connection without waiting for all the data in the buffer to be sent. After receiving the RST segment, the receiver does not need to send an ACK for confirmation. To send data to an unlistened port or an abnormal connection, the sender sends an RST dataCopy the code

When Jedis closes or destroys the connection, it sends a QUIT command and disconnects directly. The server also waves its hands 4 times foolishly, but the connection is no longer there