Abnormal field review

On July 7, pressure test was conducted on the video task push function of little Donkey class. During the pressure test, some interface requests failed. The following exceptions occur when the donkey fails to report push:

The exception log was received by the donkey. Abnormal content is the abnormal page returned by the network school gateway, how to identify the network school gateway? Look back to the front of the home page there is a URL, this jump to the URL is the home page of the network school

The class task push is mainly to B, which is used by teachers. The number of concurrent tasks is not large, 10 concurrent tasks, 25 times, a total of 250 tasks. Concurrency is not large, in the end why the exception, how the exception?

First, let’s look at ELK log monitoring.

Query gateway logs and search for the internal gateway. The search criteria are as follows: Internal gateway + request path to view the concurrency.

We see 207 concurrent requests, all visually successful. But there were only 207 when there were 250? Let’s look for exceptions. How do I query? Search criteria:

  1. project.key is cloudlearn-gateway-inner
  2. response is not 200

Step 2: Access the gateway to query the fault cause.

A traceId query to the gateway log based on the failed request in ELK does not yield anything. So I took the traceId to the business server to look it up

Step 3: Find the business server logs.

The service server threw an exception and failed to obtain the lock. Procedure

Abnormal cause analysis and solution

Recall yesterday’s anomaly data, we see several small donkey ID have thrown exceptions. Redisson lock: paperId+lvTaskId+originType. In case, the lvTaskId of pressure measurement should be different. How can there be a lock.

The online log is incomplete. To simulate this problem, start the Jemeter locally for pressure testing. The test range lvTaskId is randomly generated from 100,000-999999 to 20. The probability of id duplication is very low, so it’s going to work very quickly

20 requests, two failed. The problems occurred in the field of pressure survey were reproduced. This is easy to do.

Step 1: View local logs and find exception logs

Same as the anomaly reported on the online manometry.

Step 2: Determine whether the randomly generated lvTaskId is duplicated.

Because the key of the Redisson lock cannot be repeated, if the Redisson lock is repeated, the wait will be performed. If the wait times out, the lock will fail to be acquired. Result: There is no duplicate lvTaskIdCopy the code

Step 3: Check the Redisson source code, break the point in the anomaly, step by step check.

@Around("controllerAspect(lock)")
public Object aroundAdvice(ProceedingJoinPoint proceedingJoinPoint, Lock lock) throws Throwable {
    String[] keys = lock.keys();
    if (keys.length == 0) {
        throw new RuntimeException("Keys cannot be empty.");
    }
    String[] parameterNames = new LocalVariableTableParameterNameDiscoverer().getParameterNames(((MethodSignature) proceedingJoinPoint.getSignature()).getMethod());
    Object[] args = proceedingJoinPoint.getArgs();

    long attemptTimeout = lock.attemptTimeout();
    if (attemptTimeout == 0) {
        attemptTimeout = redissonProperties.getAttemptTimeout();
    }
    long lockWatchdogTimeout = lock.lockWatchdogTimeout();
    if (lockWatchdogTimeout == 0) {
        lockWatchdogTimeout = redissonProperties.getLockWatchdogTimeout();
    }
    LockModel lockModel = lock.lockModel();
    if (lockModel.equals(LockModel.AUTO)) {
        LockModel lockModel1 = redissonProperties.getLockModel();
        if(lockModel1 ! =null) {
            lockModel = lockModel1;
        } else if (keys.length > 1) {
            lockModel = LockModel.MULTIPLE;
        } else{ lockModel = LockModel.REENTRANT; }}if(! lockModel.equals(LockModel.MULTIPLE) && ! lockModel.equals(LockModel.REDLOCK) && keys.length >1) {
        throw new RuntimeException("Multiple parameters, lock mode is ->" + lockModel.name() + ". Unable to lock");
    }
    log.info("Lock mode ->{}, wait lock time ->{} seconds. Maximum lock time ->{} seconds",lockModel.name(),attemptTimeout/1000,lockWatchdogTimeout/1000);
    boolean res = false;
    RLock rLock = null;
    // Wait for the lock.
    switch (lockModel) {
        case FAIR:
            rLock = redissonClient.getFairLock(getVauleBySpel(keys[0],parameterNames,args));
            break;
        case REDLOCK:
            RLock[] locks=new RLock[keys.length];
            int index=0;
            for (String key : keys) {
                locks[index++]=redissonClient.getLock(getVauleBySpel(key,parameterNames,args));
            }
            rLock = new RedissonRedLock(locks);
            break;
        case MULTIPLE:
            RLock[] locks1=new RLock[keys.length];
            int index1=0;
            for (String key : keys) {
                locks1[index1++]=redissonClient.getLock(getVauleBySpel(key,parameterNames,args));
            }
            rLock = new RedissonMultiLock(locks1);
            break;
        case REENTRANT:
            rLock= redissonClient.getLock(getVauleBySpel(keys[0],parameterNames,args));
            break;
        case READ:
            RReadWriteLock rwlock = redissonClient.getReadWriteLock(getVauleBySpel(keys[0],parameterNames,args));
            rLock = rwlock.readLock();
            break;
        case WRITE:
            RReadWriteLock rwlock1 = redissonClient.getReadWriteLock(getVauleBySpel(keys[0],parameterNames,args));
            rLock = rwlock1.writeLock();
            break;
    }

    / / implementation of aop
    if(rLock! =null) {
        try {
            if (attemptTimeout == -1) {
                res = true;
                // Wait for the lock
                rLock.lock(lockWatchdogTimeout, TimeUnit.MILLISECONDS);
            } else {
                res = rLock.tryLock(attemptTimeout, lockWatchdogTimeout, TimeUnit.MILLISECONDS);
            }
            if (res) {
                Object obj = proceedingJoinPoint.proceed();
                return obj;
            }else{
                throw new LockException("Lock acquisition failed"); }}finally {
           if(res) { rLock.unlock(); }}}throw new LockException("Lock acquisition failed");
}
Copy the code

The exception is in this code, so read this code carefully. How to generate keys, how to lock, how to retry. Suddenly, notice the code snippet that parses the key

public String getVauleBySpel(String key, String[] parameterNames, Object[] values) {

    if(! key.startsWith("#")) {return key;
    }
    / / spel parser
    ExpressionParser parser = new SpelExpressionParser();
    / / spel context
    EvaluationContext context = new StandardEvaluationContext();
    for (int i = 0; i < parameterNames.length; i++) {
        context.setVariable(parameterNames[i], values[i]);
    }
    Expression expression = parser.parseExpression(key);
    Object value = expression.getValue(context);
    log.info("key={},value={}",key,value);
    if(value! =null) {return value.toString();
    }
    return "";
}
Copy the code

If (! key.startsWith(“#”)){return key; } If redisson’s key does not start with #, the key parameters will not be parsed. That’s the problem.

Usually, when we define a key, we put a constant in front of the variable. Facilitates log troubleshooting. The diagram below:

It is this problem that causes all concurrent requests to remain a lock. Finally, there is the wait timeout problem.

Step 4: Modify the lock by putting the constant after it

Pressure test again, the result is normal.

conclusion

Redisson lock when setting key, if add constant, do not load before, to add after, if you modify the source code, that is another word.

In fact, the redisson-spring-boot-starter documentation also mentions that constants can be used. On the one hand this hint is written in the comment, on the other hand it does not explicitly say where constants should be placed. For those of us who use Redis and are used to putting constants first, this is a real pit.

At this point, the troubleshooting of this exception is over, and the troubleshooting process is recorded. I hope you can avoid similar pits when using Redisson in the future.