– If there is no preciseness or mistakes, I hope you don’t hesitate to teach me, gently, he is still a child, he teh ~

The introduction

Recently, I have been working overtime and staying up late for several days. My body is a little overwhelmed and my spirit is also a little depressed. However, the business side is pushing me so hard that I have to brave the project. When your mind is in a muddle, you can’t call it code. You can call it a Bug. I stayed up late writing a bug and got yelled at.

Due to the mall business, it is necessary to frequently deduct the inventory of goods. The application is cluster deployment. In order to avoid the problems of overbuying and oversold inventory caused by concurrency, redis distributed lock is used to control it. I thought I would add a lock lock. TryLock to the code that holds the inventory

/** * @date 2020/4/21 12:10 */ public String stockLock() {RLock lock = redissonClient.getLock("stockLock"); Try {/** * get lock */ if (lock.tryLock(10, timeunit.seconds)) {/** * subtract inventory */... } else {logger. info(" Did not get lock service end.." ); }} catch (Exception e) {logger. info(" handle Exception ", e); } return "ok"; }Copy the code

As a result, I forgot to release the lock locke.unlock () after the execution of the business code, which led to the redis thread pool being full and the redis service failing in a large area, resulting in the confusion of inventory data deduction and being scolded by the leader. This month’s performance ~ ah ~ ~

As I spent more time with Redis locks, I found that there were far more holes in redis locks than I expected. Even in the interview questions, redis distributed locks have a high rate of exposure, such as: “What problems have YOU encountered with locks?” “And” How?” Basically, it’s all a one-two punch.

Today, I would like to share with you my redis distributed lock stomp pit diary, as well as some solutions.

The lock is not released

This situation is a kind of low-level error, which is the mistake I made above. Because the current thread acquired the Redis lock and did not release the lock in time after processing the business, other threads will try to obtain the lock block all the time. For example, Jedis client will report the following error message

redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
Copy the code

The Redis thread pool has no free threads to process client commands.

The solution is also very simple. As long as we are careful, the thread that has acquired the lock can release the lock in time after processing the business. If it is a reentrant lock, the thread can release the current connection and sleep for a period of time.

public void lock() { while (true) { boolean flag = this.getLock(key); if (flag) { TODO ......... } else {// Release the current redis connection redis.close(); // Sleep 1000 ms sleep(1000); }}}Copy the code

2. The lock of B is released by A

We know that the principle of Redis lock is SETNX command. Set key to value if key does not exist and return 1. If the given key already exists, SETNX does nothing and returns 0.

SETNX key value
Copy the code

Let’s imagine this scenario: thread A and thread B try to lock key myLock. Thread A takes the lock first (if the lock expires after 3 seconds), and thread B waits to try to acquire the lock.

At this time, if the business logic is time-consuming and the execution time exceeds the redis lock expiration time, the lock of thread A is automatically released (delete the key), and thread B detects that the key myLock does not exist and executes the SETNX command to obtain the lock.

However, thread A will still release the lock (delete the key) after executing the business logic, which will cause thread B to release the lock by thread A.

In order to avoid the above situation, we generally need to identify each thread lock with its own unique value value, only release the key specified value, otherwise there will be a chaotic scene of lock release.

Database transaction timeout

Emm: Redis lock is a database transaction. Don’t worry, take a look at this code:

@Transaction public void lock() { while (true) { boolean flag = this.getLock(key); if (flag) { insert(); }}}Copy the code

Add an @Transaction annotation to this method to start a Transaction, such as rolling back an exception thrown in the code. Remember that database transactions are time-out bound and do not wait indefinitely for a time-consuming database operation.

For example, if we parse a large file and save the data to the database for too long, the transaction will timeout and automatically roll back.

If your key has not been locked for a long time and the lock has been waiting longer than the database transaction timeout, the application will report an exception.

To solve this problem, we usually need to change database transactions to manual commit and rollback transactions.

@Autowired DataSourceTransactionManager dataSourceTransactionManager; Transaction public void lock() {// Manually start the Transaction TransactionStatus TransactionStatus = dataSourceTransactionManager.getTransaction(transactionDefinition); try { while (true) { boolean flag = this.getLock(key); if (flag) { insert(); / / manual commit the transaction dataSourceTransactionManager.com MIT (transactionStatus); }}} the catch (Exception e) {/ / manual rollback transaction dataSourceTransactionManager. Rollback (transactionStatus); }}Copy the code

4. The lock has expired and the business has not been executed

This situation is similar to the second one mentioned above, but with a slightly different approach.

The same redis distributed lock expires, and the business logic has not finished the scenario, but here, another way to think about the problem, the redis lock expiration time longer not to solve it?

That’s still a problem, we can manually increase the expiration time of the redis lock while adding the lock, but how long is the appropriate time? The execution time of the business logic is not controllable, and the operation performance will be affected if the execution time is too long.

ifredisIt would be nice if the expiration of the lock could be automatically renewed.

In order to solve this problem, we use redis client Redisson. Redisson well solves some difficult problems of Redis in distributed environment. Its purpose is to make users pay less attention to Redis and focus more on business logic.

Redisson does a good job of encapsulating distributed locks by simply calling the API.

  RLock lock = redissonClient.getLock("stockLock");
Copy the code

After redisson successfully adds a lock, he registers a timed task to listen to the lock, check on it every 10 seconds, and renew the expiration if the lock is still held. The default expiration time is 30 seconds. This mechanism is also called “watchdog.”

For example: If the lock time is 30 seconds, check every 10 seconds. Once the lock service is not completed, the lock will be renewed once and reset the lock expiration time to 30 seconds.

Through the analysis of redisson’s source code implementation below, it can be found that no matter locking, unlocking and renewal, the client sends some complex business logic to Redis through encapsulation in Lua script to ensure the atomicity of the execution of this complex business logic.

 
@Slf4j
@Service
public class RedisDistributionLockPlus {
 
    /**
     * 加锁超时时间,单位毫秒, 即:加锁时间内执行完操作,如果未完成会有并发现象
     */
    private static final long DEFAULT_LOCK_TIMEOUT = 30;
 
    private static final long TIME_SECONDS_FIVE = 5 ;
 
    /**
     * 每个key的过期时间 {@link LockContent}
     */
    private Map<String, LockContent> lockContentMap = new ConcurrentHashMap<>(512);
 
    /**
     * redis执行成功的返回
     */
    private static final Long EXEC_SUCCESS = 1L;
 
    /**
     * 获取锁lua脚本, k1:获锁key, k2:续约耗时key, arg1:requestId,arg2:超时时间
     */
    private static final String LOCK_SCRIPT = "if redis.call('exists', KEYS[2]) == 1 then ARGV[2] = math.floor(redis.call('get', KEYS[2]) + 10) end " +
            "if redis.call('exists', KEYS[1]) == 0 then " +
               "local t = redis.call('set', KEYS[1], ARGV[1], 'EX', ARGV[2]) " +
               "for k, v in pairs(t) do " +
                 "if v == 'OK' then return tonumber(ARGV[2]) end " +
               "end " +
            "return 0 end";
 
    /**
     * 释放锁lua脚本, k1:获锁key, k2:续约耗时key, arg1:requestId,arg2:业务耗时 arg3: 业务开始设置的timeout
     */
    private static final String UNLOCK_SCRIPT = "if redis.call('get', KEYS[1]) == ARGV[1] then " +
            "local ctime = tonumber(ARGV[2]) " +
            "local biz_timeout = tonumber(ARGV[3]) " +
            "if ctime > 0 then  " +
               "if redis.call('exists', KEYS[2]) == 1 then " +
                   "local avg_time = redis.call('get', KEYS[2]) " +
                   "avg_time = (tonumber(avg_time) * 8 + ctime * 2)/10 " +
                   "if avg_time >= biz_timeout - 5 then redis.call('set', KEYS[2], avg_time, 'EX', 24*60*60) " +
                   "else redis.call('del', KEYS[2]) end " +
               "elseif ctime > biz_timeout -5 then redis.call('set', KEYS[2], ARGV[2], 'EX', 24*60*60) end " +
            "end " +
            "return redis.call('del', KEYS[1]) " +
            "else return 0 end";
    /**
     * 续约lua脚本
     */
    private static final String RENEW_SCRIPT = "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('expire', KEYS[1], ARGV[2]) else return 0 end";
 
 
    private final StringRedisTemplate redisTemplate;
 
    public RedisDistributionLockPlus(StringRedisTemplate redisTemplate) {
        this.redisTemplate = redisTemplate;
        ScheduleTask task = new ScheduleTask(this, lockContentMap);
        // 启动定时任务
        ScheduleExecutor.schedule(task, 1, 1, TimeUnit.SECONDS);
    }
 
    /**
     * 加锁
     * 取到锁加锁,取不到锁一直等待知道获得锁
     *
     * @param lockKey
     * @param requestId 全局唯一
     * @param expire   锁过期时间, 单位秒
     * @return
     */
    public boolean lock(String lockKey, String requestId, long expire) {
        log.info("开始执行加锁, lockKey ={}, requestId={}", lockKey, requestId);
        for (; ; ) {
            // 判断是否已经有线程持有锁,减少redis的压力
            LockContent lockContentOld = lockContentMap.get(lockKey);
            boolean unLocked = null == lockContentOld;
            // 如果没有被锁,就获取锁
            if (unLocked) {
                long startTime = System.currentTimeMillis();
                // 计算超时时间
                long bizExpire = expire == 0L ? DEFAULT_LOCK_TIMEOUT : expire;
                String lockKeyRenew = lockKey + "_renew";
 
                RedisScript<Long> script = RedisScript.of(LOCK_SCRIPT, Long.class);
                List<String> keys = new ArrayList<>();
                keys.add(lockKey);
                keys.add(lockKeyRenew);
                Long lockExpire = redisTemplate.execute(script, keys, requestId, Long.toString(bizExpire));
                if (null != lockExpire && lockExpire > 0) {
                    // 将锁放入map
                    LockContent lockContent = new LockContent();
                    lockContent.setStartTime(startTime);
                    lockContent.setLockExpire(lockExpire);
                    lockContent.setExpireTime(startTime + lockExpire * 1000);
                    lockContent.setRequestId(requestId);
                    lockContent.setThread(Thread.currentThread());
                    lockContent.setBizExpire(bizExpire);
                    lockContent.setLockCount(1);
                    lockContentMap.put(lockKey, lockContent);
                    log.info("加锁成功, lockKey ={}, requestId={}", lockKey, requestId);
                    return true;
                }
            }
            // 重复获取锁,在线程池中由于线程复用,线程相等并不能确定是该线程的锁
            if (Thread.currentThread() == lockContentOld.getThread()
                      && requestId.equals(lockContentOld.getRequestId())){
                // 计数 +1
                lockContentOld.setLockCount(lockContentOld.getLockCount()+1);
                return true;
            }
 
            // 如果被锁或获取锁失败,则等待100毫秒
            try {
                TimeUnit.MILLISECONDS.sleep(100);
            } catch (InterruptedException e) {
                // 这里用lombok 有问题
                log.error("获取redis 锁失败, lockKey ={}, requestId={}", lockKey, requestId, e);
                return false;
            }
        }
    }
 
 
    /**
     * 解锁
     *
     * @param lockKey
     * @param lockValue
     */
    public boolean unlock(String lockKey, String lockValue) {
        String lockKeyRenew = lockKey + "_renew";
        LockContent lockContent = lockContentMap.get(lockKey);
 
        long consumeTime;
        if (null == lockContent) {
            consumeTime = 0L;
        } else if (lockValue.equals(lockContent.getRequestId())) {
            int lockCount = lockContent.getLockCount();
            // 每次释放锁, 计数 -1,减到0时删除redis上的key
            if (--lockCount > 0) {
                lockContent.setLockCount(lockCount);
                return false;
            }
            consumeTime = (System.currentTimeMillis() - lockContent.getStartTime()) / 1000;
        } else {
            log.info("释放锁失败,不是自己的锁。");
            return false;
        }
 
        // 删除已完成key,先删除本地缓存,减少redis压力, 分布式锁,只有一个,所以这里不加锁
        lockContentMap.remove(lockKey);
 
        RedisScript<Long> script = RedisScript.of(UNLOCK_SCRIPT, Long.class);
        List<String> keys = new ArrayList<>();
        keys.add(lockKey);
        keys.add(lockKeyRenew);
 
        Long result = redisTemplate.execute(script, keys, lockValue, Long.toString(consumeTime),
                Long.toString(lockContent.getBizExpire()));
        return EXEC_SUCCESS.equals(result);
 
    }
 
    /**
     * 续约
     *
     * @param lockKey
     * @param lockContent
     * @return true:续约成功,false:续约失败(1、续约期间执行完成,锁被释放 2、不是自己的锁,3、续约期间锁过期了(未解决))
     */
    public boolean renew(String lockKey, LockContent lockContent) {
 
        // 检测执行业务线程的状态
        Thread.State state = lockContent.getThread().getState();
        if (Thread.State.TERMINATED == state) {
            log.info("执行业务的线程已终止,不再续约 lockKey ={}, lockContent={}", lockKey, lockContent);
            return false;
        }
 
        String requestId = lockContent.getRequestId();
        long timeOut = (lockContent.getExpireTime() - lockContent.getStartTime()) / 1000;
 
        RedisScript<Long> script = RedisScript.of(RENEW_SCRIPT, Long.class);
        List<String> keys = new ArrayList<>();
        keys.add(lockKey);
 
        Long result = redisTemplate.execute(script, keys, requestId, Long.toString(timeOut));
        log.info("续约结果,True成功,False失败 lockKey ={}, result={}", lockKey, EXEC_SUCCESS.equals(result));
        return EXEC_SUCCESS.equals(result);
    }
 
 
    static class ScheduleExecutor {
 
        public static void schedule(ScheduleTask task, long initialDelay, long period, TimeUnit unit) {
            long delay = unit.toMillis(initialDelay);
            long period_ = unit.toMillis(period);
            // 定时执行
            new Timer("Lock-Renew-Task").schedule(task, delay, period_);
        }
    }
 
    static class ScheduleTask extends TimerTask {
 
        private final RedisDistributionLockPlus redisDistributionLock;
        private final Map<String, LockContent> lockContentMap;
 
        public ScheduleTask(RedisDistributionLockPlus redisDistributionLock, Map<String, LockContent> lockContentMap) {
            this.redisDistributionLock = redisDistributionLock;
            this.lockContentMap = lockContentMap;
        }
 
        @Override
        public void run() {
            if (lockContentMap.isEmpty()) {
                return;
            }
            Set<Map.Entry<String, LockContent>> entries = lockContentMap.entrySet();
            for (Map.Entry<String, LockContent> entry : entries) {
                String lockKey = entry.getKey();
                LockContent lockContent = entry.getValue();
                long expireTime = lockContent.getExpireTime();
                // 减少线程池中任务数量
                if ((expireTime - System.currentTimeMillis())/ 1000 < TIME_SECONDS_FIVE) {
                    //线程池异步续约
                    ThreadPool.submit(() -> {
                        boolean renew = redisDistributionLock.renew(lockKey, lockContent);
                        if (renew) {
                            long expireTimeNew = lockContent.getStartTime() + (expireTime - lockContent.getStartTime()) * 2 - TIME_SECONDS_FIVE * 1000;
                            lockContent.setExpireTime(expireTimeNew);
                        } else {
                            // 续约失败,说明已经执行完 OR redis 出现问题
                            lockContentMap.remove(lockKey);
                        }
                    });
                }
            }
        }
    }
}

Copy the code

Redis master/slave replication pit

The most common solution for Redis high availability is master-slave, which also makes redis distributed lock difficult.

In the redis cluster environment, if client A wants to lock, it will select A master node to write key mylock according to routing rules. After the lock is successfully added, the master node will asynchronously copy the key to the corresponding slave node.

If the Redis master node breaks down, a master/slave switchover is performed to ensure cluster availability, and the slave becomes redis Master. Client B successfully locks the new master node, and client A thinks it has successfully locked the new master node.

In this case, multiple clients lock a distributed lock at the same time, resulting in various dirty data.

As for the solution, there is no radical cure at present, but to ensure the stability of the machine and reduce the probability of this event.

conclusion

Above is I met when using Redis distributed lock some pit, and a little sigh with emotion, often use a method to fill in the pit, soon found another pit out again, in fact there is no perfect solution, what silver bullet, just after the weigh the pros and cons, choose a compromise that within the scope of accepting.


Small benefits:

Sorted out hundreds of various kinds of technical e-books and video materials, hush ~, free to send, public number reply [666] to receive. I set up a technology exchange group with my friends to discuss technology and share technical information, aiming to learn and progress together. If you are interested, join us!