background

Today under the share recently encountered a case, I believe you have made similar frequency limit, we have similar business scenario, an interface or function need to restrict user traffic over a period of time, our solution is through the Redis, on the one hand is due to Redis quite memory access performance is higher, On the other hand, the system is distributed, and Guava’s RateLimiter can be used if it is standalone or only QPS that need to limit the access of standalone.

The phenomenon of

For example, in such A scenario, interface A limits the user to invoke only three times within 30 seconds. However, A strange phenomenon occurs that the call cannot be invoked even after the time has passed. No exception is found in application logs or external dependencies.

Problem orientation

First, check to see if the app has been released recently, and if that’s the result of a new feature, which it hasn’t. Because this code has not been changed recently, and has not encountered similar problems, so I began to suspect that there is a loophole in the code logic, layer by layer through the fog, I found the most core code, pseudo-code is as follows:

Jedis redis = getRedis();
try {
    redis.set(SafeEncoder.encode(key), SafeEncoder.encode(def + ""), "nx".getBytes(),
    "ex".getBytes(), exp);
    Long count = redis.incrBy(key.getBytes(), val);
} finally {
    redis.close();
}Copy the code

Do is very simple, first set command that is to say if the key does not exist then set the value to def, and set the expiration time, then incrBy command on the val, therefore here if val passed 0 then you can get the current value, but actually there is a problem here, repetition is not very easy, but once appear, the user will not be able to call interface.

The problem

If the application calls this method, executes the set command at t1, and finds that the key exists, it does not set the expiration time, it does not set the default value, and then calls incrBy at T2, but if the key expires between T1 and T2, the key will always exist. This will lead to the problems mentioned above.

  1. When the client runs the set command, the key is not expired. Therefore, the set command does not set the value or the expiration time
  2. When the set command is executed, the key expires
  3. The client executes the incrBy command because the key in the previous step has expired, so the incrBy command increments a new key, but the key here is that no expiration time is set, which means that the key will always exist.

The solution

One solution proposed here is to first analyze what the code wants to do, pass a key and default value and an expiration time. The requirement is to increment and expire. The set command is not needed. Here is a solution:

try (Jedis redis = getRedis()) { Long count = redis.incrBy(key.getBytes(), val); if (count == val) { redis.expire(key, exp); }}Copy the code

If incrBy returns a value equal to val, then this is the first call, so you need to set the expiration time. There are two network calls involved, so you can change it to lua script so there is only one network call. If you want to optimize, you can change it to evalsha, Avoid having to pass lua scripts every time to avoid extra network overhead.

Lessons learned

Distributed, high concurrent systems is a complex field, relevant code also need better consciousness, after writing the code, we need to carefully analyze the code in the performance of each kind of case, such as one of the service timeout, this time how to deal with, is a retry or directly into the upper throw exceptions, etc., and the code will be how well under high concurrency and so on. My advice is to read a lot of good code and think about how they handle various cases, including logging, exception handling, etc. Learn more and step more to grow faster.