background

It has been a long time since I shared any problems related to Java. Recently, I helped my colleagues to check a problem:

When using Pulsar consumption, repeated consumption of the same message occurred.

screening

I was skeptical when he told me this phenomenon. Pulsar explained it in the official document and API based on his previous experience:

Only when set to consumeackTimeoutBy default, the message is turned off, and viewing code is not turned on.

Would that be calling the negativeAcknowledge() method (which also triggers reposting), since we have a third-party library github.com/majusko/pul… This method is called only when an exception is thrown.

There is no place to throw an exception after looking at the code, or even the entire process to see an exception generated; That’s where it gets a little weird.

repetition

In order to make clear the whole thing, a detailed understanding of his use process;

In fact, there is a bug in the service. He debugs the message consumption and then steps through the debugging. After finishing the debugging, he immediately receives the same message again.

Oddly enough, not every debug can be consumed again, as we say that if a bug can be completely reproduced 100% of the time, the majority of the problem is solved.

So the first step in our investigation is to completely reproduce the problem.


In order to rule out the problem of IDEA (although it is highly unlikely), since the problem is generated during debug, it is actually sleep when converting to the code, so we plan to directly sleep for a period of time in the consumption logic to see if it can reproduce.

After the test, sleep could not be repeated for several seconds to tens of seconds, and finally sleep for one minute. Magical things happened, and it was successfully repeated every time!

It is easy to reproduce it successfully, because my own business code has also been used in Pulsar, so I plan to reproduce it again in my own project for the convenience of debugging.

Then something weird happened again, and I can’t repeat it here.

That’s what you expect, but you can’t adjust it.

In keeping with the premise of modern science, the only difference between us was the project, so I compared the codes on both sides.

    @PulsarConsumer( topic = xx, clazz = Xx.class, subscriptionType = SubscriptionType.Shared )
    public void consume(Data msg) {
        log.info("consume msg:{}", msg.getOrderId());
        Lock lock = redisLockRegistry.obtain(msg.getOrderId());
        if (lock.tryLock()) {
            try {
                orderService.do(msg.getOrderId());
            } catch (Exception e) {
                log.error("consumer msg:{} err:", msg.toString(), e);
            } finally{ lock.unlock(); }}}Copy the code

As expected, the colleague’s code was locked; A distributed lock based on Redis. When I tap my thigh, it will not be due to the timeout of the unlock.

In order to verify this problem, ON the basis of reproducibility I framePulsarThere is a break point in the consumption: As expected solved the case, abnormal prompt has been very clear: lock has passed the timeout time.

Negative message was sent directly after entering the exception, and the exception was also eaten, so it was not found before.

Refer to theRedisLockRegistryThe default timeout is exactly one minute, so before wesleepThe problem cannot be repeated in tens of seconds.

conclusion

Later, I asked my colleague why there was a lock, because I saw that there was no need to lock it. Turns out he added it because he copied code from someone else and didn’t think much about it.

So there are some lessons here:

  • CTRL C/V is convenient, but you have to think about your business scenario.
  • When using some third-party apis, you need to fully understand their functions and parameters.

Your likes and shares are the biggest support for me