preface

The previous article explained why message queues were introduced. Introducing MQ solves some problems for us, but it also introduces some complex questions that must be addressed in large projects and, more importantly, often asked in interviews. In fact, message queues are not 100% reliable! RabbitMQ only provides a mechanism to reduce the probability of losing messages, or to allow us to log messages when they are lost. When solving these problems, it is necessary to understand that in fact, the small business volume is not high, concurrent volume is not high in the case of these problems are almost never happen…… Even if it happens occasionally, developers can fix the data processing manually. Therefore, it is necessary to solve these problems according to the actual business scenario of the company

Message reliability

Take creating an order as an example of a possible business scenario

  • MQ hung up. Message didn’t go out. What if the downstream system of several coupons and points after the creation of an order does not perform business settlement?
  • What if MQ is highly available, the message is sent, but the coupon settlement service reports an error? Because this is asynchronous, it’s not easy to roll back
  • When the message is sent normally and received by consumers, the merchant system and coupon system have been executed normally, but the points are not settled due to the error of the points service, then the data of this order will be inconsistent

To solve this problem, we need to ensure that messages are consumed reliably, so we can analyze the steps that can cause problems with messages. RabbitMQ sends a message like this:

The message is sent by the producer to the specified switch and routed according to the routing rules to the bound queue and then pushed to the consumer. There are scenarios where the message goes wrong during this process:

  • The producer message did not reach the switch, which is equivalent to the producer losing the message
  • The switch did not route the message to the queue, equivalent to the producer losing the message
  • RabbitMQ breakdown causes queue and message loss, which is equivalent to RabbitMQ message loss
  • Consumer consumption is abnormal, the business is not implemented, equivalent to consumer lost messages

The producer lost the message

RabbitMQ provides acknowledgements and fallbacks. There is an asynchronous listener that is triggered every time a message is sent to the switch for success/failure, and a listener for failure to route from the switch to the queue. You just need to enable the two listening mechanisms. For example, SpringBoot is integrated with RabbitMQ

Introducing dependency starter

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-amqp</artifactId>
    </dependency>
Copy the code

The configuration file

Rabbitmq: publisher-returns: true publisher-confirm-type: acknowledgeCopy the code

Then write the listener callback

@Configuration @Slf4j public class RabbitMQConfig { @Autowired private RabbitTemplate rabbitTemplate; @postConstruct public void enableConfirmCallback() {// Confirm to listen when the message is sent successfully to the switch ACK = true, Not sent to the switch ack = false / / correlationData can be specified in the send message only id rabbitTemplate. SetConfirmCallback ((correlationData, ack, cause) -> { if(! Ack){// record logs, send email notifications, and resend scheduled tasks for database deletion}}); / / when there is no routing message successfully sent to the switch to the queue to trigger this listener. RabbitTemplate setReturnsCallback (returned - > {/ / log, send email notification, library task regularly scan resend}); }}Copy the code

We can send a message with the wrong name of the switch and routing key, and then call back to the listening method we just wrote. The cause will show us the reason why the message was not sent to the switch. The returned object contains information about messages.

In fact, as far as I know, some enterprises will not do the retransmission in these two monitoring, why? The cost is too high…… First of all, RabbitMQ itself is very unlikely to be lost, and second of all, there is a lot of code to develop if it needs to be taken down and scanned and resold by a scheduled task, distributed scheduled task…… Second, scheduled task scanning will definitely increase message latency and is not necessary. The real business scenario is to record a log on the line, convenient problem backtracking, incidentally send an email to the relevant personnel, if really extremely rare is the producer lost messages, then the development of data to the database on the line.

RabbitMQ lost messages. Procedure

Without persistence, all queues and messages will disappear after RabbitMQ restarts, so we can set persistence when we create queues and when we send messages (set deliveryMode to 2). In general, persistence is a must in real business.

The consumer lost the message

The so-called consumer end lost message is that the consumer end execution of business code reported an error, so the business to do is not actually done. For example, if the order is created successfully and the coupon is settled incorrectly, by default RabbitMQ will delete the message from the queue as soon as the message is pushed to the consumer, but the coupon has not been settled, thus the message is lost. This is quite common, since we developers can’t guarantee that our code is error-free, and this problem has to be fixed. Otherwise the user placed the order, the coupon is not deducted, your performance estimate this month is not……

RabbitMQ provides us with a consumer ack mechanism, which by default is automatic and will ack whenever a message is pushed to a consumer, and then RabbitMQ removes the message from the queue. RabbitMQ will not remove the message from the queue until we have called the API to manually ack it.

Start by enabling manual ACK in the configuration file

Spring: rabbitmq: listener: simple: acknowledge-mode: manualCopy the code

It then responds to the acknowledgment message manually in the consumer side code

@RabbitListener(queues = "queue") public void listen(String object, Message message, Channel channel) { long deliveryTag = message.getMessageProperties().getDeliveryTag(); Log.info (" Consumption succeeded :{}, message content :{}", deliveryTag, object); Try {/** * execute business code... * */ channel.basicAck(deliveryTag, false); } catch (IOException e) {log.error(" failed to sign ", e); try { channel.basicNack(deliveryTag, false, true); } catch (IOException exception) {log.error(" failed to reject visa ", exception); }}}Copy the code

Experience on pit

If you use the above code for production, you will crash once a consumption error occurs. Since the third parameter of the basicNack method indicates whether or not the message is requeued, if you enter false then the message is discarded. If you fill out true, when a consumer after an error, the message will be return to the top of the message queue, continue to push into the consumer side, continue to consume the message, usually an error code are not retry can solve, so this message will appear this kind of circumstance: continue to consumption, consumption continue to report errors, return to the queue, continue to be… Infinite loop

So the real scenario is usually three choices

  • When the consumption fails, the message will be stored in Redis and the consumption times will be recorded. If the consumption fails for three times, the message will be discarded and the log will be stored in the library
  • Enter false, do not return to the queue, log, send mail and wait for development manual processing
  • Do not enable manual ACK and use message retries provided by SpringBoot

Message retry provided by SpringBoot

It is not necessary to enable consumer response mode in many scenarios, because SpringBoot provides a retry mechanism to retry the execution of the consumer business method when the business method executed by the consumer reports an error.

Enable the retry mechanism provided by SpringBoot

Spring: Rabbitmq: listener: simple: retry: enabled: true max-attempts: 3 # Retry attemptsCopy the code

Consumer code

@RabbitListener(queues = "queue") public void listen(String object, Message message, Throws IOException {try {/** * Execute service code... * */ int i = 1 / 0; } catch (Exception e) {log.error(" failed to sign ", e); ** / Throw new RuntimeException(" failed to consume the message "); }}Copy the code

Be sure to throw an exception manually, because SpringBoot triggers retries based on the occurrence of an uncaught exception in the method. Note that this retry is provided by SpringBoot, re-executing the consumer method, rather than having RabbitMQ re-push the message.

Summary of message reliability

In fact, careful research down you will find that the so-called reliability of the message itself is not guaranteed…… The various reliability mechanisms are simply meant to provide a queryable log for future message loss, but at some (significant) cost they do reduce the possibility of message loss

Message sequence

Some business scenarios require sequential consumption of messages, such as updating Redis using binary logs that Canal subscribed to MySQL, and usually we send changes to the data that Canal subscribed to to the message queue.

If you do not ensure sequential consumption of RabbitMQ, dirty data can occur in Redis.

Single consumer instance

Actually queue itself is sequential, but production service instances are generally cluster, when consumer is more than one instance, a queue of messages will be distributed to all instances of consumption (the same message sent to only one consumer instance), so that it does not guarantee the order of the message consumption, because you can’t make sure which machine consumption business code execution speed

So for services that require sequential consumption, we can deploy a single consumer instance, set RabbitMQ to push one message at a time, and enable manual ACK as follows

Spring: rabbitmq: listener: simple: prefetch: 1 # Push one message at a time. Acknowledge -mode: manualCopy the code

RabbitMQ will only push one message from the queue at a time, and when it’s done we’ll ack the response and consume the next one, ensuring sequence.

Multiple consumer instances

In the case of RabbitMQ consuming multiple instances, it is very difficult to ensure that messages are ordered. The details are very large. In a word, I will not……

Message duplication consumption (idempotence)

This is also a common scenario in production operations. My blog uses RabbitMQ and it is odd that the logs often show messages being consumed twice.

There are two ways to solve the problem of repeated consumption of messages. The first way is not to let the consumer side to execute twice, and the second way is to let it repeat consumption, but it will not affect my business data.

Make sure the consumer only executes it once

Generally speaking, the repeated consumption of messages is consumed several times in a short period of time. We can use Redis to store the unique identifier of the consumed messages, and then determine whether the identifier already exists in Redis before the execution of the consumer business. For example, after the use of coupons, to notify the coupon system, increase the use of water. Here can use the order number + coupon ID as the unique identification. Before the service starts, it determines whether the redis id already exists. If yes, the id has been processed. If it does not exist, put it into Redis to set the expiration time and execute the business.

Boolean flag = stringRedisTemplate.opsForValue().setIfAbsent("orderNo+couponId"); // Check whether the message has already been consumed if (! Boolean.TRUE.equals(flag)) { return; } // Execute business... / / consumer identity stored in Redis, expired stringRedisTemplate for 10 SEC opsForValue (). The set (" orderNo + couponId ", "1", Duration, ofSeconds (10 l));Copy the code

Allow the consumer side to execute multiple times to ensure that data is not affected

  • Database unique key constraint

If the consumer side business is a new operation, we can use the database’s unique key constraint, such as the coupon number in the coupon table, if repeated consumption will insert two same coupon number records, the database will give us an error, to ensure that the database data will not insert two.

  • Database optimistic locking idea

If the consumer business is an update operation, you can add a version field to the business table, each update as the version of the condition, after the update version + 1. Since innoDB of MySQL is a row lock, when one of the requests is successfully updated, the other request can come in. Since the version number has changed to 2, the number of affected SQL statements that must be updated is 0, and the database data will not be affected.

Messages are piling up

Message backlogs occur when a large number of messages cannot be consumed in the RabbitMQ queue because the consumer can consume messages much faster than the producer can send them.

In fact, I do not know why the interview so like to ask….. Since the speed of consumers can not keep up with the producers, then improve the speed of consumers on the line! Personally, I think there are several ways of thinking

  • Appropriate traffic limiting for the producer messaging interface (not recommended because it affects user experience)
  • Deploy multiple consumer instances (recommended)
  • Increase the number of prefetches appropriately so that the consumer receives more messages at a time (recommended, can be used with the second option)

conclusion

If this post helped you, be sure to like and follow. Your support is my motivation to continue my creation!