1. A large number of messages have been stuck in MQ for hours

Scenario: Tens of millions of pieces of data are stuck in MQ for seven or eight hours, from late 4pm to late 10pm, late 11pm. When the line fails, it’s either a matter of fixing the consumer, getting him back up to speed, and then waiting a few hours for his purchases to finish. This is definitely not going to work. One consumer per second is 1000, three consumers per second is 3000, one minute is 180,000, more than 10 million. So if you have a backlog of millions to tens of millions of data, even if the consumer recovers, it takes about an hour to recover. Solution: Only temporary expansion can be performed to consume data at a faster rate. The specific operation steps and ideas are as follows: (1) first fix the problem of the consumer, ensure that it restores the consumption speed, and then stop the existing consumers.

② Temporarily create 10 or 20 times as many queues as before (create a new topic and partition is 10 times as many as before).

③ Write a temporary message distribution consumer program, this program is deployed to consume the backlog of messages, after consumption does not do time processing, directly uniform polling to write a temporary 10 points in the queue.

④ Ten times as many machines are then requisitioned to deploy consumers, each of which consumes a temporary queue of messages.

⑤ This is equivalent to temporarily expanding queue resources and consumer resources by 10 times, consuming messages at 10 times the normal rate.

(6) After the rapid consumption, the original deployment architecture is restored and the original consumer machine is used to consume messages again.

2. If the message expires, what should I do if it is lost

If you use RabbitMQ, RabbitMQ can set an expiration time, namely TTL, and if messages are stuck in a queue for more than a certain amount of time they will be cleared by RabbitMQ and the data will be lost. So that’s the second hole. This does not mean that the data will accumulate in MQ, but rather that the data will simply get lost. Solution: In this case, there is actually no message squeeze, but a large number of messages are lost. So the first addition of consumer definitely does not apply. This situation can be solved by adopting the “batch reguide” scheme. During periods of low traffic (such as in the dead of night), write a program that manually retrieves the missing data and sends messages back to MQ to replace the lost data.

3. Backlog messages have not been processed for a long time. What if MQ cannot keep up

What if messages are backlogged in MQ and you leave them unprocessed for a long time, causing MQ to fill up? Is there another way to do this? Solution: there is no way, it must be the first solution is too slow, this time had to use the “discard + batch reguide” way to solve.

First, write a temporary program that connects to MQ to consume data, and then discard the messages as they are received, quickly consuming the backlog of messages and reducing the pressure on MQ, and then go to the second option of manually retrieving and redirecting the missing data in the dead of night.

Previous article “How to Ensure that messages are executed in order”

Next article if you could Design AN MQ, what would you Do?