As a key component to realize the scalability and scalability of distributed system, distributed message system needs high throughput and high availability. When it comes to the design of messaging systems, two problems cannot be avoided:

  1. The order of messages
  2. Duplication of messages

RocketMQ is a high-performance, high-throughput messaging middleware from Alibaba. How does RocketMQ solve these two problems? What are the key features of RocketMQ? How does it work?

Key features and implementation principles

1. Sequential messages

Message ordering means that messages can be consumed in the order in which they are sent. For example, an order generates three messages: order creation, order payment, and order completion. When consuming, it makes sense to consume in order. At the same time, multiple orders can be consumed in parallel. Let’s start with the following example:

If a producer produces two messages, M1 and M2, what can be done to ensure the order of the two messages? Something like this might come to mind:

You might use this method to ensure message order

Suppose that M1 is sent to S1 and M2 is sent to S2. To ensure that M1 is consumed before M2, we need to notify S2 when M1 arrives at the consumer and is consumed, and S2 then sends M2 to the consumer.

The problem with this model is that if M1 and M2 are sent separately to two servers, there is no guarantee that M1 will reach the MQ cluster first or be consumed first. On the other hand, if M2 reaches the MQ cluster before M1, or even after M2 is consumed before M1 reaches the consuming end, then the messages will be out of order, indicating that the above model cannot guarantee the order of the messages. How can the order of messages be guaranteed in an MQ cluster? A simple way is to send M1 and M2 to the same Server:

Guarantee message order, your improved method

This ensures that M1 arrives at the MQServer before M2 (the producer waits for M1 to be sent before M2 is sent), and that M1 will be consumed before M2 according to the principle of first arrived, first consumed, thus ensuring the order of messages.

This model can only guarantee the order of messages in theory. In practical scenarios, the following problems may be encountered:

Network Latency

As soon as messages are sent from one server to another, there are network latency issues. As shown in the figure above, if it takes longer to send M1 than to send M2, M2 will still be consumed first and the order of the messages will still not be guaranteed. Even if M1 and M2 reach the consumer end at the same time, M2 may still be consumed before M1 because the load of consumer end 1 and consumer end 2 is not clear.

So how do you solve this problem? If M1 and M2 are sent to the same consumer and M1 is sent, M2 can be sent only after the consumer responds successfully.

If M1 is sent to consumer 1 and consumer 1 does not respond, should it continue to send M2 or resend M1? Generally, in order to ensure that the message will be consumed, it will choose to resend M1 to another consumer 2, as shown in the figure below.

Ensure proper posture of message order

In this model, the order of messages is strictly guaranteed, and if you are careful, you will still notice that there are two cases where the consumer 1 does not respond to the Server. Either the consumer has consumed M1 and has sent a response message, but the MQ Server has not received it. In the second case, resending M1 will result in repeated consumption of M1. This brings us to our second problem, message duplication, which we’ll cover in more detail later.

Looking back at message ordering, strictly ordered messages are easy to understand and can be easily handled in the manner described in this article. To sum up, a simple and feasible way to implement strict sequential messages is:

Ensure that producer-MQServer-consumer relationships are one-to-one

Although this design is simple and easy to implement, it also has some serious problems, such as:

  1. Parallelism becomes a bottleneck in the messaging system (insufficient throughput)
  2. More exception handling, for example: whenever there is a problem on the consumer side, the entire processing process will be blocked, and we have to spend more effort to solve the problem of blocking.

But our ultimate goal is high fault tolerance and high throughput for clustering. This seems like an irreconcilable contradiction, so how did Ali solve it?

The easiest way in the world to solve a computer problem: “Just” doesn’t need to solve it! – shen polling

Some problems, seemingly important, can be avoided by design or by breaking them down. Time spent trying to solve the problem itself is not only inefficient but also wasteful. From this perspective, we can draw two conclusions about the order of messages:

  1. Applications that don’t care about out-of-order actually abound
  2. An out-of-order queue does not mean that the message is out-of-order

So is a business-level approach to ensuring the order of messages, rather than just relying on messaging systems, a more sensible approach to pursuing?

Finally, we analyze RocketMQ’s implementation of sending sequential messages from a source code perspective.

RocketMQ polls all queues to determine which queue messages are sent to (load balancing policy). For example, in the following example, messages with the same order number are sent to the same queue successively:

After obtaining the routing information, a queue will be selected according to the algorithm implemented by MessageQueueSelector. The queue obtained by the same OrderId must be the same queue.

Second, message repetition

In solving the message order problem above, a new problem was introduced, namely message duplication. So how does RocketMQ solve the problem of message duplication? Or “just” does not solve.

The root cause of message duplication is that the network is unreachable. As long as data is exchanged over the network, this problem cannot be avoided. So the way to solve this problem is to get around it. The question then becomes: what if the consumer receives two identical messages?

  1. The business logic for processing messages on the consumer side remains idempotent
  2. Ensure that each message has a unique number and that successful message processing is accompanied by the de-table log

The first rule is easy to understand. As long as idempotency is maintained, no matter how many duplicate messages are sent, the result will be the same. The second principle is to use a log table to record the ID of a message that has been successfully processed. If the new message ID is already in the log table, the message is not processed.

Solution 1, which obviously should be implemented on the consumer side, is not part of the messaging system. Article 2 can be implemented either on the message system or on the business side. The probability of duplicate messages is very small under normal circumstances, and if implemented by a messaging system, it will have an impact on the throughput and high availability of the messaging system, so it is best to let the business side handle the problem itself, which is why RocketMQ does not solve the problem of duplicate messages.

RocketMQ does not guarantee message non-duplication, and if your business needs to guarantee strict message non-duplication, you need to do it yourself on the business side.