Official account: Mufeng Technical Notes

A true master always has the heart of an apprentice

The introduction

Needless to say the concept of business, I believe that the children reading the article have a very deep understanding. We all know that MQ can achieve asynchronous and decoupling between microservices, so with the introduction of MQ, it is worth thinking about how to achieve data consistency between microservices. RocketMQ transaction messages are the solution to this problem. Transaction messages are also used to address message loss.

What are the scenarios in which message loss occurs

Before analyzing the RocketMQ transaction messages, let’s analyze the scenarios in which message loss anomalies occur in the entire message link after the introduction of messaging middleware.

When we pay the order, the shopping points in our account will be adjusted accordingly. In combination with the following simplified interaction diagram of order service, RocketMQ and Integral service, we analyze the message loss problem that may occur in the whole link.

Scenario 1:

When the order service sends the message of successful order generation to RocketMQ, it may fail to deliver the order message to RocketMQ due to network jitter, resulting in message loss.

Scenario 2:

So if the network between the order service and RocketMQ is fine and the message is received by RocketMQ normally, will there be message loss? The answer is yes, due to the persistence mechanism of RocketMQ. When a message arrives at RocketMQ, it is not immediately stored on disk, but stored in the Page cache. If there is a server power outage or outage, the message data that has not yet fallen may be lost. In addition, even if it is dropped into the disk, if there is bad disk, there will still be the possibility of message data.

Scenario 3:

If both scenarios work, the points service gets the order message. Will there still be the problem of message loss? Again, the answer is yes. Even if the points service receives the order message, when the points service automatically submits the message offset to RocketMQ, if there is an outage or the points service hangs and the increased points are not processed, then the message will be lost.

Principle of transaction message mechanism

Half the message

The so-called RocketMQ transaction mechanism is RocketMQ providing a half message mechanism. When the order service receives the order payment information, the order service sends a half message to RocketMQ, which is invisible to the consumer. The half message, as I understand it, implements half of the messaging functionality and is only visible on the production side, not the consumer side. In addition, this half message acts as a RocketMQ availability probe, and if half messages fail to be sent, no further downstream operations are required.

If the half message used to detect the availability of RocketMQ fails at this point, an exception exists between the order service and RocketMQ, and a series of rollback operations are performed on the previous order. If the half message is delivered successfully, a local transaction operation is required to update the order status.

If the local transaction fails, the order service can send a ROLLBACK request to remove the previous half message from RocketMQ without further message delivery.

Half message principle analysis

As mentioned earlier, the half message is not visible to the consumer, so how can the half message be invisible to the points service in RocketMQ?

The order service sends the half message, and instead of actually delivering the message to the topic to which the Points service subscribed, it actually delivers the message to the messeageQueue corresponding to RMQ_SYS_TRANS_HALF_TOPIC in RocketMQ. Since the points service does not subscribe to this Topic, this message is not visible to the points service.

Another OP_TOPIC is used to record the COMMIT /rollback status of the half message. The general interaction is as follows:

If the order service half message fails to be sent, due to network reasons, or RocketMQ is down, then some rollback is required to close the order. Because the order information could not be notified to the downstream service.

What if the half message has been written to RocketMQ, but the local transaction fails? That is, when the order service received the response that the half message was written successfully, an exception occurred while updating the order information and the status update could not be completed. At this point, the order service needs to send a rollback request to RocketMQ to delete the original half information. If the local transaction is successful, a COMMIT request is sent to RocketMQ, which reposts the message from RMQ_SYS_TRANS_HALF_TOPIC to the points service subscribed TOPIC. In this way, the integral service can normally consume the information for the next integral operation.

Consider the case where the order service sends a COMMIT or rollback request that is not properly delivered to RocketMQ, RocketMQ does not know whether the half message corresponds to a local transaction that executed successfully or failed. In this case, the order service needs to provide a status check interface, RocketMQ periodically checks for unprocessed half messages, and when such a check exists, RocketMQ calls the check interface to confirm the execution of the local transaction. If the execution fails, the half message is deleted, and if the execution succeeds, the message is redelivered.

conclusion

Based on the analysis above, the interaction between the order service and RocketMQ ensures that messages can be delivered reliably through the transaction messaging mechanism. At least there is no problem with message loss between the order service and RocketMQ.