MQ messaging ultimate consistency solution

With the popularity and popularity of distributed service architecture, multiple logical operations performed in a single application are now split into remote calls between multiple services. Although service into our system brings the ability of horizontal scaling, however the resulting challenge is distributed transaction problem, use their own maintenance between multiple service database, between each other is not in the same transaction, if the execution is successful, A B is defeated, and A transaction has been submitted at this time, cannot be rolled back, Then it will eventually lead to data inconsistency on both sides; Although there have been XA distributed transactions based on two-phase commit for a long time, such schemes have poor performance because they require global locking of resources. Therefore, the distributed transaction schemes of flexible transactions such as message final consistency and TCC are gradually derived. This paper mainly analyzes the final consistency scheme based on message.

Common message processing flow

The message producer sends the message
MQ receives the message, persists it, and adds a record to the store
Returns an ACK to the producer
MQ pushes the message to the corresponding consumer and waits for the consumer to return an ACK
If the message consumer successfully returns an ACK within the specified time, MQ considers the message consumption successful and deletes the message from the store, performing step 6. If MQ does not receive an ACK within the specified time, it considers the message consumption failed and attempts to push the message again, repeating steps 4, 5, and 6
MQ Delete message

Consistency issues in normal message processing

Take order creation as an example. The order system first creates an order (a local transaction) and then sends a message to downstream processing. If the order is created successfully but the message is not sent, then all downstream systems will not be aware of the event and dirty data will appear.

public void processOrder() {// Order processing (business operation) orderService.process(); // Send the order processing success message (send the message); }Copy the code

If the order message is sent first, then the order is created; It is possible that the message was sent successfully, but failed when the order was created, at which point the downstream system thought the order had been created, and the dirty data would appear.

public void processOrder() {// Send order processing success message (send message) sendBizMsg (); // Order processing (business operation) orderService.process(); }Copy the code

A wrong idea

At this point, some students may wonder whether we can put the message sending and business processing in the same local transaction for processing. If the business message sending fails, then the local transaction will be rolled back. Would this solve the problem of message sending consistency?

@Transactionnal
public void processOrder() {try{// Order processing (business operation) orderService.process(); // Send the order processing success message (send the message); }catch(Exception e){transaction rollback; }}Copy the code

Analysis of message sending exceptions

Possible situation	consistency
The order is processed successfully, then suddenly it goes down, the transaction is not committed, and the message is not sent	consistent
The order is processed successfully, the message is not sent due to network problems or MQ outage, and the transaction is rolled back	consistent
The order was processed successfully and the message was sent successfully, but MQ failed to store the message and the transaction rolled back due to other reasons	consistent
Order processing was successful, message storage was successful, but MQ processing timed out, resulting in ACK confirmation failure, causing the sender’s local transaction to roll back	Don’t agree

From the above analysis, we can see that the use of ordinary processing, in any case, can not ensure the consistency of business processing and message sending, the fundamental reason lies in: remote call, the result may be success, failure, timeout; In the case of timeout, the handler’s final result may be success or failure, and the caller has no way of knowing. The author had a similar situation in the project, the caller to write data in the local first, then launch the RPC service call, but the deal because the DB data volume is large, lead to deal with the timeout, the caller, in the wake of a timeout exception directly rolled back local transactions, which leads to the caller side data, and processing data has written over there, Service data on the two sides is inconsistent. In order to ensure the consistency of data on both sides, we have to look for a new breakthrough from other places.

Transaction message

Couldn’t solve the messages generated due to the traditional way of handling the local transaction consistency problem successfully and the message is sent successfully, so the transaction message was born, it implements the message producer local transactions and messages atomicity, ensure the message sending messages generated local transaction success and eventual consistency problems of success.

The flow of transactional message processing

Transactional messages differ from normal messages in the message production phase, where the producer first presends a message to MQ(also known as sending half messages)
After MQ receives the message, it persists it first, and a message in the waiting state is added to the store
An ACK is then returned to the message producer, at which point MQ does not trigger a message push event
After the producer presends the message successfully, the local transaction is performed
Execute the local transaction, and when it is complete, send the result to MQ
MQ deletes or updates the message status to deliverable based on the results
If the message status is updated to deliverable, MQ pushes the message to the consumer, and the subsequent messages are consumed as normal messages

Note: Since MQ generally guarantees that messages will be delivered successfully, it is possible to create duplicate message delivery problems for MQ if the business does not return ACK results in a timely manner. Therefore, for the ultimate message consistency scheme, the consumers of the message must support the consumption of the message idempotent, not causing the situation of repeated consumption of the same message.

Transaction message exception analysis

Abnormal situation	consistency	Exception handling method
The message was not stored and the business operation was not performed	consistent	There is no
storage`sent`The message succeeds, but the ACK fails. As a result, the service is not executed (MQ processing timeout or network jitter may cause).	Don’t agree	MQ validates the results of business operations and processes messages (delete messages)
storage`sent`The message succeeds, the ACK succeeds, the business executes (which may or may not succeed), but MQ does not receive the final result of the producer business processing	Don’t agree	MQ confirms the result of the service operation and processes the message (based on the result of the service processing, updates the message status, delivers the message if the service execution is successful, and deletes the message if the service fails).
The business was successfully processed and the result was sent to MQ, but MQ failed to update the message, causing the message status to remain as`sent`	Don’t agree	Same as above

MQ to support transactional messages

Currently, only RocketMQ supports transactional messages among the more mainstream MQS such as ActiveMQ, RabbitMQ, Kafka, RocketMQ, etc. According to the author understand, early ali to MQ increase business message is also because of alipay there because of business demand and generation. Therefore, if we want to rely heavily on an MQ transaction message for ultimate message consistency, at this point RocketMQ is the only technology to choose. We also analyzed the anomaly of transaction messages above, where MQ stores messages to be sent, but MQ is not aware of the end result of upstream processing. For RocketMQ, the solution is simple: its internal implementation has a scheduled task to drill the producer for messages that are waiting to be sent, and then sends a check request to the producer. The producer must implement a check listener. The listener’s job is usually to check that the corresponding local transaction is successful (typically a DB query), and if so, MQ sets the message to sendable, or deletes it otherwise.

Frequently asked Questions

Q: If sending a message fails, does the service stop?

A: Yes, for schemes based on final consistency of messages, this step is heavily relied upon, and if this step is not guaranteed, then final consistency is ultimately impossible.
Q: Why add a message sending mechanism in advance and a retry mechanism for two outgoing messages? Why not use a retry mechanism for sending failed messages after successful services?

A: If a message is sent after the service is successfully executed, the service system breaks down before the message is sent. After the system restarts, the system does not record whether a message has been sent before. As a result, the service is successfully executed but the message is not sent.
If the consumer fails, does the producer need to rollback?

A: In this case, the producer does not roll back the transaction messages if the consumer fails to consume them. The application that uses the transaction messages is pursuing high availability and ultimate consistency. If the message consumption fails, MQ itself is responsible for repushing the message until the consumption succeeds. Thus, transaction messages are addressed to the production side and to the consumer side, where consistency is achieved through MQ’s retry mechanism.
If the consumer side is rolled back due to a business exception, then the consistency between the two sides is not guaranteed.

A: A message-based ultimate consistency scheme must ensure that the consumer side can operate on the business without hinderance. It only allows system exception failures, not business failures, such as throwing an NPE on your business and causing your consumer side to fail to execute a transaction, which is very difficult to achieve consistency.

Since not all MQ supports transactional messages, what if we didn’t choose RocketMQ as the system’s MQ? The answer is yes.

Final consistency based on local messages

The core of the final consistency scheme based on local messages is to record a message data to DB during the execution of business operations, and the recording of message data and business data must be completed in the same transaction, which is the core guarantee of the premise of the scheme. After recording the post-message data, we can then rotate the messages in DB through a scheduled task for the status of the messages to be delivered to MQ. During this process, there may be a message delivery failure. At this time, the retry mechanism is used to ensure that the message status is updated or the message is cleared after receiving a successful ACK from MQ. If the subsequent message fails to be consumed, it relies on the retry of MQ itself to achieve final consistency between the two systems. Although the scheme based on local message service can achieve the final consistency of messages, it has a serious disadvantage. When each business system uses this scheme, it needs to create a message table in the corresponding business library to store messages. To solve this problem, we can separate this function into a message service for unified processing, thus deriving the solution we will discuss below.

Final consistency of standalone messaging services

Eventual consistency independent news service and local news service eventual consistency of the biggest differences is that the message is stored separately make a RPC services, this process is to simulate the transaction message message pretest to process, if pretest send messages fails, then the producers would not have to carry out business, so for the producers of the business, It is strongly dependent on the messaging service. However, the standalone messaging service supports horizontal scaling, so the reliability can be guaranteed by deploying multiple servers in HA cluster mode. In the messaging service, there is a separate timed task that periodically trains messages that have been waiting for delivery for a long time, verifies the success of the corresponding service through a check compensation mechanism, modifies the message to be deliverable if the corresponding service processing is successful, and then delivers it to MQ. If the service processing fails, update or delete the corresponding message. Therefore, when using this scheme, the message producer must also implement a check service for the message service to confirm the message. For message consumption, this scenario is the same as above, using MQ’s own retransmission mechanism to ensure that the message is consumed.

Summary: After an upstream transaction commits, rollback is not considered in an MQ-based scenario. The failure may be caused by network and service outages, as mentioned in the article, business execution is barrier-free. If the downstream service for a long time have not recovered, then you should set the alarm, here there are several mechanisms to solve the problem of some types of psoriasis, if the message is always sent upstream failure (basic does not exist unless the possibility that the code is false) this kind of situation we can set the alarm mechanism can print log when an exception occurs, for example, send SMS, email, Save abnormal orders to the database. These measures can be used for some abnormal orders downstream at the same time. At the same time, we can also create a message prompt of abnormal Topic when abnormal occurs, so that manual intervention of data correction.

Please follow this article if it helps you