The introduction

Hello, everyone, I am South orange, from contact with Java to now also have almost two years, two years, from a Java have several data structures do not understand super small white, to now understand a little bit of advanced small white, learned a lot of things. The more knowledge is shared, the more valuable, I this period of time summary (including from other big guy over there to learn, quote) some of the focus in the ordinary study and interview (self think), hope to bring some help to everyone

In the previous chapter, we discussed flexible transaction solutions in distributed systems and introduced 2PC, 3PC, and TCC solutions. This time, I introduce a reliable message ultimate consistency solution for implementing distributed transactions.

  • This article refers to the architectural notes of hufir

1, reliable message ultimate consistency transaction

The reliable message final consistency scheme means that when a transaction initiates a fully local transaction and sends a message, the transaction participant (message consumer) must be able to receive the message and process the transaction successfully. This scheme emphasizes that the final transaction should be consistent as long as the message is sent to the transaction participant.

2. Implement the ultimate consistency scheme for reliable messages

Since RocketMQ supports distributed messaging, this scheme is implemented with RocketMQ. In fact, what message middleware is used is only a business option, the important is the principle.

At the request of one of my classmates, I drew a picture of myself.

  • Producer production message
  • Message Acknowledgement The server acknowledges the message status
  • RocketMQ delivers the message
  • Consumer Spending News


  • 1. The producer sends a message to the message acknowledgement server, which stores the message and changes the state of the message to be acknowledged.
  • 2. The producer executes the local database after sending the message. If the execution succeeds, the message is confirmed; if the execution fails, the message is deleted.
  • 3. If it is a confirmation message, the acknowledgement server updates the message status in the database to “sent” and sends the message to MQ at the same time.

If the database fails to update the status of the message, then throw an exception and exit, do not post to MQ.

If the post MQ fails and an error is reported, an exception is thrown to roll back the local database transaction.

The two operations have to work together or fail together.

  • Consumers are waiting to consume messages from MQ and, if consumed, operate on their local databases
  • 5. If the operation is successful, it will in turn notify the message acknowledgement server that it has processed successfully, and then the message acknowledgement server will set the status of the message to “completed”.

Is the process simple? But there is still a lot of work to be done to ensure the ultimate consistency and reliability of the messages.

3. How to ensure reliable delivery of messages by producer services

To ensure reliable delivery of producer messages, the producer needs to store the messages in its own database and invoke the interface of the reliable message service based on its execution results.

If the local database operation performed successfully, the reliable messaging service is asked to confirm that message. If the local database operation fails, the reliable messaging service is asked to delete the message.

At the same time, in the message confirmation server to develop a background scheduled running thread, non-stop polling to check the status of each message.

If there is a problem when the upstream service notifies the reliable messaging service of a confirmation message or a deletion message after the upstream service has operated the local database. In this case, the message in the message server is always in the “to be confirmed” state, so something is wrong with the message and an interface provided by the producer can be called back to confirm the operational status of the producer database.

If the upstream server executes successfully, the Reliable Messaging service changes the message status to “sent” and delivers the message to MQ.

If the upstream service fails, the reliable Message Service simply deletes the messages from the database.

With this mechanism, it is guaranteed that the reliable messaging service will attempt to complete the delivery of messages to MQ.

4. How to ensure reliable reception of messages by consumer servers

To ensure reliable acceptance of consumer messages, it is also necessary to develop a thread that runs regularly in the background in the message acknowledgement server, through which the status of each message is constantly checked by polling.

If the message status remains “sent” and never becomes “done,” then the consumer service has not been successfully processed, at which point the confirmation server can try again to repost the message to MQ for the consumer service to process again.

Of course, in order to ensure the consistency of data and avoid repeated consumption, the realization of idempotence is essential.

5. Ensure that the messaging mechanism is highly available

1. High availability of RabbitMQ

As mentioned in our previous article,RibbitMQ can be clustered to achieve high availability, but there are different advantages and disadvantages to building different clusters, so I won’t repeat the words here.

Message Queues — Principles of RabbitMQ (2)

High availability of RocketMQ

RocketMQ is a messaging middleware that was born in the era of high concurrency distribution, so it is inherently high concurrency and transaction support. Meanwhile, Name Server is stateless, linearly scalable, and naturally supports high availability.

  • 1. Multi-master mode

In a cluster consisting of multiple Master nodes, even if one node goes down, there is no impact on the entire cluster.

Disadvantages: When a single Master node is down, messages that are not consumed are not available until the node is recovered, which affects the real-time performance of the messages. Of course, using synchronization allows messages to be synchronized across nodes, but with a significant reduction in performance and space utilization.

  • 2. Multiple Master and Slave asynchronous replication mode

On a multi-master basis, each node has at least one Slave. The Master node can read and write, but the Slave node can only read and write. This situation is similar to the Master/Slave mode of MySQL.

Advantage: When a Master node is down, the Slave node can still read messages.

Disadvantages: The synchronous mode of asynchronous replication may cause message loss.

  • 3. Multiple Master and Slave synchronous dual-write mode

This mode is similar to the multi-master and multi-slave asynchronous replication mode, except that data between masters and slaves is transmitted synchronously.

Advantages: No single point of data or service, no message latency in case of a Master failure, high service availability and data availability.

Disadvantages: The performance is about 10% lower than that of the asynchronous replication mode. The RT that sends a single message is slightly higher. After the active server breaks down, the standby server cannot be automatically switched over to the host.

Three, Kafka high availability

Kafka itself is made up of multiple brokers, each of which is a node; Create a topic that can be divided into multiple partitions. Each partition can reside on a different broker and each partition can hold data.

So Kafka is a distributed message queue, where the data for a topic is distributed across multiple machines, with each machine holding a portion of the data.

(Find a picture to show it)

After Kafka 0.8, the HA mechanism is provided, namely, the Replica replica mechanism. Kafka evenly distributes all replicas of a partition on different machines to improve fault tolerance. If a broker is down and a partition has a leader on it, Kafka will automatically elect a new leader and continue reading and writing to the new leader.

Data writing: The producer acts as the leader and writes data to the local disk. Then other followers actively pull data from the leader. Once all the followers have synchronized data, an ACK is sent to the leader. After receiving the ACK from all the followers, the leader returns a write success message to the producer.

Consumption time: Only read from the leader, but only if a message has been synchronized by all followers and successfully returned as an ACK, that is, the message is read on both the leader and all followers.


This article is basically a summary of distributed transactions. I think if you want to learn more about it, you can draw the architecture diagram like I did, and then try to build a simple service yourself. This effect is really good, although in the actual work most of the company has set up their own framework, but always want to update the ~ one day will we also take charge of one side?

At the same time, if you need a mind map, you can contact me, after all, the more knowledge is shared, the more fragrant!