How to ensure the reliable transmission of messages? Resolve message loss issues

The problem of data loss can occur among producers, MQ and consumers, so RabbitMQ and Kafka can be analyzed separately.

RabbitMQ

The producer lost the data

When producers send data to RabbitMQ, it can be lost halfway through the process, due to network problems and so on.

To do this, the RabbitMQ transaction channel.txselect is enabled before the producer sends a message. If the message is not received successfully, the producer will receive an exception. At this point, you can roll back the transaction channel.txrollback and retry sending the message; If a message is received, the transaction channel.txCommit can be committed.

// Start the transaction

Channel.txselect try {catch (Exception e) {channel.txrollbackCopy the code

// Commit the transaction channel.txcommit but the problem is that with RabbitMQ transactions (synchronization) basically throughput slows down because it takes too much performance.

So in general, if you want to make sure that messages to and from RabbitMQ are not lost, you can enable Confirm mode. After enabling Confirm mode, you will be assigned a unique ID for each message you write to RabbitMQ. RabbitMQ will send you an ACK message saying that the message is ok. If RabbitMQ fails to process the message, it will call back to one of your nACK interfaces to tell you that the message failed and you can try again. In combination with this mechanism, you can maintain the status of each message ID in memory yourself, and if you haven’t received a callback for that message for a certain amount of time, you can resend it.

The big difference between a transaction and a Confirm is that a transaction is synchronous, you commit a transaction and it blocks, whereas a Confirm is asynchronous, you send a message and then you send the next message, RabbitMQ then receives the message and asynchronously calls back to one of your interfaces to inform you that the message was received.

Therefore, the confirmation mechanism is generally used to avoid data loss in the producer.

RabbitMQ has lost data

If RabbitMQ loses data, you must enable RabbitMQ persistence, meaning messages will be persisted to disk after being written to, even if RabbitMQ dies, and will be read automatically after recovery, data will not be lost. Unless very rarely, RabbitMQ dies of its own without persisting, a small amount of data can be lost, but this is unlikely.

There are two steps to setting up persistence:

This ensures RabbitMQ will persist the metadata of the queue, but it will not persist the data in the queue. The second is to persist messages by setting deliveryMode to 2, and RabbitMQ will persist messages to disk. Both must be set at the same time, and RabbitMQ will restart its queue from disk even if it hangs and restarts again.

Note that even if you enable persistence for RabbitMQ, it is possible that the message will be written to RabbitMQ before it is persisted to disk, and then RabbitMQ will hang up, causing a small loss of data in memory.

Therefore, persistence can be combined with the producer confirm mechanism to notify the producer ack only after the message has been persisted to disk, so even if RabbitMQ dies before persisting to disk, the data is lost and the producer cannot receive the ACK, you can send it yourself.

The consumer lost the data

If RabbitMQ loses data, mainly because the process hangs (for example, after a restart) when you consume it, RabbitMQ will assume that you have consumed it and the data is lost.

To do this, use RabbitMQ’s ack mechanism. In simple terms, you must turn off RabbitMQ’s automatic ack, which can be called through an API, and then ack it in your own code every time it is done. In this case, if you haven’t finished processing, there is no ACK? RabbitMQ will assume that you have not finished processing the purchase, and will assign the purchase to another consumer so that the message is not lost.

Kafka

The consumer lost the data

The only way a consumer can lose data is if you consume the message, then the consumer automatically submits the offset, making Kafka think you have consumed the message, but you are just about to process the message, and before you can process it, you hang up and the message is lost.

This is similar to RabbitMQ. Kafka is known to automatically commit offsets, so simply turn off the automatic commit and manually commit offsets after processing to ensure data is not lost. However, there may still be repeated consumption at this time. For example, if you die before submitting offset after processing, you will definitely repeat consumption once, as long as you ensure idempotency.

One of the problems in production is that our Kafka consumers consume data and write it to an in-memory queue to buffer it. As a result, sometimes you just write the message to the in-memory queue, and then the consumer automatically submits the offset. Then we restart the system, causing the queue to lose data before it has time to process it.

Kafka lost the data

A common scenario in this scenario is that a Kafka broker goes down and the partition leader is reelected. If some data from other followers is not synchronized at this time, the leader dies. If a follower is elected as the leader, some data will be lost. There’s some data missing.

Kafka’s leader machine crashed. After switching from follower to leader, we found that this data was lost.

Therefore, you must set at least four parameters as follows:

Set replication. Factor to topic: This value must be greater than 1, requiring at least 2 replicas per partition. Set the min.insync.replicas parameter on the Kafka server: This value must be greater than 1, which requires the leader to be aware that at least one follower is still in contact with him and does not fall behind. In this way, he can ensure that there is still one follower when the leader dies. If acks=all is set on the producer end, each piece of data must be written to all replicas before it can be considered as written successfully. Setting retries=MAX on the producer side is a request for unlimited retries if a write fails. Our production environment is configured in such a way that, at least on the Kafka broker side, data will not be lost in the event of a leader switch due to a failure of the leader broker.

Will the producer lose the data?

If you set acks=all, the write will not be lost. The requirement is that the leader receives the message and all the followers have synchronized the message. If this condition is not met, the producer automatically retries an unlimited number of times.

How to ensure the reliable transmission of messages? Resolve message loss issues

RabbitMQ

The producer lost the data

RabbitMQ has lost data

The consumer lost the data

Kafka

The consumer lost the data

Kafka lost the data

Will the producer lose the data?

Related Posts

I hired a month and tencent T4 bosses all learned what | learn this weekend

Easy to understand design patterns: 4. Builder Pattern

Design patterns applied in Netty