Moment For Technology

Distributed Message Queues: How to ensure reliable transmission of messages

Posted on Aug. 8, 2023, 2:02 p.m. by Fiona Lewis
Category: The back-end Tag: distributed


(1) The producer lost the data

When producers send data to RabbitMQ, it can be lost halfway through the process, due to network problems and so on.

To do this, you can use rabbitMQ transactions, which enable the rabbitMQ transaction (channel.txselect) before sending a message. If the message is not received successfully, the producer will receive an exception. At this point, you can roll back the transaction (channel.txrollback) and retry sending the message; If a message is received, the transaction can be committed (channel.txCommit). The problem, however, is that as soon as rabbitMQ transactions are implemented, throughput is basically reduced because it takes too much performance.

So in general, if you want to make sure that messages to and from RabbitMQ are not lost, you can enable Confirm mode. After enabling Confirm mode, you will be assigned a unique ID for each message you write to RabbitMQ. Rabbitmq will send you an ACK message saying that the message is ok. If RabbitMQ fails to process the message, it will call you back to the NACK interface telling you that the message failed to be received and you can try again. In combination with this mechanism, you can maintain the status of each message ID in memory yourself, and if you haven't received a callback for that message for a certain amount of time, you can resend it.

The big difference between transactions and Cnofirm is that transactions are synchronous, you commit a transaction and it blocks, but confirm is asynchronous, you send a message and then you send the next message, Rabbitmq then receives the message and asynchronously calls you back to an interface to inform you that the message was received.

Therefore, the confirmation mechanism is generally used to avoid data loss in the producer.

(2) RabbitMQ loses data

If rabbitMQ loses data, you must enable RabbitMQ persistence, meaning messages will be persisted to disk after being written to, even if RabbitMQ dies, and will be read automatically after recovery, data will not be lost. In the rare event that RabbitMQ dies without persisting, a small amount of data can be lost, but this is unlikely.

There are two steps to setting persistence. The first is to set the queue to persist when it is created. This ensures that rabbitMQ will persist the metadata of the queue but not the data in it. The second is to set the deliveryMode of the message to 2, which is to persist the message, and RabbitMQ will persist the message to disk. Both must be set at the same time, and rabbitMQ will restart its queue from disk even if it hangs and restarts again.

And persistence can be combined with producer confirmations to notify producers of ACK messages only after they have been persisted to disk, so even if rabbitMQ dies before persisting to disk, the data is lost and the producer can't receive an ACK, you can send it yourself.

Even if you enable persistence for RabbitMQ, it is possible that the message will be written to RabbitMQ before it is persisted to disk, and then rabbitMQ will hang up, causing a bit of memory to be lost.

(3) The consumer loses the data

If rabbitMQ loses data, mainly because the process hangs (for example, after a restart) when you consume it, rabbitMQ will assume that you have consumed it and the data is lost.

To do this, use rabbitMQ's ack mechanism. In simple terms, you can turn off rabbitMQ automatic ack, call it through an API, and ack it every time you make sure it's done in your own code. In that case, if you're not done with it, there's no ACK? Rabbitmq will assume that you have not finished processing the purchase, and will assign the purchase to another consumer so that the message is not lost.


(1) The consumer loses the data

The only way a consumer can lose data is if you consume the message, and then the consumer automatically submits the offset, making Kafka think you have consumed the message. When you are ready to process the message, you hang up before you process it, and the message is lost.

Kafka is known to automatically commit offsets, so if you turn off the automatic submission of offsets and manually commit offsets after processing, you can ensure that data is not lost. However, it is true that repeated consumption will occur at this time. For example, if you die before submitting offset after processing, you will definitely repeat consumption once, as long as you ensure idempotency.

One of the problems in production is that our Kafka consumers consume data and write it to an in-memory queue to buffer it. As a result, sometimes you just write the message to the in-memory queue, and then the consumer automatically submits the offset.

Then we restart the system, causing the queue to lose data before it has time to process it

(2) Kafka loses data

A common scenario for this is when a Broker in Kafka goes down and the Partiton leader is reelected. If some data from other followers is not synchronized at this time, the leader dies. If a follower is elected as the leader, his data will be lost. There's some data missing.

Kafka's leader machine crashed. After switching from follower to leader, we found that this data was lost

Therefore, you must set at least four parameters as follows:

Set replication. Factor to this topic: this value must be greater than 1, requiring at least 2 replicas per partition

Set the min.insync.replicas parameter on the Kafka server. The value must be greater than 1 to ensure that the leader knows that at least one follower is still in contact with him

If acks=all is set on the producer end, each piece of data must be written to all replicas before it can be considered as written successfully

Setting retries=MAX on the producer side is a request for unlimited retries if a write fails

Our production environment is configured in such a way that, at least on the Kafka broker side, data will not be lost in the event of a leader switch due to a failure of the leader broker

(3) Whether the producer will lose data

If ack=all is set, the write will not be lost. The requirement is that the write is considered successful only after the leader receives the message and all the followers have synchronized the message. If this condition is not met, the producer automatically retries an unlimited number of times.

More articles in series

Distributed message queues: How can message queues be highly available

Distributed message queues: How to ensure that messages are not re-consumed

Distributed Message Queues: How to ensure reliable transmission of messages

The last

The follow-up will continue to update distributed knowledge, we feel good can click a like in the attention, will share more articles in the future!

About (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.