This article takes a look at RabbitMQ, Kafka, and RocketMQ from a high availability perspective, and looks at their respective implementations.

1. RabbitMQ

RabbitMQ can be deployed in three modes:

  • Stand-alone mode
  • Common Cluster mode
  • Mirroring cluster mode

The standalone mode has nothing to do with high availability at all, so let’s go ahead and look at the two cluster modes.

1.1 Common Cluster Mode

A Queue is on a Broker in the cluster, which synchronizes metadata but not message data in the Queue.

If a Broker fails, its Queue becomes unusable. If the message is not configured for message persistence, the message is lost.

As you can see, this approach does not achieve high availability, but is more scalable. The extended Broker can accommodate more queues and improve throughput.

1.2 Mirroring Cluster Mode

The metadata and message data of a Queue in one Broker are synchronized to the other brokers, which is a full backup, hence the “mirrored mode”.

With high availability, if one Broker fails, it doesn’t matter, other brokers can be used to continue working and message data won’t be lost.

Usability is up, but scalability is not.

A Queue’s data is fully stored in the Broker, so the Queue’s message capacity and message processing capacity are limited by the Broker.

The common cluster mode is not highly available and has good scalability.

The mirrored cluster mode achieves high availability but poor scalability.

2. Kafka

Kafka divides topics (topics/queues) into partitions (partitions). Topics are just logical concepts, and partitions are the actual message storage units.

Multiple partitions of a Topic are scattered across multiple brokers, each holding a portion of the Topic’s data.

With partitions, topics are extremely scalable and can be specified as many as N partitions.

You can assign multiple “copies” to partitions, scattered across different brokers, to make them highly available.

When a Broker fails, the Partition stored in it becomes unavailable, but it doesn’t matter. You can use copies on other brokers.

Multiple copies of a Partition are divided into two roles: Leader and Follower.

The Leader is elected by Kafka and handles reads and writes of messages. After receiving new messages, the Leader synchronizes them to followers.

Followers act as candidates, and when the Leader fails, Kafka elects a new Leader from the followers.

Criteria for message writing completion can be configured:

  • Writing to the Leader can be fast, but messages may be lost, for example if the Broker fails before synchronization to the Follower.
  • The write is successful only after the Follower synchronization is complete. Messages are highly reliable but slow down the write speed.

3. RocketMQ

This is the official structure of RocketMQ, with Producer and Consumer on the left and RocketMQ in the middle, divided into two parts:

  • NameServer cluster – Holds metadata
  • Broker cluster — Holds queue data

Both parts need to be highly available.

NameServer runs independently and holds the cluster’s complete cluster metadata, such as routing information, Broker information, and data information.

To ensure high availability, you can run multiple Nameservers with complete data synchronization between them.

In this way, as long as one NameServer is available, it will not affect the normal operation of the cluster.

Broker clusters can be deployed in three ways.

  • More than the Master

Multiple brokers are deployed in the role of Master, and the data for the Topic is distributed among them.

If a single Master fails, the data in the Master cannot be used and needs to be repaired.

If you want to ensure data reliability, you can use RAID10 + Synchronous disk flushing mechanism.

  • The Master and Slave much

If a Slave is configured for the Master, the Master synchronizes data to the Slave.

When the Master fails, the Slave can be used to replace the Master. Data and services are not affected, but there is a short pause. You need to modify the configuration and restart the Slave to complete the switchover.

Data synchronization modes are as follows:

1) Asynchrony — The data is written to the Master and asynchronously synchronized to the Slave. Write speed is fast, but synchronization may be delayed, and data may be lost.

2) Synchronization — the synchronization is successful only after both Master and Slave are written. No message is lost, but the write speed is reduced.

  • Dledger Group

In Dledger mode, two slaves must be configured for the Master. The three slaves form a Dledger Group.

Dledger is also a master-slave synchronization method. The advantage of Dledger is that the Master can be automatically elected and the Slave can be switched automatically.

In the event of a Master failure, RocketMQ can select a new Master from the group for automatic switchover, further improving cluster availability.

So just to wrap it up.

Recommended reading

OAuth2 diagram

Easy to understand the core concepts of Kubernetes

Architecture technology trends that developers must understand: Service Mesh