RocketMQ high availability mechanism

Distributed cluster

The RocketMQ distributed cluster is highly available through the combination of Master and Slave.

The difference between Master and Slave: In the Broker configuration file, the brokerId parameter has a value of 0 indicating that the Broker is Master, a value greater than 0 indicating that the Broker is Slave, and the brokerRole parameter also indicates whether the Broker is Master or Slave.

Master brokers support read and write, while Slave brokers only support read. That is, Producer can only connect to Master brokers and write messages. The Consumer can connect to either the Master or Slave Broker to read messages.

Second, high availability

2.1 High availability of message consumption

In the Consumer configuration file, there is no need to set whether to read from Master or Slave. When the Master is unavailable or busy, the Consumer is automatically switched to read from Slave. With automatic Consumer switching, when a Master machine fails, the Consumer can still read messages from the Slave without affecting the Consumer program. This leads to high availability on the consumer side.

2.2 Message sending is highly available

When a Topic is created, Message queues for the Topic are created on multiple Broker groups (machines with the same Broker name and different Brokerids form one Broker group) so that when the Master of a Broker group becomes unavailable, The other group’s Master is still available, and the Producer can still send messages. RocketMQ does not currently support automatic conversion from Slave to Master. If you need to convert a Slave to Master due to insufficient machine resources, manually stop the Broker for the Slave role, change the configuration file, and start the Broker with the new configuration file.

2.3 Message Primary/Secondary Replication

If a Broker group has Master and Slave, messages need to be copied from the Master to the Slave, either synchronously or asynchronously.

1) Synchronous replication

In synchronous replication, the write success status is reported to the client after the Master and Slave write success.

In synchronous replication mode, if the Master fails, all the backup data is stored on the Slave, which is easy to recover. However, synchronous replication increases the data writing delay and reduces the system throughput.

2) Asynchronous replication

In asynchronous replication, the Master sends the write success status to the client as long as the write is successful.

In asynchronous replication, the system has low latency and high throughput, but if the Master fails, some data may be lost because it is not written to the Slave.

3) configuration

Synchronous and asynchronous replication is set through the brokerRole parameter in the Broker configuration file, which can be set to one of the ASYNC_MASTER, SYNC_MASTER, or SLAVE values.

4) summary

In actual application scenarios, set the disk flush mode and the primary/secondary replication mode properly, especially the SYNC_FLUSH mode. Because disk write actions are frequently triggered, performance deteriorates significantly. In general, it is a good choice to configure the Master and Slave as ASYNC_FLUSH and the Master and Slave as SYNC_MASTER replication. This ensures data loss even if one machine fails.