Love life, love coding, wechat search [Architecture Technology column] pay attention to this place like sharing.

This article has been included in the architecture and technology column, with various videos, materials and technical articles.

The background,

1. In the process of continuous iteration of each business system, JDK, SpringBoot, RocketMQ Client and other frameworks have also been upgraded. The messages sent by the RocketMQ Client of the higher version are sent to the earlier version, and the details of messages cannot be viewed at noon on the console, which makes it very difficult to troubleshoot the daily business.

2. It is difficult to achieve consistency between the original business side sending messages and the local transaction. It is very expensive to develop and protect against data loss and data inconsistency.

3. Our reliance on MQ has grown, MQ has become as important and stable as DB, and v4.x has added new features and monitoring tools that allow us to better monitor MQ usage.

4. V4.x version has been transferred from Alibaba to Apache community and maintained by him, which promotes wider use, more participants to participate in, and higher reliability and timely responsiveness.

5. The new version has better throughput and support for new technologies. Based on these factors, we are considering upgrading and retooling MQ.

6. Upgrade V3_2_6 – > V4.6.0

Second, the process

Due to service features, the RocketMQ cluster should be upgraded iteratively and continuously as follows:

Ask the architect of the upgrade to review the documentation and fix any gaps before they cause an irreversible accident

The following is the basic information used for the upgrade:

The official documentation

Rocketmq.apache.org/docs/quick-…

Rocketmq.apache.org/dowloading/…

Dledger Quick Setup Guide:

Github.com/apache/rock…

Apache RocketMQ Developer Guide:

Github.com/apache/rock…

Two architecture diagrams to read before upgrading:

1. Message storage

2, the message brush plate

Three, early preparation

1. Current environment version status

DEV: http://10.0.254.191:7080/ V3_5_8 2 m

TEST: http://10.185.240.76:8081/ V3_5_8 2 m

PRO:rocketmq.pro.siku.cn/ admin/secoo V3_2_6 2m

2. Jre environment supported by each component version

Version Client Broker NameServer
4.0.0 – incubating > = 1.7 > = 1.8 > = 1.8
4.1.0 – incubating > = > = 1.8 > 1.6 = 1.8
4.2.0 > = 1.6 > = 1.8 > = 1.8
X 4.3. > = 1.6 > = 1.8 > = 1.8
X 4.4. > = 1.6 > = 1.8 > = 1.8
X 4.5. > = 1.6 > = 1.8 > = 1.8
X 4.6. > = 1.6 > = 1.8 > = 1.8

4. Set of commands used during the upgrade

Enable nohup sh bin/mqnamesrv & nohup sh bin/mqbroker -c conf/2m-noslave/broker-b. perties & Broker write permission to disable bin/mqadmin X :10911 -n 192.168.x :9876 -k brokerPermission -v 4 Restores the write permission bin/mqadmin for this node X :10911 -n 192.168.x :9876 -k brokerPermission -v 6 Stops the bin/mqshutdown broker Bin /mqshutdown namesrv View cluster information. Clustername, BrokerId, TPS, etc./bin/mqadmin clusterList -n localhost:9876 Get all topic./bin/mqadmin topicList -n Localhost :9876 -c DevCluster > topiclist Obtain topic routing information./bin/mqadmin topicRoute -t demo-cluster -n localhost:9876 Obtain topic Offset./bin/mqadmin topicStatus -t demo-cluster-n localhost:9876 Prints information about Topic subscriptions, TPS, accumulation, and total read and write operations in 24 hours./bin/mqadmin StatsAll -n localhost:9876 Modifies broker parameters./bin/mqadmin updateBrokerConfig -n Localhost :9876 -b 10.0.xxx.2:10911 -k WaitTimeMillsInSendQueue -v 500 -c TestCluster Sends a message./bin/mqadmin sendMessage -n localhost:9876 -t lqtest -p "this is /bin/mqadmin consumeMessage -n localhost:9876 -t lqtestCopy the code

5. Collect official version features

Only indicate the characteristics of the next important here, interested can look at the rocketmq.apache.org/release_not…

4.0.0 (INCUBATING) into Apache

4.4.0 Support message trace and ACL

4.5.0 Introduction of Dledger multi-copy technology

6. New cluster 4.6.0 Cluster model selection

  1. A single Master model

    This is risky because if the Broker restarts or goes down, the entire service may become unavailable. This is not recommended for online environments, but can be used for local testing.

  2. More than the Master model

    There are no slaves in a cluster but only masters, for example, two or three masters. The advantages and disadvantages of this mode are as follows:

    Advantages: simple configuration, the maintenance of a single Master disk has no impact on applications. When the RAID10 disk is configured as RAID10, the RAID10 disk is highly reliable and does not lose messages even when the machine is down and cannot be recovered. (a small number of messages are lost when the asynchronous disk is flushed, but none is lost when the synchronous disk is flushed.)

    Disadvantages: During a single machine outage, unconsumed messages on the machine cannot be subscribed until the machine is restored, affecting the real-time performance of messages.

  3. Multi-master, multi-Slave – Asynchronous replication

    Each Master is configured with a Slave, and there are multiple pairs of master-slaves. HA adopts asynchronous replication, and the Master has a short message delay (in milliseconds). Advantages and disadvantages of this mode are as follows:

    Advantages: Even if the disk is damaged, the message loss is very small, and the real-time performance of the message is not affected. In addition, when the Master is down, consumers can still consume from the Slave. This process is transparent to the application, without manual intervention, and the performance is almost the same as that of the multi-master mode.

    Disadvantages: Master down, loss of small messages in case of disk corruption.

  4. Multi-master, multi-slave mode – Synchronous dual-write (Select this cluster mode based on the current service and concurrency)

    Each Master is configured with a Slave. There are multiple master-slave pairs. HA adopts the synchronous dual-write mode.

    Advantages: No single point of failure for data and services, no delay for messages in the case of Master outage, high service availability and data availability;

    Disadvantages: Lower performance than asynchronous replication (about 10% lower), higher RT for sending a single message, and the standby node cannot automatically switch to the host when the active node is down

Please choose to upgrade according to your own cluster characteristics and stability, not necessarily the latest cluster mode is the most suitable for you. Make smooth upgrades a priority.

7. TOPIC arrangement

You can write a script to collate the existing topic directory and collate the topic list and partitions after the upgrade is complete.

In RocketMQ, topic is designed as an organization of the same business logic message. It is only a logical concept, and a topic contains several logical queues, namely message queues. Message content is actually stored in queues, and queues are stored in the broker

Be sure to comb through specific business scenarios 1) sequential consumption 2) topic single broker configuration

In case a Topic has only one queue, messages will be lost. I just picked up a picture, so you can look at it

8. Configure the new cluster

brokerClusterName=MQCluster brokerName=broker-ali-76 brokerId=0 deleteWhen=04 fileReservedTime=360 brokerRole=ASYNC_MASTER flushDiskType=ASYNC_FLUSH storePathCommitLog=/data/rocketmq/store/commitlog storePathConsumerQueue=/data/rocketmq/store/consumequeue storePathRootDir=/data/rocketmq/store autoCreateSubscriptionGroup=true ## if msg tracing is open,the flag will be true traceTopicEnable=true listenPort=10911 NamesrvAddr = 10.48 xx. 76:9876; 10.48 xx. 77:9876Copy the code

4. Upgrade steps

When the walkthrough is done, we can get started. My final architectural mode of choice: Multi Master Slave mode – synchronous double write

Process Overview:

  1. Modify 2M-2s-sync, Runbroker, and RunServer configuration parameters

  2. 3.2.6 Nameserver original IP PORT is disabled. 4.6.0 Nameserver is replaced step by step

  3. Stop 3.2.6 Broker Start 4.6.0 broker (to see if there is a single topic problem) and replace it step by step

  4. Test cluster stability and add slaves to the new cluster. The upgrade is complete

Detailed steps:

Ready to operate

1. Download the latest 4.6.0 deployment package

CD/data/xxx_tomcat wget HTTP: / / http://mirrors.tuna.tsinghua.edu.cn/apache/rocketmq/4.6.0/rocketmq-all-4.6.0-bin-release.zip Unzip rocketmq - all - 4.6.0 - bin - releaseCopy the code

2. Modify the configuration

CD /data/xxx_tomcat/ Rocketmq-4.6.0 /conf/2m-2s-sync Change 51, 50 Machine Broker configuration change configuration 2m-2s-sync Change runbroker JVM configuration to avoid using the default configuration and running out of memoryCopy the code

3. The configurations of the two M’s are as follows:

BrokerName brokerClusterName=MQCluster brokerName=broker-60-50 brokerId=0 deleteWhen=04 fileReservedTime=48 brokerRole=ASYNC_MASTER flushDiskType=ASYNC_FLUSH storePathCommitLog=/data/alibaba-rocketmq/store/commitlog storePathConsumerQueue=/data/rocketmq/store/consumequeue storePathRootDir=/data/alibaba-rocketmq/store autoCreateSubscriptionGroup=true ## if msg tracing is open,the flag will be true traceTopicEnable=true listenPort=10911 NamesrvAddr = 192.168. XXX. Both 876; 192.168. XXX. Up to 876Copy the code

4. Replace nameserver

JPS -l sh bin/mqshutdown namesrv CD /data/xxx_tomcat/ RocketMq-4.6.0 nohup sh bin/ mqNamesrv-. /bin/mqadmin clusterList -n localhost:9876Copy the code

Replace the broker

The JPS -l sh bin/mqshutdown broker CD/data/xxx_tomcat rocketmq 4.6.0 ps - ef | grep mq/check/use nohup sh bin/mqbroker - c configuration file conf/2m-2s-sync/broker-b.properties & ./bin/mqadmin clusterList -n localhost:9876Copy the code

Finally, let’s test it by sending a message

./bin/mqadmin sendMessage -n localhost:9876 -t lqtest -p "this is test"

./bin/mqadmin consumeMessage -n localhost:9876 -t lqtest
Copy the code

Congratulation you arrive here your cluster upgrade complete!!