Enabling an independent Node

Cluster by reconfiguring existing RabbitMQ nodes to a cluster configuration. So the first step is to start RabbitMQ on all nodes in the normal way:

# on rabbit1
rabbitmq-server -detached
# on rabbit2
rabbitmq-server -detached
# on rabbit3
rabbitmq-server -detached
Copy the code

This will create three independent RabbitMQ agents, one on each node, as confirmed by the cluster_status command:

# on rabbit1 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit1 ... # => [{nodes,[{disc,[rabbit@rabbit1]}]},{running_nodes,[rabbit@rabbit1]}] # => ... done. # on rabbit2 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit2 ... # => [{nodes,[{disc,[rabbit@rabbit2]}]},{running_nodes,[rabbit@rabbit2]}] # => ... done. # on rabbit3 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit3 ... # => [{nodes,[{disc,[rabbit@rabbit3]}]},{running_nodes,[rabbit@rabbit3]}] # => ... done.Copy the code

Create the cluster

To link three nodes in the cluster, we tell two of them, such as rabbit@rabbit2 and rabbit@rabbit3, to join a third node, rabbit@rabbit1. Before you can do this, you must reset the two newly added members.

We first added rabbit@rabbit2 to the cluster along with rabbit@rabbit1. To do this, at rabbit@rabbit2, we stopped the RabbitMQ application, joined the rabbit@rabbit1 cluster, and then restarted the RabbitMQ application. Note that the node must be reset to join an existing cluster. Resetting a node deletes all resources and data that previously existed on the node. This means that a node cannot both be a member of the cluster and retain its existing data.

# on rabbit2 rabbitmqctl stop_app # => Stopping node rabbit@rabbit2 ... done. rabbitmqctl reset # => Resetting node rabbit@rabbit2 ... rabbitmqctl join_cluster rabbit@rabbit1 # => Clustering node rabbit@rabbit2 with [rabbit@rabbit1] ... done. rabbitmqctl start_app # => Starting node rabbit@rabbit2 ... done.Copy the code

By running the cluster_status command on any node, you can see that two nodes have been added to the cluster:

# on rabbit1 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit1 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]}, # => {running_nodes,[rabbit@rabbit2,rabbit@rabbit1]}] # => ... done. # on rabbit2 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit2 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]}, # => {running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}] # => ... done.Copy the code

Now we’ll add rabbit@rabbit3 to the same cluster. The steps are the same as above, except that this time we will cluster to Rabbit2 to prove that it doesn’t matter which node is chosen for the cluster – it is sufficient to provide an online node and that node will be clustered to the cluster.

# on rabbit3 rabbitmqctl stop_app # => Stopping node rabbit@rabbit3 ... done. # on rabbit3 rabbitmqctl reset # => Resetting node rabbit@rabbit3 ... rabbitmqctl join_cluster rabbit@rabbit2 # => Clustering node rabbit@rabbit3 with rabbit@rabbit2 ... done. rabbitmqctl start_app # => Starting node rabbit@rabbit3 ... done.Copy the code

By running the cluster_status command on any node, we can see that three nodes have joined the cluster:

# on rabbit1 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit1 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, # => {running_nodes,[rabbit@rabbit3,rabbit@rabbit2,rabbit@rabbit1]}] # => ... done. # on rabbit2 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit2 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, # => {running_nodes,[rabbit@rabbit3,rabbit@rabbit1,rabbit@rabbit2]}] # => ... done. # on rabbit3 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit3 ... # => [{nodes,[{disc,[rabbit@rabbit3,rabbit@rabbit2,rabbit@rabbit1]}]}, # => {running_nodes,[rabbit@rabbit2,rabbit@rabbit1,rabbit@rabbit3]}] # => ... done.Copy the code

By following the steps above, we can add new nodes to the cluster at any time while it is running.

Restart the node

You can stop nodes that have joined the cluster at any time. They may also fail or be terminated by the operating system. In all cases, the rest of the cluster continues, and the nodes automatically “synchronize” (synchronize) with the other cluster nodes when they start up again. Note that some partitioning processing policies may differ and affect other nodes. We shut down the nodes rabbit@rabbit1 and rabbit@rabbit3 and check the cluster status at each step:

# on rabbit1 rabbitmqctl stop # => Stopping and halting node rabbit@rabbit1 ... done. # on rabbit2 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit2 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, # => {running_nodes,[rabbit@rabbit3,rabbit@rabbit2]}] # => ... done. # on rabbit3 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit3 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, # => {running_nodes,[rabbit@rabbit2,rabbit@rabbit3]}] # => ... done. # on rabbit3 rabbitmqctl stop # => Stopping and halting node rabbit@rabbit3 ... done. # on rabbit2 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit2 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, # => {running_nodes,[rabbit@rabbit2]}] # => ... done.Copy the code

Now let’s start the node again and check the cluster status:

# on rabbit1 rabbitmq-server -detached rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit1 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, # => {running_nodes,[rabbit@rabbit2,rabbit@rabbit1]}] # => ... done. # on rabbit2 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit2 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, # => {running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}] # => ... done. # on rabbit3 rabbitmq-server -detached # on rabbit1 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit1 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, # => {running_nodes,[rabbit@rabbit2,rabbit@rabbit1,rabbit@rabbit3]}] # => ... done. # on rabbit2 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit2 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, # => {running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}] # => ... done. # on rabbit3 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit3 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, # => {running_nodes,[rabbit@rabbit2,rabbit@rabbit1,rabbit@rabbit3]}] # => ... done.Copy the code

It is important to understand what process nodes go through when they stop and restart.

Stop node Select a cluster member (only disk nodes will be considered) to synchronize with after reboot. After the restart, by default, the node attempts to contact the cluster counterpart 10 times, with each response timeout of 30 seconds. If the peer end is available within this time interval, the node starts successfully, synchronizes the information about the peer end, and continues to run. If the peer is unavailable, the restarted node is abandoned and voluntarily stopped.

When a node has no cluster node online during shutdown, it starts without trying to synchronize with any known peer. It will wait for the peer to rejoin it.

Therefore, when the entire cluster shuts down, the last node to shut down is the only one that does not have any running peers at the time of the shutdown. This node can be started without first contacting any peer nodes. Since a node will attempt to contact a known peer for up to 5 minutes (by default), the node can be restarted in any order during that time. In this case, they will successfully rejoin each other. You can adjust this time window using two configuration Settings:

# wait for 60 seconds instead of 30
mnesia_table_loading_retry_timeout = 60000

# retry 15 times instead of 10
mnesia_table_loading_retry_limit = 15
Copy the code

By adjusting these Settings and adjusting the time window in which a known peer must be returned, you can resolve a cluster-wide redeployment scenario that may take more than five minutes to complete. During an upgrade, sometimes the last node to stop must be the first node to start after the upgrade. This node will be designated to perform a cluster-wide schema migration from which other nodes can be synchronized and applied when they rejoin.

In some cases, the last offline node cannot be recovered. You can use forget_cluster_node

In addition, the force_boot rabbitmqctl command can be used on the nodes to boot them without attempting to synchronize with any of their peers as if they had been shut down for the last time. This is usually required only if the last node to be shut down or a group of nodes will never come back online.

Decomposition of the cluster

Sometimes it is necessary to remove nodes from the cluster. The operator must do this explicitly using the rabbitmqctl command. Some peer-to-peer discovery mechanisms support node health checks and forced deletion of nodes whose discovery back end is unknown. This feature is enabled (disabled by default)

We first remove rabbit@rabbit3 from the cluster to make it a standalone node. The steps are as follows:

# on rabbit3 rabbitmqctl stop_app # => Stopping node rabbit@rabbit3 ... done. rabbitmqctl reset # => Resetting node rabbit@rabbit3 ... done. rabbitmqctl start_app # => Starting node rabbit@rabbit3 ... done.Copy the code

Running the cluster_status command on the node confirms that rabbit@rabbit3 is now no longer part of the cluster and can run independently:

# on rabbit1 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit1 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]}, # => {running_nodes,[rabbit@rabbit2,rabbit@rabbit1]}] # => ... done. # on rabbit2 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit2 ... # => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]}, # => {running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}] # => ... done. # on rabbit3 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit3 ... # => [{nodes,[{disc,[rabbit@rabbit3]}]},{running_nodes,[rabbit@rabbit3]}] # => ... done.Copy the code

You can also delete nodes remotely. This is useful, for example, when you have to deal with unresponsive nodes. For example, we can delete rabbit@rabbi1 from rabbit@rabbit2.

# on rabbit1 rabbitmqctl stop_app # => Stopping node rabbit@rabbit1 ... done. # on rabbit2 rabbitmqctl forget_cluster_node rabbit@rabbit1 # => Removing node rabbit@rabbit1 from cluster ... # = >... done.Copy the code

Note that Rabbit1 still thinks it’s clustered with Rabbit2, and trying to start it will result in an error. We will need to reset it to be able to start it again.

# on rabbit1 rabbitmqctl start_app # => Starting node rabbit@rabbit1 ... # => Error: inconsistent_cluster: Node rabbit@rabbit1 thinks it's clustered with node rabbit@rabbit2, but rabbit@rabbit2 disagrees rabbitmqctl reset # => Resetting node rabbit@rabbit1 ... done. rabbitmqctl start_app # => Starting node rabbit@rabbit1 ... # = >... done.Copy the code

The cluster_status command now shows all three nodes running as independent RabbitMQ agents:

# on rabbit1 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit1 ... # => [{nodes,[{disc,[rabbit@rabbit1]}]},{running_nodes,[rabbit@rabbit1]}] # => ... done. # on rabbit2 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit2 ... # => [{nodes,[{disc,[rabbit@rabbit2]}]},{running_nodes,[rabbit@rabbit2]}] # => ... done. # on rabbit3 rabbitmqctl cluster_status # => Cluster status of node rabbit@rabbit3 ... # => [{nodes,[{disc,[rabbit@rabbit3]}]},{running_nodes,[rabbit@rabbit3]}] # => ... done.Copy the code

Note that rabbit@rabbit2 preserves the remaining state of the cluster, while rabbit@rabbit1 and rabbit@rabbit3 are the newly initialized RabbitMQ agents. If you want to reinitialize rabbit@rabbit2, follow the same steps as for any other node:

# on rabbit2 rabbitmqctl stop_app # => Stopping node rabbit@rabbit2 ... done. rabbitmqctl reset # => Resetting node rabbit@rabbit2 ... done. rabbitmqctl start_app # => Starting node rabbit@rabbit2 ... done.Copy the code

Reset the node

Sometimes it may be necessary to reset a node (erase all its data) and then rejoin the cluster. In general, there are two possible scenarios: when the node is running and when the node cannot start or respond to CLI tool commands, for example due to a problem such as ERL-430. Resetting a node deletes all of its data, cluster membership information, configured runtime parameters, users, virtual hosts, and any other node data. It also permanently removes the node from the cluster.

To reset running and responding nodes, stop RabbitMQ on them using rabbitmqctl stop_app and then reset them using rabbitmqctl reset:

# on rabbit1 rabbitmqctl stop_app # => Stopping node rabbit@rabbit1 ... done. rabbitmqctl reset # => Resetting node rabbit@rabbit1 ... done.Copy the code

Unresponsive nodes must first be stopped by any means necessary. This is already the case for nodes that cannot be started. Then overwrite the node’s data directory location or [delete] the existing data store. This will make the node start as an empty node. It must be instructed to rejoin its original cluster, if any. Nodes that have reset and rejoined their original cluster will synchronize all virtual hosts, users, permissions, and topologies (queues, switches, bindings), runtime parameters, and policies. If a managed copy is selected, it may synchronize the contents of the mirror queue. The non-mirrored queue content on the node will be lost.

There is no guarantee that the queue data directory will be restored on the reset node that has synchronized its schema from the peer to ensure that the data is available to the client, as the queue master location of the affected queue may have changed.

Single cluster

In some cases it may be useful to run clusters of RabbitMQ nodes on a single machine. This is useful for cluster testing on a desktop or laptop without having to start multiple virtual machines for the cluster. To run multiple RabbitMQ nodes on a single machine, you must ensure that these nodes have different node names, data storage locations, log file locations, and are bound to different ports, including those used by plug-ins. See RABBITMQ_NODENAME, RABBITMQ_NODE_PORT and RABBITMQ_DIST_PORT in the Configuration Guide, and RABBITMQ_MNESIA_DIR in the File and Directory Locations guide, RABBITMQ_CONFIG_FILE and RABBITMQ_LOG_BASE.

You can manually start multiple nodes on the same host by repeatedly calling rabbitmq-server (rabbitmq-server.bat on Windows). For example,

RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=hare rabbitmq-server -detached
rabbitmqctl -n hare stop_app
rabbitmqctl -n hare join_cluster rabbit@`hostname -s`
rabbitmqctl -n hare start_app
Copy the code

A two-node cluster will be set up, with both nodes serving as disk nodes. Note that if the node listens on any ports other than AMQP 0-9-1 and AMQP 1.0, those ports must also be configured to avoid conflicts. This can be done from the command line:

RABBITMQ_NODE_PORT=5672 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15672}]" RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15673}]" RABBITMQ_NODENAME=hare rabbitmq-server -detached
Copy the code