1. Description of Storm nodes

1.1 Main control node and working node

Storm classifies each node in the cluster into master node and working node. There is only one master node and multiple working nodes.Copy the code

1.2 Nimbus

The master node runs the Nimbus daemon, similar to JobTracker in Hadoop, which distributes code across the cluster, assigns tasks to nodes, and monitors host failures.Copy the code

1.3 the Supervisor

Each worker node runs the Supervisor daemon, which listens for assigned host jobs on the worker node and starts and stops Nimbus assigned worker processes. The Supervisor regularly obtains topology information Topologies, task assignment information and various heartbeat information from ZooKeeper, and assigns tasks based on these information. During supervisor synchronization, the new worker will be started or the old worker will be shut down and the load will be balanced according to the new task assignment.Copy the code

1.4 the Worker

Worker is a process that specifically processes Spout/Bolt logic. According to conf.setnumworkers (3) in the submitted topology; Define the number of workers assigned to each topology. Storm allocates tasks evenly to each worker. A worker can only execute one topology, but can execute multiple task threads.Copy the code

1.5 the Task (Task)

Task refers to each Spout/Bolt thread in Worker. Each Spout and Bolt will perform many tasks in the cluster, and each task is executed by one thread. The parallelism of each Spout or Bolt can be set using the setSpout() and setBolt() methods of the TopologyBuilder class.Copy the code

1.6 Reference Materials

How does Storm ensure at least once semantics? How does Storm allocate tasks and load balancing?Copy the code

2.Storm’s fault tolerance mechanism

2.1 Worker Process Dies

When only the Worker process dies, the Supervisor on the host tries to restart the Worker process. If the restart fails for consecutive times, Nimbus restarts the Worker on other hosts when the number of failures exceeds a certain threshold. When the Supervisor dies, if the Worker on a host dies, the Worker cannot be restarted on the host because there is no Supervisor, but the Worker will be restarted on other hosts. After the Supervisor restarts, the Worker on the host will be restarted. For example, there are three workers on Node2 and three workers on Node3. When the Supervisor of Node2 dies and kills one Worker, four workers appear on Node3. After the Supervisor of node2 is restarted, node2 will restart one Worker and restore to three workers. Node3will kill the extra Worker and restore to three workers. When Nimbus dies, the Worker will continue to execute, but when a Worker dies, it will not be executed on other hosts as when a Supervisor dies. Therefore, if all workers die, the task will fail. Workers in the cluster are evenly distributed to each node. For example, when a job has three workers, two workers will be assigned to one node (such as node2) and one Worker to one node (such as node3). When another job that requires three workers is started, One Worker will be assigned on node2 and two workers on node3.Copy the code

2.2 Nimbus or Supervisor Process Dies

Nimbus and Supervisor are designed to fail quickly and be stateless. Their states are stored on ZooKeeper or disk. If these two processes die, they will not automatically restart like Worker, but cluster jobs can still run in Worker. And they reboot as if nothing had happened.Copy the code

2.3 they stop

The stopping of ZooKeeper also does not affect the running of existing jobs. In this case, a Worker will be restarted on the machine after a period of time after the Worker is killed. To sum up, only when Nimbus fails and all workers fail will the operation of jobs on the cluster be affected. In addition, the fault tolerance mechanism of Storm cluster can ensure the reliability of operation of jobs.Copy the code