Storm startup and operation steps

Here is a summary of the steps to create a Storm cluster:

  1. Creating a ZooKeeper cluster:
  2. Install Nimbus and Worker dependencies
  3. Download the Storm installer and unzip it to each machine in the cluster
  4. Fill in the cluster configuration information in storm. Yaml
  5. Start the process with the “Storm” script under the monitor of your choice
  6. (Optional) Setting the DRPC Server

1. Set up a ZooKeeper cluster

Storm uses ZooKeeper to coordinate clusters. Zookeeper is not used for messaging, so Storm has a low load on Zookeeper. In most cases, a single-node ZooKeeper cluster is sufficient, but if you need a more reliable failover mechanism or need to deploy a large-scale Storm cluster, you are better off deploying a ZooKeeper cluster. For details about how to configure the ZooKeeper cluster, see this section.

Some notes about ZooKeeper deployment:

  1. The Zookeeper cluster must run in monitored mode because Zookeeper is a fast failure system. If a fault occurs, Zookeeper automatically exits. For more details, please see this article.
  2. It is important to set up a Cron service to periodically compress ZooKeeper data and transaction logs. Because ZooKeeper’s background processes do not handle this problem, ZooKeeper logs can quickly fill up disk space if cron is not configured. For more details, please see this article.

Practice:

  1. Configure a single ZooKeeper service

    1. Download ZooKeeper, create a ZooKeeper folder, and decompress the file.
    Tar ZXVF - apache - they are - 3.7.0 - bin. Tar. GzCopy the code
    1. Create and configure the myID file.
    CD apache-zookeeper-3.7.0-bin mkdir data CD data touch myid Write 1 to the myid fileCopy the code
    1. The following is an example of modifying the ZooKeeper configuration file:
    cd .. /conf cp zoo_sample. CFG zoo. CFG vi zoo. CFG Modified as follows: DataDir = / data/zookeeper/apache - zookeeper 3.7.0 - bin/data server. 1 = 192.168.52.106:2888-3888 Server.2=192.168.52.100:2888:3888 server.3=192.168.52.105:2888:3888 # The port at which the clients will connect clientPort=2186Copy the code
  2. Configure the ZooKeeper cluster and copy zooKeeper to the other two servers. Note: Change the myID of zooKeeper to x corresponding to server.x.

  3. Start the ZooKeeper service.

Go to the zookeeper folder CD /data/zookeeper/apache-zookeeper-3.7.0-bin/bin/./ zkserver. sh startCopy the code
  1. validation
/ zkserver. sh status, which has the corresponding mode information (leader and follower).Copy the code

Install Nimbus and Worker dependencies

Next you need to install the required dependencies on Nimbus and Worker machines, including:

  1. Java 8+ (Apache Storm 2.x is tested through travis ci against a java 8 JDK)
  2. Python 2.7.x or Python 3.x

These are the dependent versions tested by Storm. Storm is not necessarily supported in other versions of Java and/or Python. Practice:

Mkdir -p /data/ Java Configure the Java directory tar -zxvf jdK-8U301-linux-x64.tar. gz vi /etc/profile Add export JAVA_HOME=/data/ Java /jdk1.8.0_301 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib Export PATH=${JAVA_HOME}/bin:$PATH source /etc/profile Java -version If the corresponding version is displayed, it is successful mkdir -p /data/python Configure the Python directory tar -zxvf python-2.7.16.tgz CD python-2.7.16./configure --prefix=/data/ Python /python2 make make install rm -f / usr/bin/python to delete the original link, and create a new link ln -s/data/python/python2 / bin/python/usr/bin/python corresponding version number in which successCopy the code

3. Download the Storm installer and unzip it to each machine in the cluster

Next, download a Storm version and unpack it on each machine. All versions of Storm can be downloaded here. Practice:

Mkdir -p /data/storm tar -zxvf apache-storm-2.2.0.tar.gzCopy the code

4. Configure cluster information in storm. Yaml

Storm contains the conf/storm.yaml configuration file to configure the Storm process. You can view the default values in the default configuration file. The rewritten storm.yaml will override the default defaults.yaml configuration file. In order for the cluster to work properly, the following configuration is required:

  1. Storm.zookeeper. servers: This is the address of the Zookeeper cluster associated with your storm cluster. The configuration is as follows:
Storm.zookeeper. servers: - "111.222.333.444" - "555.666.777.888"Copy the code

If the port used by your Zookeeper cluster is not the default port, you should set the storm.zookeeper.port parameter accordingly.

  1. Dir: Nimbus and Supervisor processes need a local directory to store small amounts of state information (like jars, Confs, and things like that). This directory should be created on each machine, given the appropriate permissions, and written into a configuration file, as shown in the following example:
storm.local.dir: "/mnt/storm"
Copy the code

If Storm is running on Windows, the following is an example:

storm.local.dir: "C:\storm-local"
Copy the code

If you are using a relative path, relative to the storm installation root (STORM_HONE), you can use the default $STORM_HOME/storm-local and leave it empty.

  1. Nimbus. seeds: The Worker node needs to know which machine in the cluster is the master in order to download the topology and configuration file from the master, as shown in the following example:
Nimbus. Seeds: [" 111.222.333.44 "]Copy the code

You’re encouraged to fill out the value to list of machine’s FQDN. If you want to set up Nimbus H/A, you have to address all machines’ FQDN which run nimbus. You may want to leave it to default value when you just want to set up ‘pseudo-distributed’ cluster, but you’re still encouraged to fill out FQDN.

  1. Supervisor.slots. ports: For each worker machine (Supervisor), you can use this configuration to set how many workers can run on each machine. Each worker uses one port to receive messages. This configuration item defines the list of ports that workers can use. If you define 5 ports, Storm can assign up to 5 workers on this machine. If you define 3 ports, Storm will run up to 3 workers. By default, this configuration runs four workers on ports 6700, 6701, 6702, and 6703, as shown in the following example:
supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703
Copy the code
  1. Drpc. servers: If you want to set up DRPC servers, specify their IP so that the worker can find them. This setting should be a list of DRPC servers. The following is an example:
DRPC. The servers: [" 111.222.333.44 "]Copy the code

Practice:

CD/data/storm/apache - storm - 2.2.0 / conf vi storm. The yamlCopy the code

5. Monitor the operation of the Supervisors

Storm provides a mechanism by which the administrator can configure the manager to periodically run scripts provided by the administrator to determine whether a node is healthy or not. Administrators can have the Supervisor perform any checks they need from the scripts in the storm.health.check.dir directory to determine whether the node is in a healthy state. If the script detects that the node is in an unhealthy state, it returns a non-zero value. In the previous Storm 2.x version, there was a bug where the script failed to return 0, it has been fixed. Supervisor periodically runs scripts in the health check directory and checks the output. If the output of the script contains the string ERRO as described above, the Supervisor shuts down the worker and exits. (If the script’s output contains the string ERROR, as described above, The supervisor will shut down any workers and exit.

If the Supervisor is running under the monitor, you can call /bin/storm node-health-check to determine whether the Supervisor should be started or whether the node is unhealthy.

The following is an example of the health check directory configuration:

storm.health.check.dir: "healthchecks"
Copy the code

The script must have execution permission. The following is an example of the time that the health check script is allowed to run before a failure caused by timeout:

storm.health.check.timeout.ms: 5000
Copy the code

Configure external libraries and environment variables (optional)

If you need to use some external libraries or custom plugins, you can put the jar packages in extlib/ and extlib-daemon directories. Note that the extlib-Daemon directory is only used to store jar packages required by related processes (Nimbus, Supervisor, DRPC, UI, Logviewer), such as HDFS, and custom scheduling libraries. In addition, two environment variables STORM_EXT_CLASSPATH and STORM_EXT_CLASSPATH_DAEMON can be used to configure the classpath of a normal external library and a “related processes only” external library. Use external libraries, see Classpath Handling for more details.

7. Under the monitor of your choice, start the process using the “Storm” script

The last step is to start all Storm processes. It is important to run these processes under a monitor. Storm is a fast fail system, which means Storm will stop when an unexpected error occurs. Storm is designed to ensure that the Storm process can safely stop at any time and resume normal after the process restarts. This is also why Storm does not save any state during processing — in this case, the running topology is not affected if Nimbus or Supervisor is restarted. Here is how to start the Storm process:

  1. Nimbus: Execute on master under monitorbin/storm nimbusCommand.
  2. Supervisor: Execute on each worker machine under the monitoring programbin/storm supervisorCommand. The Supervisor process is responsible for starting and stopping worker processes on the machine.
  3. UI: Under the monitoring procedure, through executionbin/storm uiThe command runs Storm UI (you can access the site from a browser, via http://{ui host}:8080, which provides diagnostics about clusters and topologies).

As you can see, starting the related process is very simple. The process also logs information to the Logs/directory of the Storm installer.

Setup DRPC servers (Optional)

Just like with nimbus or the supervisors you will need to launch the drpc server. To do this run the command bin/storm drpc on each of the machines that you configured as a part of the drpc.servers config.

DRPC Http Setup

DRPC optionally offers a REST API as well. To enable this set teh config drpc.http.port to the port you want to run on before launching the DRPC server. See the REST documentation for more information on how to use it.

It also supports SSL by setting drpc.https.port along with the keystore and optional truststore similar to how you would configure the UI.

The blog only for beginners self learning record, shallow words, if there is wrong, kindly correct.

The resources

Storm Document -> Setting up a Storm Cluster