A list,

Before submitting the Storm Topology to run on a server cluster, the project needs to be packaged. This paper mainly compares and analyzes various packaging methods, and explains the matters needing attention in the packaging process. There are three main packing methods:

  • The first is to use MVN package without any plug-ins.
  • The second is to use the Maven-assembly-plugin for packaging.
  • Third: package using maven-shade-plugin.

The following are detailed explanations.

Second, the MVN package

2.1 Limitations of MVN Package

Without configuring any plug-ins in the POM, the project is packaged directly using the MVN package, which is feasible for projects that do not use external dependency packages.

However, if a third-party JAR package is used in the project, there will be a problem, because the PACKAGED JAR of THE MVN package does not contain the dependency package. If you submit it to the server to run at this time, there will be an exception that the third-party dependency cannot be found.

If you want to package this way, but use third-party jars, is there a solution? The answer is yes, as explained in the Command Line Client section of the official documentation. The main solutions are as follows.

2.2 Solutions

When submitting the Topology using the Storm JAR, you can specify third-party dependencies as follows:

  • This can be used if the third-party JAR package is locally available--jarsThe specified;
  • This can be used if third-party JAR packages are in a remote central repository--artifactsSpecifies that you can use if you want to exclude certain dependencies^Symbols. Storm will automatically download it to the central repository and cache it locally.
  • This is also required if third-party JAR packages are in other repositories--artifactRepositoriesSpecifies the warehouse address, library name, and address usage^Symbol separation.

Here is an example command that contains the above three cases:

./bin/storm jar example/storm-starter/storm-starter-topologies-*.jar \ org.apache.storm.starter.RollingTopWords blobstore-remote2 remote \ --jars ". / external/storm - redis/storm - redis - 1.1.0. Jar,. / external/storm - kafka/storm - kafka - 1.1.0. Jar "\ - artifacts "Redis. Clients: jedis: 2.9.0, org. Apache. Kafka: kafka_2. 10:0. 8.2.2 ^ org. Slf4j: slf4j - log4j12" \ - artifactRepositories "jboss-repository^http://repository.jboss.com/maven2, \ HDPRepo^http://repo.hortonworks.com/content/groups/public/"Copy the code

If your server is not connected to the Internet, or if you want to package your project directly as an ALL IN ONE JAR containing ALL dependencies, you can use the two plug-ins described below.

Maven-assembly-plugin is a maven-assembly plugin

Maven-assembly-plugin is a packaging method described in Running Topologies on a Production Cluster

If you’re using Maven, the Maven Assembly Plugin can do the packaging for you. Just add this to your pom.xml:

<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>  
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
  <mainClass>com.path.to.main.Class</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
Copy the code

Then run mvn assembly:assembly to get an appropriately packaged jar. Make sure you exclude the Storm jars since the cluster already has Storm on the classpath.

The official document mainly explains the following points:

  • Using maven-assembly-plugin, you can push all dependencies into the final JAR at once;
  • You need to exclude Storm jars that are already provided in the Storm cluster environment;
  • through <mainClass>The tag specifies the main entry class.
  • through<descriptorRef>The tag specifies the package-related configuration.

Jar-with-dependencies is a basic packaging configuration predefined by Maven. Its XML file reads as follows:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="Http://maven.apache.org/ASSEMBLY/2.0.0 http://maven.apache.org/xsd/assembly-2.0.0.xsd">
    <id>jar-with-dependencies</id>
    <formats>
        <format>jar</format>
    </formats>
    <includeBaseDirectory>false</includeBaseDirectory>
    <dependencySets>
        <dependencySet>
            <outputDirectory>/</outputDirectory>
            <useProjectArtifact>true</useProjectArtifact>
            <unpack>true</unpack>
            <scope>runtime</scope>
        </dependencySet>
    </dependencySets>
</assembly>
Copy the code

This configuration file can be extended to do more, such as exclude specified jars. The following is an example:

1. Introduce plug-ins

Introduce the plug-in in pom.xml and specify the configuration file in packaged format as assembly.xml(name is customizable) :

<build>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <descriptors>
                    <descriptor>src/main/resources/assembly.xml</descriptor>
                </descriptors>
                <archive>
                    <manifest>
                        <mainClass>com.heibaiying.wordcount.ClusterWordCountApp</mainClass>
                    </manifest>
                </archive>
            </configuration>
        </plugin>
    </plugins>
</build>
Copy the code

Assembly.xml expands from jar-with-dependencies. XML and excludes Storm jars using the

tag:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="Http://maven.apache.org/ASSEMBLY/2.0.0 http://maven.apache.org/xsd/assembly-2.0.0.xsd">
    
    <id>jar-with-dependencies</id>

    <! -- Specify packaging method -->
    <formats>
        <format>jar</format>
    </formats>

    <includeBaseDirectory>false</includeBaseDirectory>
    <dependencySets>
        <dependencySet>
            <outputDirectory>/</outputDirectory>
            <useProjectArtifact>true</useProjectArtifact>
            <unpack>true</unpack>
            <scope>runtime</scope>
            <! -- exclude storm core already provided in storm environment
            <excludes>
                <exclude>org.apache.storm:storm-core</exclude>
            </excludes>
        </dependencySet>
    </dependencySets>
</assembly>
Copy the code

You can exclude not only dependencies but also specified files in configuration files. For more configuration rules, refer to the official document Descriptor Format

2. Pack commands

To package using maven-assembly-plugin, run the following command:

# mvn assembly:assembly 
Copy the code

After packaging, two JAR packages are generated. Jar-with-dependencies are JAR packages that contain third-party dependencies. The suffix is specified by the < ID > tag in assembly. Submit the JAR to the cluster environment for immediate use.

4. Maven-shade-plugin

4.1 Description of official Documents

The third way is to use maven-shashade -plugin. The official documentation explains why maven-shashade -plugin is needed when maven-assembly-plugin is already available. Storm HDFS Integration Storm HDFS Integration

When packaging your topology, it’s important that you use the maven-shade-plugin as opposed to the maven-assembly-plugin.

The shade plugin provides facilities for merging JAR manifest entries, which the hadoop client leverages for URL scheme resolution.

If you experience errors such as the following:

java.lang.RuntimeException: Error preparing HdfsBolt: No FileSystem for scheme: hdfs
Copy the code

it’s an indication that your topology jar file isn’t packaged properly.

If you are using maven to create your topology jar, You should use the following Maven-shade-plugin configuration to create your Topology JAR.

When integrating HDFS, you must use maven-shade-plugin instead of Maven-assembly-plugin, otherwise RuntimeException will be thrown.

Maven-shade-plugin packaging has many advantages. For example, your project depends on many JARS, which in turn depend on other JARS. This way, when the project depends on different versions of jars and the JAR has the same resource file name, The Shade plug-in tries to package all the resource files together instead of overwriting them as assembly does.

4.2 configuration

The configuration example for maven-shade-plugin packaging is as follows:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <configuration>
        <createDependencyReducedPom>true</createDependencyReducedPom>
        <filters>
            <filter>
                <artifact>* : *</artifact>
                <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.sf</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.dsa</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                    <exclude>META-INF/*.rsa</exclude>
                    <exclude>META-INF/*.EC</exclude>
                    <exclude>META-INF/*.ec</exclude>
                    <exclude>META-INF/MSFTSIG.SF</exclude>
                    <exclude>META-INF/MSFTSIG.RSA</exclude>
                </excludes>
            </filter>
        </filters>
        <artifactSet>
            <excludes>
                <exclude>org.apache.storm:storm-core</exclude>
            </excludes>
        </artifactSet>
    </configuration>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <transformers>
                    <transformer
                       implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                    <transformer
                       implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                    </transformer>
                </transformers>
            </configuration>
        </execution>
    </executions>
</plugin>
Copy the code

The above configuration examples are from Storm Github. Here are some explanations:

In the above configuration, some files are excluded because some JAR packages are generated using Jarsigner generate file signature (completion verification), which is divided into two files in the meta-INF directory:

  • A signature file, with A. SF extension;
  • A signature block file, with A. DSA,.rsa, or.EC extension;

If some packages have double references, this may cause an Invalid Signature File digest for Manifest main Attributes exception during packaging, so exclude these files from the configuration.

4.3 Packaging Commands

When using maven-shade-plugin to package, the package command is the same as normal:

#mvn package
Copy the code

When packaged, two JAR packages are generated and submitted to the server cluster using jars that do not start with Original.

Five, the conclusion

Based on the detailed introduction of the above three packaging methods, here is the final conclusion: It is recommended to use maven-shad-plugin for packaging, because it is the most versatile and easiest to operate, and all examples in Storm Github are packaged in this way.

Six, packing matters needing attention

Regardless of the packaging approach, you must exclude the Storm jars already provided in the cluster environment. Typical here is storm-core, which already exists in the lib directory of the installation directory.

If you do not exclude storm-core, the following exception is usually thrown:

Caused by: java.lang.RuntimeException: java.io.IOException: Found multiple defaults.yaml resources.   
You're probably bundling the Storm jars with your topology jar.   
[jar:File: / usr/app/apache - storm - 1.2.2 / lib/storm - core - 1.2.2. Jar! /defaults.yaml,
jar:File: / usr/appjar/storm - HDFS - integration - 1.0. The jar! /defaults.yaml]
        at org.apache.storm.utils.Utils.findAndReadConfigFile(Utils.java:384)
        at org.apache.storm.utils.Utils.readDefaultConfig(Utils.java:428)
        at org.apache.storm.utils.Utils.readStormConfig(Utils.java:464)
        at org.apache.storm.utils.Utils.<clinit>(Utils.java:178)
        . 39 more
Copy the code

The resources

More information about maven-shade-Plugin configuration can be found in the Maven-shade-Plugin Getting Started Guide