Reading the source code is very boring. In order to single-step debug Hadoop source code, it is best to compile a source code in the deployment environment to avoid various environmental problems.

This paper records the process of compiling Hadoop source code on the monkey’s own Mac. Combined with a previous compilation experience, it basically covers the main problems that may be encountered when compiling Hadoop source code.

As debug progresses further, it may involve modifying the source code, requiring multiple recompilations. Therefore, there is no getting around it.

Versions.

  • Source:Apache Hadoop server
  • System:MacOS 10.12.4
  • Rely on:
    • Oracle JDK 1.7.0 _79
    • Apache Maven 3.5.0
    • Libprotoc 2.5.0

compile

Core commands

mvn install -DskipTests -Pdist,native -Dtar
Copy the code

pit

A long time

Hadoop has a large amount of source code, many dependencies and a long compilation time.

Downloading jar packages and compiling protoc are two big heads. It took about an hour to compile the Protoc and more than two hours to download the JAR package and compile Hadoop. Despite this time, it takes about an hour to compile successfully.

Fortunately, IN order to see Yarn state machine compiled once in the first half of the year, although it is not completely compiled, but also downloaded most of the dependent JAR package, and compiled and installed protoc (strongly recommended to compile and install, forget what happened at that time). I just need to pick up where I left off.

However, given that most people do this multiple times the first time they compile, a single compile time doesn’t matter. Have a cup of tea and take your time.

JDK version

[ERROR] Failed to execute goal. Org. Apache maven. Plugins: maven -- the compiler plugin: 2.5.1: compile (default - the compile) on the project  hadoop-annotations: Compilation failure: Compilation failure: [ERROR] /Volumes/Extended/Users/msh/IdealProjects/Git-Study/hadoop/hadoop-common-project/hadoop-annotations/src/main/java/org/ap Ache/hadoop/classification/tools/ExcludePrivateAnnotationsJDiffDoclet Java: [20, 22] error: package com. Sun. Javadoc does not existCopy the code

I don’t know why this package doesn’t exist, it’s probably JDK version. Compiling Hadoop Annotations com.sun. Javadoc for Mac OS

validation

Look in all the pom.xml for where to set the 1.7 JDK:

find . -name pom.xml > tmp/tmp.txt

while read file
do
    cnt=0
    grep '1.7' $file -C2 | while read line; do
        if [ -n "$line" ]; then
            if [ $cnt -eq0];then
                echo "+++file: $file"
            fi
            cnt=$((cnt+1))
            echo $line
        fi
    done
    cnt=0
done < tmp/tmp.txt
Copy the code

Output:

+++file: Annotations./hadoop-common-project/ hadoop-Annotations/HM.xml </profile> <profile> < ID > JDk1.7 </ ID > <activation> -- <activation> < JDK >1.7</ JDK > </activation> <dependencies> -- -- <groupId>jdk.tools</groupId> <artifactId>jdk.tools</artifactId> The < version > 1.7 < / version > < scope > system < / scope > < systemPath >${java.home}/.. /lib/tools.jar</systemPath> +++file: ./hadoop-project/pom.xml ... (abbreviated)Copy the code

Annotations /hadoop-common-project/ hadoop-Annotations/POm.

To solve

The default JDK on monkey Mac is Oracle JDk1.8.0_102. Note It is not because the package does not exist.

You can try to change the JDK version restricted in the POM. However, to prevent problems such as using deprecated methods, we cut JDK 1.7 directly and leave the POM unchanged.

Openssl environment variable

[ERROR] Failed to execute goal. Org. Apache maven. Plugins: maven - antrun - plugin: 1.7: the run (make) on project hadoop - pipes: An Ant BuildException has occured:execreturned: 1 [ERROR] around Ant part ... <exec dir="/Volumes/Extended/Users/msh/IdealProjects/Git-Study/hadoop/hadoop-tools/hadoop-pipes/target/native" executable="cmake" failonerror="true">... @ 5:15 3in /Volumes/Extended/Users/msh/IdealProjects/Git-Study/hadoop/hadoop-tools/hadoop-pipes/target/antrun/build-main.xml
Copy the code

The ant version of jdk1.7 has been reinstalled.

Ant BuildException is also confusing. Brew version 1.7 does not parse the class file. Ant can be used normally after the brew version 1.7 is installed. Ant can be used normally after the brew version 1.7 is installed.

Build-main. XML: cmake: cmake: cmake: cmake: cmake: cmake: cmake: cmake: cmake: cmake: cmake: cmake:

cmake /Volumes/Extended/Users/msh/IdealProjects/Git-Study/hadoop/hadoop-tools/hadoop-pipes/src/ -DJVM_ARCH_DATA_MODEL=64
Copy the code

Output:

. (abbreviated) CommandLineTools/usr/bin/c + + - works - Detecting CXX compiler ABI info - Detecting CXX compiler ABI info -done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at /usr/local/ Cellar/cmake / 3.6.2 / share/cmake/Modules/FindPackageHandleStandardArgs cmake: 148 (the message) : Could NOT find OpenSSL, try toset the path to OpenSSL root folder in the
  system variable OPENSSL_ROOT_DIR (missing: OPENSSL_INCLUDE_DIR)
Call Stack (most recent call first):
  /usr/local/ Cellar/cmake / 3.6.2 / share/cmake/Modules/FindPackageHandleStandardArgs cmake: 388 (_FPHSA_FAILURE_MESSAGE)/usr /local/ Cellar/cmake / 3.6.2 / share/cmake/Modules/FindOpenSSL cmake: 380 (find_package_handle_standard_args) CMakeLists. TXT: 20 (find_package) ... (abbreviated)Copy the code

OPENSSL_ROOT_DIR and OPENSSL_INCLUDE_DIR are not set. Echo no Settings.

To solve

The Mac comes with OpenSSL, but the monkey doesn’t know which is root and which is included. In addition, MAC reportedly plans to remove the default OpenSSL. Simply reinstall yourself:

brew install openssl
Copy the code

Then configure the environment variables:

export OPENSSL_ROOT_DIR=/usr/local/ Cellar/openssl / 1.0.2 nexport OPENSSL_INCLUDE_DIR=$OPENSSL_ROOT_DIR/include
Copy the code

The Maven repository is unstable

[ERROR] Failed to execute goal on project hadoop-aws: Could not resolve dependencies forProject org, apache hadoop: hadoop - aws: jar: server: Could not transfer an artifact com. Amazonaws: aws - Java - SDK: jar: 1.7.4 the from/to central (https://repo.maven.apache.org/maven2) : GET request of: com/amazonaws/ aws-java-sdK-1.7.4 /aws- java-sdK-1.7.4. jar from central Failed: SSL peer shut down incorrectly -> [Help 1]Copy the code

“Could not resolve Dependencies”; “SSL peer shut down Incorrectly”; otherwise, try again.

other

Historical pit

There is a small hole in the last compilation, which is a historical problem in the Hadoop source code.

Compilation process will be in the $JAVA_HOME/Classes to find one that doesn’t exist under the jars Classes. The jar, need is actually $JAVA_HOME/lib/tools. The jar, it is good to add a soft chain (note that MAC and soft chain with Linux) the difference between.

So the last compilation of the monkey fixed this problem, and it won’t be repeated here. See this article about compiling Hadoop for MAC

There is no skipTests

Without skipTests, you will get at least the following errors during the tests:

[ERROR] Failed to execute goal. Org. Apache maven. Plugins: maven - surefire plugin: 2.16:test (default-test) on project hadoop-auth: There are test failures.
[ERROR]
[ERROR] Please refer to /Volumes/Extended/Users/msh/IdealProjects/Git-Study/hadoop/hadoop-common-project/hadoop-auth/target/surefire-reports for the individual test results.

Results :

Tests inerror: TestKerberosAuthenticator. TestAuthenticationHttpClientPost: 157 » ClientProtocol TestKerberosAuthenticator. TestAuthenticationHttpClientPost: 157 » ClientProtocol Tests run: 92, Failures: 0, Errors: 2, Skipped: 0Copy the code

You can ignore these errors for the time being; skipTests skip the tests and can trace the source code to understand the main process.

Start a pseudo-distributed “cluster”

After successful compilation, various distributions are generated in the target directory of the Hadoop-Dist module. Go to hadoop-2.6.0.tar.gz and find a place to unzip it.

configuration

XML, hdFS-site. XML, yarn-site. XML, and mapred-site. XML in the etc/hadoop directory on the official website.

Start the

  • Format (only the first time to format) :
bin/hdfs namenode -format
Copy the code
  • Start (how many times does the user password need to be printed when starting DFS)
# enable Namenode, SecondaryNamenode, Datanode
sbin/start-dfs.sh
# Start ResourceManager and NodeManager
sbin/start-yarn.sh
# start TimelineServer (ApplicationHistoryServer)
sbin/yarn-daemon.sh start timelineserver
Copy the code

After the startup, access the Web UI of NameNode and ResourceManager and the RESTful interface of the Timeline, create directories, upload files, and run sample MapReduce to verify whether the deployment is successful.


This article is published under the Creative Commons — Share The Same 4.0 International License. All attribution and links to this article must be reserved.