Building big data cluster based on Docker (3) Hadoop deployment

The main content

Hadoop installation

The premise

Zookeeper is working properly
JAVA_HOME environment variable

The installation package

Micro cloud download | tar packages directory

Hadoop 2.7.7

Role division

Role assignment	NN	DN	SNN
cluster-master	is	no	no
cluster-slave1	no	is	is
cluster-slave1	no	is	no
cluster-slave1	no	is	no

First, environmental preparation

Upload to docker image

Docker cp hadoop - 2.7.7. Tar. Gz cluster - master: / root/tar

Unpack the

Tar xivf hadoop-2.4.tar. gz -c /opt/hadoop

Configuration file

core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://jinbill</value>
        </property>
        <property>
                <name>ha.zookeeper.quorum</name>
                <value>cluster-master:2181</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/opt/hadoop</value>
        </property>
</configuration>Copy the code

yarn-site.xml

<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>mr_jinbill</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>cluster-slave2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>cluster-slave3</value> </property> <property> < name > yarn. The resourcemanager. Zk - address < / name > < value > 192.168.11.46:12181 < value > / < / property > < property > <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>Copy the code

hadoop-env.sh

Export JAVA_HOME = / opt/JDK/jdk1.8.0 _221Copy the code

hdfs-site.xml

<configuration> <property> <name>dfs.nameservices</name> <value>jinbill</value> </property> <property> <name>dfs.ha.namenodes.jinbill</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.jinbill.nn1</name> <value>cluster-master:8020</value> </property> <property> <name>dfs.namenode.rpc-address.jinbill.nn2</name> <value>cluster-slave1:8020</value> </property> <property> <name>dfs.namenode.http-address.shsxt.nn1</name> <value>cluster-master:50070</value> </property> <property> <name>dfs.namenode.http-address.shsxt.nn2</name> <value>cluster-slave1:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://cluster-slave1:8485; cluster-slave2:8485; cluster-slave3:8485/jinbill</value> </property> <property> <name>dfs.client.failover.proxy.provider.jinbill</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider </value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/hadoop/data</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> </configuration>Copy the code

New slaves file, if any, edit it directly

cluster-slave1
cluster-slave2
cluster-slave3Copy the code

Initialization

Start JournalNode for all nodes

hadoop-daemon.sh start journalnode

Initialize metadata on NN

hdfs namenode -forma

Copy the formatted metadata to the SNN

scp /opt/zookeeper/dfs cluster-slave1:/opt/hadoop

Start the NN of the master node

hadoop-daemon.sh start namenode

Run on SNN

hdfs namenode -bootstrapStandby

Start the SNN

hadoop-daemon.sh start namenode

Initialize ZKFC on NN or SNN

hdfs zkfc -formatZK

Stop the upper node

stop-dfs.sh

Four, start,

start-dfs.shstart-yarn.sh

Five, whether the test is successful

Because the network segment is different, so you have to add a route to access

To open CMD, administrator privileges are required
Route add 172.15.0.0 mask 255.255.0.0 192.168.11.38 -p

Accessing the UI

Hadoop cluster access addressHadoop Job Address