Quickly set up the Hadoop environment

For Hadoop, there are two main aspects, one is distributed file system HDFS and the other is MapReduce computing model. I will explain the process of building Hadoop environment below. Hadoop Test Environment

Total 4 test machines, 1 Namenode 3 Datanode OS version: RHEL 5.5 X86_64 Hadoop: 0.20.203.0 Jdk: Namenode 192.168.57.75 datanode1 192.168.57.76 datanode2 192.168.57.78 datanode3 192.168.57.79

Preparations before deploying Hadoop

1 Hadoop depends on Java and SSH Java 1.5.x (above) and must be installed. SSH must be installed and SSHD must always be running to manage remote Hadoop daemons using Hadoop scripts. 2 Creating a Hadoop public account All nodes must have the same user name. You can run the following command to add the user name: Useradd hadoop passwd hadoop 3 Run the following command to configure host host name: tail -n 3 /etc/hosts 192.168.57.75 Namenode 192.168.57.76 datanode1 192.168.57.78 datanode2 192.168.57.79 datanode3 4 above require all nodes (the namenode | datanode) configuration is all the same

The SSH configuration

1 Generate the configuration files of private key ID_rsa and public key id_rsa.pub

[hadoop@hadoop1 ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

d6:63:76:43:e2:5b:8e:85:ab:67:a2:7c:a6:8f:23:f9 [email protected]

2 Configuration files of private key ID_rsa and public key id_rsa.pub

[hadoop@hadoop1 ~]$ ls .ssh/

authorized_keys id_rsa id_rsa.pub known_hosts

3 Upload the public key file to the Datanode server

[hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode1

hadoop@datanode1’s password:

Now try logging into the machine, with “ssh ‘hadoop@datanode1′”, and check in:

.ssh/authorized_keys

to make sure we haven’t added extra keys that you weren’t expecting.

[hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode2

hadoop@datanode2’s password:

Now try logging into the machine, with “ssh ‘hadoop@datanode2′”, and check in:

.ssh/authorized_keys

to make sure we haven’t added extra keys that you weren’t expecting.

[hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode3

hadoop@datanode3’s password:

Now try logging into the machine, with “ssh ‘hadoop@datanode3′”, and check in:

.ssh/authorized_keys

to make sure we haven’t added extra keys that you weren’t expecting.

[hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@localhost

hadoop@localhost’s password:

Now try logging into the machine, with “ssh ‘hadoop@localhost'”, and check in:

.ssh/authorized_keys

to make sure we haven’t added extra keys that you weren’t expecting.

4 verify

[hadoop@hadoop1 ~]$ ssh datanode1

Last login: Thu Feb 2 09:01:16 2012 from 192.168.57.71

[hadoop@hadoop2 ~]$ exit

logout

[hadoop@hadoop1 ~]$ ssh datanode2

Last login: Thu Feb 2 09:01:18 2012 from 192.168.57.71

[hadoop@hadoop3 ~]$ exit

logout

[hadoop@hadoop1 ~]$ ssh datanode3

Last login: Thu Feb 2 09:01:20 2012 from 192.168.57.71

[hadoop@hadoop4 ~]$ exit

logout

[hadoop@hadoop1 ~]$ ssh localhost

Last login: Thu Feb 2 09:01:24 2012 from 192.168.57.71

[hadoop@hadoop1 ~]$ exit

logout

Configure the Java environment

1 Download the appropriate JDK

// This file is an RPM package used by 64Linux

Wget download.oracle.com/otn-pub/jav…

2 install the JDK

rpm -ivh jdk-7-linux-x64.rpm

3 validation Java

[root@hadoop1 ~]# java -version

Java version “1.7.0”

Java(TM) SE Runtime Environment (build 1.7.0-b147)

Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)

[root@hadoop1 ~]# ls /usr/java/

The default jdk1.7.0 latest

4 Configure Java environment variables

#vim /etc/profile // add the following information to profile:

#add for hadoop

Export JAVA_HOME = / usr/Java/jdk1.7.0

export CLASSPATH=.: $JAVA_HOME/jre/lib/rt.jar:$ JAVA_HOME/lib/dt.jar:$JAVA_HOME/

export PATH= $PATH:$ JAVA_HOME/bin

// Set environment variables to effect

source /etc/profile

5 Copy the /etc/profile file to the Datanode

[root@hadoop1 src]# scp /etc/profile root@datanode1:/etc/

The authenticity of host ‘datanode1 (192.168.57.86)’ can’t be established.

RSA key fingerprint is b5:00:d1:df:73:4c:94:f1:ea:1f:b5:cd:ed:3a:cc:e1.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added ‘datanode1,192.168.57.86’ (RSA) to the list of known hosts.

root@datanode1’s password:

Profile 100% 1624 1.6KB/s 00:00

[root@hadoop1 src]# scp /etc/profile root@datanode2:/etc/

The authenticity of host ‘datanode2 (192.168.57.87)’ can’t be established.

RSA key fingerprint is 57:cf:96:15:78:a3:94:93:30:16:8e:66:47:cd:f9:cd.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added ‘datanode2,192.168.57.87’ (RSA) to the list of known hosts.

root@datanode2’s password:

Profile 100% 1624 1.6KB/s 00:00

[root@hadoop1 src]# scp /etc/profile root@datanode3:/etc/

The authenticity of host ‘datanode3 (192.168.57.88)’ can’t be established.

RSA key fingerprint is 31:73:e8:3c:20:0c:1e:b2:59:5c:d1:01:4b:26:41:70.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added ‘datanode3,192.168.57.88’ (RSA) to the list of known hosts.

root@datanode3’s password:

Profile 100% 1624 1.6KB/s 00:00

6 Copy the JDK installation package and install the JDK package on each Datanode

[root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode1:/home/hadoop/

hadoop@datanode1’s password:

Hadoop-0.20.203.0.rc1.tar.gz 100% 58MB 57.8MB/s 00:01

Jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01

[root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode2:/home/hadoop/

hadoop@datanode2’s password:

Hadoop-0.20.203.0.rc1.tar.gz 100% 58MB 57.8MB/s 00:01

Jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01

[root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode3:/home/hadoop/

hadoop@datanode3’s password:

Hadoop-0.20.203.0.rc1.tar.gz 100% 58MB 57.8MB/s 00:01

Jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01

Hadoop Configuration // Perform operations as a Hadoop user

1 Configuration Directory

[hadoop@hadoop1 ~]$ pwd

/home/hadoop

[hadoop@hadoop1 ~]$ ll

total 59220

LRWXRWXRWX 1 hadoop Hadoop 17 Feb 1 16:59 Hadoop -> hadoop-0.20.203.0

Drwxr-xr-x 12 Hadoop Hadoop 4096 Feb 1 17:31 Hadoop-0.20.203.0

-rw-r–r– 1 hadoop hadoop 60569605 Feb 1 14:24 hadoop-0.20.203.0rc1.tar.gz

2 Run the hadoop-env.sh command to specify the Java location

vim hadoop/conf/hadoop-env.sh

Export JAVA_HOME = / usr/Java/jdk1.7.0

3 Configure core-site. XML // Locate the namenode of the file system

[hadoop@hadoop1 ~]$ cat hadoop/conf/core-site.xml

fs.default.name

hdfs://namenode:9000

4 Configure mapred-site. XML // Locate the active jobtracker node

[hadoop@hadoop1 ~]$ cat hadoop/conf/mapred-site.xml

mapred.job.tracker

namenode:9001

5 Configure hdFS-site. XML // Configure the number of HDFS copies

[hadoop@hadoop1 ~]$ cat hadoop/conf/hdfs-site.xml

dfs.replication

6 Configure the master and slave configuration documents

[hadoop@hadoop1 ~]$ cat hadoop/conf/masters

namenode

[hadoop@hadoop1 ~]$ cat hadoop/conf/slaves

datanode1

datanode2

7 Copying the Hadoop Directory to All Nodes (Datanode)

[hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode1:/home/hadoop/

[hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode2:/home/hadoop/

[hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode3:/home/hadoop

8 Formatting HDFS

[hadoop@hadoop1 hadoop]$ bin/hadoop namenode -format

12/02/02 11:31:15 INFO namenode.NameNode: STARTUP_MSG:

/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = hadoop1.test.com/127.0.0.1

STARTUP_MSG: args = [-format]

STARTUP_MSG: version = 0.20.203.0

STARTUP_MSG: build = svn.apache.org/repos/asf/h…

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N) Y // Enter Y here

12/02/02 11:31:17 INFO util.GSet: VM type = 64-bit

12/02/02 11:31:17 INFO util.GSet: 2% max memory = 19.33375 MB

12/02/02 11:31:17 INFO util.GSet: capacity = 2^21 = 2097152 entries

12/02/02 11:31:17 INFO util.GSet: recommended=2097152, actual=2097152

12/02/02 11:31:17 INFO namenode.FSNamesystem: fsOwner=hadoop

12/02/02 11:31:18 INFO namenode.FSNamesystem: supergroupsupergroup=supergroup

12/02/02 11:31:18 INFO namenode.FSNamesystem: isPermissionEnabled=true

12/02/02 11:31:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

12/02/02 11:31:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

12/02/02 11:31:18 INFO namenode.NameNode: Caching file names occuring more than 10 times

12/02/02 11:31:18 INFO common.Storage: Image file of size 112 saved in 0 seconds.

12/02/02 11:31:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.

12/02/02 11:31:18 INFO namenode.NameNode: SHUTDOWN_MSG:

/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

SHUTDOWN_MSG: Shutting down the NameNode at hadoop1.test.com/127.0.0.1

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

[hadoop@hadoop1 hadoop]$

9 Start the Hadoop daemon process

[hadoop@hadoop1 hadoop]$ bin/start-all.sh

starting namenode, logging to /home/hadoop/hadoop/bin/.. /logs/hadoop-hadoop-namenode-hadoop1.test.com.out

datanode1: starting datanode, logging to /home/hadoop/hadoop/bin/.. /logs/hadoop-hadoop-datanode-hadoop2.test.com.out

datanode2: starting datanode, logging to /home/hadoop/hadoop/bin/.. /logs/hadoop-hadoop-datanode-hadoop3.test.com.out

datanode3: starting datanode, logging to /home/hadoop/hadoop/bin/.. /logs/hadoop-hadoop-datanode-hadoop4.test.com.out

starting jobtracker, logging to /home/hadoop/hadoop/bin/.. /logs/hadoop-hadoop-jobtracker-hadoop1.test.com.out

datanode1: starting tasktracker, logging to /home/hadoop/hadoop/bin/.. /logs/hadoop-hadoop-tasktracker-hadoop2.test.com.out

datanode2: starting tasktracker, logging to /home/hadoop/hadoop/bin/.. /logs/hadoop-hadoop-tasktracker-hadoop3.test.com.out

datanode3: starting tasktracker, logging to /home/hadoop/hadoop/bin/.. /logs/hadoop-hadoop-tasktracker-hadoop4.test.com.out

10 validation

//namenode

[hadoop@hadoop1 logs]$ jps

2883 JobTracker

3002 Jps

2769 NameNode

//datanode

[hadoop@hadoop2 ~]$ jps

2743 TaskTracker

2670 DataNode

2857 Jps

[hadoop@hadoop3 ~]$ jps

2742 TaskTracker

2856 Jps

2669 DataNode

[hadoop@hadoop4 ~]$ jps

2742 TaskTracker

2852 Jps

2659 DataNode

Hadoop monitors Web pages

http://192.168.57.75:50070/dfshealth.jsp

5 Simply Verifying HDFS

The hadoop file command format is as follows:

hadoop fs -cmd

// Create a directory

[hadoop@hadoop1 hadoop]$ bin/hadoop fs -mkdir /test-hadoop

// Check the directory

[hadoop@hadoop1 hadoop]$ bin/hadoop fs -ls /

Found 2 items

drwxr-xr-x – hadoop supergroup 0 2012-02-02 13:32 /test-hadoop

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp

// View directories including subdirectories

[hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /

drwxr-xr-x – hadoop supergroup 0 2012-02-02 13:32 /test-hadoop

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred

drwx—— – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system

-rw——- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info

// Add a file

[hadoop@hadoop1 hadoop]$bin/hadoop fs -put /home/hadoop/hadoop-0.20.203.rc1.tar. gz /test-hadoop

[hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /

drwxr-xr-x – hadoop supergroup 0 2012-02-02 13:34 /test-hadoop

-rw-r–r– 2 Hadoop supergroup 60569605 2012-02-02 13:34 /test-hadoop/ hadoop-0.20.203.rp1.tar. gz

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred

drwx—— – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system

-rw——- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info

// Get the file

[hadoop@hadoop1 hadoop]$bin/hadoop fs-get /test-hadoop/ hadoop-0.20.203.rc1.tar. gz/TMP /

[hadoop@hadoop1 hadoop]$ ls /tmp/*.tar.gz

/ TMP / 1. Tar. Gz/TMP/hadoop – 0.20.203.0 rc1. Tar. Gz

// Delete files

[hadoop@hadoop1 hadoop]$bin/hadoop fs-rm /test-hadoop/ hadoop-0.20.203.rp1.tar.gz

Does HDFS: / / the namenode: 9000 / test – hadoop/hadoop – 0.20.203.0 rc1. Tar. Gz

[hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /

drwxr-xr-x – hadoop supergroup 0 2012-02-02 13:57 /test-hadoop

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred

drwx—— – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system

-rw——- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info

drwxr-xr-x – hadoop supergroup 0 2012-02-02 13:36 /user

-rw-r–r– 2 hadoop supergroup 321 2012-02-02 13:36 /user/hadoop

// Delete the directory

[hadoop@hadoop1 hadoop]$ bin/hadoop fs -rmr /test-hadoop

Deleted hdfs://namenode:9000/test-hadoop

[hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop

drwxr-xr-x – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred

drwx—— – hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system

-rw——- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info

drwxr-xr-x – hadoop supergroup 0 2012-02-02 13:36 /user

-rw-r–r– 2 hadoop supergroup 321 2012-02-02 13:36 /user/hadoop

// Hadoop FS help (part)

[hadoop@hadoop1 hadoop]$ bin/hadoop fs -help

hadoop fs is the command to execute fs commands. The full syntax is:

hadoop fs [-fs <local | file system URI>] [-conf <configuration file>]

[-D <propertyproperty=value>] [-ls ] [-lsr ] [-du ]

[-dus ] [-mv ] [-cp ] [-rm [-skipTrash] ]

[-rmr [-skipTrash] ] [-put … ] [-copyFromLocal … ]

[-moveFromLocal … ] [-get [-ignoreCrc] [-crc]

[-getmerge [addnl]] [-cat ]

[-copyToLocal [-ignoreCrc] [-crc] ] [-moveToLocal ]

[-mkdir ] [-report] [-setrep [-R] [-w] <path/file>]

[-touchz ] [-test -[ezd] ] [-stat [format] ]

[-tail [-f] ] [-text ]

[-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]

[-chown [-R] [OWNER][:[GROUP]] PATH…]

[-chgrp [-R] GROUP PATH…]

[-count[-q] ]

[-help [cmd]]

The procedure for setting up a Hadoop environment is tedious and requires certain knowledge of the Linux system. It is important to note that the Hadoop environment set up through the above steps can only give you a general understanding of Hadoop. If you want to use HDFS for online services, you need to further configure the Hadoop configuration document. The following documents will continue to be published in the form of blog posts, stay tuned.

Quickly set up the Hadoop environment

Related Posts

Time saving and efficient tools! 25 can ‘t-miss design tools collection

A drop in the ocean, front-end program yuan year-end summary

Chapter 15 Obeisance to teachers