Hadoop pseudo-distributed system construction

  • Centos install docker
  • Configure a pseudo-distributed system

Centos install docker

Update yum packages to the latest

[root@ecs-kc1-large-2-linux-20200802175157 ~]# yum update
Copy the code

Uninstall the old version (if one has been installed)

[root@ecs-kc1-large-2-linux-20200802175157 ~]# yum remove docker  docker-common docker-selinux docker-engine
Copy the code

Install the required package, yum-util provides yum-config-manager functionality, the other two are dependent on the Devicemapper driver

[root@ecs-kc1-large-2-linux-20200802175157 ~]# yum install -y yum-utils device-mapper-persistent-data lvm2
Copy the code

Set the yum source

[root@ecs-kc1-large-2-linux-20200802175157 ~]# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
Copy the code

Install the Docker,

[root@ecs-kc1-large-2-linux-20200802175157 ~]# yum install docker-ce 
Copy the code

Add boot startup,

[root@ecs-kc1-large-2-linux-20200802175157 ~]# systemctl start docker
[root@ecs-kc1-large-2-linux-20200802175157 ~]# systemctl enable docker
Copy the code

Verify whether the installation is successful (client and Service indicate that the installation and startup of Docker are successful)

Configure a pseudo-distributed system

Preparations Pull the centos7 image

[root@ecs-kc1-large2 --linux- 20200802175157. yum.repos.d]# docker pull centos:7
Copy the code

Because I built the distributed system using Docker on the server of Huawei cloud host, I encountered a small pit, where THE JDK needs to use ARM64 framework

The following figure



1. After downloading the JDK, upload it to a directory on the host through FXTP



Then use the following code to upload the JDK installation file to a centos container that docker started.

[root@ecs-kc1-large2 --linux- 20200802175157. centos-ssh-root-jdk-hadoop]# docker cp jdk-8u261-linux-arm64-vfp-hflt.tar.gz 1b:/opt/
Copy the code

Enter the centos container

docker exec -it 1b /bin/bash
Copy the code

Go to the directory where the JDK file is stored and decompress it

Tar -zxvf JDK nameCopy the code

Configure environment variables

vi /etc/profile
Copy the code
Add the following information at the end of the file:export JAVA_HOME=/opt/java/jdk18.. 0_261# write the JDK path hereexport CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    export PATH=$PATH:$JAVA_HOME/bin
Copy the code

Press Esc to exit the insertion mode and enter :wq to save the changes. Run the following command to make the changes take effect immediately

source /etc/profile
Copy the code

You can then use the Java-version test to check whether the installation is successful

Then configure Hadoop to upload the Hadoop file as above, and decompress the file in the Hadoop file path in the container

 tar -zxf hadoop2.73..tar.gz
Copy the code

Configure hadoop environment variables

Vi ${hadoop installation path}/etc/hadoop/hadoop-env.shCopy the code

Add the JDK path to javA_HOME



Once the container is configured, mirror it with commit

[root@ecs-kc1-large2 --linux- 20200802175157. centos-ssh-root-jdk-hadoop]# docker commit 1b centos7-hadoop
Copy the code

Then start three containers for the image

[root@ecs-kc1-large2 --linux- 20200802175157. centos-ssh-root-jdk-hadoop]# docker run -it --name hadoop0 --hostname hadoop0 -d -P -p 50070:50070 -p 8088:8088 centos7-hadoop
docker run -it --name hadoop1 --hostname hadoop1 -d -P centos7-hadoop
docker run -it --name hadoop1 --hostname hadoop1 -d -P centos7-hadoop^C
Copy the code

Execute in three containers

source /etc/profile
Copy the code

Save the configuration so that three containers are configured.

Set fixed IP addresses for the three containers in pseudo-distributed configuration. 1: Download pipework. Download address: github.com/jpetazzo/pi… 2: Upload the downloaded ZIP package to the host server, decompress it, and change the name

unzip pipework-master.zip
mv pipework-master pipework
cp -rp pipework/pipework /usr/local/bin/ 
Copy the code

3: Install bridge-utils

yum -y install bridge-utils
Copy the code

4: Create a network

brctl addbr br0
ip link set dev br0 up
ip addr add 192.1682.1./24 dev br0
Copy the code

5: Set a fixed IP address for the container

pipework br0 hadoop0 192.1682.10./24
pipework br0 hadoop1 192.1682.11./24
pipework br0 hadoop2 192.1682.12./24
Copy the code

Verify that the three IP addresses can be pinged

To configure the Hadoop cluster, first connect to hadoop0 and run the following command

docker exec -it hadoop0 /bin/bash
Copy the code

The following steps are the Hadoop cluster configuration procedure 1: Set the mapping between the host name and IP address, and modify the three containers: vi /etc/hosts Add the following configuration

192.1682.10.    hadoop0
192.1682.11.    hadoop1
192.1682.12.    hadoop2
Copy the code

2: Modify the configuration files of the hadoop on hadoop0 Into the/usr/local/hadoop/etc/hadoop directory Modify configuration files core-site. XML, hdFS-site. XML, yarn-site. XML, and mapred-site. XML core-site. XML in the directory

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hadoop0:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/usr/local/hadoop2.73.</property> <name>fs.trash.interval</name> <value>1440</value>
        </property>
</configuration>
Copy the code

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
</configuration>
Copy the code

yarn-site.xml

<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property> 
                <name>yarn.log-aggregation-enable</name> 
                <value>true</value> 
        </property>
</configuration>
Copy the code

XML: change the file name to mv mapred-site.xml.template mapred-site. XML vi mapred-site. XML

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
Copy the code

Go to the /usr/local/hadoop directory. 1. Run the formatting command

bin/hdfs namenode -format
Copy the code

“Which” command is missing

Run the following command to install

yum install -y which
Copy the code

Start pseudo-distributed Hadoop

sbin/start-all.sh
Copy the code



Enter JPS to see if it succeeds



In this way pseudo-distribution is constructed successfully