1. Pseudo-distribution mode

Pseudo-distributed mode is a mode that runs on a single node and multiple Java processes. Compared with the local mode, you need to set more configuration files, SSH, and YARN Settings.

2 HadoopThe configuration file

Modify three configuration files in the Hadoop installation directory:

  • etc/hadoop/core-site.xml
  • etc/hadoop/hdfs-site.xml
  • etc/hadoop/hadoop-env.sh

2.1 core-site.xml

First modify core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
    	<name>hadoop.tmp.dir</name>
    	<value>/usr/local/hadoop/tmp</value>
    </property>
</configuration>
Copy the code
  • fs.defaultFSSet up theHDFSSet the address to run locally9000On port
  • hadoop.tmp.dirThe temporary directory is set, if not set by default/tmp/hadoop-${user.name}, data will be lost after the system restarts. Therefore, change the path of the temporary directory

Then create the temporary directory:

mkdir -p /usr/local/hadoop/tmp
Copy the code

2.2 hdfs-site.xml

Alter fsds-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
Copy the code

Dfs. replication sets the number of temporary backups stored by HDFS. Since there is only one node in pseudo-distribution mode, set it to 1.

2.3 hadoop-env.sh

Modify the file to add the JAVA_HOME environment variable, even if JAVA_HOME is present

  • ~/.bashrc
  • ~/.bash_profile
  • /etc/profile

JAVA_HOME cannot be found. Therefore, you need to manually set JAVA_HOME in hadoop-env.sh:

3 There is no password on the local PCsshThe connection

The next step is to set up the local password-free SSH connection. First, ensure that the SSHD service is enabled:

systemctl status sshd
Copy the code

Localhost:

ssh localhost
Copy the code

Enter your own user password to access. However, a password-free connection is required. Therefore, configure the key authentication mode for the connection:

ssh-keygen -t ed25519 -a 100 
cat ~/.ssh/id_25519.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Copy the code

After the public and private keys are generated, add the public key to authorized_keys and modify the permission. Note that only the local user has the write permission.

SSH localhost to connect to the localhost.

4 run

4.1 the formattingHDFS

Run in single-node mode, format HDFS first:

# HADOOP indicates the HADOOP installation directory
HADOOP/bin/hdfs namenode -format
Copy the code

Format is to block datanodes in the HDFS, collect statistics on all the initial metadata after block, and store it in the NameNode.

After the formatting is successful, the DFS directory will be generated in the temporary directory set in the configuration file above, as follows:

The TMP/DFS /name/current directory contains the following files:

The description of the document is as follows:

  • fsimage:NameNodeMetadata stored in a persistent file when the memory is full
  • fsimage*.md5: Verification file, used for verificationfsimageThe integrity of the
  • seen_txidDeposit:transactionIDFile,formatAnd then 0, which meansNameNodeThe inside of theedits_*Mantissa of the file
  • VERSION: Save the creation time,namespaceID,blockpoolID,storageType,cTime,clusterID,layoutVersion

Note about VERSION:

  • namespaceID:HDFSUnique identifier, inHDFSGenerated after the first formatting
  • blockpoolID: Identify ablock poolIs globally unique across clusters
  • storageTypeWhat process’s data structure information is stored
  • cTime: Creation time
  • clusterID: A cluster generated or specified by the systemID, you can use-clusteridThe specified
  • layoutVersionSaid:HDFSInformation about versions of persistent data structures

4.2 startNameNode

HADOOP/sbin/start-dfs.sh
Copy the code

And then you can go through

localhost:9870
Copy the code

Visit the NameNode:

4.3 test

Generate an input directory and use a configuration file as input:

bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/USER_NAME # USER_NAME is your user name
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input
Copy the code

Testing:

Bin/hadoop jar share/hadoop/graphs/hadoop - graphs - examples - 3.3.0. Jar grep input output'dfs[a-z.]+'
Copy the code

Get the output:

bin/hdfs dfs -get output output Copy output to the output directory
cat output/*
Copy the code

Stop:

sbin/stop-hdfs.sh
Copy the code

5 usingYARNconfiguration

In addition to starting a single node in pseudo-distribution mode, you can use YARN to schedule nodes in a unified manner. You only need to modify the configuration file.

5.1 Configuration File

Modify the following files:

  • HADOOP/etc/hadoop/mapred-site.xml
  • HADOOP/etc/hadoop/yarn-site.xml

5.1.1 mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>
Copy the code
  • mapreduce.framework.nameSpecifies theMapReduceRunning on theYARNon
  • mapreduce.application.classpathClass path specified

5.1.2 yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED _HOME</value>
    </property>
</configuration>
Copy the code
  • yarn.nodemanager.aux-services: running inNodeManagerSatellite services run on
  • yarn.nodemanager.env-whitelist: Environment variables are passed fromNodeManagersThe environment properties inherited by the container

5.2 run

sbin/start-yarn.sh
Copy the code

Run it and you can pass it

localhost:8088
Copy the code

Access:

Stop:

sbin/stop-yarn.sh
Copy the code

6 reference

  • Hadoop3.3.0 official documentation
  • CSDN GitChat · with large data | in the history of the most detailed Hadoop environment set up
  • CSDN-Hadoop Namenode metadata files Fsimage, editlog, and seen_txID