The hadoop environment

Hadoop version for

Hadoop version hadoop 2.10.1 Subversion https://github.com/apache/hadoop - r 1827467 c9a56f133025f28557bfc2c562d78e816 Compiled by CentOS on 2020-09-14T13:17Z Compiled with Protoc 2.5.0 From source with checksum 3114edef868f1f3824e7d0f68be03650

Client Development

  • Introducing dependencies (using Maven)
< the dependency > < groupId > org, apache hadoop < / groupId > < artifactId > hadoop - client < / artifactId > < version > 2.10.1 < / version > </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> < version > 2.10.1 < / version > < / dependency > < the dependency > < groupId > org.. Apache hadoop < / groupId > < artifactId > hadoop - common < / artifactId > < version > 2.10.1 < / version > < / dependency >
  • Write the code
package com.definesys.hadoop; import org.apache.commons.io.IOUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hdfs.DistributedFileSystem; import java.io.FileInputStream; import java.io.IOException; /** * @description: * @author: Jianfeng. Zheng * @since: 2020/12/14 12:36 AM * @history: 1.2020/12/14 Created by Jianfeng. Zheng */ public class HDFS {public static void main(String[] CMD) throws IOException {  Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://master:9000/"); // conf.set("fs.hdfs.impl", DistributedFileSystem.class.getName()); System.setProperty("HADOOP_USER_NAME", "hadoop"); FileSystem fs = FileSystem.get(conf); Path dst = new Path("hdfs://master:9000/demo/hello.txt"); FSDataOutputStream os = fs.create(dst); FileInputStream is = new FileInputStream("/root/hello.txt"); IOUtils.copy(is, os); is.close(); os.close(); fs.close(); }}
  • packaging

If it’s a Web application, it’s usually packaged as a WAR or EAR. Either way, both package formats will include the dependent packages, so no special processing is required. If you want to run it locally, you’ll need to use two plug-ins to copy the following configuration information into your pom.xml

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-jar-plugin</artifactId>
            <version>2.6</version>
            <configuration>
                <archive>
                    <manifest>
                        <addClasspath>true</addClasspath>
                        <classpathPrefix>lib/</classpathPrefix>
                        <mainClass>com.definesys.hadoop.HDFS</mainClass>
                    </manifest>
                </archive>
            </configuration>
        </plugin>

        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-dependency-plugin</artifactId>
            <executions>
                <execution>
                    <id>copy-dependencies</id>
                    <phase>package</phase>
                    <goals>
                        <goal>copy-dependencies</goal>
                    </goals>
                    <configuration>
                        <outputDirectory>${project.build.directory}/lib</outputDirectory>
                        <overWriteReleases>false</overWriteReleases>
                        <overWriteSnapshots>false</overWriteSnapshots>
                        <overWriteIfNewer>true</overWriteIfNewer>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

The maven-jar-plugin generates a MANIFEST.MF file based on the configuration. The MANIFEST.MF file records the running class information and dependencies, similar to the following

Manifest-version: 1.0 Archiver-version: Plexus Archiver Built By: Asan Class-Path: Lib/Hadoop-Client-2.10.1.jar.... Created - By: Apache Maven 3.6.3 Build - Jdk: 1.8.0 comes with _161 Main - Class: com. Definesys. Hadoop. HDFS

${project.build. Directory}/lib: ${project.build. Directory}/lib: ${project.build. And the classpathPrefix. When the package is complete, execute the following command

Java - jar hadoop - HDFS - 1.0. Jar # or manually specify run Java classes - cp hadoop - HDFS - 1.0. Jar com. Definesys. Hadoop. HDFS

There is also a plugin called maven-assembly-plugin, which is not recommended for packaging because it unpacks all dependencies into a JAR package. Some of Hadoop’s mechanisms are implemented through SPI, and unpacking can cause configuration file overwriting

A simple HDFS operation class

package com.definesys.hadoop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.FileInputStream;
import java.io.IOException;


/**
 * @Description:
 * @author: jianfeng.zheng
 * @since: 2020/12/14 12:36 上午
 * @history: 1.2020/12/14 created by jianfeng.zheng
 */
public class HDFS {

    public static void main(String[] cmd) throws IOException {
        HDFS hdfs = new HDFS();
        hdfs.mkdir("/hdfsDemo");
        hdfs.putFile("/root/hello.txt", "/hdfsDemo");
        hdfs.dowloadFile("/hdfsDemo/hello.txt", "/root/hello-hdfs.txt");
        hdfs.deleteFile("/hdfsDemo");
    }

    public boolean mkdir(String path) throws IOException {
        FileSystem fs = this.getHDFSFileSystem();
        return fs.mkdirs(new Path(path));
    }

    public void putFile(String localPath, String hdfsPath) throws IOException {
        this.getHDFSFileSystem().copyFromLocalFile(new Path(localPath), new Path(hdfsPath));
    }

    public void deleteFile(String path) throws IOException {
        this.getHDFSFileSystem().delete(new Path(path), true);
    }

    public void dowloadFile(String hdfsPath, String localPath) throws IOException {
        this.getHDFSFileSystem().copyToLocalFile(new Path(hdfsPath), new Path(localPath));
    }

    private FileSystem getHDFSFileSystem() {
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://master:9000/");
        System.setProperty("HADOOP_USER_NAME", "hadoop");
        try {
            FileSystem fs = FileSystem.get(conf);
            return fs;
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }
}

The problem

Permission problems

Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/":hadoop:supergroup:drwxr-xr-x
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:350)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:251)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:189)

HDFS file system permissions are similar to Linux, and different users have different permissions on files. If you do not specify a user name in your code, then use the operating system on which the program is running as the user name. In this case, it is root

$ hadoop fs -ls / Found 5 items drwxr-xr-x - asan supergroup 0 2020-12-16 10:07 /001 drwx-w---- - hadoop supergroup 0 2020-12-07 10:54 / TMP drwxr-xr-x-hadoop supergroup 0 2020-12-07 11:05 /user # root path access $hadoop fs-ls-d/drwxr-xr-x - hadoop supergroup 0 2020-12-18 00:42 /

There are several solutions

  • Change the root path permissions or other folder permissions to 777
$ hadoop fs -chmod 777 /demo

$ hadoop fs -ls -d /demo
drwxrwxrwx   - hadoop supergroup          0 2020-12-18 00:46 /demo
  • Cancel permission verification

Add the following configuration to the master node

<property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
</property>
  • Adding a username configuration to your code (recommended)
System.setProperty("HADOOP_USER_NAME", "hadoop");

The code needs to be added before performing the HDFS operation