Learning HDFS

1. Know the properties of distributed file system (HDFS) 2. Know the access method of HDFS 3. 4. Use HDFS command or Ambari Files View to manage Files and directories 5. Use REST API to Access HDFS 6

Hadoop and Storage

Hadoop is extensible and designed for different file systems, and users or programs can access specific file systems in command line code with special URI prefixes. For example, HDFS DFS -ls is used to view the contents of the directory. This command can use different prefixes to access different file systems: 1.

HDFS DFS - ls file:///binCopy the code

2, display HDFS /root:

HDFS DFS - ls HDFS: / / / rootCopy the code

3. The default storage type is HDFS, which is determined by the fs.defaultFS attribute in core-site. XML:

Set in core-site. XML fs.defaultFS= HDFS ://<NameNode_host>:8020 HDFS client attempts to connect HDFS NameNode on port 8020 using RPCl unless otherwise specified.Copy the code

Note: Even though the default storage file system is specified as HDFS,

HDFS DFS -ls file:///bin displays the contents of /bin of the local file systemCopy the code

   

Second, the Accessing HDFS

Hadoop provides many ways to access HDFS, including: HDFS Shell, WebHDFS, the HDFS NFS Gateway, the Java API, the Ambari Files View, and HUE (Hadoop User Experience).

“`

NameNode and DataNode Introduction

The HDFS service has two components:

NameNode (active) – > Master node

DataNode (slave) – > Worker node (slave node)

NameNode Manages System namespaces and meta information:

File and directory names;

Hierarchy of the file system

Permissions and owners

The time was last changed

access control lists(ACLs)

DataNode only contains information about data blocks:

By default, the minimum size of a data block is 128 MB. Each block is assigned a unique ID, mapped to a file name, and maintained in the NameNode. Only the NameNode knows which block corresponds to which file.

The following is the structure diagram of NameNode and DataNode:

四、 File and Directory Attributes

The properties of files and directories in HDFS are basically the same as those in Linux. usehdfs fs -lsCommand to view the contents of a directory and the properties of files and directories, as shown below:

HDFS Home Directories

Home Directories control data access with HDFS permissions. An HDFS Administrator can create a Home Directories for all users to use, but some applications, Hive, for example, can also create home directories.

Saad has read-write access to its home directory. This means that only HDFS superuser and Saad create files and subdirectories in its home directory.

Once Saad has its own home directory, it can create files and subdirectories,Set different permissionsTo control access to other users.

Six, HDFS Superuser

Many Hadoop services have the concept of super user. Similar to root in Linux and Adminsrator in Windows, Hadoop grants special rights to special services. The common super user is HDFS, which can perform any HDFS operations.

The HDFS superuser account is created byhdfs-site.xmlIn thedfs.cluster.administratorsAttributes determine,

Run the su – HDFS command to obtain the privileges of HDFS superuser

Exit – Exits the super user permission.

HDFS Shell Operations

7.1 Help:

7.2 Directory Creation and Viewing

7.3. Home Directory and Current Directory

Create file:

7.5 Viewing file Contents

7.6 File Copy and Movement:

7.7 Deletion of Files and Directories