Address: Pengtuo. Tech /2018/09/10/…

The Hadoop ecosystem is a large and fully functional ecosystem, but it still revolves around the distributed system infrastructure named Hadoop. Its core components are composed of four parts: Common, HDFS, MapReduce, and YARN.

  1. CommonHadoopCommon components of the architecture;
  2. HDFSHadoopDistributed file storage system;
  3. MapReduceHadoopA programming model is provided for parallel operation of large data sets.
  4. YARNHadoopAfter architecture upgrade, currently widely used resource manager.

A small goal is to write a blog post on each of the core components. Here’s a good look at HDFS.

Introduce a,

HDFS(The Hadoop Distributed File System) is a Distributed File System designed to be suitable for Hadoop running on commodity Hardware. It differs significantly from other distributed systems in that HDFS is highly fault tolerant and can be deployed on inexpensive hardware. In addition, HDFS provides high-throughput access to application data and is suitable for applications with large data sets.

Currently, HDFS is the core project of Apache Hadoop. The URL is hadoop.apache.org/

2. Advantages of HDFS

2.1 Hardware Fault Prevention

An HDFS instance may contain hundreds or thousands of servers, each of which stores a portion of the file system data. In this case, hardware failures are normal. HDFS can detect faults and recover from them quickly and automatically.

2.2 Stream Data Access

Designed for batch processing rather than interactive use by users, HDFS focuses on high throughput for data access rather than low latency for data access.

2.3 Processing large data sets

The core purpose of HDFS is to process applications with a large amount of data. The file size of applications running on HDFS is generally TB. HDFS provides high aggregate data bandwidth and scales to hundreds of nodes in a cluster, supporting tens of millions of files for a single application.

2.4 Simple consistent model

HDFS applications are a “write once, read many” file access model. This model can simplify the problem of data consistency and achieve high throughput data access. Official documentation indicates that support for appending files is planned.

2.5 Mobile computing replaces mobile data

Moving Computation is Cheaper than Moving Data. Moving computing is most efficient when a program and Data are on the same physical node, especially when the Data volume is very large. Mobile computing minimizes network congestion and improves overall system throughput. HDFS is designed to move computing closer to where the data is, not to where the application is running. HDFS provides interfaces for applications that bring themselves closer to the data.

2.6 Portability across heterogeneous hardware and software platforms

HDFS is designed to be portable from one platform to another. This has contributed to the widespread adoption of HDFS as the preferred big data processing platform for a large number of applications.

NameNode & DataNodes

NameNode and DataNode are important knowledge points of HDFS. HDFS is the master/slave architecture. An HDFS cluster consists of a single NameNode and multiple Datanodes. Files are divided into one or multiple blocks, which are stored in a group of Datanodes.

Because HDFS is built using the Java language, NameNode and DataNode can be run on any machine that supports the Java language. And because of the high portability of Java, HDFS has a very wide range of applications. In a typical HDFS deployment mode, one physical host is assigned to run NameNode, and the other physical hosts run Datanodes. In actual deployment, one DataNode is deployed on each host.

The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. NameNode is the determiner and repository for all HDFS metadata. The system is designed so that user data never flows through the NameNode, which is understood to be the backbone of the entire system.

The architecture is shown below:

First, rack is translated as “rack”, which can be understood as two clusters of machines in different places, each with its own internal connection. Secondly, datanodes store blocks instead of files. In HDFS, each large file is divided into multiple blocks, and these blocks are distributed and stored in different Datanodes. In addition, each Block has multiple copies. It is also stored in other Datanodes.

As you can see, the figure above explains the two operations: “read” and “write” respectively:

  1. When a client wants toHDFSWhen writing to a file, the file is split in the diagramBlockWrite to both racksDataNodeIn general, it is two physical hosts in two racks. It can be seen that the file data has not passed throughNameNode. See the process of data writing (vii. Data Replication Pipeline)
  2. When there is a client to fromHDFSWhen the file is read, the operation command is passed toNameNodeAnd thenNameNodeThe operation to convert the corresponding data block to the corresponding commandDataNodeReturn the required data to the client.

There is also a node not shown in the figure, called Secondary Namenode. It is an auxiliary background program, mainly responsible for communicating with Namenode, regularly saving snapshots of HDFS metadata and backing up contents of other Namenode. Replaces NameNode when NameNode fails.

NameNode

NameNode is the primary server that manages file system namespaces. It is used to manage clients’ access to files and perform file system namespace operations, such as opening, closing, and renaming files and directories. It also determines the mapping of blocks to Datanodes.

NameNode makes all the decisions about block replication and periodically receives Heartbeat and block reports from each DataNode in the cluster. Receiving a Heartbeat means that the DataNode is running properly, and the Blockreport contains a list of all blocks on the DataNode.

DataNode

Datanodes are usually one for each node in a cluster. They store data and provide read and write requests from file system clients. It also creates, deletes, and copies blocks according to NameNode instructions.

HDFS file system namespace and metadata

HDFS supports traditional hierarchical file organization. The file system namespace hierarchy is similar to that of most existing file systems. A user or application can create folders and store files in these folders. However, HDFS does not support hard and soft connections in Linux. NameNode maintains the namespace of the file system, logs any changes to the namespace or properties of the file system, and stores replicators.

The number of copies of a data block is called the replicator of that data block

File system MetaData is also stored in NameNode, which uses a transaction log called EditLog to persist every change to file system MetaData. For example, creating a new file in HDFS causes NameNode to insert records into the EditLog to indicate this situation. NameNode uses files in its localhost OS file system to store editlogs.

The entire file system namespace (including block-to-file and file system attributes mapping) is stored in a file named FsImage. FsImage is also stored as a file in NameNode’s local file system.

Persistence of metadata

NameNode holds the data block mapping of the entire file system namespace and file in the entire memory. When the NameNode is started, or a checkpoint is triggered by a configurable threshold, it reads FsImage and EditLog from disk, loads the file system metadata information in FsImage into memory, and then applies all transactions in the EditLog to FsImage in memory. Finally, this new version is synchronized to FsImage on disk. It can then truncate the old EditLog because its transactions have been applied to the persistence FsImage. This process is called checkpoint.

The purpose of a checkpoint is to ensure that HDFS has a consistent view of the file system metadata by taking a snapshot of the file system metadata and saving it to FsImage. Although it is efficient to read FsImage directly from memory, incremental editing FsImage directly is not efficient. Instead of changing the FsImage of each edit, we keep the edit content in the Editlog.

During a checkpoint, the Editlog changes are applied to FsImage. Can the given time interval in seconds (DFS) the namenode. Checkpoint. Period) trigger checkpoints, or transactions in a given amount of accumulated file system (DFS. The namenode. Checkpoint. TXNS) after triggering checkpoints. If both properties are set, a checkpoint can be triggered as soon as either threshold is met.

Data Replication

HDFS is designed to reliably store very large files across computers in large clusters. It stores each file as a series of blocks, and all blocks in the file except the last block have the same size. The default block size used by HDFS is 128MB. Blocks of files are copied to implement fault tolerance, and generally the copied file blocks are stored in different Datanodes. The size of the data block and replication factor can be set by the user.

Files in HDFS are written at a time, and only one writer can be created at any time.

Explanation: As shown in the figure, the replication factor of part-0 file is R :2, and the number of data blocks it splits is {1,3}, so the number of data block 1 is on the first and third DataNode, and the number of data block 3 is on the fifth and sixth DataNode. The same applies to the part-1 file. And that information is stored in NameNode.

HDFS copy storage policy

This is just a brief introduction to the information in the figure. The actual HDFS replica placement strategy is a worthwhile topic to study, because it really matters for HDFS dependability and performance, and the optimized replica placement strategy gives HDFS an advantage over other distributed file systems.

In most real-world cases, when the replication factor is r = 3, the HDFS placement strategy is to place one replica on the DataNode operated by the writer, the second replica on a node on another remote rack, and then the last replica on a different physical node on the same remote rack.

This placement strategy can effectively reduce the traffic in the rack to improve system performance. Because physical nodes in different racks communicate through switches, in most cases the network bandwidth between computers in the same rack is greater than that between computers in different racks.

If the replication factor is greater than 3, the placement of the fourth and subsequent copies is randomly determined while keeping the number of copies per rack below the upper limit.

Generally, the upper limit is (number of copies -1)/rack + 2

Because NameNode does not allow datanodes to have multiple copies of the same block, the maximum number of copies that can be created is the total number of Datanodes at this time.

When a client requests data access, the HDFS preferentially selects the data copy closest to the client to minimize global bandwidth consumption and read latency.

6. Communication protocol

All HDFS communication protocols are layered on top of TCP/IP.

7. Data replication pipeline

When the client writes data to the HDFS file whose replication factor is R = 3, NameNode uses the Replication target choosing algorithm to retrieve the DataNode list. This list contains the Datanodes that will host the block copy.

The client then writes to the first DataNode, which starts receiving the data in batches, writing each portion to its local storage, and transferring the portion to the second DataNode in the list. The second DataNode again receives each part of the data block, writes that part to its storage, and then flusher that part to the third DataNode. Finally, the third DataNode writes data to its local storage.

As can be seen, datanodes receive data from the previous DataNode in the pipeline and forward the data to the next DataNode in the pipeline. Data flows from one DataNode to the next DataNode.

Eight, operable

Applications can manipulate HDFS files in a variety of ways. FS Shell can be used as a Linux file system. Common commands include:

Action Command
Create the foodir folder bin/hadoop fs -mkdir /foodir
Delete folders bin/hadoop fs -rm -R /foodir
Viewing file Contents bin/hdfs dfs -cat /foodir/myfile.txt
Upload a file bin/hdfs dfs -copyFromLocal ~/a.txt /foodir/

There are two command prefixes: Hadoop FS and HDFS DFS

The difference is that Hadoop FS can be used in other file systems, not just HDFS, which means that this command can be used in a wider range. HDFS DFS is dedicated to HDFS distributed file systems.

There is also a hadoop DFS prefix, which is outdated, and 🙅 is not recommended.

Nine, space recycling

9.1 Deleting and Undeleting files

If garbage bin configuration is enabled, files deleted by FS Shell are not immediately removed from HDFS, but HDFS moves them to the garbage directory (/user/username/.trash).

In the garbage bin, NameNode deletes the deleted file from the HDFS namespace after its life cycle expires. Deleting a file causes blocks associated with the file to be released.

Note: There may be a significant time delay between the time the user deletes the file and the corresponding increase in available space in HDFS.

If the trash bin configuration is enabled, run the hadoop fs -rm -r -skipTrash a.txt command to delete it completely

9.2 Reducing replication Factors

When the file’s replication factor decreases, NameNode selects the redundant copies that can be deleted. The next Heartbeat transfers this information to the DataNode. The DataNode then deletes the corresponding blocks and displays the corresponding free space in the cluster.

reference

[1] Hadoop JavaDoc API

[2] HDFS source: hadoop.apache.org/version_con…

[3] HDFS document: hadoop.apache.org/docs/stable…