Basic introduction


  • Client

    • Storage after chunking
    • Interacts with the NameNode to get location information for the file
    • Interacts with the DataNode to read or write data
    • Client provides several commands to manage and access HDFS
  • NameNode: It’s a master. It’s a master

    • Manage HDFS namespaces
    • Manage the mapping information of the data block, but do not persist, too large
    • Configure replica policy
    • Handle client read and write requests
  • DataNode: Executes NameNode directives

    • Stores the actual data block
    • Performs read and write operations to data blocks
  • snn

    • Auxiliary nn
    • Merge fsimage and fsedits periodically and push to nn
    • In case of emergency, NN can be restored for one hour ago data
  • To solve the problem of mass storage, cross-machine storage, unified management of the file system distributed on the cluster is called distributed file system
  • Suitable for storing large data, provide unified access interface, can operate like ordinary file system
  • Distributed storage scheme

    • Split large files into chunks and place them on different servers
  • No matter how big the file is, it will be stored in blocks. Each block has 128M, and each block has a copy. The default copy of each block is 3
  • Metadata is stored in NameNode memory, and there is a backup of the metadata on disk
  • Design goals

    • Hardware fault handling: detection and quick recovery is the core
    • Streaming read data: suitable for batch processing, rather than interactive, focusing on high throughput data access
    • The file size should be as large as possible. Small file metadata takes up memory
    • The cost of mobile computing (computing in HDFS) is lower than the cost of moving data (pulling data to the local area)
  • Application scenarios

    • GB, TB, PB basic data storage, batch high throughput
    • One write multiple reads, no modification necessary, HDFS does not support random modification of file data, such as the fact that history has evolved (weather)
    • Can run on ordinary cheap machines
    • High fault tolerance
    • Ability to expand
  • Inapplicable scenario

    • Randomly modify the scene,
    • Interactive scenes
    • Scenarios for storing large numbers of small files


  • Heartbeat Mechanism: Hdfs-default.xml: If the DataNode does not send a heartbeat in a certain amount of time and waits 10 times, the Node will be able to send a heartbeat to the NameNode every 3 seconds. If the default configuration does not receive a heartbeat for 30 seconds, NameNode will consider it as suspended death. NameNode will actively send a check to the DataNode every 5 seconds. If it does not receive a heartbeat after two checks, it will consider it as a failure. The default configuration is 630 and the DataNode is determined to be dead. In addition, the DataNode will report its block information to the NameNode every six hours
<property> <name>dfs.heartbeat.interval</name> // <value>3</value> <description>Determines datanode heartbeat interval in seconds.</description> </property> <property> <name>dfs.namenode.heartbeat.recheck-interval</name> <value>300000</value> <description> This time decides the interval to check for expired datanodes. With this value and dfs.heartbeat.interval, the interval of deciding the datanode is stale or not is also calculated. The unit of this configuration is millisecond.  </description> </property>
  • Load balancing: The NameNode ensures that the number of blocks in each DataNode is approximately the same -t 10% 
  • Replica mechanism: Default 3 copies per block. When it is found that there are less than 3 copies of some blocks, it will ask the specified node to create a copy and ensure that there are 3 copies. If the copy data is more than 3, it will ask the specified node to delete the extra copies. If you can’t handle a copy (more than half of the machine is dead), then NameNode will put HDFS in safe mode and won’t write. Each DataNode will report block information to NameNode. NameNode will check the integrity of the data and exit the security mode

    • Matters needing attention

      • When storing, the block size is 128M, but when the block size is smaller than 128M, it is saved as the actual size
  • Replica storage (rack aware)

    - The first copy is stored on any machine on the rack nearest to the client, if equal, select a rack at random - The second copy is stored on a server different from the first one - The third copy: on a server different from the second copy on the rack

HDFS operation

  • HDFS DFS -put localpath hdfspath to upload to HDFS
  • HDFS DFS -moveFromLocal LocalPath hdfspath
  • HDFS DFS -get hdfspath localpath from HDFS
  • HDFS dfs-getMerge hdfspath1 hdfspath2 localPath is downloaded from HDFS and merged to the local
  • HDFS DFS-RM [-R-F-SkipTrash] XX data moved to the trash can with storage time
  • HDFS dfs-du [-h] file
  • HDFS dfs-chown [-r] User: User group file
  • HDFS DFS -appendToFile srclocalFilePath hdfsFilePath append file

HDFS senior shell

  • hdfs dfs -safemode get/enter/leave

    • Supplementary security mode: A protection mechanism of Hadoop. At this time, the data integrity will be checked. The default copy rate is 0.999, and the DataNode will delete the redundant copies and only accept read requests
    • If there is a missing block and HDFS can not actively exit the safe mode, manually leave the safe mode, can be deleted, 1. Files not important skip the recycle bin all dry; 2. If the file data is important, download the file to the local disk, delete the corresponding HDFS file by skipping the recycle bin, and then upload it again
  • Benchmark test, test the overall throughput of the HDFS cluster that has just been built, take the maximum value, and take the average for several tests of different parameters
Hadoop jar/export/server/hadoop - 2.7.5 / share/hadoop/graphs/hadoop - graphs - the client - jobclient - 2.7.5. Jar TestDFSIO -write -nrFiles 10 -fileSize 10MB

HDFS principle

The NameNode stores the metadata process

  • The entire system’s namespace and address map of data are held, so the entire system’s storage capacity is limited by the memory size of the NameNode
  • When the NameNode records metadata information, it first records it to the Edits and then to the memory space, until it succeeds

    • Edits files are too large to reach a size of 64MB and are closed to open a new Edits
    • Edits is going to open a new one in an hour
    • A reboot will also open a new Edits
    • Edits are small files, should not be too many, you need to merge them, the resulting file is fsimage, as the continuous merge, this file will become larger and larger, close to the memory size
    • Fsimage is a relatively intact metadata. When HDFS just starts, it first loads the data from the Fsimage file into memory, and then loads the Edits file into memory. At this time, all the metadata is restored to memory
    • SecondaryNameNode: Helps NameNode merge edits into fsimage
  • NameNode files operate on streams of data that do not pass through NameNodes and will tell the DataNode to be processed
  • The NameNode makes the decision to prevent replicas based on the global situation
  • For a balance of heartbeat mechanisms and status reports, see the architecture above

HDFS read and write process

  • write

  • read

SNN (secondarynamenode)

The first step is to understand what the fsimage and edits files do

Edits: holds the operation log,

Fsimage: generated by merging edits and is done by SNN

  • When edits exceed 64M, SNN alerts NameNode to create a new edits
  • New one every hour
  • New on Restart
  • Under special circumstances, SNN can be used for metadata recovery, to restore to the state of one hour ago