HDFS profile

HDSF

Hadoop Distributed File System

The characteristics of

1. Save multiple copies, and provide fault tolerance mechanism, copy lost or down automatic recovery. Save 3 copies by default. 2. Run on cheap machines. 3. Suitable for big data processing. By default, HDFS splits files into blocks. 64MB is one block. The block key-value pair is then stored on HDFS and the mapping of the key-value pair is stored in memory. If there are too many small files, the memory burden can be very heavy. (Note: HDFS is not suitable for small file storage: there are many small files, causing a memory burden.)

The system structure

Master and Slave structures.

There are three roles: NameNode, SecondarynaNode and DataNode.

NameNode

The Master node, the big leader. 1. Manage data block mapping; 2. Handle read and write requests of the client; 3. Configure replica policy; 4. Manage HDFS namespaces. 5. NameNode holds = fsimage + edits in memory.

SecondaryNameNode

The younger brother, to share the workload of the elder brother NameNode. The secondaryNameNode is responsible for timing 1 hour by default. The secondaryNameNode fetches fsimages and edits from the NameNode for merging, and then sends them to the NameNode. Reduce the NameNode workload. Cold backup of NameNode.

DataNode

A Slave, a Slave, a worker. 1. Store the data block sent by the client; 2. Perform read and write operation of data block.

Hot backup

B is a hot backup of A if A fails. So B immediately does what A does.

Cold backup

B is a cold backup of A if A breaks down. So B can't take A's job right away. But storing some information of A on B reduces the loss of A if it breaks down.

fsimage

Metadata image file (directory tree of the file system). (? Still don't get it)

edits

Metadata operation log (record of changes made to the file system)

The working principle of

The write operation

Write operation schematic diagram

Write operation scenario

There is a file, FileA, with a size of 100M. The Client writes Filea to HDFS. HDFS is configured by default. HDFS is distributed on three racks RACK1, RACK2, and RACK3.

Write operation flow

  1. The Client chunks the Filea into 64M chunks. Block1 and Block2;
  2. The Client sends a write request to the NameNode, as shown in the blue dotted line (>).
  3. The NameNode node that records the block information. And returns the available DataNode, such as the pink dotted line ②– >.

    Block1: host2,host1,host3
    
    Block2: host7,host8,host4

    The principle of

    NameNode has RackAware rack awareness, which can be configured. If the client is a DataNode, then the rule for storing blocks is as follows: Copy 1 is on the same node as the client; Copy 2, on different rack nodes; Replica 3, on another node with the second replica rack; Other replicas are randomly selected. If the client is not a DataNode, then when storing a block, the rule is: Copy 1, select a node at random; Copy 2, different copy 1, on the rack; Replica 3 on another node of the same name as Replica 2; Other replicas are randomly selected. (? Didn't understand)
  4. The Client sends Block1 to the DataNode; The sending process is a streaming write.

    Stream write process 1> 64M block1 according to 64k package partition; 2> then sends the first package to host2; > host2 sends the first package to host1, and the client sends the second package to host2. > host1 sends the first package to host3 and receives the second package sent by host2; > and so on, as shown in the red solid line in the figure, until block1 is sent out; > host2, host1, host3 send notification to NameNode and host2 sends notification to Client saying "message is finished". This is shown by the solid pink lines. The > client receives the message from Host2 and sends the message to NameNode that Block1 is finished. This completes Block1. Figure yellow thick line; > sends block1, then sends block2 to host7, host8, host4, as shown in solid blue line. 9 After > sends BLOCK2, host7,host8,host4 send notification to NameNode, host7 send notification to Client, as shown in light green solid line. 10> Client sends a message to NameNode saying I'm done, as shown in bold yellow. And that's it.

Write Operation Summary

1. To write 1T file, we need 3T of storage and 3T of network traffic loan. 2. During a read or write process, the NameNode and the DataNode communicate via HeartBeat to ensure that the DataNode is alive. If a DataNode is found to have died, then the data in the dead DataNode will be moved to other nodes. When reading, to read other nodes to go. 3. Hang a node, it doesn't matter, there are other nodes can be backed up; Even, hanging a certain rack, it doesn't matter; There are also backups on other racks.

A read operation

Read operation schematic diagram

Read operation scenario

The client will read a FileA from the DataNode. Filea consists of block1 and block2.

Read operation flow

  1. The client sends a read request to the NameNode;
  2. NameNode looks at Metadata information and returns the location of the Filea block:

    block1:host2,host1,host3
    
    block2:host7,host8,host4
    
  3. Block placement is sequential: read block1 first, then block2. And block1 reads host2; And then Block2, go to host7 and read it.

Note: In the example above, the client is located outside the rack, so if the client is located on a DataNode in the rack, for example, the client is host6. So when reading, follow the rule: it is preferable to read the data on the local rack.

A blog post collated from Daniel Notes.