HDFS write data

The flow chart

HDFS write data

Specific steps

  1. The client invokes the DS module to request the NameNode to upload the file.
  2. The NameNode checks to see if the target file and parent directory already exist and returns whether it can be uploaded
  3. Assume that the file size is 200M. The client requests to upload the first Block, hoping to obtain the location of the DataNode server.
  4. NameNode returns three Datanodes, namely DN1, DN2, and DN3, to store data.
  5. The client requests DN1 to establish an upload data channel through FSDataOutputStream module. Dn1 will continue to request DN2 after receiving the request, and then DN2 requests DN3 until the communication pipeline is established.
  6. Dn3, DN2, and DN1 answer the client step by step.
  7. The client starts to upload the first Block to DN1 (read data from disk and put it into a local memory cache). In terms of Packet, DN1 receives a Packet and sends it to DN2, which then sends it to DN3. Once DN1 sends a packet, it puts the packet into a reply queue and waits for the reply. (Similar to queue, in Packet unit)
  8. When one Block (0-128M) has been transferred, the client requests NameNode to upload the second Block to the server. Repeat steps 3-7.
  9. Report to NameNode upload complete.

Matters needing attention

  • The DataNode location is selected, using the default 3 copies as an example: the first copy is the most recent copy is generally itself; The second copy selects different nodes of the same rack (same route); The third copy is a random node from another rack.
  • Data transfer is in packets. When the first node receives a packet, it forwards the packet to the next DataNode. It’s not about waiting for the data to be transferred.

HDFS read data

The flow chart

HDFS read data

Specific steps

  1. The client invokes the DS module to request NameNode to download the file.
  2. The NameNode checks whether the target file exists and returns the DataNode address where the file block resides by querying metadata.
  3. The client uses the FSDataInputStream module to request DN1 (nearby selection) to read Block1.
  4. The DataNode starts to transmit data to the client (reads the data input stream from the disk and performs Packet verification). The client receives packets in the unit of Packet, which is first cached locally and then written to the target file.
  5. When a Block (0-128M) has been transferred, the client requests Block2 again. Repeat steps 2-4.
  6. Report the download to NameNode.

Matters needing attention

  • If the first copy of the block fails, the second copy is requested, and so on.

The NameNode and SecondaryNameNode

The flow chart

The NameNode and SecondaryNameNode

Specific steps

Phase 1: NameNode

  1. When NameNode is started for the first time (formatting), Fsimage (image file) and Edits (edit log) files are created. After startup, image files and edit logs will be directly loaded to memory, and the merge operation will be performed.
  2. Suppose that the client makes a request to add, delete, or change a file.
  3. NameNode records the previous edit log (edits_n) and updates the new log to the scroll log (Edits_InProgress_n).
  4. After logs are recorded, NameNode adds, deletes, and modifies data in memory.

Second stage: SecondaryNameNode

  1. Secondary NameNode asks NameNode whether CheckPoint is required.
  2. If necessary, the Secondary NameNode requests CheckPoint execution.
  3. NameNode scroll log.
  4. Copy the edit log (edits_001) and image file (fsimage) before scrolling to the Secondary NameNode.
  5. The Secondary NameNode loads the edit log and image file into memory and merges it.
  6. After the merge, a new image file fsimage.chkpoint is generated.
  7. Copy fsimage.chkpoint to NameNode.
  8. NameNode renames fsimage. Chkpoint to fsimage.

Matters needing attention

  • Fsimage and Edits files?

Fsimage is a serialized file of metadata in NameNode memory. Edits records each step of the client’s metadata update information. Each time you add, delete, or change a file, change the log first. The advantage is that if you guarantee gg midway, you can ensure that the operation is not lost, which is easy to recover.

  • Secondary NameNode

Edit. Log is merged into the fsimage file only when NameNode is restarted, so there are three problems. [Fixed] The next NameNode restart will take a long time Fsimage files get very old over time and are very clever if they die in the middle.

To solve this problem, SecondaryNameNode is created, which at regular intervals assists in merging NameNode edit.log into the fsimage file. As you can see from the above flow chart, that’s exactly what it does.

  • When to CheckPoint?

(1) User timing (2) edit.log full

  • Secondary NameNode is a hot backup?

No, you can see that the Secondary NameNode merges the edIS before scrolling, which is always less than the NameNode edit log.

The DataNode and the NameNode

The flow chart

The NameNode and DataNode

Specific steps

  1. After DataNode starts, it registers with NameNode.
  2. NameNode notifies the registration is successful.
  3. DataNode reports all block information to NameNode periodically (1 hour).
  4. The DataNode sends a heartbeat every three seconds. The heartbeat returns a result with NameNode commands to the DataNode, such as copying a block of data to another machine or deleting a block of data.
  5. If no heartbeat is received within 10 minutes, the node is unavailable.

Matters needing attention

  • A data block is stored on the disk as a file on the DataNode. The file contains the data itself, and the metadata file contains the length, checksum, and timestamp of the data block.
  • Node addition: Automatically registers a new node with NameNode after it is configured.
  • Node retirement: NameNode can whitelist the desired node. Specify unwanted nodes through the blacklist.