As mentioned earlier, when a client needs to write data, it will tell the NameNode that it is uploading the file to a directory, and the NameNode will start to update the metadata. Because metadata updates are important, clients retry under certain conditions until they succeed.



Once the metadata is updated successfully, the Client creates two queues, a DataQueue and an AckQueue. The purpose of these two queues will be discussed later.



Each chunk is 512byte, and the chunksum is 4byte. This checksum is used to check the contents of the chunk, so the size of each chunk is 516byte.



When these chunks are filled, they are stored in a packet called a packet. The packet is 64K in size, so it is 65,535 bytes, equivalent to 127 chunks. Each time a packet or 128M (the size of the block) is filled, a new packet is created to write to the chunk.



The packet that has been filled is placed in the dataQueue mentioned above.



The client has some other thread, it’s going to monitor the DataQueue, and then it sees that the DataQueue has data, so it starts asking the NameNode for the block information.

The NameNode will pass the calculated DataNode information to the client based on load balancing and rack awareness.



After receiving the DataNode information, the client then starts to pipe data with one of the DataNodes, which in turn channels data with the other DataNodes.



The purpose of establishing a data channel is to transfer data through the socket. The client does not directly establish a data channel with the three DataNodes. This is because the DataNode is usually in the same machine room, so their internal communication speed is relatively fast.

If the channel is set up successfully, the client will remove the header packet of the DataQueue and pass it to the DataNode through the socket. In addition, the client will put the packet into the ACKQueue.



If the DataNode fails, the packet in the packet queue will be put back into the DataQueue. In this case, the thread listening on the DataQueue will take the packet out of the packet queue and transfer it again. If the transfer is successful, the packet from the ackQueue is removed.

The DataNode receives the data and writes it to the ACKQUEUE. The DataNode then transfers the data to the downstream for packet and then writes it to disk.



The ACKQUEUE is used as above to prevent transmission failure. If the transmission succeeds, the packet will be removed from the ACKQUEUE. If the packet fails, the transmission will continue.

If the DataNode is not the last node, then the above operation will be repeated.