We mentioned the CheckPoint mechanism, which basically merges multiple Edits files. NameNode is already under a lot of pressure, so it is not the NameNode that does the merging, but the secondaryNameNode that does it. If you are in a high availability cluster or federated cluster, it is the NameNode with the Standby node that does the merging.

SecondaryNamenode

HDFS – What is MetadataThe metadata on disk is mentioned in FSIMAGE, EDITS_INPROGRESS, EDITS, SEEN_TXID, VERSION, etc.

When I started, there were no EDITS files.



When the NameNode is run for a while, it slowly generates multiple edits files in the format edits_ number-numeric (numbers are 19 bits, I drew 4 bits here), and incrementally.



As mentioned in the previous article, when the NameNode is restarted, the edits_inprogress, edits, and fsimage files are loaded because they are the complete metadata together. As edits become more and more common, consolidation is needed to improve startup speed, but NameNode is already under a lot of pressure, so secondarynameNode is needed.

SecondaryNameNode will request the NameNode to fetch the fsimage and edits files at certain periods of time. If the time of two checkpoints has exceeded 1 hour or the number of operation records of two checkpoints has exceeded 10W, the secondaryNameNode will request the NameNode to fetch the fsimage and edits files at certain periods of time. The NameNode generates an empty edits. New file and sends the edits and fsimage files to the secondaryNameNode.



SecondarynaNode will then locally merge the edits file with the fsimage file to create a new fsimage.ckpt file.



The generated fsimage. CKPT will be sent to the NameNode, and the NameNode will override the original fsimage, fsimage_0006, which corresponds to the metadata set edits_0001 through edits_0006. At this point, the old edits will be deleted and the edits.new file will be renamed the edis file.

If the latest fsimage is not available, the metadata state of the previous fsimage will be rolled back, and the new metadata will be lost.

The cluster

In the case of a non-cluster, metadata persistence is written to disk; in the case of a cluster, metadata is written to a JournalNode as well as to disk. So initially, the active node’s NameNode and JournalNode’s metadata are the same.



The standby node’s NameNode goes to JournalNode every 60 seconds to read metadata information.



When the CheckPoint is triggered, the Edits file and the fsimage file are combined to create a new fsimage file.



The NameNode of the standby node then sends the new fsimage to the NameNode of the active node. The active node’s NameNode replaces the old fsimage, and the edits file is deleted.



At this point, the merging of Edits files and fsimage is complete.