When a file is uploaded to an HDFS server, it is divided into multiple blocks and stored as multiple copies on the server. How do we know the name of the file? How many pieces is this file divided into? Which services does each block store?

So when HDFS uploads a file, in addition to uploading the file, it also saves this information, which is called metadata.

There are two forms of metadata in HDFS, one on disk and one in memory, and both forms of data are exactly the same. Disk storage is for data persistence, memory storage is to improve the performance of read and write.

In HDFS, when we read and write files, the directory is similar to the Linux directory structure, so the metadata also holds the directory tree structure. For example, in the following figure, INodeDirectory is a directory, INodeFile is a file, a directory can have multiple directories and multiple files.

So metadata includes:

  • Directory tree structure.
  • The relationship between a file and a block
  • Relationship between Block and DataNode

The composition of the metadata file

We set up beforeHadoop clusterAs well asHigh availability clusterThis will generate an fsimage file in the disk directory, which will be used to store the metadata information.

When the NameNode starts, it reads the fsimage file from the fsimage directory based on the configuration information and loads it into memory.

When the NameNode provides services, it will add metadata to the in-memory fsimage if the client requests to upload a file. In order to ensure that the data is not lost, it will also persist the metadata to the disk. The file written to the disk is edits_inprogress. At this point fsimage in memory = fsimage+edits_inprogress on disk.

As the edits_inprogress file becomes more and more data, or after some time, the edits_inprogress file is renamed to the edits0000000n-edits0000000m file and the data is rewritten with the new edits_inprogress file. And so on and so forth, there are many, many Edits files. At this point fsimage in memory = fsimage of the disk +edits_inprogress+ all edits.

When the NameNode is restarted, the disk fsimage+edits_inprogress+ all edits are read into the cache and merged into an in-memory fsimage.

You can expect to see more and more Edits files in infinite increments of metadata, and more and more time to load each NameNode reboot, so you have a CheckPoint mechanism that simply merizes historical Edits files into an fsimage, so that the next time you reboot, We just load the merged fsimage file, the edits_inprogress file, and a few edits files, greatly improving the NameNode startup time. In order to record the TXID of each CheckPoint, there is a SEEN_TXID file for recording.

There is also a file called VERSION, which holds information about the HDFS cluster.

In summary, metadata files include fsimage, edits_inprogress, edits, seen_txid, VERSION.