1. Working mechanism of DataNode

1. On the DataNode, a data block is stored on the disk as a file. The file contains the data itself and metadata, including the length of the data block, the checksum of the data block, and the timestamp.

2. After DataNode is started, it registers with NameNode and periodically reports all block information to NameNode (1 hour).

3. The heartbeat occurs every three seconds. The heartbeat returns a command from NameNode to DataNode, such as deleting a data block. If no heartbeat message is received from a Datanode within 10 minutes, the node is considered unavailable.

4. Some machines can be added and exited safely during cluster operation.

2. Parameter setting of offline time limit

NameNode and Datanodes communicate with each other. Datanodes have a heartbeat mechanism that takes place every 3 seconds. However, datanodes may fail. What does namenode do when it finds that datanodes fail?

1. The DataNode process dies or a network fault causes the DataNode to fail to communicate with NameNode.

2. NameNode will not immediately judge the node as dead, but one end of time will pass, and the calculation formula is:

2 * DFS. The namenode. Heartbeat. Recheck - interval + 10 * DFS in heartbeat. The interval = 10 minute 30 sCopy the code

3. In hdFS-site. XML, the default parameters are as follows: One unit is howseconds, and the other unit is seconds.

<property>
    <name>dfs.namenode.heartbeat.recheck-interval</name>
    <value>300000</value>
</property>
<property>
    <name>dfs.heartbeat.interval</name>
    <value>3</value>
</property>

Copy the code

3. New data nodes are put into service

If the capacity of the original data node cannot meet data storage requirements, dynamically add new data nodes to the original data node.

1. Prepare a new node machine and install the environment. Start the DataNode directly to associate it with the cluster

HDFS --daemon start datanode yarn --daemon start nodeManagerCopy the code

2. If data is unbalanced, you can run commands to rebalance the cluster.

After a new machine is added, you can use this command if other machines occupy a large amount of memory.

sbin/start-balancer.sh
Copy the code

4. Retire old data nodes

Whitelists and blacklists should be planned at the beginning of cluster construction.

4.1 Adding a Whitelist and Blacklist (Security)

Hosts added to the whitelist are allowed to access NameNode. Hosts not on the whitelist are removed.

A host added to the blacklist cannot access namenode and exits after data migration.

In actual cases, the whitelist is used to determine datanodes that are allowed to access NameNode. The configuration of the whitelist is generally consistent with that of the Workers file. The blacklist is used to decommission Datanodes during cluster running.

To configure a whitelist and blacklist, perform the following steps:

/opt/module/hadoop-3.1.3/etc/hadoop [lei@hadoop102 hadoop]$touch whitelist [lei@hadoop102 hadoop]$touch blacklistCopy the code

Add a host name to the whitelist list. The nodes that work properly in the cluster are

hadoop100
hadoop101
hadoop102
hadoop103
Copy the code

Add dfs.hosts and dfs.hosts.exclude to the HDFS -site. XML configuration file of NameNode

< property > < name > DFS. Hosts < / name > < value > / opt/module/hadoop - 3.1.3 / etc/hadoop/whitelist < value > / < / property > < property > < name > DFS. Hosts. Exclude < / name > < value > / opt/module/hadoop - 3.1.3 / etc/hadoop/blacklist < value > / < / property >Copy the code

Configuration file distribution, same requirement for each machine

xsync hdfs-site.xml
Copy the code

Restarting the cluster

4.2 Blacklist Retirement

1. Prepare to retire hadoop105 from the blacklist

Edit the blacklist file and add Hadoop05

[atguigu@hadoop102 hadoop] vim blacklist
hadoop105
Copy the code

Refresh nameNode

[atguigu@hadoop102 hadoop] hdfs dfsadmin -refreshNodes
Copy the code

3. Check the DataNode status on the Web. Hadoop105 is being decommissioned and data is being migrated