“This is the 27th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

Previous Section: Hadoop Enterprise Production Tuning Manual (I)

Fifth, HDFS storage optimization

Note: A total of 5 virtual machines are required to demonstrate erasure code and heterogeneous storage. Try to get another set of clusters. Prepare a cluster of five servers.

5.1 verses delete code

5.1.1 Principles of erasure codes

By default, the HDFS has three copies of a file, which improves data reliability but doubles the redundancy cost. Hadoop3.x introduces erasure correction codes, which can save about 50% of storage space by computing.



Commands related to erasure code operations

[Tom@hadoop102 hadoop-3.13.]$ hdfs ec
Usage: bin/hdfs ec [COMMAND]
          [-listPolicies]
          [-addPolicies -policyFile <file>] [...getPolicy -path <path>] [...removePolicy -policy <policy>] [...setPolicy -path <path> [-policy <policy>] [...replicate[-]]unsetPolicy -path <path>] [...listCodecs]
          [-enablePolicy -policy <policy>] [...disablePolicy -policy <policy>] [...help <command-name>]
Copy the code

View the supported erasure code policies

[Tom@hadoop102 hadoop-3.13.]$ hdfs ec -listPolicies
Erasure Coding Policies:
ErasureCodingPolicy=[Name=RS-10-4-1024k.Schema= [ECSchema= [Codec=rs.numDataUnits=10.numParityUnits=4]], CellSize=1048576.Id=5], State=DISABLED
ErasureCodingPolicy=[Name=RS-3-2-1024k.Schema= [ECSchema= [Codec=rs.numDataUnits=3.numParityUnits=2]], CellSize=1048576.Id=2], State=DISABLED
ErasureCodingPolicy=[Name=RS-6-3-1024k.Schema= [ECSchema= [Codec=rs.numDataUnits=6.numParityUnits=3]], CellSize=1048576.Id=1], State=ENABLED
ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k.Schema= [ECSchema= [Codec=rs-legacy.numDataUnits=6.numParityUnits=3]], CellSize=1048576.Id=3], State=DISABLED
ErasureCodingPolicy=[Name=XOR-2-1-1024k.Schema= [ECSchema= [Codec=xor.numDataUnits=2.numParityUnits=1]], CellSize=1048576.Id=4], State=DISABLED
Copy the code

Explanation of erasure code policy

(1) RS-3-2-1024 K: RS encoding is used to generate 2 parity units for every 3 data units, a total of 5 units. That is to say, the original data can be obtained as long as any 3 units of these 5 units exist (whether it is data units or parity units, as long as the total number =3). The size of each cell is 1024K =1024*1024=1048576.

(2) RS-10-4-1024 K: RS encoding is used to generate 4 check units for every 10 data units, a total of 14 units, that is to say: As long as any 10 of these 14 cells exist (whether data cells or parity cells, as long as the total number =10), the original data can be obtained. The size of each cell is 1024K =1024*1024=1048576.

(3) RS-6-3-1024 K: RS encoding is used to generate 3 parity units for every 6 data units, a total of 9 units. That is to say, as long as any 6 units of the 9 units exist (whether it is data units or parity units, as long as the total number =6), the original data can be obtained. The size of each cell is 1024K =1024*1024=1048576.

(4) RS-Legacy-6-3-1024 K: The strategy is the same as rS-6-3-1024 K above, except that the encoding algorithm is RS-Legacy.

(5) XOR-2-1-1024 K: XOR coding (faster than RS coding) is used to generate 1 check unit for every 2 data units, a total of 3 units, that is to say: As long as any two of the three units exist (no matter data unit or parity unit, as long as the total number = 2), the original data can be obtained. The size of each cell is 1024K =1024*1024=1048576.

5.1.2 Erasure code case practice



Erasure code policy is set for a specific path. This policy applies to all files stored in this path. By default, only the RS-6-3-1024 K policy is supported. To use other policies, enable the policy in advance.

Requirement: Set the /input directory to the RS-3-2-1024 K policy

Step (1) Enable support for RS-3-2-1024 K

[Tom@hadoop102 hadoop-3.13.]$ hdfs ec -enablePolicy -policy RS- 32 --1024k
Erasure coding policy RS- 32 --1024k is enabled
Copy the code

(2) Create a directory in HDFS and set the RS-3-2-1024 K policy

[Tom@hadoop102 hadoop-3.13.]$ hdfs dfs -mkdir /input
[Tom@hadoop102 hadoop-3.13.]$ hdfs ec -setPolicy -path /input -policy RS- 32 --1024k
Set RS- 32 --1024k erasure coding policy on /input
Copy the code

(3) Upload the file and check the storage condition after the file encoding

[Tom@hadoop102 hadoop-3.13.]$ hdfs dfs -put web.log /input
Copy the code



(4) Check the data unit and parity unit of the storage path, and do damage experiments

5.2 Heterogeneous Storage (Separating Hot and Cold Data)

Heterogeneous storage provides the best performance when different data is stored on different types of disks.



About Storage Types

RAM_DISK :(memory mirrored file system)

SSD :(SSD solid-state drives)

DISK :(common DISK. In HDFS, if the data directory storage type is not actively specified, the default value is DISK.)

ARCHIVE :(does not refer to a particular type of storage medium, mainly refers to the storage medium with weak computing power and high storage density, used to solve the problem of data volume expansion, generally used for archiving)

About Storage PolicyNote: From Lazy_Persist to Cold, the device access speed ranges from high to low

5.2.1 Heterogeneous Storage Shell Operations

(1) View available storage policies

[Tom@hadoop102 ~]$ hdfs storagepolicies -listPolicies
Block Storage Policies:
	BlockStoragePolicy{PROVIDED:1, storageTypes=[PROVIDED.DISK], creationFallbacks=[PROVIDED.DISK], replicationFallbacks=[PROVIDED.DISK]}
	BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
	BlockStoragePolicy{WARM:5, storageTypes=[DISK.ARCHIVE], creationFallbacks=[DISK.ARCHIVE], replicationFallbacks=[DISK.ARCHIVE]}
	BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
	BlockStoragePolicy{ONE_SSD:10, storageTypes=[SSD.DISK], creationFallbacks=[SSD.DISK], replicationFallbacks=[SSD.DISK]}
	BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
	BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK.DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
Copy the code

(2) Set a storage policy for the specified path (data store directory)

hdfs storagepolicies -setStoragePolicy -path xxx -policy xxx
Copy the code

(3) Obtain the storage policy of the specified path (data store directory or file)

hdfs storagepolicies -getStoragePolicy -path xxx
Copy the code

(4) Cancel the storage policy. If the directory or file is the root directory, it is HOT

hdfs storagepolicies -unsetStoragePolicy -path xxx
Copy the code

(5) View the distribution of file blocks

bin/hdfs fsck xxx -files -blocks -locations
Copy the code

(6) View cluster nodes

hadoop dfsadmin -report
Copy the code

5.2.2 Preparing the Test Environment

Test Environment Description

Number of servers: 5 Cluster configuration: 2. Create a directory with storage types (created in advance) Cluster planning:

node Storage Type Allocation
hadoop102 RAM_DISK, SSD
hadoop103 SSD, DISK
hadoop104 DISK, RAM_DISK
hadoop105 ARCHIVE
hadoop106 ARCHIVE

Configuration file Information

(1) Add the following information to HDFS -site. XML of hadoop102

<property>
	<name>dfs.replication</name>
	<value>2</value>
</property>
<property>
	<name>dfs.storage.policy.enabled</name>
	<value>true</value>
</property>
<property>
	<name>dfs.datanode.data.dir</name>
	<value>file:///opt/module/hadoop-3.1.3/hdfsdata/ram_disk file:///opt/module/hadoop-3.1.3/hdfsdata/ssd [SSD], [RAM_DISK]</value>
</property>
Copy the code

(2) Add the following information to HDFS -site. XML of hadoop103

<property>
	<name>dfs.replication</name>
	<value>2</value>
</property>
<property>
	<name>dfs.storage.policy.enabled</name>
	<value>true</value>
</property>
<property>
	<name>dfs.datanode.data.dir</name>
	<value>SSD file:///opt/module/hadoop-3.1.3/hdfsdata/ssd, DISK file:///opt/module/hadoop-3.1.3/hdfsdata/disk</value>
</property>
Copy the code

(3) Add the following information to HDFS -site. XML of hadoop104

<property>
	<name>dfs.replication</name>
	<value>2</value>
</property>
<property>
	<name>dfs.storage.policy.enabled</name>
	<value>true</value>
</property>
<property>
	<name>dfs.datanode.data.dir</name>
	<value>[RAM_DISK] file:///opt/module/hdfsdata/ram_disk, DISK file:///opt/module/hadoop-3.1.3/hdfsdata/disk</value>
</property>
Copy the code

(4) Add the following information to HDFS -site. XML of hadoop105

<property>
	<name>dfs.replication</name>
	<value>2</value>
</property>
<property>
	<name>dfs.storage.policy.enabled</name>
	<value>true</value>
</property>
<property>
	<name>dfs.datanode.data.dir</name>
	<value>[ARCHIVE] file:///opt/module/hadoop-3.1.3/hdfsdata/archive</value>
</property>
Copy the code

(5) Add the following information to HDFS -site. XML of hadoop106

<property>
	<name>dfs.replication</name>
	<value>2</value>
</property>
<property>
	<name>dfs.storage.policy.enabled</name>
	<value>true</value>
</property>
<property>
	<name>dfs.datanode.data.dir</name>
	<value>[ARCHIVE] file:///opt/module/hadoop-3.1.3/hdfsdata/archive</value>
</property>
Copy the code

Data preparation

(1) Start the cluster

[Tom@hadoop102 hadoop-3.13.]$ hdfs namenode -format
[Tom@hadoop102 hadoop-3.13.]$ myhadoop.sh start
Copy the code

(2) Create a file directory on the HDFS

[Tom@hadoop102 hadoop-3.13.]$ hadoop fs -mkdir /hdfsdata
Copy the code

(3) Upload files and materials

[Tom@hadoop102 hadoop-3.13.]$ hadoop fs -put /opt/module/hadoop- 3.1.3/NOTICE.txt /hdfsdata
Copy the code

5.2.3 HOT Storage Policy Cases

(1) When the storage policy is not set at the beginning, we obtain the storage policy of the directory

[Tom@hadoop102 hadoop-3.13.]$ hdfs storagepolicies -getStoragePolicy -path /hdfsdata
Copy the code

(2) We check the distribution of uploaded file blocks

[Tom@hadoop102 hadoop-3.13.]$ hdfs fsck /hdfsdata-files -blocks -locations
[DatanodeInfoWithStorage[192.16810.104.:9866.DS-0b133854-7f9e-48df-939b-5ca6482c5afb.DISK].DatanodeInfoWithStorage[192.16810.103.:9866.DS-ca1bd3b9-d9a5-4101-9f92-3da5f1baa28b.DISK]]
Copy the code

No storage policy is configured. All file blocks are stored in DISK. Therefore, the default storage policy is HOT.

5.2.4 Testing the WARM storage policy

(1) Next we cool the data

[Tom@hadoop102 hadoop-3.13.]$ hdfs storagepolicies -setStoragePolicy -path /hdfsdata -policy WARM
Copy the code

(2) Looking at the file block distribution again, we can see that the file block is still in the same place.

[atguigu@hadoop102 hadoop-3.13.]$ hdfs fsck /hdfsdata-files -blocks -locations
Copy the code

(3) We need to let HDFS move file blocks according to the storage policy

[Tom@hadoop102 hadoop-3.13.]$ hdfs mover /hdfsdata
Copy the code

(4) Check the file block distribution again

[Tom@hadoop102 hadoop-3.13.]$ hdfs fsck /hdfsdata -files -blocks -locations
[DatanodeInfoWithStorage[192.16810.105.:9866.DS-d46d08e1-80c6-4fca-b0a2-4a3dd7ec7459.ARCHIVE].DatanodeInfoWithStorage[192.16810.103.:9866.DS-ca1bd3b9-d9a5-4101-9f92-3da5f1baa28b.DISK]]
Copy the code

The file blocks are half DISK and half ARCHIVE, which conforms to the WARM policy we set up

5.2.5 COLD Policy Testing

(1) We continue to cool the data down to cold

[Tom@hadoop102 hadoop-3.13.]$ hdfs storagepolicies -setStoragePolicy -path /hdfsdata -policy COLD
Copy the code

Note: When we set the directory to COLD and we have not configured the ARCHIVE storage directory, we cannot upload files directly to that directory, and an exception will be reported.

(2) Manual transfer

[Tom@hadoop102 hadoop-3.13.]$ hdfs mover /hdfsdata
Copy the code

(3) Check the distribution of file blocks

[Tom@hadoop102 hadoop-3.13.]$ bin/hdfs fsck /hdfsdata -files -blocks -locations
[DatanodeInfoWithStorage[192.16810.105.:9866.DS-d46d08e1-80c6-4fca-b0a2-4a3dd7ec7459.ARCHIVE].DatanodeInfoWithStorage[192.16810.106.:9866.DS-827b3f8b-84d7-47c6-8a14-0166096f919d.ARCHIVE]]
Copy the code

All file blocks are archived, conforming to the COLD storage policy.

5.2.6 Testing the ONE_SSD policy

(1) Next we change the storage policy from the default HOT to One_SSD

[Tom@hadoop102 hadoop-3.13.]$ hdfs storagepolicies -setStoragePolicy -path /hdfsdata -policy One_SSD
Copy the code

(2) Manually transfer file blocks

[Tom@hadoop102 hadoop-3.13.]$ hdfs mover /hdfsdata
Copy the code

(3) After the transfer, check the distribution of file blocks

[Tom@hadoop102 hadoop-3.13.]$ bin/hdfs fsck /hdfsdata -files -blocks -locations
[DatanodeInfoWithStorage[192.16810.104.:9866.DS-0b133854-7f9e-48df-939b-5ca6482c5afb.DISK].DatanodeInfoWithStorage[192.16810.103.:9866.DS-2481a204-59dd-46c0-9f87-ec4647ad429a.SSD]]
Copy the code

Half of the file blocks are stored on SSDS and half are stored on disks, which complies with the One_SSD storage policy.

5.2.7 Testing ALL_SSD policy

(1) Next we change the storage policy from the default HOT to All_SSD

[Tom@hadoop102 hadoop-3.13.]$ hdfs storagepolicies -setStoragePolicy -path /hdfsdata -policy All_SSD
Copy the code

(2) Manually transfer file blocks

[Tom@hadoop102 hadoop-3.13.]$ hdfs mover /hdfsdata
Copy the code

(3) After the transfer, check the distribution of file blocks

[Tom@hadoop102 hadoop-3.13.]$ bin/hdfs fsck /hdfsdata -files -blocks -locations
[DatanodeInfoWithStorage[192.16810.102.:9866.DS-c997cfb4-16dc-4e69-a0c4-9411a1b0c1eb.SSD].DatanodeInfoWithStorage[192.16810.103.:9866.DS-2481a204-59dd-46c0-9f87-ec4647ad429a.SSD]]
Copy the code

All file blocks are stored on SSDS in accordance with All_SSD storage policy.

5.2.8 LAZY_PERSIST Policy Testing

(1) Continue to change the storage policy to lazy_persist

[Tom@hadoop102 hadoop-3.13.]$ hdfs storagepolicies -setStoragePolicy -path /hdfsdata -policy policy lazy_persist
Copy the code

(2) Manually transfer file blocks

[Tom@hadoop102 hadoop-3.13.]$ hdfs mover /hdfsdata
Copy the code

(3) After the transfer, check the distribution of file blocks

[Tom@hadoop102 hadoop-3.13.]$ bin/hdfs fsck /hdfsdata -files -blocks -locations
[DatanodeInfoWithStorage[192.16810.104.:9866.DS-0b133854-7f9e-48df-939b-5ca6482c5afb.DISK].DatanodeInfoWithStorage[192.16810.103.:9866.DS-ca1bd3b9-d9a5-4101-9f92-3da5f1baa28b.DISK]]
Copy the code

Here we find that all of the files are stored on DISK, according to the theory of a copy stored in RAM_DISK, other copies stored in the DISK, that is because, we also need to configure the DFS. Datanode. Max. Locked. “the memory”. DFS. Block. “size” parameter.

When the storage policy is LAZY_PERSIST, file block copies are stored on DISK for two reasons: (1) If the DataNode where the client resides does not have RAM_DISK, the data will be written to the DISK of the DataNode where the client resides, and other copies will be written to disks of other nodes. (2) when the client’s DataNode RAM_DISK, but DFS. DataNode. Max. Locked. The memory parameter values set or not set too small (less than “DFS. Block. The size parameter values”), The data will be written to the DISK of the DataNode where the client resides, and other copies will be written to disks of other nodes.

The Max locked memory of the vm is 64KB, so an error will be reported if the parameter is too large:

ERROR org.apache .hadoop.hdfs.server.datanode.DataNode: Exception in secureMainjava.lang.RuntimeException: Cannot start datanode because the configured max locked memory size(dfs.datanode.max.locked.memory) of 209715200 bytes is more than the datanode's available RLIMIT_ MEMLOCK ulimit of 65536 bytes.
Copy the code

You can use this command to query the memory of this parameter

[Tom@hadoop102 hadoop-3.13.]$ ulimit -a
max locked memory       (kbytes, -l) 64
Copy the code

6. HDFS troubleshooting

6.1 Cluster Security Mode

Safe mode: The file system only accepts data read requests but does not accept change requests such as deletion or modification

The safe mode scenario is displayed:

NameNode is in safe mode during image file loading and log editing;

The NameNode is in safe mode when it receives DataNode registration.



Exit safe mode condition

dfs.namenode.safemode.min.datanodes: Minimum number of available Datanodes. Default: 0

dfs.namenode.safemode.threshold-pct: Percentage of the minimum number of copies in the total number of system blocks. The default value is 0.999 F. (Only one block is allowed to be lost)

dfs.namenode.safemode.extension: Stability time. The default value is 30000 milliseconds, that is, 30 seconds

The basic syntax cluster is in safe mode and cannot perform important operations (write operations). After the cluster is started, it automatically exits the safe mode.

(1) bin/HDFS dfsadmin-safemodeGet (function description: View security mode status) (2) bin/HDFS dfsadmin-safemodeEnter (Enter safe mode) (3) bin/HDFS dfsadmin-safemodeLeave function Description: Leave safe mode state4) bin/HDFS dfsadmin-safemodeWait (Function Description: Wait safe mode state)Copy the code

case: After the cluster starts, the system immediately deletes data from the cluster, indicating that the cluster is in safe mode

6.2 Slow Disk Monitoring

A slow disk refers to a disk that writes data slowly. In fact, chronic disk is not uncommon. When the machine runs for a long time and has too many tasks, the disk’s READ/write performance deteriorates. In serious cases, data write latency may occur.

How do I find a slow disk?

It takes less than 1s to create a directory in the HDFS. If you find that creating a directory takes more than a minute, and it doesn’t happen every time. Only occasionally slow, there is a high probability of slow disk. You can use the following methods to find out which disk is slow:

No contact time through heartbeat

A slow disk may affect the heartbeat between Datanodes and NameNode. The normal heartbeat interval is 3s. If the number exceeds 3s, an exception occurs.



The fio command tests the read/write performance of a disk

(1) Sequential reading test

[Tom@hadoop102 ~]#sudo yum install -y fio
[Tom@hadoop102 ~]# sudo fio -filename=/home/Tom/test.log -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync-bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=test_r
Run status group 0 (all jobs):
READ: bw=360MiB/s (378MB/s), 360MiB/s-360MiB/s (378MB/s-378MB/s), io=20.0GiB (21.5GB), run=56885-56885msec
Copy the code

The result shows that the overall sequential read speed of the disk is 360MiB/s

(2) Write tests in sequence

[Tom@hadoop102 ~]# sudofio -filename=/home/Tom/test.log -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=test_w
Run status group 0 (all jobs):
WRITE: bw=341MiB/s (357MB/s), 341MiB/s-341MiB/s (357MB/s-357MB/s), io=19.0GiB (21.4GB), run=60001-60001msec
Copy the code

The result shows that the overall sequential write speed of the disk is 341MiB/s

(3) Write tests randomly

[Tom@hadoop102 ~]#sudofio -filename=/home/Tom/test.log -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync-bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=test_randw
Run status group 0 (all jobs):
WRITE: bw=309MiB/s (324MB/s), 309MiB/s-309MiB/s (324MB/s-324MB/s), io=18.1GiB (19.4GB), run=60001-60001msec
Copy the code

The result shows that the total random write speed of the disk is 309MiB/s.

(4) Sequential reading test

[Tom@hadoop102 ~]# sudo fio -filename=/home/Tom/test.log -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=test_r_w -ioscheduler=noop
Run status group 0 (all jobs):
READ: bw=220MiB/s(231MB/s), 220MiB/s-220MiB/s (231MB/s-231MB/s), io=12.9GiB (13.9GB), run=60001-60001msec
WRITE: bw=94.6MiB/s (99.2MB/s), 94.6MiB/s- 94..6MiB/s (99.2MB/s- 99..2MB/s), io=5674MiB (5950MB), run=60001-60001msec
Copy the code

The result shows that the overall mixed random read/write speed of the disk is 220MiB/s and the write speed is 94.6MiB/s.

6.3 Archiving small files

Disadvantages of HDFS storing small files



Each file is stored as a block, and the metadata of each block is stored in NameNode memory. Therefore, HDFS is inefficient to store small files. Because a large number of small files will use up most of the memory in NameNode. Note, however, that the disk capacity required to store small files is independent of the size of the data block. For example, a 1MB file set to 128MB block storage actually uses 1MB of disk space, not 128MB.

One solution to storing small files

HDFS archive files, or HAR files, are a more efficient file archive tool that stores files in HDFS blocks, allowing transparent access to files while reducing NameNode memory usage. To be specific, the HDFS archive file is an independent file internally, but a whole file for NameNode, reducing NameNode memory.



A case in field

(1) Start the YARN process

[Tom@hadoop102 hadoop 3 1 3 ]$ start-yarn.sh
Copy the code

Archive all files in /input into an archive file called input.har and store the archive file in /output.

[Tom@hadoop102 hadoop-3.13.]$ hadoop archive -archiveName input.har -p /input /output
Copy the code

(3) Review the document

[Tom@hadoop102 hadoop-3.13.]$ hadoop fs -ls /output/input.har
Found 4 items
-rw-r--r--   3 Tom supergroup          0 2021- 0626 - 17:26 /output/input.har/_SUCCESS
-rw-r--r--   3 Tom supergroup        268 2021- 0626 - 17:26 /output/input.har/_index
-rw-r--r--   3 Tom supergroup         23 2021- 0626 - 17:26 /output/input.har/_masterindex
-rw-r--r--   3 Tom supergroup         74 2021- 0626 - 17:26 /output/input.har/part0

[Tom@hadoop102 hadoop-3.13.]$ hadoop fs -ls har:///output/input.har
2021- 0626 - 17:33:50.362 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Found 3 items
-rw-r--r--   3 Tom supergroup         38 2021- 0626 - 17:24 har:///output/input.har/shu.txt
-rw-r--r--   3 Tom supergroup         19 2021- 0626 - 17:24 har:///output/input.har/wei.txt
-rw-r--r--   3 Tom supergroup         17 2021- 0626 - 17:24 har:///output/input.har/wu.txt
Copy the code

(4) Unpack archive files

[Tom@hadoop102 hadoop-3.13.]$ hadoop fs -cp har:///output/input.har/* /
Copy the code

7. MapReduce production experience

Causes of MapReduce slowness: (1) Computer performance: CPU, memory, disk, and network. (2) I/O operation optimization: data skew. Map takes a long time to run, causing Reduce to wait for a long time. Too many small files

Common tuning parameters of MapReduce





MapReduce data skew problem

Data frequency skew – the amount of data in one area is much greater than in another.

Data size skew – Some records are much larger than average.

Methods to reduce data skew: (1) First check whether the data skew is caused by too many null values. In production environment, null values can be filtered directly; If you want to keep the null value, you can customize the partition by adding a random number to the null value. And then we have a second polymerization. (2) It can be processed in advance in the MAP stage, and it is better to process it in the MAP stage first. For example, Combiner and MapJoin (3) Set the number of reduce tasks

Hadoop comprehensive tuning

8.1 Hadoop Small File Optimization Method

8.1.1 Disadvantages of Hadoop small files

Each file in HDFS needs to create corresponding metadata on NameNode, and the size of metadata is about 150 bytes. In this way, when a large number of small files are created, a large number of metadata files will be generated. On the one hand, NameNode memory space will be occupied by a large number of metadata files. Makes addressing indexing slow. Too many small files will generate too many slices during MR calculation, and too many MapTasks need to be started. Each MapTask processes a small amount of data. As a result, the processing time of the MapTask is shorter than the startup time, wasting resources.

8.1.2 Hadoop Small file solution

1) During data collection, small files or small batches of data are combined into large files and uploaded to HDFS (data source)

2) Hadoop Archive is an efficient file Archive tool that puts small files into HDFS blocks. It can package multiple small files into a HAR file, thus reducing NameNode memory usage.

3) CombineTextInputFormat (computational direction) CombineTextInputFormat is used to generate a single slice or a small number of slices from multiple small files during the slicing process.

By default, each Task needs to start a JVM to run. If the Task computates a small amount of data, we can run multiple tasks of the same Job in a SINGLE JVM. You don’t have to have a JVM on for every Task. (1) Without uber mode enabled, upload several small files on /input path and execute wordcount program

[Tom@hadoop102 hadoop-3.13.]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples- 3.1.3.jar wordcount /input /output2
Copy the code

(2) Observe the console

2021- 0626 - 16:18:07.607 INFO mapreduce.Job: Job job_1613281510851_0002 running in uber mode : false
Copy the code

(3) Observationhttp://hadoop103:8088/cluster



(4) Enable Uber mode and add the following configuration to mapred-site. XML

<! -- Enabled Uber mode, disabled by default -->
<property>
	<name>mapreduce.job.ubertask.enable</name>
	<value>true</value>
</property>
<! -- Maximum number of MapTasks in Uber mode, modifiable down -->
<property>
	<name>mapreduce.job.ubertask.maxmaps</name>
	<value>9</value>
</property>
<! -- Maximum reduce number in Uber mode, can be changed down -->
<property>
	<name>mapreduce.job.ubertask.maxreduces</name>
	<value>1</value>
</property>
<! -- The maximum amount of data to input in Uber mode. The default value is dfs.blocksize, which can be modified down -->
<property>
	<name>mapreduce.job.ubertask.maxbytes</name>
	<value></value>
</property>
Copy the code

(5) Distribute the configuration

[Tom@hadoop102 hadoop]$ xsync mapred-site.xml
Copy the code

(6) Execute wordcount program again

[Tom@hadoop102 hadoop-3.13.]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples- 3.1.3.jar wordcount /input /output2
Copy the code

(7) Observe the console

2021- 0627 - 16:28:36.198 INFO mapreduce.Job: Job job_1613281510851_0003 running in uber mode : true
Copy the code

(8) Observationhttp://hadoop103:8088/cluster

8.2 Testing MapReduce Computing Performance

Use Sort to evaluate MapReduce Note: Do not execute this code if the number of disks on a VM is less than 150 GB

(1) Use RandomWriter to generate random numbers. Each node runs 10 Map tasks, and each Map generates about 1G binary random numbers

[Tom@hadoop102 mapreduce]$ hadoop jar /opt/module/hadoop- 3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples- 3.1.3.jar randomwriter random-data
Copy the code

(2) Execute Sort

[Tom@hadoop102 mapreduce]$ hadoop jar /opt/module/hadoop- 3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples- 3.1.3.jar sortrandom-data sorted-data
Copy the code

(3) Verify whether the data is really sorted

[Tom@hadoop102 mapreduce]$ hadoop jar /opt/module/hadoop- 3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient- 3.1.3-tests.jar testmapredsort -sortInput random-data -sortOutput sorted-data
Copy the code

8.3 Enterprise Development Scenario

8.3.1 demand

(1) Requirement: Count the frequency of occurrence of each word from 1G data. Three servers, each equipped with 4G memory, 4-core CPU, and 4 threads. (2) Requirement analysis: 1G/128m=8 MapTasks; 1 ReduceTask; 1 mrAppMaster, 10 /3 ≈3 tasks per node on average (4 3 3)

8.3.2 Tuning HDFS Parameters

(1) Modify hadoop-env.sh

export HDFS_NAMENODE_OPTS="-Dhadoop.security.logger=INFO,RFAS-Xmx1024m"
export HDFS_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS-Xmx1024m"
Copy the code

(2) Modify hdFS-site.xml

<! --NameNode has a worker thread pool, default is 10-->
<property>
	<name>dfs.namenode.handler.count</name>
	<value>21</value>
</property>
Copy the code

(3) Modify core-site.xml

<! Set garbage collection time to 60 minutes -->
<property>
	<name>fs.trash.interval</name>
	<value>60</value>
</property>
Copy the code

(4) Distribute the configuration

[Tom@hadoop102 hadoop]$ xsync hadoop-env.sh hdfs-site.xml core-site.xml
Copy the code

8.3.3 Tuning MapReduce Parameters

(1) Modify mapred-site.xml

<! -- Ring buffer size, default 100m-->
<property>
	<name>mapreduce.task.io.sort.mb</name>
	<value>100</value>
</property>

<! -- Ring buffer overwrite threshold, default 0.8-->
<property>
	<name>mapreduce.map.sort.spill.percent</name>
	<value>0.80</value>
</property>

<! --merge merge number, default 10 -->
<property>
	<name>mapreduce.task.io.sort.factor</name>
	<value>10</value>
</property>

<! -- MapTask memory, default 1 GB; By default, the mapTask heap memory size is the same as this value. Mapreduce.map.java.
<property>
	<name>mapreduce.map.memory.mb</name>
	<value>- 1</value>
	<description>The amount of memory to request from the scheduler for each    map task. If this is not specified or is non-positive, it is inferred frommapreduce.map.java.opts and mapreduce.job.heap.memory-mb.ratio. If java-opts are also not specified, we set it to 1024.
</description>
</property>

<! -- Matask CPU core number, default 1 -->
<property>
	<name>mapreduce.map.cpu.vcores</name>
	<value>1</value>
</property>

<! --matask number of exception retries, default 4 -->
<property>
	<name>mapreduce.map.maxattempts</name>
	<value>4</value>
</property>

<! -- Number of parallelism for each Reduce to pull data from Map. The default is 5-->
<property>
	<name>mapreduce.reduce.shuffle.parallelcopies</name>
	<value>5</value>
</property>

<! -- Ratio of Buffer size to Reduce available memory, default 0.7-->
<property>
	<name>mapreduce.reduce.shuffle.input.buffer.percent</name>
	<value>0.70</value>
</property>

<! -- What percentage of the data in Buffer will be written to disk? The default value is 0.66. -->
<property>
	<name>mapreduce.reduce.shuffle.merge.percent</name>
	<value>0.66</value>
</property>

<! Reducetask memory, default 1g; Reducetask default heap memory size and the size value consistent graphs. Reduce. Java. Opts - >
<property>
	<name>mapreduce.reduce.memory.mb</name>
	<value>- 1</value>
	<description>The amount of memory to request from the scheduler for each    reduce task. If this is not specified or is non-positive, it is inferred
from mapreduce.reduce.java.opts and mapreduce.job.heap.memory-mb.ratio.
If java-opts are also not specified, we set it to 1024.
</description>
</property>

<! Reducetask Number of CPU cores, default 1 -->
<property>
	<name>mapreduce.reduce.cpu.vcores</name>
	<value>2</value>
</property>

<! Reducetask Number of failed retries, default 4 -->
<property>
	<name>mapreduce.reduce.maxattempts</name>
	<value>4</value>
</property>

<! -- Apply resources for ReduceTask only when the percentage of MapTask completed reaches this value. The default is 0.05 -- -- >
<property>
	<name>mapreduce.job.reduce.slowstart.completedmaps</name>
	<value>0.05</value>
</property>

<! -- If the program does not read data within the specified default 10 minutes, it will forcibly exit -->
<property>
	<name>mapreduce.task.timeout</name>
	<value>600000</value>
</property>
Copy the code

(2) Distribute the configuration

[Tom@hadoop102 hadoop]$ xsync mapred-site.xml
Copy the code

8.3.4 Tuning Yarn Parameters

(1) Modify the yarn-site. XML parameters as follows:

<! -- Select scheduler, default capacity -->
<property>
	<description>The class to use as the resource scheduler.</description>
	<name>yarn.resourcemanager.scheduler.class</name>
	<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

<! --ResourceManager Number of threads to process scheduler requests. The default value is 50. If the number of submitted tasks is greater than 50, you can increase this value, but do not exceed 3 * 4 threads =12 threads (excluding other applications can not exceed 8) -->
<property>
	<description>Number of threads to handle scheduler interface.</description>
	<name>yarn.resourcemanager.scheduler.client.thread-count</name>
	<value>8</value>
</property>

<! The default value is false. If there are many other applications on this node, you are advised to manually configure yarn. If there are no other applications for this node, use automatic -->
<property>
	<description>Enable auto-detection of node capabilities such as memory and CPU.</description>
	<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
	<value>false</value>
</property>

<! Default: false; use physical CPU cores.
<property>
	<description>Flag to determine if logical processors(such as hyperthreads) should be counted as cores. Only applicable on Linux when yarn.nodemanager.resource.cpu-vcores is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true.</description>
	<name>yarn.nodemanager.resource.count-logical-processors-as-cores</name>
	<value>false</value>
</property>

<! -- Virtual kernel and physical kernel multiplier, default is 1.0-->
<property>
	<description>Multiplier to determine how to convert phyiscal cores to vcores. This value is used if yarn.nodemanager.resource.cpu-vcores is set to -1(which implies auto-calculate vcores) and yarn.nodemanager.resource.detect-hardware-capabilities is set to true. Thenumber of vcores will be calculated asnumber of CPUs * multiplier.</description>
	<name>yarn.nodemanager.resource.pcores-vcores-multiplier</name>
	<value>1.0</value>
</property>

<! --NodeManager (default: 8GB, changed to 4GB) -->
<property>
	<description>Amount of physical memory, in MB, that can be allocated for containers. If set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated(in case of Windows and Linux).In other cases, the default is 8192MB.</description>
	<name>yarn.nodemanager.resource.memory-mb</name>
	<value>4096</value>
</property>

<! -- Number of CPU cores for nodeManager (default: 8)
<property>
	<description>Number of vcores that can be allocated
for containers. This is used by the RM scheduler when allocating resources for containers. This is not used to limit the number of CPUs used by YARN containers. If it is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically determined from the hardware in case of Windows and Linux.In other cases, number of vcores is 8 by default.</description>
	<name>yarn.nodemanager.resource.cpu-vcores</name>
	<value>4</value>
</property>

<! -- Container minimum memory, default 1G -->
<property>
	<description>The minimum allocation for every container request at the RMin MBs. Memory requests lower than this will be set to the value of thisproperty. Additionally, a node manager that is configured to have less memorythan this value will be shut down by the resource manager.</description>
	<name>yarn.scheduler.minimum-allocation-mb</name>
	<value>1024</value>
</property>

<! -- Container Max memory, default 8GB, changed to 2G -->
<property>
	<description>The maximum allocation for every container request at the RMin MBs. Memory requests higher than this will throw anInvalidResourceRequestException.</description>
	<name>yarn.scheduler.maximum-allocation-mb</name>
	<value>2048</value>
</property>

<! -- Minimum number of CPU cores in container, default 1 -->
<property>
	<description>The minimum allocation for every container request at the RMin terms of virtual CPU cores. Requests lower than this will  be set to thevalue of this property. Additionally, a node manager that is configured tohave fewer virtual cores than this value will be shut down by the resourcemanager.</description>
	<name>yarn.scheduler.minimum-allocation-vcores</name>
	<value>1</value>
</property>

<! -- Maximum number of CPU cores in a container (default: 4) -->
<property>
	<description>The maximum allocation for every container request at the RMin terms of virtual CPU cores. Requests higher than this will throw an InvalidResourceRequestException.</description>
	<name>yarn.scheduler.maximum-allocation-vcores</name>
	<value>2</value>
</property>

<! -- Virtual memory check, default enabled, changed to disabled -->
<property>
	<description>Whether virtual memory limits will be enforced for containers.</description>
	<name>yarn.nodemanager.vmem-check-enabled</name>
	<value>false</value>
</property>

<! -- Virtual memory to physical memory set ratio, default 2.1 -->
<property>
<description>Ratio between virtual memory to physical memory whensetting memory limits for containers. Container allocations areexpressed in terms of physical memory, and virtual memory usageis allowed to exceed this allocation by this ratio.</description>
	<name>yarn.nodemanager.vmem-pmem-ratio</name>
	<value>2.1</value>
</property>
Copy the code

(2) Distribute the configuration

[Tom@hadoop102 hadoop]$ xsync yarn-site.xml
Copy the code

8.3.5 Executing the program

(1) Restart the cluster

[Tom@hadoop102 hadoop-3.13.]$ sbin/stop-yarn.sh
[Tom@hadoop103 hadoop-3.13.]$ sbin/start-yarn.sh
Copy the code

(2) Execute the WordCount program

[Tom@hadoop102 hadoop 3.13.]$ hadoop jar
share/hadoop/ mapreduce/hadoop mapreduce examples 3.1.3.jar
wordcount /input /output
Copy the code

(3) to observe the Yarn task page http://hadoop103:8088/cluster/apps