preface

We’ve finished updating ZooKeeper for the high concurrency series from scratch, and the previous zooKeeper didn’t incorporate big data into the description. On the one hand, I always want to find a reason to summarize things about big data, on the other hand, IT is to seize the trend of The Times, after all, it is also for themselves, so let’s start without any more nonsense.

Read the instructions

This is similar to a study note, but it definitely has the beginning and the end. It will describe the knowledge in the clearest possible language, and hopefully you will get something out of it

Key points: There are many conceptual problems with big data, so the article will not heap up code as before, but more in the elaboration of concepts

For big data, be sure to read the command and log!

Previous Links to ZooKeeper (Java only)

High concurrency from scratch (1) – basic concept of Zookeeper

High concurrency from zero (2) – Zookeeper implements distributed locking

High concurrency from scratch (3) – Setting up a Zookeeper cluster and electing a leader

High concurrency from zero (4) – Distributed queue of Zookeeper

High concurrency from scratch (5) – The configuration center application of Zookeeper

High concurrency from scratch (6) – Zookeeper Master election and official website overview

1. HDFS Concept

Let’s just go through the basic concepts, at least know what we’re going to talk about and what this thing is for

1.1 the Hadoop architecture

The Hadoop Distributed FileSystem (HDFS) consists of three modules: Distributed storage HDFS, Distributed computing MapReduce, and resource scheduling Yarn

A large number of files can be distributed on different servers

A single file is too large to fit on a single disk. Therefore, the file can be divided into many small blocks and stored on different servers. Each server is connected over the network to form a whole.

1.2 Core Concept Block

The files on hdFs3. x are divided into 128 MB blocks and stored on different datanodes in the cluster. Note that this operation is automatically performed by HDFS.

So let’s say we want to store 300 MB of files, and that 300 MB is going to be cut into

Datanode1:128M + Datanode2:128M + Datanode3:44MCopy the code

At this point, we need to know that even though the underlying logic of datanode3 is divided by 128MB, an actual 44MB block of Datanode3 will not occupy 128MB of space

1.3 Copy of block

The reason Why Hadoop is so popular today is because it was originally designed to be deployed on commercial servers, and we know that commercial servers are very cheap, and cheap servers are very prone to failure, CPU,IO memory, etc.

According to what we just said in 1.2, a file is divided into three blocks and stored on different Datanodes. If one of the Datanodes fails, the file will not be found back, so Hadoop also makes a copy of each data block to ensure the reliability of data

The number of copies can be manually set, generally 3 copies

hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
    </configuration>
Copy the code

You can clearly see that the value of “value” is 3, so our copy number is 3

How we know what these properties do is available on the Hadoop website (2.7.3). Click the official document and scroll down to the bottom of the left sidebar to see the Configuration item

Of course, we need to find the right file. If it is HDFS, we need to find hdFs-default. XML, if it is MapReduce, we need mapred-default. XML, and if it is YARN, we need yarn-default.xml

Click HDFS-default. XML, press CTRL + F to search, type DFs. replication and press Enter

The default value for dfs.replication is 3. The default value for dfs.replication is default block replication. The maximum value for dfs.replication is 512

The same 128M block we mentioned in 1.2 core concept blocks is also available in this file

Therefore, the size of each data block can be set independently

1.3.1 Rack Storage Policies

In the actual equipment room, there are racks, and each rack has several servers

Generally we will store three copies of a block as follows:

The first replica is stored on rack A and the second replica is stored on A server in A different rack from the block (such as rack B)Copy the code

When we store the second replica, we store the replica on a different rack first. This is to prevent a power failure in one rack. If the replica is also stored on a different server on the same rack, the data may be lost.

The third replica is stored on another server in Rack B (note that both replica 2 and 3 are stored in Rack B)Copy the code

Why would be so choice, because if we get a copy of the 3 on another rack C, communications between duplicates of 2 and 3 will need to copy 2 through its total to contact switch, and then switches to contact rack C switches, need route is very long, and bandwidth resources of the computer room is very valuable, if in the condition of the high concurrency, It is easy to use up the bandwidth of the machine room, and the response speed of the whole cluster will drop sharply, and the service will have problems.

Of course, the number of copies can also be increased manually through commands. When the client visits more, we can appropriately allocate the pressure

$ hadoop fs -setrep -R 4 path+FileName
Copy the code

Set replication = set replication = set replication = set replication = set replication = set replication = set replication = set replication = set replication = set replication = set replication

Three components of the HDFS

Again, most of the big data frameworks are actually master-slave architectures, that is, one master with many slaves. HDFS, as we will talk about in a moment, is a NameNode and multiple Datanodes, while MapReduce is a JobTracker and multiple TaskTracker. Yarn is a ResourceManager with multiple NodeManagers, and Spark is a Master and multiple slaves

The introduction of DataNode can be omitted. We only need to know that DataNode is used to store blocks.

2.1 Introduction to NameNode

Big data frames are distributed, each role may run on different servers, need to communicate will need the support of network, and in our client needs to read a file information, must know that our file is divided into many blocks, each block, which are stored in the server again, The information used to describe the file is called the file’s metaData, and the metaData is stored in the NameNode’s memory

2.2 Introduction to metaData

Size of metaData: Files, blocks, and directories occupy about 150 bytes of metadata, so why HDFS is suitable for storing large files but not small files. It can be thought that there is only one 150byte metadata file for storing a large file, and N small files will be accompanied by N 150byte metadata files. It’s not a good deal

Metadata information is stored in a namespace image file (hereinafter called fsimage) and an edit log (hereinafter called edits log)

Fsimage: metadata image file, which stores the directory tree information of the file system and the mapping between files and blocks edits log: log file, which stores the change records of filesCopy the code

If the NameNode goes down, the memory will not be able to read it. In order to prevent this, and to speed up the recovery of the NameNode from a failure, the memory will not be able to read it. A SecondaryNameNode role is designed

Log cache: When a client writes a file to the HDFS, the operation log will be recorded. In this case, we prepare two cache areas in advance. When the first cache is full, the log will be recorded into the disk, that is, the memory of the Edits log and NameNode. This ensures that client writes are logged at all times.

2.3 Introduction to SecondaryNameNode

Its function basically has the following points

Back up the NameNode metadata. 2. Speed up the restart of NameNode 3. As a new NameNode if necessaryCopy the code
Why does SecondaryNameNode speed up the recovery of NameNode?

When a cluster starts, the system records the startup time. When a period of time passes or the Edits log file in the NameNode is full, the checkPoint operation is triggered. This operation is also used in Spark to back up important data

The procedure is described in a point for you to read

1.SecondaryNameNode pulls the edits log and fsimage information through HTTP get

2. Merge the edits log and fsimage in SecondaryNameNode to create a new file called fsimage.ckpt

3. After the merge is complete in SecondaryNameNode, it is passed back to NameNode

4. At this time, it is likely that some clients are still reading and writing NameNode, and new logs will be generated, which will be stored in a separate edits new file

5. The fsimage. CKPT that has just been sent back is decomposed into the original fsimage and edits log files

Why does SecondaryNameNode speed up the restart of the NameNode

How does NameNode recover from a node crash

First, it reads the image file fsimage into memory, then executes all the operations recorded in the Edits log to restore all the metadata, and then returns to the state before shutdown. This process is very slow

However, with SecondaryNameNode, a large part of the metadata can be recovered through the fsimage.ckpt provided by the SecondaryNameNode, which can then be recovered directly by performing the new operations recorded in the Edits log and merged from the Edits New

After the NameNode is determined that it cannot be restarted, the SecondaryNameNode can serve as the new NameNode by running the following command

hadoop-daemon.sh start namenode
Copy the code

Of course, it is not difficult to find that this method is very inelegant, because there must be a gap between the NameNode restart and the SecondaryNameNode host, so the Hadoop HA method will help us to solve this problem

3. HDFS mechanism

3.1 Heartbeat Mechanism

The heartbeat mechanism solves the communication problem between HDFS clusters, and is also the way for NameNode command DataNode to perform operations

After the master Namenode is started, an IPC server is started. 2. The slave DataNode starts, connects to the Namenode, and sends a heartbeat message to the Namenode every 3 seconds. 3.NameNode sends task instructions to DataNode through the return value of this heartbeatCopy the code

The role of the heartbeat mechanism

1.NameNode has full control over data block replication. It periodically receives heartbeat signals and block status reports from each DataNode in the cluster

2.DataNode registers with NameNode when it starts, sends blockReports to NameNode periodically, sends heartbeat messages to NameNode every 3 seconds, and NameNode returns instructions to DataNode, such as copying data blocks to another machine. If a DataNode does not send heartbeat messages to the NameNode within 10 minutes, the NameNode determines that the DataNode is unavailable. In this case, the read and write operations of the client are no longer transmitted to the DataNode

3. When the Hadoop cluster is started, it enters safe mode (99.99%), and the heartbeat mechanism is used. In fact, when the cluster is started, each DataNode sends a block report to the NameNode, and the NameNode counts the total number of blocks reported. When block/total is less than 99.99%, the safe mode is triggered. In safe mode, the client cannot write data to the HDFS, but can only read data.

3.2 Load Balancing

In fact, it is the increase or decrease of nodes, or the level of disk utilization of nodes, mainly through the network for data migration work to achieve high availability

Trigger a command

$ HADOOP_HOME/sbin/start-balancer.sh -t 5%
Copy the code

5% is actually the disk utilization difference mentioned earlier, more than 5% triggers load balancing policy

finally

The next article will continue to update HDFS read and write processes and fault tolerance, HA high availability, and federation