Hadoop2.10.0 - Hive2.3.7 - HBase2.2.4 Environment setup

The big data ecosystem has certain requirements for the technical version of each component, and if the version is not adapted, various problems are likely to occur.

X, 2. X, and 3. X are different from each other. When building a hive data warehouse or hbase database based on Hadoop-HDFS, the version selection is preferred.

Like the usual version of support, there are many articles on the Internet, but based on Hadoop2.10.0 may be rare, so here is a summary:

Resource address (Hive, Hadoop, ZooKeeper, hbase, mysql database driver, etc.) :

Link: pan.baidu.com/s/1n4wRfi9G… Extract code: S8YX

Link: https://pan.baidu.com/s/1n4wRfi9G5Ff9yfcKlMdVLg extraction code: s8yx

One: Hadoop2.10.0 installation

Reference environment:

MAC Pro with 16GB of ram
Parallels DeskTop
Install OS CentOs7 on the VM
JDK version: Sun JDK 1.8
Hadoop2.10.0 High availability (HA) mode, Hive2.3.7 single-node, HBase2.2.4 cluster (No standby Master is configured), ZooKeeper3.4.14 (three-node cluster), Hive metadata is stored in mysql
Hadoop cluster Start the HDFS cluster and YARN
Four virtual machines centos

Pre-preparation:

Install jdk1.8 on the four VMS and configure the /etc/profile environment variables JAVA_HOME and path. For details, see the following

export JAVA_HOME=/usr/local/ jdk1.8.0 _65exportHADOOP_HOME = / home/hadoop/hadoop - 2.10.0exportHIVE_HOME = / home/hadoop/apache - hive - 2.3.7 - binexportHBASE_HOME = / home/hadoop/hbase - 2.2.4export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin
 
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
 
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeCopy the code

Set the node name, and add node01: IP address, node02:IP address, node03: IP address, and node04: IP address to /etc/hosts. The Settings on the four machines are the same, and the SCP command is used to distribute the IP addresses. For details about how to configure /etc/hosts:
```
19.211.55.3  node01
19.211.55.4  node02
19.211.55.5  node03
19.211.55.6  node04Copy the code
```
Configure SSH no-password login for the four VMS (each vm must be configured in the /root directory,.ssh/ is the finished..)
```
Ssh-keygen ssh-copy-id -i /root/.ssh/id_rsa.pub node01(node name)Copy the code
```

The four VMS synchronize time and aliyun

yum install ntpdate
ntpdate ntp1.aliyun.comCopy the code

Upload the tar.gz package of hadoop2.10.0 to the /home/hadoop directory on the VM and decompress the package. (For convenience, do not configure an independent user and use the root user to perform startup operations.)
Set the /etc/profile environment variable, run the source /etc/profile command, and distribute the command to other nodes for the same operation

Prepare the ZooKeeper cluster and set the ZooKeeper cluster on Node02, node03, and node04. The configuration of the /home/hadoop/zookeeper-3.4.14/conf/zoo. CFG file is as follows: (/var/zfg/ zooKeeper set myid file, and set 1, 2, and 3 as zooKeeper identification tags according to the service name)

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/var/zfg/zookeeper
server.1=node02:2888:3888
server.2=node03:2888:3888
server.3=node04:2888:3888
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1Copy the code

Set hadoop configuration files (write both HDFS and YARN configuration files for simultaneous distribution), add and modify hdFS-site. XML,mapred-site. XML,core-site. XML, YARn-site. XML, and slaves. Set the parameters of hadoop-env.sh, including the JDK directory.

Reference to the HDFS -site. XML configuration file:

<? xml version="1.0" encoding="UTF-8"? > <? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <! -- Licensed under the Apache License, Version 2.0 (the"License");
  you may not use this file except inThe compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed toin writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License forthe specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <! -- Put site-specific property overridesinthis file. --> <configuration> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>node01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>node02:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>node01:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>node02:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node01:8485; node02:8485; node03:8485/mycluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/var/sxt/hadoop/ha/jn</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value>
  </property>
   <property>
        <name>dfs.replication</name>
        <value>3</value>
   </property>
</configuration>
Copy the code

Core-site. XML configuration file reference:

<? xml version="1.0" encoding="UTF-8"? > <? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <! -- Licensed under the Apache License, Version 2.0 (the"License");
  you may not use this file except inThe compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed toin writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License forthe specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <! -- Put site-specific property overridesinthis file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/var/abc/hadoop/cluster</value> </property>  <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>node02:2181,node03:2181,node04:2181</value> </property> </configuration>Copy the code

Reference to the yarn-site. XML configuration file:

<? xml version="1.0"? > <! -- Licensed under the Apache License, Version 2.0 (the"License");
  you may not use this file except inThe compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed toin writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License forthe specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <! -- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>node02:2181,node03:2181,node04:2181</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>mashibing</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>node03</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>node04</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>1</value> </property> <property> <! - the client through this address to submit the application to the RM operations - > < name > yarn. The resourcemanager. Address. Rm1 < / name > < value > master: 8032 < value > / < / property > <property> <! -ResourceManager access address exposed to ApplicationMaster. ApplicationMaster uses this address to apply for or release resources from RM. --> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>node03:8030</value> </property> <property> <! - RM HTTP access address, check the cluster information - > < name > yarn. The resourcemanager. Webapp. Address. Rm1 < / name > < value > node03:8088 < value > / < / property > <property> <! - NodeManager exchange information through the address - > < name > yarn. The resourcemanager. Resource - tracker. Address. Rm1 < / name > < value > node03:8031 < / value > </property> <property> <! - administrators through the address send administrative commands to the RM - > < name > yarn. The resourcemanager. Admin. Address. Rm1 < / name > < value > node03:8033 < value > / < / property > <property> <name>yarn.resourcemanager.ha.admin.address.rm1</name> <value>node03:23142</value> </property> <property> <name>yarn.resourcemanager.address.rm2</name> <value>node04:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>node04:8030</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>node04:8088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>node04:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm2</name> <value>node04:8033</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm2</name> <value>node04:23142</value> </property> </configuration>Copy the code

Slaves Document Reference:
```
node02
node03
node04Copy the code
```

Mapred-site. XML file reference

<? xml version="1.0"? > <? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <! -- Licensed under the Apache License, Version 2.0 (the"License");
  you may not use this file except inThe compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed toin writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License forthe specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <! -- Put site-specific property overridesin this file. -->

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>Copy the code

Distribute the unzipped Hadoop directory to node02, Node03, and Node04 machines.
Start the ZooKeeper cluster
Start Initializing the Hadoop cluster.
Start the HDFS cluster, yarn, or start-all.sh

2: Install and start hive2.3.7 (stand-alone +mysql)

Pre-preparation:

Mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql Also note that the Settings allow access to machines outside the native machine.
Gz package of Hive 2.3.7 to /home/hadoop/ and decompress it
Configure hive-site. XML, including mysql connection parameters and HDFS address

Hive-site. XML reference:

<? xml version="1.0" encoding="UTF-8" standalone="no"? > <? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <! -- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this workfor additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except inThe compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed toin writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License forthe specific language governing permissions and limitations under the License. --><configuration> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> < name > javax.mail. Jdo. Option. ConnectionURL < / name > < value > JDBC: mysql: / / 46.77.56.200:3306 / hive? createDatabaseIfNotExist=true</value>
                        </property>
                        <property>
                                <name>javax.jdo.option.ConnectionDriverName</name>
                                <value>com.mysql.jdbc.Driver</value>
                        </property>
                        <property>
                                <name>javax.jdo.option.ConnectionUserName</name>
                                <value>root</value>
                        </property>
                        <property>
                                <name>javax.jdo.option.ConnectionPassword</name>
                                <value>123456</value>
                        </property>
</configuration>Copy the code

Configure the /etc/profile environment variable and copy the mysql database connection driver to the Hive lib library
Initialize hive metadata and save it to mysql

Three: HBase2.2.4 cluster installation

Pre-preparation:

Gz package of hBase2.2.4 to the /home/hadoop directory and decompress it

Configure hbase-site. XML and hbase-env.sh, and copy the HDFS -site. XML configuration file in the Hadoop cluster to the hbase conf directory. Hbase – site. XML reference:

<? xml version="1.0"? > <? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <! -- /** * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this workfor additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except inThe compliance * have the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 * * * * Unless required by applicable law or agreed toin writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-->
<configuration>
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://mycluster/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>node02,node03,node04</value>
  </property>
  <property>
  <name>hbase.unsafe.stream.capability.enforce</name>
  <value>false</value>
</property>
  <property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
</configuration>Copy the code

Set the environment variables in /etc/profile, run the source /etc/profile command, and distribute them to other nodes for the same operation. Set the hbase-env.sh parameter, JDK path, and do not use the zooKeeper provided with it.
Modify the RegionServers file to set the region distribution on node02, node03, and node04
Node distribution (the four clusters have the same directory)
Start the hbase cluster and perform basic operations. After the hbase shell is installed, run the status and processlist commands to check whether the hbase cluster is started properly

Possible problems during installation are listed and solved as follows:

Problems during the Hadoop cluster installation and configuration:

Different versions of Hadoop may cause various problems in the process of cluster construction, because each version of Hadoop is quite different. What we need to do is to carefully look at the error log and look at the port number to quickly locate the problem and then fix it.

Faults during the Installation and configuration of the ZooKeeper cluster:

Note When configuring the ZooKeeper cluster, ensure that the value of myID is the same as that of server.1 in the zoo.cfg configuration file. Otherwise, nodes cannot be combined with each other and the cluster fails to start.

Problems during the Hive installation and configuration:

Note That the mysql driver must be copied to the Hive Lib library, and the driver must correspond to the connected mysql version.

Mysql installation and configuration

Note that mysql is case-sensitive when it is installed on Linux. The default value is differentiated, but the default value is not differentiated. If this value is not set, various problems may occur during web development or automatic table creation. Mysql is case insensitive on Windows by default

Faults during hbase cluster installation and configuration:

The Hmaste process disappears after a few seconds after it is started (you can run the JPS command to view information), or the Hmaster process exists on the standby server, but the Hmaster process disappears
When commanline is commanline using the hbase shell, type Status or processList and press Enter.
hbase error: KeeperErrorCode = NoNode for /hbase/master
In the log: Java. Lang. RuntimeException: HMaster Aborted at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2120)
A NULL or 500 error occurs after the webui is started

These problems are related. Either the default file path /hbase/master in ZooKeeper is not generated, which may be caused by startup failure or verification and synchronization failure. You can see the details in the log file.

Add this parameter to hbase-site. XML (to resolve verification failures) :

       <property>            
         <name>hbase.unsafe.stream.capability.enforce</name>               
         <value>false</value>         
       </property>Copy the code

Open the hbase webUI (hbase-site.xml) and perform the following operations:

  <property>
     <name>hbase.master.info.port</name>
     <value>60010</value>
  </property>Copy the code

Use port 60010 to access the hbase webUI.

KeeperErrorCode = NoNode for /hbase/master In this case, stop the hbase cluster. If stop-hbase.sh cannot be stopped (it takes too long to stop), kill -9 to kill the process and delete the /hbase directory path in the ZooKeeper cluster.

Sh command to go to zk Commandlined, enter ls/command to view, and then run RMR /hbase command to delete the node path.

Check logs to see whether other errors, such as sync check, are recorded in the logs. If any, add the first configuration item and restart hbase. After hbase shell, enter processList and Status.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Hadoop2.10.0 – Hive2.3.7 – HBase2.2.4 Environment setup

2: Install and start hive2.3.7 (stand-alone +mysql)

Three: HBase2.2.4 cluster installation

Possible problems during installation are listed and solved as follows:

Hadoop2.10.0 – Hive2.3.7 – HBase2.2.4 Environment setup

2: Install and start hive2.3.7 (stand-alone +mysql)

Three: HBase2.2.4 cluster installation

Possible problems during installation are listed and solved as follows:

Related Posts

Small program engineering – environment variable configuration management

From SOA to microservices, how are enterprise distributed application architectures reshaped in the cloud native era?

CI practice based on GitLab