The basic principle of

  • Hive is a Hadoop based data warehouse tool that maps structured data files into a database table and provides simple SQL queries that convert SQL statements into MapReduce tasks for execution. It has the advantages of low learning cost and can quickly realize simple MapReduce statistics through SQL like statements without developing special MapReduce applications, so it is very suitable for statistical analysis of data warehouse.
  • Hive is built on Hadoop, which is based on static batch processing, because Hadoop usually has high latency and requires a lot of overhead when jobs are submitted and scheduled. Therefore, Hive is not suitable for applications that require low latency. It is best suited for batch jobs based on large amounts of immutable data, such as network log analysis.
  • Hive is scalable (dynamically adding devices to a Hadoop cluster), scalable, fault tolerant, and loosely coupled in the output format.
  • Hive stores metadata in relational databases (RDBMSs), such as MySQL and Derby.
  • Hive has three modes to connect to data: single-user mode, multi-user mode, and remote service mode. (that is, embedded mode, local mode, remote mode).

Linux Ubuntu 20.04 OpenJDK-11.0.11 Hadoop -3.2.2 mysql-8.0.25 mysql-connector-java-8.0.0.jar

The prerequisite is to build Hive based on the completion of Hadoop and MySQL installation.


1. Create a/ data/hive1 folder on your Linux native to store the installation packages.

mkdir -p /data/hive1

Switch to /data/hive1 and use wget to download apache-hive-2.3.8-bin.tar.gz and mysql-connector-java-8.0.25.jar.

CD/data/hive1 wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.8/apache-hive-2.3.8-bin.tar.gz wget https://dev.mysql.com/downloads/file/?id=504646

2. Unzip /data/hive1 apache-hive-2.3.8-bin.tar.gz to /apps.

Tar-xzvf /data/hive1/apache-hive -2.3.8-bin.tar.gz-c /apps/

/apps/hive-1.1.0-cdh5.4.5 /apps/hive-1.1.0-cdh5.4.5 /apps/hive-1.1.0-cdh5.4.5 /apps/hive-1.1.0-cdh5.4.5

CD /apps mv /apps/apache-hive-2.3.8-bin.tar.gz/ /apache-hive-2.3.8-bin.tar.gz

3. Use Vim to open user environment variables.

sudo vim ~/.bashrc

Add the Hive bin directory to the user environment variable PATH, and then save exit.

#hive config  
export HIVE_HOME=/apps/hive  
export PATH=$HIVE_HOME/bin:$PATH

Execute the Source command to make the Hive environment variable effective.

source ~/.bashrc 

4. Because Hive needs to store metadata in MySQL. So you need to copy the mysql-connector-java-8.0.25.jar from /data/hive1 into the hive lib directory.

Cp/data/hive1 / mysql connector - Java - 8.0.25. Jar/apps/hive/lib /

/apps/ Hive /conf/Hive /conf/Hive/Hive /conf/Hive/Hive/site /conf/Hive

cd /apps/hive/conf  
touch hive-site.xml    

Open the hive-site.xml file using Vim.

vim hive-site.xml

Add the following configuration items to the hive-site.xml file.

The < property > # javax.mail. Jdo. Option. ConnectionDriverName: connect to the database driver package. <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> # javax.mail. Jdo. Option. ConnectionUserName: database user name. <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> # javax.mail. Jdo. Option. ConnectionPassword: connect to the database password. <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <property> < name > hive. Server2. Thrift. Bind. Host < / name > < value > 127.0.0.1 < value > / < / property >



Since Hive metadata is stored in the MySQL database, you need to specify MySQL information in the Hive configuration file.

Javax.mail. Jdo. Option. ConnectionURL: database connection string. Javax.mail. Jdo. Option. ConnectionDriverName: connect to the database driver package. Javax.mail. Jdo. Option. ConnectionUserName: database user name. Javax.mail. Jdo. Option. ConnectionPassword: connect to the database password. The database user name and password here need to be set to the database user name and password of your own system.

6. In addition, you need to tell Hive about the Hadoop environment configuration. So we need to modify the hive-env.sh file.

First we’ll rename the hive-env.sh.template to hive-env.sh.

mv /apps/hive/conf/hive-env.sh.template  /apps/hive/conf/hive-env.sh 

Open the hive-env.sh file using Vim.

vim hive-env.sh

Append the path to Hadoop and the path to the Hive configuration file to the file.

# Set HADOOP_HOME to point to a specific hadoop install directory # HADOOP_HOME=${bin}/.. /.. /hadoop HADOOP_HOME=/apps/hadoop # Hive Configuration Directory can be controlled by: # export HIVE_CONF_DIR= export HIVE_CONF_DIR=/apps/hive/conf



7. The next step is to configure MySQL to store Hive metadata.

First, you need to ensure that MySQL is started. Execute the following command to see the status of MySQL.

sudo service mysql status 



From the output, you can see that MySQL has started. If not, you need to execute the startup command.

sudo service mysql start

If you do not have MySQL installed, you will need to execute the installation command. If MySQL is already installed for our environment, we do not need to perform this step.

sudo apt-get install mysql-server 

8. Open MySQL database.

mysql -u root -p

Look at the database.

show databases;

Next, type exit to exit MySQL.

exit 

9. Execute tests. Since Hive relies on the MapReduce computing model for data processing, you need to ensure that Hadoop related processes have been started.

Enter JPS to see the status of the process. Hadoop needs to be started if the Hadoop related process is not started.

/apps/hadoop/sbin/start-all.sh  

After starting Hadoop, enter Hive directly in the terminal command line interface to start Hive command-line mode.

hive 

Enter the HQL statement to query the database to test if Hive works.

show databases;  

Hit the pit

Error: Error in Hive startup

The Exception in the thread "main" Java. Lang. NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String; Ljava/lang/Object;) V

There is a mismatch between a Guava JAR in the lib between Hadoop and Hive.

Check the guava opening jar in hadoop and hadoop’s path on the hive:

cd /apps/hadoop/share/hadoop/common/lib/

Hive:

cd /apps/hive/lib/

Solution: Look at the JAR starting with Guava under the path and select the one with the higher version to overwrite the other.

Mv/apps/hadoop/share/hadoop/common/lib/guava - 27.0 - the jre. Jar/apps/hive/lib /

Delete the lower version

The rm -f guava - 19.0. The jar

Done!