1. Install Hive

1.1 Download and decompress the package

Download the required version of Hive, which I downloaded as CDH5.15.2. Download address: archive.cloudera.com/cdh5/cdh/5/

#Download it and unzip itThe tar - ZXVF hive - 1.1.0 - cdh5.15.2. Tar. GzCopy the code

1.2 Configuring Environment Variables

# vim /etc/profile
Copy the code

Add environment variables:

Export HIVE_HOME = / usr/app/hive - 1.1.0 - cdh5.15.2 export PATH = $HIVE_HOME/bin: $PATHCopy the code

Make configured environment variables take effect immediately:

# source /etc/profile
Copy the code

1.3 Modifying the Configuration

1. hive-env.sh

Go to the conf/ directory in the installation directory and copy the Hive environment configuration template flume-env.sh.template

cp hive-env.sh.template hive-env.sh
Copy the code

Change hive-env.sh to specify Hadoop installation path:

HADOOP_HOME = / usr/app/hadoop - server - cdh5.15.2Copy the code

2. hive-site.xml

Create the hive-site. XML file that contains the following information, including the address, driver, user name, and password of the MySQL database that stores metadata:


      

      

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://hadoop001:3306/hadoop_hive? createDatabaseIfNotExist=true</value>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
  </property>

</configuration>
Copy the code

1.4 Copying database drivers

Copies the MySQL driver package into a Hive installation directory lib directory, MySQL driver download address is: dev.mysql.com/downloads/c… , I have also uploaded a copy under the resources directory of this warehouse. If necessary, you can download it by yourself.

1.5 Initializing the Metadata Database

  • If hive version 1.x is used, you do not need to initialize hive. Hive automatically initializes metadata tables when it is started for the first time, but only necessary tables are initialized. Other tables will be automatically created when other tables are used in future use.

  • If hive version 2.x is used, you must manually initialize the metadata database. Initialization command:

    #The schematool command is in the bin directory of the installation directory. You can run it anywhere because environment variables have been configured
    schematool -dbType mysql -initSchema
    Copy the code

Here I use CDH hive-1.1.0-cDH5.15.2.tar. gz, corresponding to hive 1.1.0, you can skip this step.

1.6 start

The bin directory of the Hive has been configured to environment variables. Run the following command to start the Hive. On the interactive CLI, run the show databases command.

# hive
Copy the code

You can also see Hive libraries and tables for storing metadata information in Mysql

Second, the HiveServer2 / beeline

Hive provides HiveServer and HiveServer2 services. Both HiveServer and HiveServer2 services allow clients to connect using multiple programming languages. However, HiveServer cannot process concurrent requests from multiple clients. HiveServer2 (HS2) allows remote clients to submit requests and retrieve results to Hive using a variety of programming languages, and supports concurrent access and authentication from multiple clients. HS2 is a single process consisting of multiple services, including Thrift based Hive services (TCP or HTTP) and Jetty Web services for Web UI.

HiveServer2 has its own CLI tool, Beeline. Beeline is a JDBC client based on SQLLine. Since HiveServer2 is the focus of Hive development and maintenance, it is recommended to use Beeline instead of Hive CLI. The following describes how to configure Beeline.

2.1 Modifying Hadoop Configurations

Modify the core-site. XML configuration file of the Hadoop cluster and add the following configuration to specify that the root user of Hadoop can represent all users on the local machine.

<property>
 <name>hadoop.proxyuser.root.hosts</name>
 <value>*</value>
</property>
<property>
 <name>hadoop.proxyuser.root.groups</name>
 <value>*</value>
</property>
Copy the code

The reason for this step is that hadoop 2.0 introduced security spoofs, so that Hadoop does not allow upper-layer systems (such as Hive) to transfer real users directly to the Hadoop layer. Instead, the real users should be transferred to a super agent that performs operations on Hadoop. To prevent arbitrary clients from manipulating Hadoop. If this step is not configured, an AuthorizationException may be thrown in subsequent connections.

For details about Hadoop’s user agent mechanism, see Hadoop’s User agent mechanism or Superusers Acting On Behalf Of Other Users

2.2 start hiveserver2

Since the environment variable has been configured above, it can be directly started here:

# nohup hiveserver2 &
Copy the code

2.3 use beeline

You can run the following command to access the beeline interactive command line. If Connected is displayed, the connection is successful.

# beeline -u jdbc:hive2://hadoop001:10000 -n root
Copy the code