An overview of the

Apache Hive data warehouse software can easily read, write, and manage large data sets in distributed storage using SQL. Hive data warehouse maps structured data files to a database table, provides SQL query functions, and converts SQL statements into MapReduce tasks. Command-line tools and JDBC drivers are provided to connect users to Hive. The framework was developed by Facebook open source to solve the problem of massive structured data statistics and later became Apache’s top incubator project.

Install the deployment

1. Pre-preparation

Before deploying Hive, ensure that JDK, MYSQL, and HADOOP components are available. The deployment guidelines are provided below

  • JDK
  • MYSQL
  • HADOOP-CDH

2. Prepare the installation package and JDBC driver

  • Hive-1.1.0-cdh5.16.2.tar. gz Download address _ Extract code: 87ew
  • Mysql connector – Java – 5.1.47. Jar
[hadoop@xinxingdata001 software]$tar -zxvf hive-1.1.0-cdh5.16.2.tar.gz -c.. /app/ [hadoop@xinxingdata001 software]$ cd .. /app/ [hadoop@xinxingdata001 app]$ln -s hive-1.1.0-cdh5.16.2/ hiveCopy the code

Add Hive environment variables, configure conf files, and copy driver classes

Configuring environment Variables

[hadoop@xinxingdata001 ~]$ vim ~/.bashrc

#set hive environment
export HIVE_HOME=/home/hadoop/app/hive
export PATH=$HIVE_HOME/bin:$PATH

[hadoop@xinxingdata001 ~]$ source ~/.bashrc
Copy the code

Configuring the CONF File

[hadoop@xinxingdata001 ~]$ vim /home/hadoop/app/hive/conf/hive-site.xml

<? The XML version = "1.0" encoding = "utf-8"? > <? xml-stylesheet type="text/xsl" href="configuration.xsl"? > <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://xinxingdata:3306/hive? createDatabaseIfNotExist=true&amp; useSSL=false</value> <description>the URL of the MySQL database</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> </configuration>Copy the code
[hadoop@xinxingdata001 conf]$ cp hive-env.sh.template hive-env.sh
[hadoop@xinxingdata001 conf]$ vim hive-env.sh

# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/hadoop/app/hadoop

Copy the code

The Hive configuration parameters: cwiki.apache.org/confluence/…

Add mysql driver Jar package

[hadoop@xinxingdata001 software]$cp mysql-connector-java-5.1.47.jar.. /app/hive/lib/mysql-connector-java.jarCopy the code

Metastore + Hiveserver2 service

[hadoop@xinxingdata001 software]$ nohup hive --service  metastore > ~/log/metastore.log 2>&1 &

[hadoop@xinxingdata001 software]$ nohup  hiveserver2  > ~/log/hiveserver2.log 2>&1 &
Copy the code

Common commands, basic knowledge, exercises

Common commands

hive> ! clear; // hive> exit; // exit hive> show databases; Hive > show databases '%student%'; Hive > create database IF NOT EXISTS xx_hive; Hive > create table student(id int, name string, age int); Hive > use xx_hive; // Access the xx_hive database; Hive > set hive. Cli. Print. Current. The db / / view the current configuration of the value value; hive> set hive.cli.print.current.db=true; // Explicitly display the database name; hive> set hive.cli.print.header=true; // Whether to print the column names in the query output; hive> desc formatted student; Hive (default)> create database if not exists hive2 location '/hive/directory'; Hive (hive2)> drop database if exists hive2 cascade; // Delete database; Add cascade to cascade (use with caution). hive (default)> select current_database(); Hive > ALTER TABLE xinxing_emp2 RENAME TO xinxing_emp2_new; // Change the SQL statement from DROP to TRUNCATE to delete only the reserved table structure. MSCK REPAIR TABLE emp_partition [ADD/DROP/SYNC PARTITIONS]; / / refresh metastore information hive > set hive. Fetch. Task. The conversion = 0 | | 1 | | 2 / / decided to run what MR hive > set hive. Execution. The engine / / select execution engine MR | | SPARK hive (default) > select explodes (field) from TBLS; Hive (default)> select split(TBLS) from TBLS; // Split functionCopy the code

Interactive command line

[hadoop@xinxingdata001 conf]$ hive -e "select * from student"; [hadoop@xinxingdata001 conf]$hive -f Specifies the SQL file.Copy the code

Common data types, delimiters

Value type: int float double BigInt String type: string VARCHAR char Boolean type: blloean Complex data type: array, map, structs Common delimiters:, \tCopy the code

Create a table

CREATE [EXTERNAL] TABLE table_name [(column_constraint_specification] [COMMENT col_comment], ...[constraint_specification]) ] [ROW FORMAT row_format] [AS select_statement] [LIKE existing_table_or_view_name] [LOCATION hdfs_path]; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * EXTERNAL / / EXTERNAL table column_constraint_specification: [PRIMARY KEY | | to chassis NOT NULL | DEFAULT [default_value] | CHECK [check_expression] ENABLE | DISABLE NOVALIDATE RELY/NORELY] COMMENT col_comment / / play annotation constraint_specification: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ] [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ] [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE [, CONSTRAINT constraint_name UNIQUE (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ] [, CONSTRAINT constraint_name CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ] row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] // separator AS select_statement // Treat the data returned by the query as the data for the currently created table. [LIKE existing_table_or_view_name] // Copy the data structure of the existing_table_or_view_name table LOCATION hdfs_path // import the file data in the hdfs_path pathCopy the code

emp.txt

7369 SMITH CLERK 7902 1980-12-17 800.00 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30 7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30 7566 JONES MANAGER 7839 1981-4-2 2975.00 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30 7782 CLARK MANAGER 7839 1981-6-9 2450.00 10 7788 SCOTT Ana-Lyst 7566 1987-4-19000.00 20 7839 KING PRESIDENT 1981-11-17000.00 10 7844 TURNER SALESMAN 7698 1981-9-81500.00 0.00 30 7876 ADAMS CLERK 7788 1987-5-23 1100.00 20 7900 JAMES CLERK 7698 1981-12-3 950.00 30 7902 FORD Ana-Lyst 7566 1981-12-3 3000.00 20 7934 MILLER CLERK 7782 1982-1-23 1300.00 10 8888 HIVE PROGRAM 7839 1988-1-23 10300.00Copy the code

Create a table with \t delimiter

create table xinxing_emp(
    empno int,
    ename string,
    job string,
    mgr int,
    hiredate string,
    sal double,
    comm double,
    deptno int
)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
Copy the code

Create a table and copy the data structure of the table

create table xinxing_emp2 like xinxing_emp;

Specify the HDFS path for data during table creation

CREATE TABLE tbl_name like xinxing_emp LOCATION '/user/hive/warehouse/xinxing_emp';

Create a table and copy its data structure and data

create table xinxing_emp3 as select * from xinxing_emp;

Load data into a table

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
Copy the code

[LOCAL] indicates a LOCAL path. If no [LOCAL] indicates a HDFS path. If [OVERWRITE] is specified, the HDFS path is overwritten

Load local data to the Hive table

LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE xinxing_emp;

Insert (data retrieved from query) into hive table

Basic syntax: INSERT [OVERWRITE] TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement; > OVERWRITE with [OVERWRITE] and append extension syntax without [OVERWRITE] : INSERT [OVERWRITE] TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]]  select_statement2]Copy the code

Insert local data into the Hive table

INSERT OVERWRITE [LOCAL] DIRECTORY directory1 [ROW FORMAT row_format] [STORED AS file_format]  SELECT ... FROM ...

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/data/tmp/' select empno,ename from ruozedata_emp;
Copy the code

Insert data returned from an SQL query into the Hive table

insert into table emp2 select empno,ename,job,mgr,hiredate,sal,comm,deptno from ruozedata_emp;
Copy the code

Query built-in methods

hive (default)> show functions; Hive (default)> desc function [extended] lower; // See usage and examplesCopy the code



Copy the code