1. Data types

1.1 Basic data types

type HIVE MySQL JAVA The length of the example
The integer TINYINT TINYINT byte 1-byte integer 2
SMALINT SMALINT short 2-byte integer 20
INT/INTEGER INT int 4-byte integer 20
BIGINT BIGINT long 8-byte integer 20
String type STRING VARCHAR string String type “Hello world”
VARCHAR VARCHAR string String type
CHAR VARCHAR string String type
floating-point FLOAT FLOAT float Single-precision floating point number 3.14159
DOUBLE DOUBLE double A double – precision floating – point number 3.14159
Time to type TIMESTAMP TIMESTAMP Time to type
DATE DATE DATE Time to type
Other types of BINARY BINARY binary
BOOLEAN There is no boolean Boolean type TRUE/FALSE

1.2 Collection data types

The data type describe Syntax examples
MAP MAP<primitive_type, data_type> The map (” key1 “, 1)
ARRAY ARRAY<data_type> Array (” aaa “, “BBB”, “CCC”)
STRUCT STRUCT<col_name : data_type1, data_type2,data_type3> Struct (” a “, 1, 2, 3)

Hive arrays and maps are similar to Arrays and maps in Java, and structs are similar to STRUCts in C. They encapsulate a collection of named fields. Complex data types can be nested at any level.

Declarations of complex data types must use < Angle brackets > to indicate the type of the data fields in them:

create table t_eg(
         col1 ARRAY<INT>,
         col2 MAP<STRING,INT>,
         col3 STRUCT<a:STRING,b:INT,c:DOUBLE>);
Copy the code

We use JSON format to represent the data structure:

{" name ":" jack ", "friends" : [" Sam ", "van"], / / list Array, "children" : {/ / key/value Map, "chale" : 18, "davil" : 19} "address" : Struct, "street": "Beijing road ", "phone": 13631230887, "money": 246.4}}Copy the code

1.3 Type Conversion

Hive atomic data types can be converted implicitly. For example, if an expression uses an INT, TINYINT is automatically converted to an INT.

However, Hive does not perform the reverse conversion unless the CAST operation is used.

1.3.1 Implicit Type Conversion rules

  1. Any integer type can be implicitly converted to a wider range of integer types, such as TINYINT — >INT, INT — >BIGINT.
  2. All integer types, floats, and strings can be implicitly converted to DOUBLE.
  3. TINYINT, SMALLINT, INT can all be converted to FLOAT.
  4. BOOLEAN types cannot be converted to any other type.

1.3.2 Use CAST to display conversion data

CAST(‘1’ AS INT), will convert the string ‘1’ to the integer 1

2. Common DDL operations

2.1 the Database

2.1.1 Viewing the Data List

show databases;
Copy the code

2.1.2 Using the Database

USE database_name;
Copy the code

2.1.3 Creating a Database

Grammar:

The CREATE DATABASE | SCHEMA [IF NOT EXISTS] database_name [COMMENT database_comment] - DATABASE annotation [LOCATION hdfs_path] - stored in the HDFS [WITH DBPROPERTIES (property_name=property_value,...)] ; -- Specify additional attributesCopy the code

Example:

CREATE DATABASE IF NOT EXISTS hive_test COMMENT 'Hive DATABASE for test' WITH DBPROPERTIES (' CREATE '=' DBPROPERTIES ');Copy the code

2.1.4 Viewing database Information

Grammar:

DESC DATABASE [EXTENDED] database_name; EXTENDED indicates whether additional attributes are displayedCopy the code

Example:

DESC DATABASE EXTENDED hive_test;
Copy the code

2.1.5 Deleting a Database

Grammar:

DROP DATABASE|SCHEMA [IF EXISTS] database_name RESTRICT|CASCADE; The default behavior is RESTRICT. If a table exists in the database, the drop fails. To drop a library and its tables, use CASCADE.Copy the code

Example:

DROP DATABASE IF EXISTS hive_test CASCADE;
Copy the code

2.2 create a table

2.2.1 internal table

CREATE TABLE emp(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL(7,2), deptno INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";Copy the code

2.2.2 the external table

CREATE EXTERNAL TABLE emp_external(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL(7,2), deptno INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LOCATION '/hive/emp_external';Copy the code

2.2.3 the partition table

CREATE EXTERNAL TABLE emp_partition( empno INT, ename STRING, job STRING, mgr INT, hiredate TIMESTAMP, Sal DECIMAL (7, 2), Comm DECIMAL(7,2)) -- PARTITIONED BY (deptno INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LOCATION '/hive/emp_partition';Copy the code

2.2.4 barrel table

CREATE EXTERNAL TABLE emp_bucket(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL (7, 2), Deptno INT) - Hash the employee numbers INTO four BUCKETS CLUSTERED BY(empno) SORTED BY(empno ASC) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LOCATION '/hive/emp_bucket';Copy the code

2.2.5 tilt table

By specifying one or more frequently skewed values for columns, Hive automatically splits the data involving those values into separate files.

When querying, if skew values are involved, it gets the data directly from individual files, rather than scanning all files, which improves performance.

But do not specify the number of skewed tables. But do not specify the number of skewed tables. Comm DECIMAL(7,2)) -- specify the SKEWED value 66,88,100 but BY (empno) ON (66,88,100) ROW FORMAT DELIMITED FIELDS TERMINATED BY  "\t" LOCATION '/hive/emp_skewed';Copy the code

2.2.6 temporary table

The temporary table is visible only to the current session. The data in the temporary table is stored in the user’s temporary directory and deleted after the session ends.

If the temporary table has the same name as the permanent table, any reference to the table name resolves to a temporary table, not a permanent table.

Temporary tables also have the following two limitations:

  • Partitioned columns are not supported;
  • Index creation is not supported.
CREATE TEMPORARY TABLE emp_temp(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL(7,2)) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";Copy the code

2.2.7 CTAS Creating a table

Create a table from the result of the query:

CREATE TABLE emp_copy AS SELECT * FROM emp WHERE deptno='20';
Copy the code

2.2.8 Copying the Table Structure

Grammar:

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] target_table_name -- CREATE the TABLE name LIKE src_table_name -- The TABLE name of the replicated TABLE [LOCATION hdfs_path]; -- Storage locationCopy the code

Example:

CREATE TEMPORARY EXTERNAL TABLE  IF NOT EXISTS  emp_co  LIKE emp
Copy the code

2.2.9 Loading Data to a table

Load local data into a table:

load data local inpath "D:/project/emp.txt" into table emp;
Copy the code

The contents of emp.txt:

7369	SMITH	CLERK	7902	1980-12-17 00:00:00	800.00		20
7499	ALLEN	SALESMAN	7698	1981-02-20 00:00:00	1600.00	300.00	30
7521	WARD	SALESMAN	7698	1981-02-22 00:00:00	1250.00	500.00	30
7566	JONES	MANAGER	7839	1981-04-02 00:00:00	2975.00		20
7654	MARTIN	SALESMAN	7698	1981-09-28 00:00:00	1250.00	1400.00	30
7698	BLAKE	MANAGER	7839	1981-05-01 00:00:00	2850.00		30
7782	CLARK	MANAGER	7839	1981-06-09 00:00:00	2450.00		10
7788	SCOTT	ANALYST	7566	1987-04-19 00:00:00	1500.00		20
7839	KING	PRESIDENT		1981-11-17 00:00:00	5000.00		10
7844	TURNER	SALESMAN	7698	1981-09-08 00:00:00	1500.00	0.00	30
7876	ADAMS	CLERK	7788	1987-05-23 00:00:00	1100.00		20
7900	JAMES	CLERK	7698	1981-12-03 00:00:00	950.00		30
7902	FORD	ANALYST	7566	1981-12-03 00:00:00	3000.00		20
7934	MILLER	CLERK	7782	1982-01-23 00:00:00	1300.00		10
Copy the code

2.3 modify the table

2.3.1 Renaming a Table

Grammar:

ALTER TABLE table_name RENAME TO new_table_name;
Copy the code

Example:

ALTER TABLE emp_temp RENAME TO new_emp; Rename the emp_temp table to new_empCopy the code

2.3.2 modify column

Grammar:

ALTER TABLE table_name CHANGE 
[COLUMN]col_old_name [COLUMN]col_new_name column_type [COMMENT]col_comment;
Copy the code

Example:

ALTER TABLE emp_temp CHANGE EMPno EMPno_new INT; ALTER TABLE emp_temp CHANGE sal sal_new decimal(7,2) AFTER ename; ALTER TABLE emp_temp CHANGE MGR mgr_new INT COMMENT 'this is column MGR ';Copy the code

2.3.3 the new column

Example:

ALTER TABLE emp_temp ADD COLUMNS (address STRING COMMENT 'home address');
Copy the code

2.4 Clearing or deleting a table

Against 2.4.1 empty table

Grammar:

TRUNCATE TABLE table_name [PARTITION (PARTItion_column = PARTItion_col_value, ...)] ;Copy the code

Example:

TRUNCATE TABLE emp_mgt_ptn PARTITION (deptno=20);
Copy the code

Delete table 2.4.2

Grammar:

DROP TABLE [IF EXISTS] table_name [PURGE]; 
Copy the code
  • Internal table: the metadata of the table and data in the HDFS are deleted.
  • External table: only metadata of the table is deleted, but data in the HDFS is not deleted.

2.5 Other Commands

2.5.1 the Describe

View database:

Desc DATABASE [EXTENDED] db_name; -- Whether EXTENDED shows additional attributesCopy the code

See table:

Desc [EXTENDED|FORMATTED] table_name; Display the table details in a friendly mannerCopy the code

2.5.2 Show

Viewing the database list

Grammar:

SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];
Copy the code

Example:

SHOW DATABASES like 'hive*';
Copy the code

View the list of tables

Grammar:

SHOW TABLES [IN database_name] ['identifier_with_wildcards'];
Copy the code

Example:

SHOW TABLES IN default;
Copy the code

View the partition list for the table

SHOW PARTITIONS table_name;
Copy the code

View the table/view creation statement

SHOW CREATE TABLE (table_name|view_name);
Copy the code

3. Common DML operations

3.1 Loading file data into tables

Grammar:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
Copy the code
  • The LOCAL keyword indicates that the file is loaded from the LOCAL file system. If omitted, the file is loaded from HDFS.

  • Filepath can be either a filepath (in which case Hive moves files to a table) or a directory path (in which case Hive moves all files in that directory to a table). When loading a file from a local file system, filepath can be an absolute path or a relative path (absolute path is recommended). When loading a file from HDFS, filepath is the complete URL of the file: Such as HDFS: / / the namenode: port/user/hive/project/data1

  • If the OVERWRITE keyword is used, the contents of the target table (or partition) are deleted and populated with new data; If this keyword is not used, it is appended.

  • The loading target can be a table or partition. If it is a partitioned table, you must specify the partition to load the data.

  • The format of the loading file must be the same AS that used in STORED AS when the table was built.

Example:

CREATE TABLE emp_ptn(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL(7,2) deptno INT) PARTITIONED BY (deptno INT) -- partition BY department number ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";Copy the code

Load data from HDFS to partition table:

LOAD DATA  INPATH "hdfs://hadoop001:8020/mydir/emp.txt" OVERWRITE INTO TABLE emp_ptn PARTITION (deptno=20);
Copy the code

Select * from deptno; select * from deptno;

3.2 Inserting query results into a table

Presentation data preparation:

CREATE TABLE emp(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL(7,2), deptno INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"; Load data local inpath "/usr/file/emp. TXT "into table emp;Copy the code

3.2.1 Static Partition

Grammar:

INSERT OVERWRITE/INTO TABLE table_name PARTITION (partcol1[=val1], partcol2[=val2] ...) select statement FROM from_statement;
Copy the code

Partition column names are required when inserting data into partitioned tables. Having partitioned column values is called static partitioning.

Example:

INSERT OVERWRITE TABLE emp_ptn PARTITION (deptno=20) SELECT empno,ename,job,mgr,hiredate,sal,comm FROM emp WHERE deptno=20;
Copy the code

3.2.2 Dynamic Partitioning

Grammar:

INSERT OVERWRITE/INTO TABLE table_name PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
Copy the code

Partition column names are required when inserting data into partitioned tables. The absence of partitioned column values is called dynamic partitioning. Dynamically partitioned columns must be specified last in the column of the SELECT statement and in the same order as they appear in the PARTITION() clause.

Example:

INSERT OVERWRITE TABLE emp_ptn PARTITION (deptno) SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE deptno=30;
Copy the code

3.3 Inserting Data using SQL Statements

Grammar:

INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)]  VALUES ( value... ] )Copy the code

3.4 Updating and Deleting Data

Grammar:

UPDATE tablename SET column = value [column = value...] [WHERE expression] -- DELETE DELETE FROM tablename [WHERE expression]Copy the code

The update and delete syntax is simple and consistent with a relational database, but it is important to note that both operations can only be performed on tables that support ACID, namely transaction tables.

3.5 Examples of Inserting, updating, and Deleting Data

Modify hive-site. XML and add the following configuration to enable transaction support. After the configuration, restart the Hive service.

<property> <name>hive.support.concurrency</name> <value>true</value> </property> <property> <name>hive.enforce.bucketing</name> <value>true</value> </property> <property> <name>hive.exec.dynamic.partition.mode</name> <value>nonstrict</value> </property> <property> <name>hive.txn.manager</name> <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value> </property> <property> <name>hive.compactor.initiator.on</name> <value>true</value> </property> <property> <name>hive.in.test</name> <value>true</value> </property> Copies the codeCopy the code

The transactional table is then created for testing using the transactional = true attribute. Note that transaction tables in Hive have the following restrictions according to the official documentation:

  • They must be buckets Table.
  • Only ORC file format is supported.
  • LOAD DATA is not supported… Statements.
CREATE TABLE emp_ts(  
  empno int,  
  ename String
)
CLUSTERED BY (empno) INTO 2 BUCKETS STORED AS ORC
TBLPROPERTIES ("transactional"="true");
Copy the code
INSERT INTO TABLE emp_ts VALUES (1," Ming "),(2,"hong");Copy the code

The MapReduce job is used to insert data. After the job is successfully executed, the data is as follows:

UPDATE emp_ts SET ename = "LAN" WHERE empno=1; DELETE FROM emp_ts WHERE empno=2;Copy the code

The MapReduce job is also used to update and delete data. After the MapReduce job is successfully executed, the data is as follows:

3.6 Writing Query Results to the File System

Grammar:

INSERT OVERWRITE [LOCAL] DIRECTORY directory1 [ROW FORMAT row_format] [STORED AS file_format] 
Copy the code
  • OVERWRITE indicates that if the output file exists, delete it first and then write it again.
  • As with the Load statement, it is recommended that both the local path and the URL address be used fully;
  • Data written to the file system is serialized as text, with columns separated by ^A and rows separated by newline characters by default.
  • If the column is not a basic type, serialize it to JSON format.
  • Row delimiters are not allowed to be customized, but column delimiters are.

Example:

Here we export the emp_pTN table created above to the local file system with the following statement:

INSERT OVERWRITE LOCAL DIRECTORY '/usr/file-ouput' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' -- define column separators as '\t' SELECT * FROM emp_ptn;Copy the code