Hive HQL is based on SQL, and THERE are several ways to create tables in SQL. Hive also supports creation of these tables.

[Hive common table Building methods](LanguageManual DDL – Apache Hive – Apache Software Foundation)

1. Create table

Use create table from scratch, according to the table format and requirements to achieve table. In the actual development of the more common way, mainly can be customized to build tables.

CREATE TABLE DEDUCT_SIGN_D '(' id' bigint COMMENT 'primary key ',' sign_no 'string COMMENT' DEDUCT_SIGN_D ', 'bp_no' string COMMENT 'id ',' comments' string COMMENT 'comments ',' etl_create_time 'string COMMENT' ID ', 'etl_update_time' string COMMENT 'ETL update date ') COMMENT 'PARTITIONED BY (' statis_date' string COMMENT 'time partition ')Copy the code

2. Create table… as select.. (CTAS)

create table …… As is a common way to create a table. In Hive projects, there will be a lot of logical aggregations and temporary tables.

2.1 Benefits of using CTAS to create tables

1. Use the query result to create and fill the table in the CTAS statement. The table created by CTAS is atomic, which means that it will not be seen by other users until all the query results are populated. As a result, other users either see the table with the full query result, or they don’t see the table at all.

2. There are two sections in CTAS. The select section can be any selection statement supported by HiveQL. The CREATE creation section of CTAS gets the result schema from the Select selection section and can create the specified target table using other table attributes such as SerDe and storage format. Such as specifying the row and column sharding format.

However, when using CTAS to create a table, there will be changes in the table structure, as well as defects and limitations of using CTAS to create a table, which should be paid special attention to.

2.2 Demo of creating tables using CTAS

1. Create a table using CTAS without specifying constraints on the creation part

create table finance.tmp_dm_rpt_110015_fsa_app_d_23 
as 
select  a.stat_date as stat_date    
  ,a.intfc_tp as intfc_tp
  ,a.trmnl_tp_id as trmnl_tp_id
  ,a.app_version  as app_version 
  ,a.clnt_chnl  as clnt_chnl
  ,a.trmnl_clnt as  trmnl_clnt 
from  finance.tmp_dm_rpt_110015_fsa_app_d_20_04  a
inner join finance.tmp_dm_rpt_110015_fsa_app_d_22 b
on b.trmnl_clnt=concat(a.trmnl_tp_id,a.clnt_id);
Copy the code

2. In the create section, specify the storage format of the table. For example, specify the column separator and storage format of the new table.

CREATE TABLE T_DEDUCT_SIGN_D_CTAS
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '&'
   STORED AS ORC
AS
SELECT *  FROM T_DEDUCT_SIGN_D WHERE STATIS_DATE IS NOT NULL;
Copy the code

Table structure after CTAS is used:

hive (fdm_sor)> SHOW CREATE TABLE T_DEDUCT_SIGN_D_CTAS;
OK
CREATE TABLE `T_DEDUCT_SIGN_D_CTAS`(
  `id` bigint, 
  `sign_no` string, 
  `bp_no` string, 
  `statis_date` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '&' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://SuningHadoop2/user/finance/hive/warehouse/fdm_sor.db/t_deduct_sign_d_ctas'
TBLPROPERTIES (
  'transient_lastDdlTime'='1546422100')
Copy the code

View the data in the table

hive (fdm_sor)> hive (fdm_sor)> select count(*) from  T_DEDUCT_SIGN_D_CTAS;
438
Copy the code

Summary: The table created using CTAS above can also specify storage format, field names, etc. For example, if you alias a column during select, the new table renames the columns of the source table. In addition, data can be converted from the storage format of the source table to the storage format of the target table by specifying the specific storage format of the target table during the creation of the table. This is hive a very powerful function, data storage format conversion. This can also be done via insert.

3. Semi-automatic mode create table like…..

A table created in this way is: The LIKE form of CREATE TABLE allows you to copy an existing table definition exactly (without copying its data). In contrast to CTAS, the statement below creates a new empty_key_value_store table whose definition exactly matches the existing key_value_store in all particulars other than table name. The new table contains no rows. CREATE TABLE creates a TABLE in a form that allows you to copy exactly the existing TABLE definition (without copying its data), with all details the same except that the TABLE name is different from the source TABLE. But there is no data for the source table. So the create Table like format is ideal for copying the source table schema. In fact, the actual development is not too much.

Hive (FDM_SOR)> show create table t__sign_d_like; OK CREATE TABLE 't__sign_d_like' (' id 'bigint COMMENT' id ', 'sign_no' string COMMENT 'id ', 'bp_NO' string COMMENT 'vendor number ') PARTITIONED BY (' Statis_date' string COMMENT 'time partition ') ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' LOCATION  'hdfs://SuningHadoop2/user/finance/hive/warehouse/fdm_sor.db/t__sign_d_like' TBLPROPERTIES ( 'transient_lastDdlTime'='1546420764') # show table structure: hive (fdM_SOR)> show create table T_DEDUCT_SIGN_D; CREATE TABLE 'T_DEDUCT_SIGN_D' (' id 'bigint COMMENT' deduct_sign_NO 'string COMMENT' deduct_sign_D ', 'bP_NO' string COMMENT 'vendor id ') COMMENT' Table 'PARTITIONED BY (' Statis_date' string COMMENT 'Time partition ') ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' LOCATION  'hdfs://SuningHadoop2/user/finance/hive/warehouse/fdm_sor.db/t_deduct_sign_d' TBLPROPERTIES ( 'transient_lastDdlTime'='1546417837')Copy the code