Basic Operations on Hive Tables (4)

1. Create the tables

The CREATE table statement follows SQL syntax conventions, but Hive syntax is more flexible. For example, you can define the data file storage location of the table, the storage format to use, and so on.

CREATE TABLE pokes (foo INT, bar STRING); Use the default information for all but essential information

The partition table

create table if not exists test.user1(
name string comment 'name',
salary float comment 'salary',
address struct<country:string, city:string> comment 'home address'
)
comment 'description of the table'
partitioned by (age int)
row format delimited fields terminated by '\t'
stored as orc;
Copy the code

If not exists The operation is not performed if the external keyword is not specified; otherwise, a new table is created. The comment can do it for their comments, partition as the age, the age between column separators is \ t, storage format for the column type storage orc, storage location for the default location, the parameters of the hive. Metastore. Warehouse. Dir (default: /user/hive/warehouse) specifies the HDFS directory.

Points barrel table

In fact, we can roughly divide the table into normal table, partition table, and bucket table, let’s demonstrate the bucket table, as for the difference between partition table and bucket table we will talk about separately later, let’s create a bucket table

CREATE TABLE par_table(
  viewTime INT,
  userid BIGINT,
  page_url STRING, referrer_url STRING,
  ip STRING COMMENT 'IP Address of the User'
)
COMMENT 'This is the page view table'
PARTITIONED BY(date STRING, pos STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
ROW FORMAT DELIMITED ‘\t’
  FIELDS TERMINATED BY '\n'
STORED AS SEQUENCEFILE;
Copy the code

2. Copy the table

Use like to copy an empty table with the same structure as the original table, with no data in it.

create table if not exists test.user2 like test.user1;
Copy the code

Of course, you can also create tables based on the queried data

create table user_behavior_copy as select * from user_behavior limit 10;
Copy the code

3. View the table structure

User1 is the same as test.user1. Run desc [optional parameter] tableName to view the table structure.

hive> desc test.user2;
OK
name                	string              	name                
salary              	float               	salary              
address             	struct<country:string,city:string>	home address        
age                 	int                 	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
age                 	int                 	                    
Copy the code

Formatted display: Formatted display: Formatted display: Formatted display: Formatted display: Formatted display: Formatted display: Formatted display: Formatted display: formatted

hive> desc formatted test.user2;
OK
# col_name            	data_type           	comment             
	 	 
name                	string              	name                
salary              	float               	salary              
address             	struct<country:string,city:string>	home address        
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
age                 	int                 	                    
	 	 
# Detailed Table Information	 	 
Database:           	test                	 
Owner:              	hdfs                	 
CreateTime:         	Mon Dec 21 16:37:57 CST 2020	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://nameservice2/user/hive/warehouse/test.db/user2	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\"}
	numFiles            	0                   
	numPartitions       	0                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	0                   
	transient_lastDdlTime	1608539877          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.ql.io.orc.OrcSerde	 
InputFormat:        	org.apache.hadoop.hive.ql.io.orc.OrcInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	- 1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	\t                  
	serialization.format	\t                  
Copy the code

4. Delete table

Drop table; drop table;

drop table if exists table_name;
Copy the code

For management tables (internal tables), delete the table completely; For external tables, you need to delete the HDFS file to delete the table completely. For security, the Recycle bin function is usually enabled in Hadoop clusters. Data deleted from external tables is stored in the recycle bin.

5. Modify the table

Most table attributes can be modified using ALTER TABLE.

5.1 Table Renaming

alter table test.user1 rename to test.user3;
Copy the code

5.2 Adding, Modifying, and Deleting Partitions

Alter table table_name add partition(…) location hdfs_path

alter table test.user2 add if not exists
partition (age = 101) location '/user/hive/warehouse/test.db/user2/part-0000101'
partition (age = 102) location '/user/hive/warehouse/test.db/user2/part-0000102'
Copy the code

Alter table… set … The command

alter table test.user2 partition (age = 101) set location '/user/hive/warehouse/test.db/user2/part-0000110'
Copy the code

Alter table tableName drop if exists partition(…)

alter table test.user2 drop if exists partition(age = 101)
Copy the code

5.3 Modifying Column Information

You can rename a field and change its position, type, or comment:

Modify before:

hive> desc user_log;
OK
userid              	string              	                    
time                	string              	                    
url                 	string              	                    
Copy the code

Change the column name to times and use after to place the position after the URL, which was before.

alter table test.user_log
change column time times string
comment 'salaries'
after url;
Copy the code

Let’s look at the table structure:

hive> desc user_log;
OK
userid              	string              	                    
url                 	string              	                    
times               	string              	salaries            
Copy the code

Time -> times, after the URL.

Increase in 5.4

Hive can also add columns:

alter table test.user2 add columns (
birth date comment 'birthday',
hobby string comment 'hobby'
);
Copy the code

5.5 delete columns

Delete column (s); delete column (s);

hive> desc test.user3;
OK
name                	string              	name                
salary              	float               	salary              
address             	struct<country:string,city:string>	home address        
age                 	int                 	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
age                 	int                 	                    
Copy the code

To delete the salary column, just write:

alter table test.user3 replace columns(
name string,
address struct<country:string,city:string>
);
Copy the code

Here is an error:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Replacing columns cannot drop columns for table test.user3. SerDe may be incompatible
Copy the code

The test.user3 table is in ORC format and cannot be deleted. In textFile format, the replace method can delete columns. While deleting columns is usually not easy, adding columns is.

5.6 Modifying Table Attributes

You can add additional table attributes, or modify attributes, but cannot delete attributes:

alter table tableName set tblproperties(
    'key' = 'value'
);
Copy the code

For example, create a new table:

create table t8(time string,country string,province string,city string)
row format delimited fields terminated by The '#' 
lines terminated by '\n' 
stored as textfile;
Copy the code

This statement changes the field separator ‘#’ in table T8 to ‘\t’;

alter table t8 set serdepropertyes('field.delim'='\t');
Copy the code

6 Other Operations

6.1. Look at the table

View all indexes in the current database

showTables;Copy the code

View all tables under a particular library

 show tables in ods;
Copy the code

Displays tables on positive conditions (regular expressions)

 SHOW TABLES '.*s';
Copy the code

6.2 the load data

This is used to load data from other places into hive tables, such as external text data, or buried data

create table ods.user_info (
    user_id int,
    cid string,
    ckid string,
    username string)
    row format delimited
    fields terminated by '\t'
    lines terminated by '\n';
Copy the code

Importing Local Data

load data local inpath '/Users/liuwenqiang/workspace/hive/user.txt' overwrite into table ods.user_info

The following is the data. The data format of the imported data table is: the fields are TAB split, and the rows are newlines.

100636 100890 C5C86F4CDDC15eb7 Wang 1 100612 100865 97CC70D411c18B6F Wang 2 100078 100087 ECD6026A15FFDDF5 WangCopy the code

Importing HDFS Data

Delete the local keyword, but note that the data imported to the HDFS is a move operation, while the local data is a copy operation

load data inpath '/Users/liuwenqiang/workspace/hive/user.txt' overwrite into table ods.user_info

For local files, use HDFD dfS-put. For cluster files, use HDFS dfs-mv