Hive knowledge

Database operations

Database creation

Create database
create database python;
Add description and attribute values
create database python_db comment 'python_database' with dbproperties('name'='python');
You can specify the location of the database
create database python_location location '/ming';


create database python_db comment 'python_database' location '/ming' with dbproperties('name'='python');
Copy the code

Database Viewing

-- Details
describe database extended python;
describe database extended python_db;
describe database extended python_location;
Look at the build statement
show create database python_db;
Copy the code

Database Deletion

Drop database
There is no data in the library
drop database python_location;
-- There is data in the library
use python_db;
create table test_tb(
    id int
);
drop database python_db cascade;
Copy the code

Database Modification

Alter database;
-- Modify attribute values
alter database python set dbproperties('age'='18');
-- Change path
alter database python set location 'hdfs:///user/hive/warehouse/python.db';
-- Modify owner
alter database python set owner user python;
Copy the code

Data table operation

Create (create)

The basic creation form does not specify the form of the split character. The default is \001 to display SOH in text editor and ^A in vim
create table test_tb(
    id int
);

Row format delimited Specifies the partition using hive's native method classes. Fields terminated by ','; Specify the split method as a field with the symbol,
create table test_tb_f(
    id int,
    name string,
    age int,
    gender string
)row format delimited
fields terminated by ', ';

Copy the code

To view

- View details
desc test_tb_f;
desc extended test_tb_f;
-- Display information after formatting
desc formatted test_tb_f;
-- Look at the construction sentence
show create table test_tb_f;
Copy the code

delete

Delete entire table (metadata and table data)
drop table python.test_tb;
Drop table data
select * from test_tb_f;
truncate table test_tb_f;
Copy the code

Modify the

Alter table attributes
alter table test_tb_f set tblproperties('name'='itcast');
Alter table name
alter table test_tb_f rename to test_tb;
-- Modify table field attributes
desc python.test_tb;
alter table test_tb change id id string;
-- Change the field name
alter table test_tb change name username string;
-- Add new fields
alter table test_tb add columns(address string);
-- Changes the storage path of the table
alter table test_tb set location '/ming'
-- Change the field segmentation method to a custom method
Copy the code

Internal and external tables

The inner table

By default, the tables created are internal

create table test_tb_n(
    name string
)row format delimited
fields terminated by ', ';
Copy the code

External table create keyword external, delete external table only delete metadata

create external table test_tb_ext(
    name string
)row format delimited
fields terminated by ', ';


create external table test_tb_ext(
    name string
)row format delimited
fields terminated by ', '
location '/ming';
Copy the code

The location description

Specifies the location path for the data store

Location can be used to specify where data is stored, whether you are creating tables or libraries

If location is not specified, HDFS creates a directory in the default directory (/user/hive/warehouse/) with the same database name as the table name to store data

The partition table

create

A partitioned by column cannot be the same as a table column

Static partitioning requires manual data import and manual partitioning of data files

Dynamic partitioning: The original table is required, and the partition table and the definition fields of the original table are kept constant

Automatically partition the data contents of the raw data table according to the partition specified by the partition field

create table test_tb_part(
    id int,
    name string,
    age int,
    gender string
)partitioned by(sex string )
row format delimited
fields terminated by ', '
Static partition table
Static partitioned table data import
load data local inpath '/root/boy.txt' into table test_tb_part partition(sex='boy');
load data local inpath '/root/girl.txt' into table test_tb_part partition(sex='girl');
Dynamic partition table t.gender specifies the partition field
insert into table test_tb_part_D partition(sex) select t.*,t.gender from test_tb_f  t;

Copy the code

To view

-- View details
desc formatted test_tb_part;
View the partitions that have been created
show partitions test_tb_part;
Copy the code

delete

alter table test_tb_part drop partition (sex='girl');
Copy the code

Modify the

Add a new partition
Add partition
alter table test_tb_part add partition(sex='aa');
-- Change the partition name
alter table test_tb_part partition(sex='aa') rename to partition(sex='bb');
-- Change the partition path
alter table test_tb_part partition(sex='bb') set location '/ming';
Copy the code

Partition repair

Alter table test_TB_part add partition(sex=’aa’); The created partition is not added to the metadata of the partition created by the HDFS. You need to restore the partition and add it to the metadata

msck repair table test_tb_part;
Copy the code

Points barrel table

create

1, original data table2, create a bucket table3, divide the data in the original data table into buckets-- Essentially, a field in the original table has been evenly distributed

-- Raw data table
select * from test_tb_f;
Create a bucket table
create table test_tb_ft(
    id int,
    name string,
    age int,
    gender string
)clustered by(gender) sorted by(age desc) into 2 buckets
row format delimited fields terminated by ', ';
- Writes raw data to the bucket table. During writing, data is automatically divided according to bucket fields
insert into test_tb_ft select * from test_tb_f;

-- Bucket calculation process
-- Gender is specified. During bucket partitioning, the hash value of data under gender is calculated, and then the calculated data is divided by the number of buckets and modulo is taken. The data with the same remainder is stored in the same bucket directory
select hash('woman');
select abs(hash('male'));
select 28845/2;
select 27446/2;
Copy the code

Bucket table and partition table

A partition table is an overall partition of data types. Data is divided by time and region. Each partition table corresponds to a directory that can be partitioned for multiple layers and stores partitioned data files
The existing data is grouped by field type. The data in a bucket is a file and cannot be divided into buckets again
Partitioned data can be divided by field again using buckets
Partitioning and buckets are both a way to optimize queries and are not necessary for creating tables.

Database operations

Data table operation

Internal and external tables

The location description

The partition table

Points barrel table

Bucket table and partition table

Related Posts

Java removes null characters: Java8 & Java11

To your own heart, degree of the heart of others; To compare one’s heart with another is the Buddha’s heart

How to be a good senior developer