Database operations

  • Database creation

    Create database
    create database python;
    Add description and attribute values
    create database python_db comment 'python_database' with dbproperties('name'='python');
    You can specify the location of the database
    create database python_location location '/ming';
    
    
    create database python_db comment 'python_database' location '/ming' with dbproperties('name'='python');
    Copy the code
  • Database Viewing

    -- Details
    describe database extended python;
    describe database extended python_db;
    describe database extended python_location;
    Look at the build statement
    show create database python_db;
    Copy the code
  • Database Deletion

    Drop database
    There is no data in the library
    drop database python_location;
    -- There is data in the library
    use python_db;
    create table test_tb(
        id int
    );
    drop database python_db cascade;
    Copy the code
  • Database Modification

    Alter database;
    -- Modify attribute values
    alter database python set dbproperties('age'='18');
    -- Change path
    alter database python set location 'hdfs:///user/hive/warehouse/python.db';
    -- Modify owner
    alter database python set owner user python;
    Copy the code

Data table operation

  • Create (create)

    The basic creation form does not specify the form of the split character. The default is \001 to display SOH in text editor and ^A in vim
    create table test_tb(
        id int
    );
    
    Row format delimited Specifies the partition using hive's native method classes. Fields terminated by ','; Specify the split method as a field with the symbol,
    create table test_tb_f(
        id int,
        name string,
        age int,
        gender string
    )row format delimited
    fields terminated by ', ';
    
    Copy the code
  • To view

    - View details
    desc test_tb_f;
    desc extended test_tb_f;
    -- Display information after formatting
    desc formatted test_tb_f;
    -- Look at the construction sentence
    show create table test_tb_f;
    Copy the code
  • delete

    Delete entire table (metadata and table data)
    drop table python.test_tb;
    Drop table data
    select * from test_tb_f;
    truncate table test_tb_f;
    Copy the code
  • Modify the

    Alter table attributes
    alter table test_tb_f set tblproperties('name'='itcast');
    Alter table name
    alter table test_tb_f rename to test_tb;
    -- Modify table field attributes
    desc python.test_tb;
    alter table test_tb change id id string;
    -- Change the field name
    alter table test_tb change name username string;
    -- Add new fields
    alter table test_tb add columns(address string);
    -- Changes the storage path of the table
    alter table test_tb set location '/ming'
    -- Change the field segmentation method to a custom method
    Copy the code

Internal and external tables

  • The inner table

    • By default, the tables created are internal
    create table test_tb_n(
        name string
    )row format delimited
    fields terminated by ', ';
    Copy the code
  • External table create keyword external, delete external table only delete metadata

    create external table test_tb_ext(
        name string
    )row format delimited
    fields terminated by ', ';
    
    
    create external table test_tb_ext(
        name string
    )row format delimited
    fields terminated by ', '
    location '/ming';
    Copy the code

The location description

Specifies the location path for the data store

Location can be used to specify where data is stored, whether you are creating tables or libraries

If location is not specified, HDFS creates a directory in the default directory (/user/hive/warehouse/) with the same database name as the table name to store data

The partition table

  • create

    A partitioned by column cannot be the same as a table column

    Static partitioning requires manual data import and manual partitioning of data files

    Dynamic partitioning: The original table is required, and the partition table and the definition fields of the original table are kept constant

    Automatically partition the data contents of the raw data table according to the partition specified by the partition field

    create table test_tb_part(
        id int,
        name string,
        age int,
        gender string
    )partitioned by(sex string )
    row format delimited
    fields terminated by ', '
    Static partition table
    Static partitioned table data import
    load data local inpath '/root/boy.txt' into table test_tb_part partition(sex='boy');
    load data local inpath '/root/girl.txt' into table test_tb_part partition(sex='girl');
    Dynamic partition table t.gender specifies the partition field
    insert into table test_tb_part_D partition(sex) select t.*,t.gender from test_tb_f  t;
    
    Copy the code
  • To view

    -- View details
    desc formatted test_tb_part;
    View the partitions that have been created
    show partitions test_tb_part;
    Copy the code
  • delete

    alter table test_tb_part drop partition (sex='girl');
    Copy the code
  • Modify the

    Add a new partition
    Add partition
    alter table test_tb_part add partition(sex='aa');
    -- Change the partition name
    alter table test_tb_part partition(sex='aa') rename to partition(sex='bb');
    -- Change the partition path
    alter table test_tb_part partition(sex='bb') set location '/ming';
    Copy the code
  • Partition repair

    Alter table test_TB_part add partition(sex=’aa’); The created partition is not added to the metadata of the partition created by the HDFS. You need to restore the partition and add it to the metadata

msck repair table test_tb_part;
Copy the code

Points barrel table

  • create
1, original data table2, create a bucket table3, divide the data in the original data table into buckets-- Essentially, a field in the original table has been evenly distributed

-- Raw data table
select * from test_tb_f;
Create a bucket table
create table test_tb_ft(
    id int,
    name string,
    age int,
    gender string
)clustered by(gender) sorted by(age desc) into 2 buckets
row format delimited fields terminated by ', ';
- Writes raw data to the bucket table. During writing, data is automatically divided according to bucket fields
insert into test_tb_ft select * from test_tb_f;

-- Bucket calculation process
-- Gender is specified. During bucket partitioning, the hash value of data under gender is calculated, and then the calculated data is divided by the number of buckets and modulo is taken. The data with the same remainder is stored in the same bucket directory
select hash('woman');
select abs(hash('male'));
select 28845/2;
select 27446/2;
Copy the code

Bucket table and partition table

  • A partition table is an overall partition of data types. Data is divided by time and region. Each partition table corresponds to a directory that can be partitioned for multiple layers and stores partitioned data files
  • The existing data is grouped by field type. The data in a bucket is a file and cannot be divided into buckets again
  • Partitioned data can be divided by field again using buckets
  • Partitioning and buckets are both a way to optimize queries and are not necessary for creating tables.