preface

Sqoop’s import tool’s main function is to upload your data into files in HDFS. If you have a Hive metastore associated with your HDFS cluster, Sqoop can also import the data into Hive by generating and executing a CREATE TABLE statement to define the data’s layout in Hive. Importing data into Hive is as simple as adding the –hive-import option to your Sqoop command line.

The main function of Sqoop’s import tool is to upload your data to a file in HDFS. If you have Hive meta-stores associated with HDFS clusters, Sqoop can also import data into Hive by generating and executing CREATE TABLE statements to define the layout of data in Hive. Import data to Hive Add the –hive-import option on the Sqoop command line. When querying import statements, you don’t have to look at the official documentation. You just need to execute the SQoop help command in your shell.

If you’re not familiar with SQOOP, go toApache SQOOPThis article.

Occurs during the import processErrorYou can also find solutions in the article

Table structure and data source

  • Mysql table structure
create table mysql_student(
id int,
name varchar(10),
age int,
sex varchar(5),
create_date DATE,
primary key (id)
);
Copy the code
  • Mysql > insert data into database
insert into mysql_student values
(1,'xinxing', 22,'boy'.'2020-06-12'),
(2,'xiaohong', 26,'girl'.'2020-06-17'),
(3,'xiaoming', 22,'boy'.'2020-06-13'),
(4,'shishi', 17,'girl'.'2020-05-12'),
(5,'xiaoli', 30,'boy'.'2020-07-12');
Copy the code
  • Mysql > update data source
insert into mysql_student values
(6,'liudehua', 22,'boy'.'2020-06-12'),
(7,'yangmi', 26,'girl'.'2020-06-17'),
(8,'yangyang', 22,'boy'.'2020-06-13'),
(9,'hejiong', 17,'girl'.'2020-05-12'),
(10,'xiena', 30,'boy'.'2020-07-12');
Copy the code
  • Hive table structure
create table hive_student(
id int,
name string,
age int,
sex string,
create_date string
)
comment 'Import student table in full'
row format delimited fields terminated by ', ';
Copy the code

Extract MYSQL data to Hive single partition (HDFS)

Full amount extracted

sqoop import \
--connect jdbc:mysql://xinxingdata001:3306/mysqlXinxing \
--username root \
--password 123456 \
--hive-database xinxing \
--table mysql_student \
--hive-table hive_student \
--hive-import \
--delete-target-dir \
--fields-terminated-by ', ' \
--mapreduce-job-name Xinxing00001Job;
Copy the code

The queryhive_studenttable

hive (xinxing)> select * from hive_student; OK hive_student.id hive_student.name hive_student.age hive_student.sex hive_student.create_date 1 xinxing 22 boy 2020-06-12 2 xiaohong 26 girl 2020-06-17 3 xiaoming 22 boy 2020-06-13 4 shishi 17 girl 2020-05-12 5 xiaoli 30 boy Time taken: 0.04 seconds, Touchless: 5 row(s)Copy the code

Incremental extract

– check-column specifies the columns that will be checked during import to determine whether the data is to be used as incremental data. Note: Character types cannot be used as incremental identifying fields – Incremental is used to specify the Mode of incremental import, and append and lastModified-last-value specify the maximum number of fields specified in the check column in the last import

sqoop import \
--connect jdbc:mysql://xinxingdata001:3306/mysqlXinxing \
--username root \
--password 123456 \
--hive-database xinxing \
--table mysql_student \
--hive-table hive_student \
--hive-import \
--fields-terminated-by ', ' \
--check-column id \
--incremental append \ 
--last-value 5 \
--mapreduce-job-name Xinxing00001Job \
Copy the code

The queryhive_studenttable

hive (xinxing)> select * from hive_student; OK hive_student.id hive_student.name hive_student.age hive_student.sex hive_student.create_date 1 xinxing 22 boy 2020-06-12 6 liudehua 22 boy 2020-06-12 2 xiaohong 26 girl 2020-06-17 7 yangmi 26 girl 2020-06-17 3 xiaoming 22 boy 2020-06-13 8 yangyang 22 boy 2020-06-13 4 shishi 17 girl 2020-05-12 9 hejiong 17 girl 2020-05-12 5 xiaoli 30 boy 2020-07-12 10 Xiena 30 boy 2020-07-12 Time taken: 0.031 seconds, touchless: 10 row(s)Copy the code

There are ten problems with incremental extraction. In the second data, if Xiaohong has undergone a sex change operation and the gender is changed to boy, the incremental data cannot be updated. The back will lead out the zipper table to solve such problems.

Extract MYSQL data to Hive multi-partition (HDFS)

Full amount extracted

Create a Hive partition table structure

create table hive_Partition_student(
id int,
name string,
age int,
sex string,
create_date string
)
comment 'Import student table in full'
partitioned by (day string)
row format delimited fields terminated by ', ';
Copy the code

extracting

sqoop import \
--connect jdbc:mysql://xinxingdata001:3306/mysqlXinxing \
--username root \
--password 123456 \
--hive-database xinxing \
--table mysql_student \
--hive-table hive_Partition_student \
--hive-import \
--delete-target-dir \
--fields-terminated-by ', ' \
--hive-partition-key day \
--hive-partition-value 12 \
--mapreduce-job-name Xinxing00001Job;
Copy the code

The queryhive_Partition_studenttable

hive (xinxing)> select * from hive_Partition_student where day=12;
OK
hive_partition_student.id	hive_partition_student.name	hive_partition_student.age	hive_partition_student.sex	hive_partition_student.create_date   hive_partition_student.day
1	xinxing	22	boy	2020-06-12	12
2	xiaohong	26	girl	2020-06-17	12
3	xiaoming	22	boy	2020-06-13	12
4	shishi	17	girl	2020-05-12	12
Copy the code

Incremental extract

sqoop import \
--connect jdbc:mysql://xinxingdata001:3306/mysqlXinxing \
--username root \
--password 123456 \
--hive-import \
--hive-database xinxing \
--table mysql_student \
--hive-table hive_Partition_student \
--fields-terminated-by ', ' \
--check-column id \
--incremental append \
--last-value 4 \
--hive-partition-key day \
--hive-partition-value 13 \
--mapreduce-job-name Xinxing00001Job;
Copy the code

The queryhive_Partition_studenttable

hive (xinxing)> select * from hive_partition_student; OK hive_partition_student.id hive_partition_student.name hive_partition_student.age hive_partition_student.sex hive_partition_student.create_date hive_partition_student.day 1 xinxing 22 boy 2020-06-12 12 2 xiaohong 26 girl 2020-06-17 12 3 xiaoming 22 boy 2020-06-13 12 4 shishi 17 girl 2020-05-12 12 6 liudehua 22 boy 2020-06-12 13 7 yangmi 26  girl 2020-06-17 13 8 yangyang 22 boy 2020-06-13 13 9 hejiong 17 girl 2020-05-12 13 10 xiena 30 boy 2020-07-12 13 Time Taken: 0.03 seconds, touchdown: 9 row(s)Copy the code

Hive data is updated incrementally

Create Hive table structures

create table hive_student(
id int,
name string,
age int,
sex string,
create_date string
)
comment 'Import student table in full'
row format delimited fields terminated by ', ';
Copy the code

Extract full data

sqoop import \
--connect jdbc:mysql://xinxingdata001:3306/mysqlXinxing \
--username root \
--password 123456 \
--hive-database xinxing \
--table mysql_student \
--hive-table hive_student \
--hive-import \
--delete-target-dir \
--fields-terminated-by ', ' \
--mapreduce-job-name Xinxing00001Job;
Copy the code

Mysql inserts new data and updates previously existing data field values

Put the updatedMysql_studentImported into theHive_odsIn the layer

sqoop import \
--connect jdbc:mysql://xinxingdata001:3306/mysqlXinxing \
--username root \
--password 123456 \
--hive-database xinxing \
--table mysql_student \
--hive-table hive_ods_student \
--hive-import \
--delete-target-dir \
--fields-terminated-by ', ' \
--mapreduce-job-name Xinxing00001Job;
Copy the code

Left JOIN the two tableshive_ods_studentThe modified data in the table is updated tohive_ods_studentIn the table

insert overwrite table hive_student
select
  a.*
from
(
   select * from hive_student
) as a
left join
(
   select * from hive_ods_student
) as b
on a.id = b.id
where b.id is null
union all
select
  *
from hive_ods_student
Copy the code