This article source: making | | GitEE

Overview of SQOOP

SQOOP is an open source big data component that is used to transfer data between Hadoop(Hive, HBase, etc.) and traditional databases (MySQL, PostgreSQL, Oracle, etc.).

General data handling components of the basic functions: import and export.

Since SQOOP is a component of the big data technology architecture, importing a relational database into a Hadoop storage system is called importing, and vice versa.

SQOOP is a command-line component tool that converts an import or export command into a MapReduce program for implementation. MapReduce mainly customizes the InputFormat and OutputFormat.

Second, environment deployment

When testing SQOOP components, you should at least have the Hadoop family, relational data, JDK and other basic environment.

Since SQOOP is a tool class component, a single node installation is fine.

1. Upload the installation package

Installation package and version: sqoop-1.4.6

[root@hop01 opt]# tar-zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz [root@hop01 opt]# mv Sqoop - 1.4.6. Bin__hadoop 2.0.4 - alpha sqoop1.4.6

2. Modify the configuration file

File location: sqoop1.4.6/conf

[root@hop01 conf]# PWD /opt/sqoop1.4.6/conf [host conf]# mv sqoop-env-template.sh sqoop-env.sh

Configuration content: Related to the Hadoop family of common components and scheduling component ZooKeeper.

Export HADOOP_Common_Home =/opt/ HADOOP2.7 export [root@hop01 conf]# Vim SQOOP-ENV.SH HADOOP_MAPRED_HOME=/opt/hadoop2.7 export HIVE_HOME=/opt/hive1.2 export HBASE_HOME=/opt/hbase-1.3.1 export HBASE_HOME=/opt/hbase-1.3.1 export ZOOKEEPER_HOME = / opt/zookeeper3.4 export ZOOCFGDIR = / opt/zookeeper3.4

3. Configure environment variables

[root@hop01 opt]# vim /etc/profile export SQOOP_HOME=/opt/sqoop1.4.6 export PATH=$PATH:$SQOOP_HOME/bin [] opt #  source /etc/profile

4. Introduce the MySQL driver

[root@hop01 opt]# cp mysql-connector-java-5.1.27-bin.jar sqoop1.4.6/lib/

5. Environmental inspection

Key points: import and export

Look at the help command and see the version number through version. SQOOP is a command-line based tool, so the commands here will be used below.

6. Related environment

At this point, look at the relevant environment in the SQOOP deployment node, which is basically clustered mode:

7. Test MySQL connection

sqoop list-databases --connect jdbc:mysql://hop01:3306/ --username root --password 123456

Here is the command to view the MySQL database. The result is printed correctly as shown in the figure:

III. Data import cases

1. MySQL data script

CREATE TABLE 'tb_user' (' id' int(11) NOT NULL AUTO_INCREMENT COMMENT ON 'tb_user ', 'user_name' varchar(100) DEFAULT NULL COMMENT 'user_name ', PRIMARY KEY (' id ')) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' user table '; INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (1, 'spring'); INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (2, 'c++'); INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (3, 'java');

2. SQOOP import script

Specify the table of the database, import the Hadoop system in full quantity, notice that the Hadoop service needs to be started here;

sqoop import 
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \ 
--table tb_user \
--target-dir /hopdir/user/tbuser0 \
-m 1

3. Hadoop queries

[root@hop01 ~]# hadoop fs -cat /hopdir/user/tbuser0/part-m-00000

4. Specify columns and conditions

The SQL statement of the query must have WHERE&dollar in it; The CONDITIONS:

sqoop import --connect jdbc:mysql://hop01:3306/sq_import \ --username root \ --password 123456 \ --target-dir /hopdir/user/tbname0 \ --num-mappers 1 \ --query 'select user_name from tb_user where 1=1 and $CONDITIONS; '

To view the export results:

[root@hop01 ~]# hadoop fs -cat /hopdir/user/tbname0/part-m-00000

5. Import Hive components

Without specifying the database used by Hive, the default library is imported and the table name is created automatically:

sqoop import 
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--table tb_user \
--hive-import \
-m 1

The SQOOP execution log can be observed here:

Step 1: Import data from MySQL into the default path of HDFS;

Step 2: Migrate the data from the temporary directory to the Hive table.

6. Import the HBase component

The current HBase cluster version is 1.3, and you need to create the table to perform the data import properly:

sqoop import 
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--table tb_user \
--columns "id,user_name" \
--column-family "info" \
--hbase-table tb_user \
--hbase-row-key id \
--split-by id

View table data in HBase:

Four, data export cases

Create a new MySQL database and table, and then export the data from HDFS to MySQL. Here you can use the data generated by the first import script:

sqoop export 
--connect jdbc:mysql://hop01:3306/sq_export \
--username root \
--password 123456 \
--table tb_user \
--num-mappers 1 \
--export-dir /hopdir/user/tbuser0/part-m-00000 \
--num-mappers 1 \
--input-fields-terminated-by ","

Check the data again in MySQL, the record is completely exported, here, is the separator symbol between each data field, syntax rules against the script an HDFS data query results can be.

Five, the source code address

Making address GitEE, https://github.com/cicadasmile/big-data-parent, https://gitee.com/cicadasmile/big-data-parent

Read labels

【 JAVA Foundation 】【 Design Patterns 】【 Structure and Algorithms 】【Linux System 】【 Database 】

[Distributed Architecture] [Micro Services] [Big Data Components] [SpringBoot Advanced] [Spring&Boot Foundation]

【 Data Analysis 】【 Technical Map 】【 Workplace 】