Disclaimer: original by the author, reproduced with credit.

Author: Handsome Chen eats an apple

First, install SQOOP

1. Download SQOOP, unzip and rename the folder

Wget tar ZXVF - http://mirror.bit.edu.cn/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz Alpha.tar.gz -c /root/hadoop/ mv sqoop-1.4.6.bin_hadoop-2.0.4.alpha sqoop-1.4.6

2. Configure environment variables

vim /etc/profile
Export PATH = $PATH:$SQOOP_HME/bin

3, validation,

If SQOOP version information is displayed normally, the installation configuration is successful:

[root@localhost ~ ] # sqoop version

Create the database and table

After installing MySQL, create the test database and test tables.

Database name: test

Point (pointId int(10) primaryKey,pointName varchar(16),pointValue int(10))

The shell script generates test data

Write a Shell script where the insert statement needs to be modified based on the table structure created:

#! /bin/bash i=1; MAX_INSERT_ROW_COUONT=$1; while(( $i <= $MAX_INSERT_ROW_COUNT )) do mysql -uhive -phive test -e "insert into test.point(pointId,pointName,pointValue) values($i,'point"$i"',$i);" i=(($i+1)) done exit 0

The above script is very slow to generate test data, the author generated 10 million pieces of data like pregnancy, welcome suggestions, thank you!

4. MySQL data import

MySQL as a data source, Sqoop need to rely on the MySQL database connection driver package, download address: https://dev.mysql.com/get/Dow…

After downloading it, unzip it and copy the mysql-connector-java-5.1.44-bin.jar from mysql-connector-java-5.1.45 to $SQOOP_HOME/lib.

1. Import HDFS

sqoop import --connect jdbc:mysql://localhost:3306/test --username hive --password hive  --table 
Parameter analysis:

Import: import data from traditional database into HDFS/HIVE/HBASE, etc.

–connect: establish a database connection;

JDBC: mysql: / / localhost: 3306 / test: the way of using the JDBC connection mysql database, database named test;

Mysql > specify database username;

–password: specify database password;

–table: Specify the table name


A) The HDFS output directory cannot already exist;

B) When -m or splite-by is not specified, that is, when parallelism is not specified, the table to be imported must have a primary key, otherwise an error will occur.

C) Import to the specified directory: sqoop import –connect jdbc:mysql://localhost:3306/test –username hive –password hive –table point –target-dir /directory

If the output directory is not specified, create a subdirectory under /user/root/ with the same table name as the output directory by default. After the import operation, check to see if there are imported files in HDFS:

hdfs dfs -ls /user/root/point/

2. Import HBase

sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table 
point --hbase-table HPoint --column-family info --hbase-row-key pointId --hbase-create-table
Parameter analysis:

–hbase-table: Specifies the table to be imported into Hbase database;

–column-family: Specifies the column family name;

–hbase-row-key: Specify rowKey;

–hbase-create-table: create table in hbase;

Finally, corrections are welcome. If you like it, give it a “like” and invite you to an apple.