1. Introduction of Sqoop

  • Apache SQOOP is a tool specifically designed for efficient data conversion between Hadoop and structured databases such as relational databases.
  • A porter of data
  • SQOOP relies on database-related Schema description information for most of the automated data transformation process, which is done using MapReduce
  • There are currently two versions of SQOOP that are completely incompatible, SQOOP and SQOOP 2. A simple distinction can be made by the version number, 1.4.x for SQOOP or SQOOP1, and 1.99.x for SQOOP2

2. SQOOP data import

  • Data from a traditional RDBMS is imported into HBase…
  • The metadata information (schema, table, field, field type) is obtained from the traditional database, and the import function is converted to map-only

    MapReduce jobs. There are many maps in MapReduce, each map reads a piece of data, and then copies the data in parallel.

3. SQOOP data export

  • The data in HBase is exported to the RDBMS
  • Get the schema, meta information of the exported table, and field match in Hadoop; Multiple Map Only jobs run simultaneously to complete the export of data from HDFS to a relational database.

4. SQOOP installation

  • Prepare installation files

    • Sqoop – 1.4.6. Bin__hadoop – 2.0.4 – alpha. Tar. Gz
  • Unpack the

    • Tar-zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -c /opt/
  • rename

    • The mv sqoop – 1.4.6. Bin__hadoop – 2.0.4 – alpha/sqoop
  • Configure environment variables

    export SQOOP_HOME=/opt/sqoop
    export PATH=$PATH:$SQOOP_HOME/bin
  • Modifying configuration files

    mv sqoop-env-template.sh  sqoop-env.sh 

    ==vim sqoop-env.sh==

    export HADOOP_COMMON_HOME=/opt/hadoop
    export HBASE_HOME=/opt/hbase
    export HIVE_HOME=/opt/hive
    export ZOOCFGDIR=/opt/zookeeper/conf
  • Because SQOOP connects to the RDBMS, the driver jar for this data needs to be added to the SQOOP classpath, so the MySQL jar is uploaded to $SQOOP_HOME/lib

    • MySQL requires MySQL -Connector -java-5.1.32.jar support.

5. Basic operations of SQOOP

  • List database

    sqoop-list-databases –connect jdbc:mysql://uplooking05:3306/ –username root –password root

  • List all tables

    sqoop-list-tables –connect jdbc:mysql://uplooking05:3306/up1 –username root –password root

6. Import data from MySQL into HDFS

sqoop-import –connect jdbc:mysql://uplooking05/up2 –username root –password root –table user

  • During the import, be sure to have PK == in the tables in the RDBMS
  • –target-dir: Import into your custom HDFS directory
  • –append: append import
  • -m: Specifies the number of MapTasks
  • –username: name of database user
  • –password: The database password
  • –connect: the database URL
  • –delete-target-dir: Normally used with –target-dir, which is == mutually exclusive == with –append
  • — Last-value: Skip a few rows of data and append all remaining data
  • –where: import according to the condition

    • sqoop-import –connect jdbc:mysql://uplooking05/up2 –username root –pass

      word root –table user –target-dir /sqoop/user –delete-target-dir –where ‘id>2’**
  • Mysql > import the result of MySQL query into HDFS

    • sqoop-import –connect jdbc:mysql://uplooking05/up2 –username root –pass word root –target-dir /sqoop/user –delete-target-dir –query “select name,age from user where age>10 and $ CONDITIONS” –split-by ‘age’

7. Import data from MySQL into HBase

sqoop-import –connect jdbc:mysql://uplooking05/up2 –username root –password root –table user –hbase-row-key id –hbase-table ns1:t2 –column-family f1

–hbase-row-key: default hbase-row-lkey is the primary key in MySQL

Note: Before importing data into the HBase database, you need to create tables and column families in HBase ==

8. Export data from HDFS to MySQL

sqoop-export –connect jdbc:mysql://uplooking05:3306/up2 –username root –password root –table user –export-dir /sqoop/user2

Note: Before exporting, you first need to create the table == in MySQL

9. Export data from HBase to MySQL

Is not workable