Recently, I have been doing some work related to data migration, investigated some tools, and found DataX is a good thing, so AMway will give you. So what is DataX? DataX is a widely used offline data synchronization tool in Alibaba Group, realizing efficient data synchronization between various heterogeneous data sources including MySQL, SQL Server, Oracle, PostgreSQL and so on.

The main function

As a data synchronization framework, DataX abstracts the synchronization of different data sources into Reader plug-in that reads data from the source data source and Writer plug-in that writes data to the target. Theoretically, DataX framework can support data synchronization of any data source type. At the same time, the DataX plug-in system serves as a set of ecosystem. Every time a new data source is connected, the newly added data source can realize the interconnection with the existing data source. For details, please go to DataX

System requirements

  • Linux
  • JDK(above 1.8, 1.8 recommended)
  • Python (recommended Python2.6. X)
  • Apache Maven 3.x (Compile DataX)
  • Set the HEAP memory of the JVM. The heap memory must be greater than 1 GB, otherwise it will not start
export JAVA_OPTS= -Xms1024m -Xmx1024m
Copy the code

Quick start

Deploy DataX

  • Method 1, direct download DataX toolkit: DataX download address

Decompress the file to a local directory and go to the bin directory to start the synchronization job:

$ cd  {YOUR_DATAX_HOME}/bin    
$ python datax.py {YOUR_JOB.json}
Copy the code

  • DataX source code (1), download DataX source code:

    $ git clone [email protected]:alibaba/DataX.git
    Copy the code

    (2) Package through Maven:

    $ cd  {DataX_source_code_home}
    $ mvn -U clean package assembly:assembly -Dmaven.test.skip=true
    Copy the code

    If the package is successfully packaged, the following information is displayed:

    [INFO] BUILD SUCCESS
    [INFO] -----------------------------------------------------------------
    [INFO] Total time: 08:12 min
    [INFO] Finished at: 2018-06-05T16:26:48+08:00
    [INFO] Final Memory: 133M/960M
    [INFO] -----------------------------------------------------------------
    Copy the code

The DataX package is located at {DataX_source_code_home}/target/ DataX/DataX /,

Generating a Configuration File

  • Step 1: Create a configuration file (JSON format)

    You can run the following command to generate a configuration template:

    python datax.py -r oraclereader -w mysqlwriter > oracle2mysql2.json
    Copy the code

    {DataX_source_code_home} * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Write data to mysql.

    { "job": { "content": [{ "reader": { "name": "oraclereader", "parameter": { "column": ["*"], "connection": [{ "jdbcUrl": ["*"], "table": ["tb1"] }], "password": "***", "username": "***" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": ["*"], "connection": [{ "jdbcUrl": "*", "table": ["tb1"] }], "password": "**", "preSql": [], "session": [], "username": "**", "writeMode": "insert" } } }], "setting": { "speed": { "channel": "3"}}}}Copy the code
  • Finally: Start DataX

    $ cd {YOUR_DATAX_DIR_BIN}
    $ python datax.py ./oracle2mysql2.json
    Copy the code
  • When the synchronization is complete, the following logs are displayed:

    . 2018-06-05 11:20:25.263 [job-0] INFO JobContainer - Task start time: 2018-06-05 11:20:15 Task end time: 2018-06-05 11:20:25 Total task time: 2018-06-05 11:20:25 10s average task traffic: 205B/s record write speed: 5rec/s number of read records: 50 number of read/write failures: 0Copy the code

summary

DataX is relatively easy to use, but it is difficult to write tables from different databases in the same configuration file. To read multiple tables and write to them, you have to configure them separately. But it also solved some problems.

www.hchstudio.cn/article/201…