Ali offline data synchronization tool DataX trampling record

Recently, I have been doing some work related to data migration, investigated some tools, and found DataX is a good thing, so AMway will give you. So what is DataX? DataX is a widely used offline data synchronization tool in Alibaba Group, realizing efficient data synchronization between various heterogeneous data sources including MySQL, SQL Server, Oracle, PostgreSQL and so on.

The main function

As a data synchronization framework, DataX abstracts the synchronization of different data sources into Reader plug-in that reads data from the source data source and Writer plug-in that writes data to the target. Theoretically, DataX framework can support data synchronization of any data source type. At the same time, the DataX plug-in system serves as a set of ecosystem. Every time a new data source is connected, the newly added data source can realize the interconnection with the existing data source. For details, please go to DataX

System requirements

Linux
JDK(above 1.8, 1.8 recommended)
Python (recommended Python2.6. X)
Apache Maven 3.x (Compile DataX)
Set the HEAP memory of the JVM. The heap memory must be greater than 1 GB, otherwise it will not start

export JAVA_OPTS= -Xms1024m -Xmx1024m
Copy the code

Quick start

Deploy DataX

Method 1, direct download DataX toolkit: DataX download address

Decompress the file to a local directory and go to the bin directory to start the synchronization job:

$ cd  {YOUR_DATAX_HOME}/bin    
$ python datax.py {YOUR_JOB.json}
Copy the code

DataX source code (1), download DataX source code:

$ git clone [email protected]:alibaba/DataX.git Copy the code

(2) Package through Maven:

$ cd {DataX_source_code_home} $ mvn -U clean package assembly:assembly -Dmaven.test.skip=true Copy the code

If the package is successfully packaged, the following information is displayed:

[INFO] BUILD SUCCESS [INFO] ----------------------------------------------------------------- [INFO] Total time: 08:12 min [INFO] Finished at: 2018-06-05T16:26:48+08:00 [INFO] Final Memory: 133M/960M [INFO] ----------------------------------------------------------------- Copy the code

The DataX package is located at {DataX_source_code_home}/target/ DataX/DataX /,

Generating a Configuration File

Step 1: Create a configuration file (JSON format)

You can run the following command to generate a configuration template:

python datax.py -r oraclereader -w mysqlwriter > oracle2mysql2.json
Copy the code

{DataX_source_code_home} * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Write data to mysql.

{ "job": { "content": [{ "reader": { "name": "oraclereader", "parameter": { "column": ["*"], "connection": [{ "jdbcUrl": ["*"], "table": ["tb1"] }], "password": "***", "username": "***" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": ["*"], "connection": [{ "jdbcUrl": "*", "table": ["tb1"] }], "password": "**", "preSql": [], "session": [], "username": "**", "writeMode": "insert" } } }], "setting": { "speed": { "channel": "3"}}}}Copy the code

Finally: Start DataX

$ cd {YOUR_DATAX_DIR_BIN}
$ python datax.py ./oracle2mysql2.json
Copy the code

When the synchronization is complete, the following logs are displayed:

. 2018-06-05 11:20:25.263 [job-0] INFO JobContainer - Task start time: 2018-06-05 11:20:15 Task end time: 2018-06-05 11:20:25 Total task time: 2018-06-05 11:20:25 10s average task traffic: 205B/s record write speed: 5rec/s number of read records: 50 number of read/write failures: 0Copy the code

summary

DataX is relatively easy to use, but it is difficult to write tables from different databases in the same configuration file. To read multiple tables and write to them, you have to configure them separately. But it also solved some problems.

Ali offline data synchronization tool DataX trampling record

The main function

System requirements

Quick start

Deploy DataX

Generating a Configuration File

summary

Related Posts

HashMap Multithreading PUT causes NULL

The construction of Kafka-Eagle monitoring interface

Spring Boot integrates Shiro combat