background

During the construction of the data warehouse, some business systems used the Daemon database to store the original data. Now the data needs to be synchronized to the cloud database MemfireDB through DataX for analysis. MemfireDB is a representative of NewSQL database system, which has the characteristics of high concurrency and elastic expansion, and is used as a storage system for data warehouse. There are many problems encountered in the process, and I will record them here.

Download the Datax toolkit

wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

After downloading, unzip and enter the bin directory

cd /opt/datax/bin

Execute the self-check script to check that the environment configuration is OK

Python2.7 datax. Py.. /job/job.json

If the screen printing is not abnormal, the environment configuration is normal, otherwise check whether the running environment meets the following requirements

Linux JDK(1.8 +, recommended 1.8) Python(recommended Python2.6.x) Apache Maven 3.x (Compile Datax)

The data source type supported by Datax

Source: https://github.com/alibaba/DataX

type	The data source	Reader (read)	Writer (write)	The document
RDBMS relational database	MySQL	Square root	Square root	read 、write
	Oracle	√	√	read 、write
	SQLServer	Square root	Square root	read 、write
	PostgreSQL	Square root	Square root	read 、write
	DRDS	Square root	Square root	read 、write
	Generic RDBMS(support for all relational databases)	Square root	Square root	read 、write
Ali cloud data warehouse data storage	ODPS	Square root	Square root	read 、write
	ADS		Square root	write
	OSS	Square root	Square root	read 、write
	OCS	Square root	Square root	read 、write
NoSQL data store	OTS	Square root	Square root	read 、write
	Hbase0.94	Square root	Square root	read 、write
	Hbase1.1	Square root	Square root	read 、write
	Phoenix4.x	Square root	Square root	read 、write
	Phoenix5.x	Square root	Square root	read 、write
	MongoDB	Square root	Square root	read 、write
	Hive	Square root	Square root	read 、write
	Cassandra	Square root	Square root	read 、write
Unstructured data storage	TxtFile	Square root	Square root	read 、write
	FTP	Square root	Square root	read 、write
	HDFS	Square root	Square root	read 、write
	Elasticsearch		Square root	write
Time series database	OpenTSDB	Square root		read
	TSDB	Square root	Square root	read 、write

View the configuration template by command

As can be seen from the table above, both the database and the MemfireDB database as the source of synchronization support JDBC only. There is no separate plug-in to support the synchronization process in DataX, so we can only choose the way of general RDBMS synchronization, through the following command to view the configuration template

Python2.7 datax.py --reader rdbmsreader --writer rdbmswriter

Save the command line output to a load.json file and adjust the parameters with your own environment.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "rdbmsreader", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": [], 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "username": "", 
                        "where": ""
                    }
                }, 
                "writer": {
                    "name": "rdbmswriter", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": "", 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "preSql": [], 
                        "session": [], 
                        "username": "", 
                        "writeMode": ""
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

Parameters that can see reading: https://github.com/alibaba/Da… Write: https://github.com/alibaba/Da…

With the load.json file configured, the synchronization process begins

Python2.7 datax. Py load. Json_bak

The following is a screenshot of the successful execution

Debugging process

No suitable driver found

The synchronized source database is not registered in Datax. You need to register the plugin in the file “.. / plugins/writer/rdbmswriter/plugin. Json “the drivers of adding new drive in the array class, at the same time, need to drive the jar package is copied to the.. /lib/ directory.Pay attention toThis is not consistent with the official GitHub description, which is to copy the JAR package to the “.. / plugins/writer/rdbmswriter/libs/”, if it is copied to the directory, there could still be the mistake. By looking at the datax.py file, we can see that class_path is set to… /lib directory, as shown below

Incorrect configuration of writeMode

In the generated template, writeMode is set to an empty string, but the general RDBMS determines whether this variable is set, and whether it gets an empty value through getString. The code is as follows:

So here we need to delete the line writeMode from load.json to solve this problem.

Illegal job. Setting. Speed. The channel] value

The channel in the generated template is set to an empty string, but what is really needed is a numeric variable. Adjust this to a numeric variable to solve the problem.

“exception”:”Value conversion failed

When you create a table on the destination, you incorrectly set the applied field to datetime when the applied field in the data source is of datetime, and throw a conversion error exception. Fix the problem after rebuilding the table.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

The pit you stepped in when synchronizing data from AliDataax to MemfireDB

background

Download the Datax toolkit

After downloading, unzip and enter the bin directory

Execute the self-check script to check that the environment configuration is OK

The data source type supported by Datax

View the configuration template by command

With the load.json file configured, the synchronization process begins

Debugging process

No suitable driver found

Incorrect configuration of writeMode

Illegal job. Setting. Speed. The channel] value

“exception”:”Value conversion failed

The pit you stepped in when synchronizing data from AliDataax to MemfireDB

background

Download the Datax toolkit

After downloading, unzip and enter the bin directory

Execute the self-check script to check that the environment configuration is OK

The data source type supported by Datax

View the configuration template by command

With the load.json file configured, the synchronization process begins

Debugging process

No suitable driver found

Incorrect configuration of writeMode

Illegal job. Setting. Speed. The channel] value

“exception”:”Value conversion failed

Related Posts

Best practices for containerized deployment of TDEngine

Aliyun hybrid cloud Apsara Stack heavy upgrade, for the government and enterprises to build the cloud high-speed

VMware business systems migrate to the cloud solution