Kettle is an open source ETL tool, implemented in pure Java. It can run on Windows, UNIX and Linux, and provides a graphical interface. It can easily define the data transfer topology by dragging and dropping controls. Basically introduce the Kettle-based MaxCompute plug-in to implement cloud over data.

Kettle version: 8.2.0.0-342

MaxCompute JDBC Driver version: 3.2.8

Setup

  1. Download and install Kettle
  2. Download the MaxCompute JDBC Driver
  3. Place the MaxCompute JDBC Driver in the lib subdirectory under the Kettle installation directory (data-integration/lib)
  4. Download and compile MaxCompute Kettle plugin:https://github.com/aliyun/ali…
  5. Place the compiled MaxCompute Kettle Plugin in the lib subdirectory under the Kettle installation directory (data-integration/lib)
  6. Start the spoon,

Job

We can use Kettle + MaxCompute JDBC Driver to organize and execute the tasks in MaxCompute.

First, you need to do the following:

  1. The new Job
  2. A new Database Connection

The JDBC connection string format is: JDBC :odps:? project=

The JDBC driver class: com. Aliyun. Odps. JDBC. OdpsDriver

AliCloud AccessKey ID

Password for AliCloud AccessKey Secret

JDBC configuration to see more: https://help.aliyun.com/docum…

MaxCompute can then be accessed through SQL nodes as required by the business. Let’s take a simple ETL procedure as an example:

The CREATE TABLE node is configured as follows:

Note:

  1. Here the Connection needs to be selected as configured
  2. Do not check Send SQL as Single Statement

Load from OSS node configuration is as follows:

The point to note is the Create Table node. For more usage of the Load, see: https://help.aliyun.com/docum…

The Processing node is configured as follows:

The point to note is the Create Table node.

Transformation

We can use the MaxCompute Kettle Plugin to implement data flowing out or into MaxCompute.

Create a new Aliyun MaxCompute Input node with the following configuration:

Create an empty table in MAXCOMPUTE with the same schema as TEST_PARTITION_TABLE.

Create a new Aliyun MaxCompute Output node with the following configuration:

When the Transformation was performed, the data was downloaded from test_partition_table and uploaded to test_partition_table_2.

other

Set the MaxCompute flags



Before executing DDL/DML/SQL, set key=value; Way to configure Flags.

Script mode

Temporarily unable to support

The original link

This article is the original content of Aliyun, shall not be reproduced without permission.