“Little knowledge, big challenge! This article is participating in the creative challenge of “Essentials for Programmers”.

The Kettle is an open-source ETL tool developed in Java and used to migrate data between databases. It runs on Linux, Windows, and Unix. There’s a graphical interface, there’s a command script and you can do secondary development.

The kettle’s website is community.hitachivantara.com/docs/DOC-10… The address is github.com/pentaho/pen…

The installation

The following uses Windows as an example. The configuration on Linux is similar.

JDK installation and configuration of environment variables

Because the Kettle is Java-based, you need to install the Java environment and configure the JAVA_HOME environment variable.

You are advised to install JDK1.8 or later. The KETTLE version 7.0 or later does not support JDK of an earlier version.

Download the kettle

Download the kettle file from the official website and decompress the file to a local PC.

Download the corresponding database driver

The Kettle needs to connect to the database, so you need to download the database driver.

For example, you need to download mysql-connector-java.jar for the MySQL database and ojdbc.jar for the Oracle database. After the kettle is decompressed, place the JAR in the lib folder.

Note: This article is based on pDI-CE-7.0.0.0-25. There may be differences in earlier versions.

Start the

Double-click Spoon. Bat to start Kettle.

conversion

A transformation consists of one or more steps connected by hops. A hop defines a one-way channel that allows data to flow from one step to another. In a Kettle, the units of data are rows. Data flow is the movement of rows from one step to another.

1. Open kettle, click File > New > Convert.

2. Click New at the left DB connection.

3. Configure the database as prompted. After the configuration is complete, click Test to verify.

4. Locate the table input on the left (Core Object -> Input -> Table Input) and drag it to the right.

5. Double-click the input table on the right to configure, select the data source, and enter SQL. You can click preview to preview the data.

6. Locate Insert/Update on the left (Core Object -> Output -> Insert/Update) and drag it to the right.

7. Hold down the Shift key and wire the table input to insert/update.

8. Double-click Insert/Update to configure.

9. Click Run to run the transformation.

10. After the run, we can see the results below, including logs, data preview, etc. We can see how many data were read, how much data was inserted and updated, etc.

This completes one of the simplest transformations, fetching data from one table and inserting updates into another.

homework

If you want to run the transformation periodically, you need a job.

Create a new job.

2. Drag START, Transform and Success from the left to the right and connect them with a line.

3. Double-click START to configure the running interval of the job, which is configured to run once per hour.

4. Double-click the transformation and select the newly created transformation.

5. Click Run to run the job, and click Stop to stop it. You can see the log of the run in the result below.

This completes the simplest job of migrating data from the source table to the target table every hour.

The Kettle is a powerful ETL tool that can be used to migrate data using graphical user interface (GUI) configuration without using development code.

With its jobs, kettle can run the conversion automatically.

Reference blog.csdn.net/qqfo24/arti…