Input components

The input to the kettle, which is used to extract or generate data, is E for the ETL operation.

The CSV file

A text file with a fixed format. Note: you can adjust your own field type and format to meet your own needs.

Text file input and log data extraction are common operations in development. Log information is basically of text type.

Operation example:

The first step is to get the text file to extract.

You can choose your own delimiter.

Get the fields as follows:

Microsoft Excel currently has two file suffixes: XLS and XLSX. Note: XLS: before 2007, XLSX: after 2007.

You can choose your own Sheet.

Get Excel input fields, you can manually adjust the field type.

Multi-file merge, data is often in the form of multiple files, some data will be scattered in multiple sub-folders. So merging data is also a very common operation in development.

XML stands for EXtensible Markup Language

XML is designed to transfer and store data.

XPath stands for XML Path Language

It is a language for determining the location of a part of an XML document. XPath is based on an XML tree structure and provides the ability to find nodes in a tree of data structures.

XPath syntax –

Select nodes XPath uses path expressions to select nodes in an XML document. Nodes are selected by following a path or step. The most useful path expressions are listed below:

XPath, path expression, example, as follows:

The Get Data from XML component is used as follows:

Get all paths to the XML document. If you need to consider namespaces, you can check them, as shown below:

field

Here to obtain the field, if you do not get the result you want, you need to manually write the name, XML path, remember, XML path is relative to their final path to obtain data, node, result type, type and so on you can choose their own.

Records generated

The vast majority of data in data warehouse are dynamic data generated by business system, but part of dimension data is not dynamic, such as date dimension. Static dimension data can be generated ahead of time.

JSON(JavaScript Object Notation) is a lightweight data interchange format. JSON core concepts: arrays, objects, and properties. Array: [], object: {}, attribute: key:value.

JSONPath is similar to XPath positioning in XML documents, and JSONPath expressions are typically used to path retrieve or set Json. The expressions can accept the “dot – notation” and “bracket – notation” formats.

1).store. Book [0].title Recommend dot notation oh.

[‘ store ‘][‘ book ‘][0][‘ title ‘]

The JSONPath- operator, as follows:

JSON Input primitively used as follows:

Field, according to their own needs, to fill in the field. Pay special attention to the path writing and specification oh.

You can use the JSON Input multiple times to query the desired field.

The second JSON Input gets the JSON value of the first JSON Input.

Insert the JAR package into the \ pdi-CE-8.2.0.0-342 \data-integration\lib directory. For example, insert the JAR package into the \ pdi-CE-8.2.0.0-342-data-integration \lib directory.

 

Database driver is a program developed by different database developers (such as Oracle mysql, etc.) for a certain development language (such as Java) to achieve unified database calls. It acts as a translator.

Enter the Kettle table as follows:

Output component

Output is the second category in the transformation. Output L of ETL, which is Load. Microsoft Excel currently has two file suffixes: XLS and XLSX. XLS: Before 2007. XLSX: after 2007.

 

Excel output is different from Microsoft Excel output. Excel output can only generate files with XLS suffix, while Microsoft Excel output can generate files with XLS suffix and XLSX suffix.

Operation example:

Excel output, you can get fields, as shown below:

Microsoft Excel output, specify the output file format, can choose XLSX, XLS format.

Get the fields as follows:

Text file output, data operations are common formats: TXT and CSV.

Get the fields as follows:

SQL file output exports the structure and data of a database table.

The output of the Kettle table is to write data to the specified table.

The renewal of the Kettle

It compares existing records in the database with those in the data stream, and updates them if they are different. Note: If the record does not exist, an error occurs!

 

Update the Kettle insert

It compares existing records in the database with those in the data stream, and updates them if they are different. If the record does not exist, data is inserted!

Custom constant data

Is to generate constant data in the form of key-value.

Custom constant data

Specifies the value of a field constant.

delete

Deletes data from a database table for the specified condition.