StreamSets provide a powerful expression language that can be used to fetch source fields, source properties, and better organize data passing to target components for processing. Therefore, mastering the expression language will improve your StreamSets development ability.

Learning series

  1. Data docking -ETL’s StreamSet Learning Journey I
  2. Data docking -ETL’s StreamSet Learning Journey II
  3. Data docking -ETL’s StreamSet Learning Journey III
  4. Data docking -ETL’s StreamSet Learning Journey IV
  5. Data docking -ETL’s StreamSet Learning Journey V
  6. Data docking -ETL’s StreamSet Learning Journey vi
  7. Data docking -ETL’s StreamSet Learning Journey seven micro services
  8. Data docking -ETL’s StreamSet learning journey starts at 8 o ‘clock
  9. Data Docking -ETL’s StreamSet Learning Journey 10 Face to face with RabbitMq
  10. – StreamSet Learning In ETL – WebHook up to Nine Pipelines
  11. Data docking -ETL StreamSet Learning Journey 11 Mysql sync to Snowflake

1. Expression language introduction

The Expression language of StreamSets is based on the expression syntax of JSP 2.0, so if you are familiar with JSP, this article should be a quick read. If not, read the following chapters to help you learn.

Expressions can be applied to processing components, such as “Expression Evaluator” or “Stream Selector” components, as well as other target components that support expressions. When using expressions, StreamSets support expression completion, so you can pop up a selection box to choose which expression to use, which is a huge help library for beginners.

The following can be used in an expression:

  • constant
  • The column name of the record
  • function
  • Predefined character
  • Sign of operation
  • Runtime parameters
  • Runtime properties
  • Runtime resources

1.1 constant

  • Data type constant
  • NULLconstant

You can use it in expressionsNULL, you can checkNULLValue or sets the value toNULLFor example, we added the following Settings to the expression evaluation component:Set the following expression in the flow selection component

The overall process is as follows:

If we click the eye button to preview the data, we can see that the data field of the data record is set to null, and the data flows to branch 1 when the component is selected by clicking stream.

1.2 Predefined Characters

The expression language includes the following text:

  • Boolean characters – “true” and “false”
  • The integer
  • Floating point Numbers
  • String with single or double quotes
  • Escape character of string:\, such as:\ \Escape to the \ character
  • NULL- Identifies a NULL value

Such as:

#returntrue
${"true" == (record:value("/data")==NULL)}
Copy the code

1.3 Operation Symbol

  • Arithmetic operations: +, -, *, /, %
  • Logic operations, &&, | |,!
  • Compare: ==,! =, <, >, <=, >=
  • Empty: Determines whether the value is null or an empty string.
  • If the else then:? :

Such as:

#Check whether it is empty
${ empty record:value("/data")}
Copy the code

Precedence of operators [] > () > unary operators (-,! , empty) >* / %> + – > <, >, <=, >= >=,! = > & > | | >? :

1.4 the function

  • Record operation: torecord:At the beginning, for example:record:value(<field path>)
  • Base64 encoding: inbase64:At the beginning, for example:base64:decodeBytes(<string>)
  • Column function: tof:At the beginning, for example:f:index()
  • File function: tofile:At the beginning, for example:file:fileExtension(<filepath>)
  • Mathematical function: tomath:At the beginning, for example:math:abs(<number>)
  • Pipe function: topipeline:At the beginning, for example:pipeline:name()
  • String function: tostr:At the beginning, for example:str:length(<string>)
  • Time function: totime:At the beginning, for example:time:now()
  • Credential function: tocredentialAt the beginning, sensitive information can be retrieved from a secure credential store.
  • Data generation function: can generate address, financial related, email, URL, people related information, Xeger related function (can generate random data according to the re)
  • Other functions:

alert:info() emptyList() emptyMap() every(, <mm() | ss()>) field:field() isEmptyList() isEmptyMap() jvm:maxMemoryMB() length() list:join(, ) list:joinSkipNulls(, ) offset:column() runtime:availableProcessors() runtime:conf() runtime:loadResource(, <restricted: true | false>) runtime:loadResourceRaw(, <restricted: true | false>) sdc:hostname() sdc:id() size() uuid:uuid() vault:read(, ) – Deprecated vault:readWithDelay(, , ) – Deprecated

2. Expression completion

To make it easier to use StreamSets, expression completion is provided for expression writing by pressing ${} as you typeCtrl+ space hot keyThe expression support function or parameter will pop up, but because the hotkey is usually occupied by the input method, so it may not be used, but it doesn’t matter, as long as you remember the first letter of the function to call, type the first letter to pop up the selection drop-down box.The expression list uses color and initials to distinguish different categories:

  • The blue F identifies the record field type
  • The red C identifies constant types
  • The green M identifies the function type
  • The purple P identifies the defined runtime parameters

When you use the down arrow to select the corresponding expression, a function prompt is displayed on the right.

3. Expression evaluator modification scope

  • Fields can be added or modified
  • Record header properties can be added or modified
  • Field attributes can be added or modified

Field names start with a slash, property names do not. Expressions contain the qualification in ${}, and all expressions are written inside {}.

4. Expression examples

4.1 Conditional Expressions

The serial number expression meaning
1 ${record:value('/payment_type') == 'CRD'} When used in the “Stream Selector” component, this condition routes records whose payment type is credit card to the Stream branch.
2 ${record:value('[3]/State') == "MD"} When used in the “Stream Selector” component, this condition routes records in the list set state MD to the Stream branch.
3 ${record:errorCode()=="FIELD_MERGER_02"} When used in the “Stream Selector” component, this condition routes records with error code FIELD_MERGER_02 to the Stream branch.
4 ${record:value("[20]/value") == ""} When used in alerts, an alert is triggered when the specified field does not contain any data.
5 ${record:type('/ID')=STRING} When used in the “Stream Selector” component, a record with an ID of a string is routed to a branch that contains a field type converter to convert the field to a numeric data type.

4.2 If-then-else? :

${record:value('/gender') = ='1'?'M':(record:value('/gender') = ='2'?'F':'U')}
Copy the code

4.3 String Operations

Replace the null value with an unknown string

${record:valueOrDefault('/Payment'.'unknown')}
Copy the code

4.4 Digital Operations

This expression calculates travel revenue by subtracting tips, taxes, and tolls from the total fare

${record:value(' /total_amount ') - (record:value(' /tip_amount ') + Record :value(' /tolls') + record:value(' /mta_tax '))}
Copy the code

5, summary

Using expressions, you can complete rich data modification and organization functions. So mastering expression writing is one of the necessary skills for advanced StreamSets.