This is the 7th day of my participation in the November Gwen Challenge. Check out the details: The last Gwen Challenge 2021

Translation reference

This article mainly refers to Updating Rows in Incremental Loads – Level 4 of the Stairway to, translated from the Stairway to Integration Services series Integration Services aims to have a comprehensive and clear understanding of SSIS. All contents are carried out on the basis of the original text. Due to various reasons such as version differences and personal doubts, the principle of complete translation is not adopted. Hopefully eventually it will be more conducive to learning and understanding SSIS,

Thanks for your support!

Incremental loading of update rows

Updating Rows in Incremental Loads Update Rows in Incremental Loads

In the previous section (incremental loading of new data), we implemented a re-executable SSIS package. Load only new rows by building a data flow task.

The following describes implementing incremental loading of updated rows (loading updated rows from the source table to the target).

Simulated update code

As in the previous article “Incremental loading of new rows”. The target table dbo.FullName is modified by code to achieve the effect of data update.

Start SSMS, connect to SQL Server, and run the following statement:

Use AdventureWorks2012
go
Update dbo.FullName
 Set MiddleName = 'Ray'
 Where MiddleName Is NULL
Copy the code

When this statement (F5) is executed, 8499 records are updated in the target table. In this way, the data in the target table is different from that in the source table, and the data in the source table is updated.

Implement incremental loading of updated data

Modify the return information for finding the transformation target table

Open SQL Server Data Tools and the previous SSIS solution, FirstSSIS, under the Data Flow TAB, where you need to modify Lookup Transformation.

Double-click Find Transformation to open the Find Transformation editor:

In the “Columns” option, we previously defined the mapping between the BusinessEntityID column and the available input column and the available lookup column. A lookup is similar to a Join query: a two-column Join line indicates the ON clause in the Join, which defines the matching rule that performs the lookup function.

Previously we did not use a check box to select any of the available lookup columns. If the lookup transformation is similar to a Join, these checkboxes are the mechanism for adding columns from the Join table to the SELECT clause.

Click the check box to the left of the Head heading “Name” in the available lookup column to “Select all”. At this point, tables for Lookup Columns, Lookup Operations, and Output aliases can be seen below the available input and Lookup columns.

The output alias is used to replace the field name of the column returned from the available lookup column. Similarly to JOIN, the output alias is similar to aliasing the column with AS in the SELECT clause that uses the JOIN statement. To identify rows returned from lookup operations, authors often use the “LkUp_”, “Dest_” alias prefixes to distinguish columns from OLE DB sources and from lookup transformations. In addition, SSIS appends “(1)” to column names if they are the same.

Here is a screenshot with the prefix “LkUp_” :

Let’s go back.

The OLE DB source is loaded into the Data Flow from the Person.Person table, and then through the Data Flow pipe into the lookup transformation Flow, with the target table Dbo.fullname accessed through a T-SQL query. The lookup transformation opens the target table and attempts to match the records that exist in the data flow pipeline and the target table. When No Match is found, the unmatched line is sent to “No Match Output”.

We changed the configuration of the lookup transform so that when the lookup transform found a match in the BusinessEntityID column of the source (available input column) and the target table, The values for all columns of BusinessEntityID, FirstName, LastName, and MiddleName are returned from the target table.

Note: Full column matching is performed directly here, and the newly added row and updated row data can also be screened out by no matching output.

But update and insert require different statements to execute, and if so, update the insert mode in the OLE DB target to SQL statements that determine that no insert exists update (one more judgment). More importantly, this is not the case with batch updates (which can significantly improve execution speed and reduce runtime).

Therefore, the following conditional split is necessary and correct, but also to learn more about the other components!

Add commands and conditional split transformations for incremental updates

Drag an OLE DB Command Transformation and a Conditional Split Transformation from the data flow canvas.

Click “Find” transform and drag the green data flow path to “Conditional Split”. The only green data stream path left over from the Lookup transform is Lookup Match Output — so no selection pops up at this point. In addition to the data contained, there are other differences between the “match output” and “mismatch output” of a lookup transformation, the biggest being columns.

Add a lookup column to the source row

Let’s look at the lookup columns that the output of the lookup transformation is added to the source data row.

Looking at the input column, right-click the OLE DB source and find the data flow path between the transformation, and click “Edit”. Click on the Metadata page of the Data Flow path editor to see a table of path Metadata that is entered from the OLE DB source to find the columns for the transformation.

Close the data flow path editor.

Right-click the data flow path between Lookup No Match Output and the OLE DB target (named “FullName”), then click “Edit” and click on the Metadata page in the Data flow path editor.

As you can see, the metadata for finding the transformation without matching the output is the same as the metadata for finding the transformation input. This is by design: find only lines that pass through the transformation without a match.

Close the data flow path editor.

Because of the lookup transformation configured above, the matching records are found to be a little different: additional columns from the target are returned. Right-click the “Lookup Match Output” data flow path between the Lookup transform and the conditional split transform.

In “Metadata,” it’s easy to see the additional columns that have been added to the data flow path — especially if we’ve added an alias prefix to identify the columns returned from the match operation to find the transformation. Then close the editor.

In the incremental load of the new row data, we configure the lookup to have no match output and do not look at the columns returned from the lookup table because they are not required for the incremental load of the new row. But here, we need them to compare the column values of the source and target tables so that we can check for changes.

Change detection (Change Detection)

Change detection is a sub-science of ETL. The approach discussed here is a good place to start. Remember, we are demonstrating a principle.

“Find match output” (Lookup Match OutputWhat is the meaning of “Match” in)?

It means that the BusinessEntityID column in the source table (Person.Person, loaded into the data flow task through the OLE DB source adapter) and the target table (dbo.fullName, The BusinessEntityID column in the Lookup transformation access) has the same value. We know that the BusinessEntityID column values match, but we are not sure if the other columns match.

If all the source and target column values match, then the source record has not changed — consider excluding it. If the source changes, we need to capture those changes and copy them to the target. This is the goal of Incrementally Loading updates.

There are two parts to this: first detect the difference, then apply the update.

Complete change detection in SSIS

Edit conditional split

Open the Conditional Split Transformation editor (double-click or right-click Conditional Split Transformation).

The “Conditional Split” transformation editor is divided into three sections. The upper left section contains two virtual folders: Variables and Parameters and Columns.

As above, expanding the “Columns” folder, we need to detect differences in the source and target tables by comparing the FirstName, LastName, and MiddleName columns. The BusinessEntityID column has already been looked up for internal matches.

First compare the FirstName column, click “FirstName” in the “Columns” list and drag it to the “Condition Column” in the lower section of the Conditional Split Transformation editor.

After you release FirstName to the “condition” column, validation happens anywhere you click. The text is red here because the current example (case) validation failed.

Why did authentication fail here? The condition must be a Boolean (True or False). But FirstName is a string value. If you click OK to close the conditional split Transformation editor, you get a very useful error message:

It’s not finished yet. Contains SSIS Expression Language syntax in the top right of the conditional Split Transformation Editor

SSIS is a difficult language to learn.

We need to detect rows where FirstName is not equal to LkUp_FirstName. In the “SSIS Expression Language “section, expand the Operators folder and select the unequal Operator! =.

Click on the not equal operator and drag it to the right of the FirstName column in the “Case Condition Expression (below the editor)” column.

Next, drag LkUp_FirstName from the “columns” folder to the right of the non-equal operator in the column subcondition expression.

Because the expression is now a Boolean value in the case Condition field, the text is shown in black as it passes validation.

What do case conditions check for? It checks the FirstName and LkUp_FirstName values from a row of data when the BusinessEntityID columns of the source and target tables match; If the FirstName column is different, the source and destination are different. Because the target is usually behind the source, the source is assumed to be new, better, and more accurate data. In this case, we need to get the data into the target.

Because no changes were made to the FirstName column (only the MiddleName column values were changed) in the previous test Settings query. The non-equal test for the FirstName column here will always return False. The values are not unequal because they are all equal.

We need to add case Condition expression to capture all changes.

Below, implement the change-detection Condition expression that isolates this part by wrapping the Condition expression in parentheses

Now you need to add a test condition for the MiddleName column.

So let’s think about, what are we looking for? We check for any differences between the source and target column values. If a change occurs, an update to the target can be triggered. If more than one change occurs, one of them is sufficient to trigger the update. Therefore, we should examine inequalities between one column or another. The operator in the last statement is “Or”.

To represent Logical OR (To represent Logical OR), will SSIS expression language operator list, scroll To the bottom, drag the | | (Logical OR) (Logical OR) example To conditional expression closing parentheses in the right side.

You then add the parentheses () after the conditional OR operator.

Drag a non-SSIS expression language operator to the right of the MiddleName column from the “Columns” folder in the upper left part of the editor, and then drag the LkUp_MiddleName column between the non-sSIS operator and the bracket finalizer as follows:

Again, you see example conditional validation. Condition tests for differences between the FirstName and LkUp_FirstName columns, or MiddleName and LkUp_MiddleName.

Next, add the logical implementation that or LastName is not equal to LkUp_LastName. Finally, case Condition expression is displayed as follows:

(FirstName ! = LkUp_FirstName) ||( [MiddleName] ! = [LkUp_MiddleName] )||( [LastName] ! = [LkUp_LastName] )
Copy the code

Finally, before closing the Conditional Split Transformation Editor, rename the Output name “Case 1” to Updated Rows. (At the same time, you can remove unnecessary square brackets and Spaces from the expression according to your preference).

Case in case Condition or case Condition Expression might be better translated as “case”.

Fix the example conditional expressioncase Condition Expression

The case Condition Expression is run incorrectly because the MiddleName is nullable in the table definition, and the column does have a null value. When two NULL values are compared equally (or unequally), the return value is still NULL, causing the conditional expression to end up with NULL instead of True or False. An error occurred and cannot be executed. The other two columns are defined as non-null values and are not considered.

MiddleName ISNULL with the ISNULL function of the SSIS conditional expression as follows:

(FirstName! =LkUp_FirstName) ||
( (ISNULL(MiddleName)&&!ISNULL(LkUp_MiddleName)) || 
  (!ISNULL(MiddleName)&&ISNULL(LkUp_MiddleName)) ||
  (!ISNULL(MiddleName)&&!ISNULL(LkUp_MiddleName)&&MiddleName! = LkUp_MiddleName) 
) ||
(LastName! =LkUp_LastName)
Copy the code

A small summary

What have we accomplished? The rows from the lookup Transformation match output flow into Conditional Split Transformation, That is, the BusinessEntityID column and the LkUp_BusinessEntityID column have the same data in each row (these two columns are used to perform the matching operation in the lookup transformation). A condition is configured in conditional split and named “Updated Rows”. The Updated Rows condition captures Rows that differ between one (more, or all) of the FirstName, MiddleName, or LastName column values and the corresponding copy of the “LkUp_” prefix.

It is important to note that conditional split transforms move rows to different outputs. When a condition is defined as above, a new output is created — a new path — from which a Conditional Split Transformation outputs.

What about the row whose source column value is the same as the target column value it matches? That’s a good question. Below the “Updated Rows” conditional grid you just defined, you’ll see “Default output name :”.

This is where unqualified rows are sent in a conditional split transform.

Click “OK” to close the editor.

Click on the conditional Split transform, drag the green data stream path to the OLE DB Command, and select the “Updated Rows” output from the conditional split when prompted.

The data flow task now looks like this:

OLE DB command conversion (OLE DB Command Transformation)

OLE DB command conversion can be used to execute SQL commands

Next use OLE DB command transformation to update the different rows in the rows that match the target table and the source table.

Above we have detected Rows in the source table that are different from those in the target table and sent these Updated Rows to the OLE DB command conversion via the “Updated Rows” output. Configure the update function of the “OLE DB command “.

Double-click “OLE DB Command “to open” The Advanced Editor for OLE DB Command “.

On the Connection Managers TAB page, set the Connection Manager drop-down list to “[server-name].AdventureWorks2012.sa”. Click the Component Properties page and scroll to the SqlCommand property. Click the ellipsis of the SqlCommand property Value text box to open the String Value Editor. Enter the following T-SQL statement:

Update dbo.FullName 
  Set FirstName = ?
, MiddleName = ?
, LastName = ?
 Where BusinessEntityID = ?
Copy the code

Note that each line of the SQL command in the editor must have a space or symbol

In OLE DB, the question mark (?) Represents a parameter placeholder. We will map these parameters in the next TAB. Click OK to close the string value editor.

Click on the “Column Mappings” TAB.

Question mark (?) The tag is a zero-based array, that is, Param_0 represents the first question mark… . We do this by dragging and dropping fields one by one from Available Input Columns to parameters from Available Destination Columns. Columns from the source and target tables exist in the Available input columns. The column with the prefix “LkUp_” contains the target data (which is returned in the lookup transformation). Since we only want to map columns from the source table (usually assuming that the source contains data that must be updated). The first question mark represents FirstName, so the FirstName column mapping Param_0, MiddleName column mapping Param_1, LastName column mapping Param_2, and BusinessEntityID column mapping Param_3:

Parameter mapping and OLE DB commands have been configured. Click OK to close the editor.

The data flow task should look like this

Perform an incremental update of the update row

Click the “Start Debugging” button (or F5) to run the SSIS package.

In the “Progress” TAB of the SSIS project (the TAB name changes to “Execution Result” when you exit Debug run mode), you can see the status of the Debug run.

As above, it took more than four seconds from start to finish.

Update statements in OLE DB commands take the most time. The reason is that OLE DB commands process one row at a time, which is similar to a cursor. Row-based operations are slow.

Set based updates (Set-Based Updates)

Is there a way to avoid row-based operations? The answer is yes.

Add OLE DB target

Stop the SSIS package, click “OLE DB Command “in the Data flow task and delete it.

Drag an “OLD DB target “(OLD DB Destination) from the toolbar to that location and connect the “Updated Rows” output of “Conditional Split”.

Right-click the OLD DB target and rename it to “StageUpdates” and double-click to open the target editor. The connection manager selects [server-name].AdventureWorks2012. The data access mode is Table or View — Fast Load.

Then, click the New button next to the Table or View name drop-down menu to open the create table dialog box

First remove columns prefixed with “LkUp_”, store only data from the source for updates, and modify other column definitions:

CREATE TABLE [StageUpdates] (
    [BusinessEntityID] [int] PRIMARY KEY,
    [FirstName] [nvarchar](50) NOT NULL,
    [MiddleName] [nvarchar](50),
    [LastName] [nvarchar](50) NOT NULL
)
Copy the code

Click ok and the StageUpdates table will be created in the database connected to the OLE DB connection manager selected above (in this case AdventureWorks2012 database).

Click on the “Mappings” page.

You can see that the columns are mapped automatically because the column names match the data types. Click OK to close the editor.

Add execute SQL task (Execute SQL task)

Next we need to manage the rows in StageUpdates. The above update rows go into the StageUpdates table and we need to apply them to the dbo.fullName table.

Click the Control Flow TAB and add a Execute SQL Task to the canvas from the toolbox on the left. Click on the Data Flow Task and attach the green “Precedence Constraint” to the “Execute SQL Task”.

We perform a collection-based update in “Execute SQL Task”. Instead of going through a loop to update records one at a time, it can update all at once.

Double-click “Execute SQL Task” to open the Execute SQL task editor. On the General page, change the name attribute to “Apply Staged Updates” and the Connection attribute to “[server-name].AdventureWorks2012.sa”. Click the ellipsis of the SQLStatement property to open the “Enter SQL Query” window and Enter the following statement:

Update dest
 Set dest.FirstName = stage.FirstName
  , dest.MiddleName = stage.MiddleName
  , dest.LastName = stage.LastName
 From dbo.FullName dest
   Join dbo.StageUpdates stage
    On stage.BusinessEntityID = dest.BusinessEntityID
Copy the code

This statement joins the StageUpdates table and the Dbo.fullName table, implementing updates from stage to destination.

Perform collection-based updates

Before executing the current SSIS package, let’s reset the tables in the target database to simulate the difference between the source and target tables. As in the beginning, execute the following T-SQL statement on SSMS:

Use AdventureWorks2012
go
Update dbo.FullName
 Set MiddleName = 'Ray'
 Where MiddleName Is NULL
Copy the code

Return the Data Tool and execute the SSIS package. The control flow and Data flow are as follows:

In the Progress TAB, you can see the improvement in execution time. Increased from 4.812 seconds to 0.718 seconds.

Management StageUpdates table

When you’re done, you need to clear the Stage

We still need to manage the StageUpdates table. The current configuration will always keep piling records to be updated into the StageUpdates table.

We need to delete the record from StageUpdates after applying the update to the dbo.FullName target table. Here’s why I don’t do this: If something “bad” happens during execution, then hanging the records (or keeping them) between executions gives me another data point to check for clues.

Therefore, we truncate the table before the data flow task loads it — keeping records in the StageUpdates table between data flow task executions.

Truncate the Stage before loading

Stop SSIS debugging. Then drag another “Execute SQL Task” onto the control flow canvas, connecting the new “Execute SQL Task” green “Priority constraint” to the data flow Task

Double-click the SQL task to open the editor and configure the “General” page as follows:

So you can prepare for the next test.

Press F5 or the “Start debugging” button to execute the SSIS package.

conclusion

Based on the incremental loading of new data, this paper discusses the difference between detection source and target, and applies the update to the target. Both row-based updates and set based updates (which are much faster) are discussed

The next step, implemented, is to delete those rows in the target that have already been deleted from the source.