This is the 27th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021
Translation reference
Multiple Flexible Source Locations — Level 17 of The Stairway to Integration Services aims to have a comprehensive and clear understanding of SSIS. All contents are carried out on the basis of the original text. Due to various reasons such as version differences and personal doubts, the principle of complete translation is not adopted. Hopefully eventually it will be more conducive to learning and understanding SSIS,
Thanks for your support!
introduce
In this paper, we realized the comprehensive use of SSIS parameters and variables by loading data from multiple files into different topic areas.
Before you begin, please click this link (andyweather.com/data/Weathe…
Data in the compressed file is stored in each MonthYear folder in MMMYY format. The weather data is stored in a subfolder of MonthYear. The MonthYear folders of Dec08 and Feb09 contain a subfolder named TH. The MonthYear folders of Apr09, Jun09, and Aug09 contain two subfolders: TH and WIND.
Each TH folder contains a file named sensor1-all.csv. The files represent temperature and humidity data collected between December 2008 and August 2009. The files are cumulative — the February 2009 file includes all data from December 2008 and records added between December 2008 and February 2009, while the April 2009 file includes all data from December 2008 and records added between December 2008 and April 2009. All three files have records similar to the following:
Date,Time,Min T,Max T,Average T,Min H,Max H,Average H,Comfort zone,Min DP,Max DP,Average DP,Min HI,Max HI,Average HI,Low At 19:00 Batt, 2008-12-25, 8.5, 10.9, 9.71, 32,36,33,2, 6.0, 5.0, 5.71, -, -, - 2008-12-25, 8, 6.3, 8.5, 7.21, 36,40,38,2, 6.0, 5.0, 5.95, -, -, - 2008-12-25, pits, 5.3, 6.7, 6.37, 39,43,40,0, 6.0, 6.0, 6.00, -, -, -Copy the code
Each WIND subfolder contains two files named 1day.csv and all.csv,
We will add a new SSIS package to load data from the all.csv file in each MonthYear\WIND folder.
But first, let’s take a look at the loadWeatherData.dtsx package. I suspect the new data will not work well in the loadWeatherData.dtsx package. We first make sure that the project parameter named SourceFolder points to the correct folder.
I am building my version of the SSIS project in separate folders so I can save the state of the project at the completion of each article. You are probably not doing this, and that’s ok.
Open the Project parameter project.params and make sure the SourceFolder parameter values point to the correct path:
If you’ve been following the walkthrough since the beginning, the “control panel” of the loadWeatherData.dtsx package now looks something like this:
The brown dots in the “DFT Stage Temperature and Humidity” data flow task indicate breakpoints.
An error occurred when loading temperature and humidity data due to WIND file data. Procedure
If you unzip the weatherDatA_DEC08_aug09.zip file into the previous data file (the folder specified by the SourceFolder project parameter), The FOREACH loop container will find the new WIND File and this will cause the DFT Stage Temperature and Humidity data stream task to fail.
Why?” FOREACH Temperature The Foreach loop container uses the SourceFolder project parameter to dynamically set the Directory property of the Foreach File enumerator (which can be found on the Collection — Collection page of the Foreach loop container editor). The File Specification is set to retrieve folders specified in the SourceFolder project parameter and any subfolders, Find the fully qualified file name of a file (“*.csv” file) that satisfies all comma-separated values, as shown in the figure below:
As currently configured, the “FOREACH Temperature File” FOREACH loop container will return files in the WIND subfolder. This causes problems for the “DFT Stage Temperature and Humidity” data flow task because it is configured to load only Temperature and Humidity data.
Let’s see how it breaks and then explain step by step why it breaks.
Execute loadWeatherData.dtsx in the SSIS debugger. When the breakpoint is reached, look in the “local variables” window (” Debug “->” Windows “->” Locals “- in Debug mode) as you did in “SSIS Learning to use TEN: Flexible source locations”. Expand the Variables node and scroll until you can see the value of the User::SourceFileName variable.
Note that when the “FOREACH Temperature File” FOREACH loop encounters the first CSV File in the WIND subfolder, the package fails to execute — the full path of the File is pushed into the User::SourceFileName SSIS variable — as shown:
The file in the WIND file name has only one entry and does not match the layout of the temperature and humidity file. In the loadWeatherData.dtsxssis package, errors are generated by the “FFSrc Temperature and Humidity” flat file source adapter. The error text is “An error occurred while skipping data rows”.
WIND data cannot be read by the “FFSrc Temperature and Humidity” flat file source adapter. Running the SSIS package in the current state will generate an error.
Why loading failed?
When the flat file connection manager is created at design time, the format of the flat file (its schema) is read into the flat file connection manager. This only happens at design time or during development.
Flat file connection managers can be described at runtime using the term Immutable_object (Immutable_object), familiar to object-oriented developers. Declaring an object to be immutable means that it cannot change (in most cases… Allow some property changes, but don’t distract…) .
The flat file connection manager is not the only immutable object in the data flow. Data flow pipelines are also immutable. As SSIS developers, we (typically) interact with data flow pipes using data flow paths to connect data flow objects.
Therefore, there is a question: if another file has the same format (mode), can that file be loaded by the same flat file connection manager and data flow task, just as the connection manager and data flow were designed for that file?
The answer is yes. I can hear you thinking, “Andy, define the equivalents.” Define equivalent:
- The fields must have the same number.
- The data type must match or implicitly enforce the data type defined in the flat file connection manager.
- The data length must be equal to or less than the length defined in flat File Connection Manager.
Keep in mind that unless the data in each column has the same meaning in all flat files, loading another file with the same connection manager and data stream will contaminate the database in a very clever way.
Correct mistakes
You might be thinking, “How do I fix this, Andy?” “This is an excellent question. We had to modify the file specification to exclude WIND comma-separated files and only temperature and humidity comma-separated files.
Only temperature and humidity data is loaded
Stop the debugger if the loadWeatherData.dtsxssis package is still running in the debugger. Open the FOREACH Temperature File FOREACH loop container editor, and navigate to the Collections page. Change the “Files” property from “*.csv” to “sensor1-all.csv”, as shown in 8:
This change configures the “FOREACH Temperature File” FOREACH loop container to return only files named “sensor1-all.csv”, rather than any files with the “CSV” extension. The name of the temperature and humidity file is sensor1-all.csv. The name of other weather data files is different.
Close the editor and re-execute the loadWeatherData.dtsx SSIS package in the SSIS debugger. This package is a success. It also loads data from two new temperature and humidity files into the database.
The following error occurs during execution:
Error: Variable “User::SourceFileName” is already in the write list. A variable can only be added once to the read lock list or write lock list. Error: Variable “(null)” is already in the write list. A variable can only be added once to the read lock list or write lock list.
I was at a loss at first. Look for the “User::SourceFileName” variable. Open the Foreach loop editor. On the “Variable Mapping” page, you will find that User::SourceFileName is specified multiple times. Delete the excess and keep only one. (The reason is unknown)
Error: The variable “User::SourceFileName” is already in the write list. A variable can only be added once to the read lock list or write lock list.
The loadWeatherData.dtsxssis package is a feature that can be used again, but now with the wrong name. Rename it to loadTemperatureData.dtsx
Loading WIND Data
design
Before adding the package to load the WIND data, let’s consider the design. I’m a big believer in right-to-left design — start with the output and work my way back to the input (one or more) because it helps me identify the major steps (or modules) in my design. Starting with the output from the new package, find the target database WeatherData, as shown
I “temporary store” WIND data in a table in the WeatherData database. Moving backwards, the staging is done by the forward-only incremental load data stream task in the new SSIS package, as shown in the figure below.
Further down the line, this data flow task will be in the Foreach cyclic container, just as it is in the LoadTemperatureData.dtsx SSIS package, as shown in the figure.
The structure diagrams in the above 3 diagrams are functionally inaccurate (most block diagrams are functionally inaccurate).
Promote the project connection manager
The loadtemperatureData.dTSx package has a Package Connection Manager for the WeatherData database. SSIS 2012 provides Project Connection Managers when developing packages in the project deployment model. Also, it is very easy to convert the package connection manager into a project connection manager.
To upgrade the package Connection manager to a Project Connection manager, simply right-click the package Connection manager and click Convert to Project Connection, as shown.
After clicking “Convert to Project Connection”, the connection manager is promoted to the Project Connection Manager, whose name is prefixed with “(project)” — “(Project)”
Also, newly promoted project Connection Managers are listed under the “Connection Managers” node in Solution Explorer:
This saves steps in the new SSIS package. Which step? Let’s create the package and have a look!
buildLoadWindData.dtsx
In Solution Explorer, right-click the SSIS Package node, then click New SSIS Package, and rename the new package to loadWindData.dtsx.
The new package will open in the SQL Server Data Tools integrated development environment.
Note that a project connection manager named “(Project) WIN-FR5grQscdpo.weatherData” can be used in the loadWindData.dtsxssis package, and we don’t need to do anything to do this. Why is that? This is because all SSIS packages in a project can use all project connection managers. How cool!
Add a Foreach loop container to the control flow of the loadWindData.dtsxssis package and rename it to “Foreach Wind File”.
Open the editor for Foreach Loop Containers, and navigate to the Collections page. Click the Enumerator Expressions property.
Click on the ellipsis of the value text box for the Expressions properties collection, open the Property expression editor, and select “Directory” from the “Property” drop down list.
Then, click the ellipsis of the “Expression” text box to display the Expression builder, expand the “Variables and Parameters” node, click the $Project::SourceFolder and drag it to the “Expression” text box
If you click the “Evaluate expression” button, you will see the current design-time value of the Project parameter $Project::SourceFolder.
This is another benefit of project-level objects in SSIS 2012’s project deployment model. Similar to the project connection manager, project parameters are available to each package in an SSIS project. Also cool!
Click OK to close the expression generator, then click OK to close the property expression editor and return to the Foreach loop editor. What we want to use this Foreach loop container to retrieve is a file containing multiple days of Wind data. These files are called “all.csv” and can be found in the \WIND subfolder of the MonthYear folder of our weather data files.
In the Files property of the Foreach loop editor, type all.csv and check the Traverse Subfolders check box. As is shown in
We configured the Foreach loop container to search for a file named all.csv in the path (or subfolder) contained in the Project parameter named $Project::SourceFolder. The Foreach loop finds and returns one of these files at a time.
When we find a file named “all.csv”, we need to send the fully qualified path to those files somewhere. Let’s configure an SSIS variable that contains the source file path Foreach iteration of the Foreach loop container.
Click the “Variable Mapping “page, click the “Variable” drop – down box, and then click” New Variable”.
In the “Add Variable” window, make sure the “container” property is set to “package Container” — loadWindData.dtsx
Enter the variable Name in the “Name “(Name) property: SourceFileName.
Click OK to close the Add Variable window and return to the Foreach loop editor.
Click ok to close the Foreach Loop Editor.
Drag a Data flow task from the Control flow SSIS toolbox into the Foreach loop container and rename the Data flow task “DFT Load Wind Data” :
Double-click the “DFT Load Wind Data” Data stream task to open the editor, add a flat file source adapter and rename it “FFSrc Wind Data”.
Double-click “FFSrc Wind Data” flat file source adapter to open editor:
Click the “New” button next to the “Flat File Connection Manager “drop-down list to create a new Flat file connection manager and open its editor. Click the “Browse” button next to the “File Name “property text box to display the open dialog box, change the File type filtering to “*.csv”, navigate to the \data\Apr09\WIND folder and select the “all.csv” File
Change the name of flat file connection manager to “FFCM Wind Data”
Click on the “Columns” page to preview the data in the file.
Click the “OK” button to close the Flat File Connection Manager editor. In the “Columns” page of the “Flat File Source Editor”, the columns returned from the Flat File Connection Manager to the Flat File Source Adapter are displayed.
In the lower-right grid, find the External Columns and Output Columns. External columns are supplied from the flat file connection manager to the flat file source adapter. We can change the names of these columns, but not here, only in flat file connection manager. However, we can change the alias of the column by editing the output column name.
I want to remove Spaces and the “(m s)” text from the column name. You can do this by changing the column names in the Output Columns column, and when you are done, the column table looks like this:
Click OK to close the flat file source editor.
Drag a Lookup Transformation from the SSIS toolkit and connect the Data flow path from the “FFSrc Wind Data” flat file source adapter to the “Lookup Transformation”.
Open the Find editor, and on the General page, change the “Specify how to process rows with no matches” drop-down list to “Redirect rows to no matches output,” as shown.
Click on “Connection” page
Make sure the “OLE DB Connection Manager “drop-down list is set to” win-FR5GRQscdpo. WeatherData”
We will use some of the capabilities of the Find transform to create the target table. Click the New button next to the Use table or View drop-down menu.
As when you create a table from an OLE DB target adapter, the default table name displayed in the Create Table window is the name of the Lookup transform — in this case, Lookup.
Note that the column names reflect the output column names changed to the flat file source adapter (without Spaces and with the “(m s)” text removed), and the table name in the modified statement is “StageWind”, as shown in the figure.
Click OK to create the table and return to find the Transformation Editor.
I’m not sure why this happens, but when following the steps just outlined, the Use Results of an SQL Query option is selected and the query is select * from and the name of the table you just created using the New button.
Good, because I do have to select that option. But let’s edit the query so that it reads:
Select [Date], [Time]
From [dbo].[StageWind]
Copy the code
Remember: our data is created hourly and never updated. If a row with the same date/time combination exists in the target table, it indicates that we have loaded the row previously.
If you click the Preview button, you won’t see any data in the Date and time column returned (we just created the table, which is empty), but you can test the query to see if it’s constructed enough to return an empty data set.
Since our data is never updated, we only need to check for new or existing rows. If the date/time is found in the target table, it is an existing row. Therefore, we only check for matches on the “Date” and “Time” columns.
Click the Columns page and join the [Available Input Columns].[Date] column to the [Available Lookup Columns].[Date] column. [Available Input Columns].[Time] join [Available Lookup Columns].[Time]
We configure lines that do not match date/time to flow out of the “no match output” of the “Find transform”. Simply discard the matching rows — the matching rows flow to the “matching output” of the Lookup Transformation, but we will not connect the “matching output” to the data flow.
Click OK to close the Find Transformation Editor. Drag the “OLE DB target adapter “onto the data stream and rename it” OLEDBDest Stage Wind “.
Connect the data flow path from the “lookup transform” to the “OLEDBDest Stage Wind” OLEDB target adapter. When prompted, from input Output Selection, select Lookup No Match Output from the Lookup transformation, as shown.
Open the editor for the OLEDB target adapter with “OLEDBDest Stage Wind” and make sure the “OLE DB Connection Manager “drop-down list is set to” win-FR5grQscdpo. WeatherData”. And set the drop-down list of “Name of the table or the view” to StageWind as follows:
Click the Mappings page to complete the automatic mapping of Available Input Columns to Available Destination Columns, as shown:
Click ok to complete the OLE DB target adapter configuration. The data flow should look like the figure below.
The test of time
Before executing the loadWindData.dtsx SSIS package in the SSDT-BI debugger, open SSMS, connect to the WeatherData database, and perform the following query:
Select * From dbo.StageWind
Copy the code
The query result should be null.
Execute the loadWindData.dtsx SSIS package in the SSDT-BI debugger, and when done, the data flow should look like the following:
When the package is finished executing but the debugger is still running, click the Progress TAB in SSDT-BI. If you scroll to the node labeled “Task DFT Load Wind Data”, you will see that the Data flow executes (“Start”) three times, as shown in the figure.
If you go back to SSMS and re-execute the test query, you will find the data in the DBO.stageWind table, as shown in the figure.
conclusion
In this article, we use SSIS parameters, variables, and Foreach loop containers to load data from multiple different sources. We reconfigured (and renamed) the original package to load only temperature and humidity data.
The Foreach loop in the new SSIS package (loadWinddata.dtsx) allows us to zero out wind data from three sources, but we can easily load data from 30 sources (either unknown or variable).