When you develop a SQL Server Integration Services (SSIS) package, you can usually use the predefined tasks to perform your data extraction, transformation, and loading (ETL) operations. However, there are some actions you can’t accomplish by using the predefined tasks. For these situations, you can often use the Script task to extend SSIS with custom code that runs within the context of the SSIS package.
You can use the Script task to perform many operations, such as querying Active Directory (AD) or monitoring system performance. You can also use the Script task to interface with and act on files and directories. For example, suppose you’re developing an ETL solution that provides data to a series of reports. The ETL operation will pull data from a set of text files saved to a local directory, but the reports should be based only on data that has been generated within the last 48 hours. So before you extract data from those files, you want to delete any files that haven’t been updated within the past two days. To accomplish this operation, you can use the Script task to determine when the files were modified and delete files outside the acceptable range.
Note that the Script task works as part of your package’s control flow; you can’t use it for data flow. Control flow manages your package’s workflow, or how the package organizes the ETL operation. Data flow, for the most part, performs the actual ETL operations that move and modify data. To extend data flow, you can use the Script component, which is the data-flow counterpart to the Script task. (For information about the Script component, see “Extending the Data Flow with the Script Component” in SQL Server 2005 Books Online—BOL.)
To use the Script task to extend your package, you add the task to the control flow as you would a predefined task. Next, you configure the task and create the script associated with that task. The script then runs as part of the package workflow. This article walks you through these steps in the context of building our sample file-deletion solution. Keep in mind that many of the principles I describe, especially about creating a custom script, can apply to almost any scripting solution.
Note that before you can develop the solution’s control flow, you must first create an SSIS project in SQL Server Business Intelligence Development Studio (BIDS), then create an SSIS package. For information about accessing BIDS and creating an SSIS project and package, see “Business Intelligence Development Studio” and “Integration Services in Business Intelligence Development Studio” in BOL.
Configuring the Control Flow
When working with files, you often have to loop through a directory to access the files one at a time, as with our file-deletion scenario. SSIS makes this process easy by providing the ForEach Loop container, a mechanism for iterating through directories and then passing that information to tasks within the container. Alternatively, you can implement the looping logic within your script, but the ForEach Loop container simplifies this process.
To implement the looping logic, you need to add the ForEach Loop container to the control flow, define any necessary variables, and then configure the container’s properties.
1. Add the container. Add the ForEach Loop container as you would any control-flow task. Before you configure the container, create any variables the container will use. This will save you an extra step because you’ll need to access the Foreach Loop Editor only once.
2. Define variables. Next, define any variables that the container will use. You must first add the container because you should create the variables within the container’s scope, which means the container must exist first. For the file-deletion solution, you need to define only one variable. First, select the container so that it’s highlighted, then you can add the variable to your package at the container’s scope. To add the variable, select the ForEach Loop container, point to the SSIS menu, then click Variables. In the Variables pane, create a string variable named RptFile (in the User namespace) and set the value to an empty string (double quotes). For more information about adding variables, see “Integration Services Variables” in BOL.
3. Configure the container. Double-click the container to open the Foreach Loop Editor, as noted above. At a minimum, you should configure the following properties:
• In the Collection tab’s Folder property, specify the folder where the text files are located.
• In the Collection tab’s Files property, type *.txt in the text box.
• In the variable list on the Variable Mapping tab, add the RptFile variable (listed as User::RptFile), and leave the index set to 0. Each time the container loops through the directory, it assigns the next file name to the variable. You can then use the variable within the Script task to perform operations on the identified file.
Close the Foreach Loop Editor and save your package. For more information about the ForEach Loop container, see “Adding Enumeration to a Control Flow” in BOL.
Configuring the Script Task
After you configure the ForEach Loop container, you can set up the Script task. First, add the task to the container, as Figure 1 shows. Then, double-click the Script task to open the Script Task Editor, and click the Script page, which Figure 2 shows.
The only property you need to configure to support the file-deletion solution is the ReadOnlyVariables property. You should add the following two variables (separated by a comma) to that property:
• StartTime. StartTime is an SSIS system variable that holds the date and time a package execution begins. You’ll use this value to determine the time interval between the file’s last modification and the package’s start time. (For a list of SSIS system variables, see “System Variables” in BOL.)
• RptFile. RptFile is the user-defined variable you created for the ForEach Loop container. The variable holds the name of the current file each time the container iterates through the directory.
When configuring the Script task, you should add any variables that you plan to use within your script. For the file-deletion solution, you need add only these two read-only variables, but you can use as many read-only and read-write variables as necessary. Just be sure to separate the variable names with commas. After you add the variables, you’re ready to create your script.
Creating the Script
Now that you've prepared the control flow, you can create the core of this solution—the script. To create the script, you must access the Visual Studio for Applications (VSA) development environment by clicking the Design Script button on the Script tab of the Script Task Editor, which Figure 2 shows.
The VSA window. When the VSA window opens, it displays the default components and script that the Script task uses. Figure 3 shows the window’s two main panes. The left pane, Project Explorer, provides details about the default project. Notice that the Project Explorer tree includes a References node. Any namespaces you reference within your script must be included under the References node. For the file-deletion solution, you don’t need to add any namespaces.
The right pane provides the scripting environment you use to write your code. By default, the script pane includes the initial script you need to get started. Whenever you access the VSA window through the Script Task Editor, you can view and modify only the script associated with that particular task. If you want to access the script for a different Script task, you must access it through that task.
For most projects, you’ll simply write your script in the right pane and ignore the other elements in the VSA window, unless you need to add a reference. Note that you must write your script in Visual Basic .NET (VB.NET); the VSA environment doesn’t support any other language. However, because you’re using VB.NET, you have access to any .NET component registered in the machine’s Global Assembly Cache (GAC).
The default script. By default, SSIS populates the script pane with initial code. Listing 1 shows the default code in its entirety. As you can see, the script includes several comments (preceded by apostrophes). You should review the comments; however, after you’ve written a few scripts, you’ll probably want to delete the default comments and include only your own.
The actual code in Listing 1 begins at callout A with the Imports statements, which call several namespaces from the .NET Framework system libraries and the SSIS runtime library. By including these statements, you don’t have to use fully qualified names within your script. For example, if you add an Imports statement to call the System.IO namespace, you can reference the FileInfo function within your script without preceding it with System.IO.
Callout B shows the next element in the default script: the ScriptMain class. A class provides a common structure for creating a set of related objects. At runtime, SSIS creates an instance of the ScriptMain class and calls the Main method. You can include other classes in your script and even store class-level variables between method calls if you break your code into multiple methods that you call from the Main method.
The ScriptMain class includes the public subroutine Main, which callout C shows. Sub is the generic name for a method that doesn’t return a result to the calling code as part of its execution. It can—and, in this example, does—set an external variable that identifies its final state. The Main sub is the entry point for this ScriptMain class to execute custom code. The Main sub can perform one or more actions. You should reference any custom subroutines or functions you create from this method.
By default, the Main subroutine defines the Dts.TaskResult property, which callout D shows. The property determines the success or failure of the script and is typically the last statement before you return from the subroutine (by using the Sub keyword). In this case, the property is set to the Success value of the Dts.Results enumeration. An enumeration is an object that contains a finite list of values. Dts.Results includes two values: Success and Failure.
The Custom Script
You’re now ready to add your own file-deletion script to the VSA window. Listing 2 shows the modified script in its entirety. Notice that the script doesn’t include the default comments but does include the other default elements: the Imports statements, the ScriptMain and Main definitions, and the Dts.TaskResult property.
The first script element you need to add is an Imports statement to call the System.IO namespace, as callout A in Listing 2 shows. As I noted earlier, adding this statement simplifies calling objects from that namespace.
You enclose the rest of your script in a Try/Catch statement. The statement consists of a Try block and a Catch block. The Try block contains the logic that performs the script’s actions, and the Catch block processes any errors that the Try block generates. The Try/Catch statement ends with the End Try keywords. (You can learn more about the Try/Catch statement in the MSDN article “How to catch exceptions in Visual Basic 2005 or in Visual Basic .NET” at http://support.microsoft.com/kb/301283.)
The Try block. In the Try block, you define your variables, as callout B shows. You don’t have to define variables within the Try block. If you don’t want to include the variables in the Try/Catch process, you define them before you begin the Try block. For the file-deletion solution, however, you should define the variables within the Try block to catch any errors that the DirectCast or DateDiff functions might raise.
The first variable definition captures the file name, which comes from the RptFile variable you created for the ForEach Loop container. To use this variable, you must specify the Value property of the Dts.Variables collection. Note that you must explicitly convert the variable to a String value, even though you created it as a string, because Option Strict is on by default in the Script task to avoid the possible loss of data during conversions. To convert the variable to a String value, you can use the DirectCast function. You can then assign the returned value to the new variable. In this case, the new variable shares the same name as the SSIS variable, but you can provide any name for the new variable. (Note that the first line of the variable definition ends with an underscore, which shows that the statement continues to the next line.)
After you define the new RptFile variable, you create a FileInfo object to hold details about the file in RptFile. When you call the FileInfo object, you pass RptFile as a parameter. You then assign this information to the FileObj variable, which you can use to retrieve specific information about the file.
Next, you declare and set the FileTime variable. The variable definition uses FileObj to retrieve the LastWriteTime property value for the file. The value is a date/time value that shows the file’s last modified date. Alternatively, you can use the GetLastWriteTime method to retrieve the time the file was last updated. GetLastWriteTime is a shared method available through the File object. By using this method, you don’t have to define both the FileObj variable and the FileTime variable; instead you define only the FileTime variable, as the following statement shows:
Dim FileTime As Date = File.GetLastWriteTime(TptFile)
This approach reduces the number of objects you need to create. However, the sample solution in Listing 2 defines a variable based on the FileInfo class.
The next variable also holds a date/time value. In this case, you want to capture the value from the StartTime SSIS system variable. As with the RptFile variable, you use the Value property of the Dts.Variables collection, except that you convert it to a Date value rather than a String value.
After you define the FileTime and StartTime variables, you can use the DateDiff function to calculate the interval between the two times. You assign the result to the DateHours variable, which is defined as the Long data type (a large integer type).
Now that you’ve set the DateHours variable to the calculated time interval, you can use that variable to determine whether the file should be deleted. You use an If statement, as callout C shows, to delete any files that have a time interval greater than 48 hours. First, specify that the value in the DateHours variable must be greater than 48 (If DateHours > 48). If this condition evaluates to true, the script will delete the file.
To delete the file, you use the Delete method of the File class, which is part of the System.IO namespace. When calling the method, pass the RptFile variable as an argument. As you saw earlier, the RptFile variable contains the name of the file captured by the ForEach Loop container.
If the last-modified date is within the past 48 hours, the script takes no action on the file. Be sure to end the If statement with the End If keywords. For this script, the end of the If statement also ends the Try block.
The Catch block. Now, you can use a Catch block to capture any errors that might be generated when trying to delete a file. When defining the Catch block, assign the exception to the variable e. You can then use this variable within the script’s Catch block.
The first statement in the Catch block (at callout D in Listing 2) fires a FireError event if the code raises an exception. To fire an event, you must use the Dts.Events property’s FireError function to specify details about the error message the script should report to SSIS logging. If an error occurs when the Script task runs, the task raises an OnError event in SSIS, which records the event to the error log if you’ve enabled logging on your package. You can also view the error on the Execution Results tab of Visual Studio (VS) or BIDS. For details about logging SSIS package execution, see “Implementing Logging In Packages” in BOL.
Because the FireError function is more complex than the other functions in the script, let’s take a closer look at it. The function takes five parameters. The first is an integer that identifies the error; use -1 if you don’t have a system for identifying custom errors. The second parameter identifies the event source (Script task), and the third parameter specifies the error message. In this case, the third parameter includes custom text as well as text saved to the e variable when the exception is raised. The fourth parameter can contain a path to a Help file, and the fifth parameter identifies the relevant topic within the Help file. This solution doesn’t use the last two parameters.
Listing 3 shows an example of an XML error log entry that SSIS logging generates when the Catch block fires an event. Notice that the entry includes a number of categories, such as the event name (OnError) and the start time. The start time in this example is the same as that stored in the StartTime system variable. The
The second statement in the Catch block, at callout E in Listing 2, is the Dts.TaskResult property, which is set to the Failure value in the Dts.Results enumeration. As a result, if an exception occurs, the task will fail. If there’s no exception, the script skips the Catch block and runs the last statement, which sets the Dts.TaskResult property to Success.
Testing Your Script
After writing your script, you need to test it. First, copy some text files to the directory specified in your ForEach Loop container. Be sure some files have been modified within the past 48 hours. When you run the script, it should delete only files not modified within the past two days. You can check the script’s error handling by setting one of the nonmodified files to read-only. Now when you run the script, the task should fail because it can’t delete a read-only file.
As you can see, the Script task provides a rich environment for creating custom code within the control flow of an SSIS package. Although the sample solution focuses on deleting files, you can use many of the principles I covered here to build scripts for other purposes. By using the Script task, you can implement dynamic packages that let you interact with text files as well as with a wide range of objects—including JPEG images, network printers, and AD records—and extend SSIS far beyond its built-in capabilities.