Now that SQL Server 7.0 has been available for a few months, developers have had a chance to use its many new features. As a result, many questions relating to the use of Data Transformation Services (DTS) are beginning to surface. This month, we address some of the more common questions. Specifically, we examine how DTS deals with error handling, a package's execution location versus storage location, passing messages between tasks, and DTS extensibility. In addition, we will look at another area that is causing confusion: data-driven queries.
DTS Error Handling
The most common questions that we hear relate to error handling within DTS. How does DTS handle errors? What errors does DTS raise? How do you implement error handling? Beyond the built-in error handling of the Package Designer, how else can you handle errors? To begin, let's look at how DTS handles errors.
Within DTS, think of errors as the status of the work being accomplished rather than a definitive right or wrong result. You have the ability to control the execution of a package, task, step, or transformation by monitoring the state or return value of the operation being executed. DTS provides this ability by passing a series of return codes between objects in the DTS object model. For example, while performing a transformation via an ActiveX script, developers can control the processing by setting the return code in the script to any value in the DTSTransformStatus enumeration list, which you see in Table 1. In addition to the DTSTransformStatus enumeration list, there are several other lists, as Table 2 shows. Each list of predefined constants lets you determine whether to continue processing, and if so, in what manner.
At this point, you are probably wondering how you've overlooked these lists in the DTS Package Designer. You haven't. With the exception of the DTSTransformStatus enumeration list, which is available to your ActiveX scripts, the lists are fully exposed only when you write directly to the DTS object model. When you design a package, the Package Designer lets you set up error-handling functionality in four places, each corresponding to a logical control level within the package:
- Set up error handling at the Package level in the DTS Package Properties dialog box on the General tab (which Screen 1 shows). You can specify a file to log runtime errors, whether to terminate processing when DTS encounters the first error, and whether to log the package's completion status to the Windows NT event log.
- Set up error handling at the Transformation level with the Advanced tab in the Data Transformation Properties dialog box (which Screen 2 shows) or with the Options tab in the Data Driven Query Properties dialog box. You establish the maximum error count before DTS stops task execution, and define an exception file to receive any records that fail to process.
- Set up additional error handling at the Transformation level in your ActiveX script. Use DTSTransformStatus return values to control row-by-row data processing.
- Set up error handling at the Task level by manipulating precedence constraints in the Workflow Properties window, as in Screen 3. Precedence constraints control task execution.
These four limited options handle some developer needs, but many application scenarios require more sophistication. The best way to circumvent the error and message-passing limitations of the Package Designer is to use Visual Basic (VB) or a similar programming language to develop a custom package. A custom package provides access to the entire DTS object hierarchy and all of the built-in error-handling functionality. However, if you are limited to the use of the Package Designer, here are some recommendations that might help. Consider using global variables to pass messages between tasks. For example, based on some error condition within your ActiveX transformation script, you can write a custom error message into a global variable. You define global variables on the Global Variables tab of the DTS Package Properties dialog box, as Screen 4 shows. In a subsequent step, you can read that variable, reformat the text, and include the message in a SQL Mail task. Another possibility for error handling is to use the query functionality of the DataDrivenQueryTask to write custom messages to the database based on logic within your transformation. If you are implementing sophisticated error and status processing, you need to be aware of possible performance tradeoffs.
DTS error handling still has a long way to go. However, if you are comfortable with how error handling works and what its limitations are, you are ready to develop creative solutions.
Where packages execute is a point of confusion for many users. Unlike stored procedures, which run within SQL Server, a package is a client application that runs on the machine on which it is initiated. At runtime, either the dtsrun.exe or the dtswiz.exe application executes a package. Dtswiz.exe executes only those packages you create with the Import and Export Wizard. Dtsrun.exe executes all other packages. However, dtsrun.exe or dtswiz.exe execute packages in the same manner. If a package is not in a COM-structured storage file format, dtsrun.exe or dtswiz.exe extracts the package from SQL Server or the SQL Server Repository, converts the package to a COM-structured storage file, and then executes the file.
Where a package executes has two implications. First, and most important, any localized information on which the package relies for execution must exist on the machine where the package executes in production. To make this concept clearer, consider the following example.
A developer wants her package to execute nightly on the production database server. During development, the developer used her local NT workstation to create the package. The package uses ODBC to connect to an Oracle server and invokes several COM components to assist with data validation. After testing completes, the developer uses SQL Agent to schedule the package for nightly execution on the production server. When the developer tests the SQL Agent task, the package fails to execute.
In this scenario, the package executes on the developer's local workstation during development and testing. However, in production, the package executes on the SQL Server production server. In order for the package to successfully execute in production, the developer must first create the appropriate ODBC Data Source Name (DSN) information and register the COM components on the production server.
The second implication is performance. A package usually executes like any other client application. It must create connections to the source and destination data stores and manage the transfer of any data to be processed. To minimize the amount of data sent across the network, you might consider executing packages on the source or the destination server.
In previous articles, we emphasized the extensibility of the DTS object model and how it lets you create custom packages, custom tasks, and custom transformations. However, you need to know which language tools to use for which job.
The exposed interfaces of the DTS object model drive language and tool selection. To understand this relationship, you must first understand what interfaces DTS provides. Microsoft provides two DLL files that let custom applications directly access the DTS packages and data pump. First, the Microsoft DTSPackage Object Library (DTS), which dtspkg.dll implements, exposes a dual interface that lets you use any language that supports OLE automation to create and modify DTS packages. Second, the Microsoft DTSDataPump Scripting Object Library (DTS-Pump), which dtspump.dll implements, provides additional automation interfaces that you can use to write directly to the data pump. In addition to the automation interfaces, DTS provides the dtspkg.h and dtspump.h header files.
Although the DTS and DTSPump interfaces let you create packages and custom tasks, you must use dtspump.h via C or C++ to implement custom transformations. A previous article ("Pump Up the Power of DTS," June 1999) stated that the data pump is an OLE DB provider exposed as an in-process COM server. Because transformations occur within the data pump, custom transformation development is limited to those tools that talk directly to the native OLE DB interfaces. Given that SQL Server processes transformations one row at a time, Microsoft felt that the performance overhead of an automation interface would be overwhelming. Therefore, the company has restricted custom transformation development to the native OLE DB interface included in dtspump.h. In a future article, we will examine how to develop a custom transformation.
How does a DataDrivenQueryTask differ from a data pump task? Put simply, the difference is in how DTS applies data to the destination data store. DTS provides two methods of interaction with the data destination. The first and fastest method involves insert-based data movement. In an insert-based scenario, DTS reads data from the source, transforms the data, and then inserts or bulk copies the transformed data into the destination data store by using INSERT statements or OLE DB's IRowsetFastLoad interface. The second method is data driven query- based. With this method, every source row passing through the data pump causes DTS to apply a query against the destination data store. In this scenario, the query can be any valid SQL code, including UPDATE, DELETE, INSERT, and stored procedure statements.
When designing your application, keep in mind that these two methods are mutually exclusive. Therefore, you need to apply the following rule when choosing between the two: If you are using the insert-based method to move data into the destination data store, use the DataPumpTask to implement your transformation. If you are using the data-driven query method to move data into the destination data store, use the DataDrivenQueryTask to implement your transformation. Keep in mind that the method you use has performance implications; data-driven queries can add significant processing overhead because they operate at a very granular level (database row).
From an implementation perspective, a DataPumpTask differs from a DataDriven- QueryTask. The transformation results from a DataPump Task map directly to the destination data store. The transformation results from a Data- DrivenQueryTask map to specific parameters in a parameterized query, which is, in turn, executed against the destination data store.
To understand how to use a DataDrivenQueryTask, consider the following example. Suppose you need to replicate employee data between a SQL Server database and an Informix database. To accomplish this task:
- Use the source and destination table definitions in Listing 1, page 46, to create a transaction history, or shadow box, on the source system. The shadow table tracks data changes.
- Create a package that implements a DataDriven- QueryTask to move the data. Begin by creating two connections: one to the SQL Server database, and the other to the Informix database. To maintain a transactional context, process only the rows marked with the status of I, or "in process." Therefore, use an Execute SQL task to mark the unprocessed rows as "in process," as Screen 5 shows.
- Add the DataDrivenQueryTask. On the Source tab of the Data Driven Query Properties dialog box, which Screen 6 shows, enter the SQL query that will extract the data from the shadow table.
- Define the target data store. On the Destination tab of the Data Driven Query Properties dialog box, which Screen 7 shows, define the target data store.
- Complete the Queries tab on the Data Driven Query Properties dialog box in Screen 8. This tab's new properties—the Insert Query, the Update Query, the Delete Query, and the User Query—differentiate a DataDrivenQueryTask from a DataPumpTask. Each property contains a SQL statement that might be applied to the destination data store. For each row the task processes, one of these queries will fire. The status result that the ActiveX script returns determines which query to execute. For example, the Queries tab in Screen 8 details the update query that applies changes to any existing rows. The parameterized query performs the update. Although most SQL statements take the form of a parameterized query, any valid SQL statement works, including a stored procedure call. The question marks in the statement serve as placeholders for parameters. You need to manually insert the question marks, even if you use the Query Designer to create the statement. Finally, use the drop-down list in the Destination column to remap the transformation result set to the appropriate parameter.
- Add the source table columns, destination table columns, and ActiveX transformation script used to the Transformations tab of the Data Driven Query Properties dialog box in the same manner as in the DataPumpTask. However, in the DataDrivenQuery- Task you need to consider carefully the value that the function returns. This return value triggers the appropriate parameterized query to execute. The script in Listing 2 sets the function's return value based on the TransType field received from the source data set. If the value is not UPDATE (U), INSERT (I), or DELETE (D), the script skips the row. These predefined constants are all part of the DTSTransform-Status enumerated list in Table 1.
- Add another Execute SQL task, as you see in Screen 9, to mark all rows as successfully processed.
- Establish precedence constraints to ensure the proper task-execution order, and your completed package, which Screen 10 shows, is ready for execution.
We're Watching Out for You
DTS offers many functions that make importing and exporting data less daunting. However, like other first-generation tools, DTS has its share of bugs and features that cause headaches for even the most seasoned developer. In future issues, we will continue to throw light on the development issues within DTS.