Unleash the Power of DTS

Downloads
5248.zip

How to use DTS import, export, and data- manipulation capabilities

Data Transformation Services (DTS) is a SQL Server 7.0 utility service that provides import, export, and data-manipulation capabilities between OLE DB, ODBC, and ASCII data stores. Like many SQL Server tools, the DTS Package Designer and wizards are nothing more than wrappers that graphically expose an object model. Although these tools offer an extensive array of options that you can use to develop elaborate transformations, they don't expose the complete functionality and power of the DTS object model. To unleash this power, you must interact with DTS via its native COM interfaces. This article examines the DTS object model and illustrates it with a Visual Basic 6.0 (VB6) application that uses the DTS COM interfaces to perform its data transformations.

The Object Model

The DTS object model offers a complete set of components that let you move data among multiple data stores. Figure 1, page 44, provides a summary of the major collections that compose the Package object model: Connections, GlobalVariables, Steps, and Tasks. All these collections play equally important roles in supporting a Package.

The Connections collection contains all the OLE DB data provider information in a Package. Each Connection object describes the data provider. With connection pooling, you can then reuse these connections across multiple Steps or Tasks, so that you can do more work with fewer resources. Connection pooling is a provider-supplied function that enables an application to reuse an existing connection rather than physically establishing a new connection for each use.

The GlobalVariables collection provides an internal storage area where you can store data and share it across different Steps within a Package. The GlobalVariable object provides a variant data type for storage. Each Package object contains a collection of GlobalVariable objects that you can dynamically add or remove during execution.

The Steps collection contains all the Task workflow information in a Package. You can associate each Step object with a collection of Task objects; however, if you want to execute a Task, it must have at least one Step. Unlike the DTS Package Designer, which implicitly creates any needed Steps, applications using the DTS COM interfaces must explicitly define the Steps.

Whether a Step object executes depends on its PrecedenceConstraint objects; a Step can execute in parallel with other Steps, serially to them, or conditionally based on their execution. A PrecedenceConstraint defines the conditions that prior Steps must meet before the current Step can execute. Thus, a Step with more than one PrecedenceConstraint cannot execute until it satisfies all its PrecedenceConstraints. The three types of PrecedenceConstraints are

Completion: This Step executes regardless of the previous Step's success or failure.
Success: This Step executes only after the previous Step's successful completion.
Failure: This Step executes only when the prior Step fails.

The Tasks collection, as Figure 2 shows, contains all the defined Tasks in a Package. Each Task object contains information describing a unit of work that the Package object must perform as part of the Transformation process. The CustomTask object is necessary to implement any Task. Using the CustomTask, you can extend DTS by creating Tasks to supplement the built-in Tasks that ship with SQL Server. CustomTasks can include their own customized user interface and dialogs, which you can add to the DTS Designer. The DTS interfaces also let you use CustomTasks directly in COM applications.

Table 1, page 45, lists the eight built-in Tasks that ship with DTS. This article focuses on the three Tasks most directly related to performing Transformations: the DataPumpTask, Data-DrivenQueryTask, and ExecuteSQLTask.

Transformations

The DataPumpTask object defines the information necessary to create and execute an instance of the DTS data pump, which transforms data as it is moved from the source to the destination data store. The data pump, which Figure 3 shows, is a multithreaded, high-speed, in-process COM server that moves and transforms rowsets between a source and a destination data store. ActiveX scripts can perform the Transformations as the data pump moves the data. If your Transformation is too complicated for the ActiveX scripting language, you can develop CustomTransformations with any COM-compliant language. Note that DTS applies Transformations row by row as the data passes through the pump. Therefore, when designing CustomTransformations, you need to evaluate and minimize any performance hits.

The DataDrivenQueryTask object, which Figure 4 shows, defines the information necessary to move data between the source and destination data stores via data-driven queries. The DataDrivenQueryTask differs from the DataPumpTask in how the Task applies the data to the destination data store.

DTS provides two methods of interaction with the destination data store. The first and fastest method uses insert-based data movement: The Task reads the data from the source, transforms it, and then uses INSERT statements or OLE DB's IRowsetFastLoad interface to insert the data or bulk copy it to the destination data store. The second method is data-driven-query-based: Every source row passing through the data pump applies a query against the destination data store. The query can be any valid SQL code including UPDATE, DELETE, and stored procedure statements.

When designing your application, keep in mind that these two methods are mutually exclusive; therefore, apply the following rule when choosing between the two. If moving data into the destination data store is an insert-based scenario—always a complete refresh or add—implement the Transformation using DataPumpTask. If moving data into the destination data store is a query-based scenario—requiring UPDATE, DELETE, or stored procedures—implement the Transformation with DataDrivenQueryTask. Unless you implement DTS in complete refresh scenarios, most of your development work will use DataDrivenQueryTask.

Both DataPumpTask and DataDrivenQueryTask support the Lookup object, which lets you perform queries that use one or more named parameterized queries. The Lookup object lets a Transformation retrieve data from locations other than the immediate source or destination row being transformed. For example, you might use a Lookup if your source and destination data stores are SQL Server databases, but one of the values needed at the destination resides in an alternative location (e.g., Oracle, DB2, Excel). For each source row, the Lookup performs a query against the third-party location to retrieve the needed value.

The Interfaces

Microsoft provides two .DLL files that custom applications use to directly access DTS Packages and the data pump. First, the Microsoft DTSPackage Object Library (DTSPkg), implemented through DTSPKG.DLL, exposes an OLE Automation interface that you can use with any language that supports OLE Automation to create and modify DTS Packages. The second interface, the Microsoft DTSDataPump Scripting Object Library (DTSPump), which DTSPUMP.DLL implements, provides additional OLE Automation interfaces that developers can use to write directly to the data pump. The ActiveXScriptTask object uses the DTSPump interface. In addition, C and C++ developers can reference the header file DTSPUMP.H to directly interface with the data pump.

Although both interfaces—DTSPkg and DTSPump—let you create Packages and CustomTasks, you can implement Custom Transformations via C or C++ only by using DTSPUMP.H. Because DTS processes Transformations one row at a time, Microsoft felt the performance overhead of an OLE Automation interface would be too overwhelming. Therefore, Microsoft restricted Custom Transformations development to the native OLE DB interface that DTSPUMP.H includes.

Using the Object Mode

Let's return to the data mart population scenario, which we discussed in the April issue and which used the DTS Package Designer. This article implements the solution with VB6 to illustrate Connections, Steps, Constraints, and DataPumpTask, DataDrivenQueryTask, and ExecuteSQLTask. Here's the scenario: A large corporation needs to combine the employee data from two subsidiary companies into one data mart and refresh that data weekly. To simulate the subsidiary companies, the example uses the Employee tables from two SQL Server sample databases, Northwind and Pubs (for more information about SQL Server's sample databases, see Michael Otey's SQL Seven column, page 80). A new database called HRMart represents the parent company's data mart. Listing 1 details the HRMart schema.

Setting Up the Environment

To begin, you must create a new VB project and add a reference to the DTS Package object model (we assume that you know VB6). Create a new Standard EXE project called HRMartExample. After the project exists, add the object reference. The primary OLE Automation interface in this ex-ample is the Micro-soft DTSPackage Ob- ject Library, which Screen 1 shows.

To enhance the readability of the VB code by using all available DTS enumeration constants, also add the Microsoft DTSDataPump Scripting Object Library. The HRMartExample application doesn't have a user interface, so remove the default form from the project and replace it with a new module. Let's call the module HRMartPackage; it will contain all your code.

Listing 2, page 48, details the module's declarations and main subroutine. Starting with the declarations, use the moPackage variable to reference the Package throughout the module. Next, establish an enumeration of the Connection IDs to use throughout the module, EConnectionIds. The ID uniquely identifies each Connection. Using the enumeration lets you enforce the uniqueness and still provide a meaningful name for referencing the Connections. The final declarations serve as placeholders for SQL Server logon information. Customize these constants to match your SQL Server environment.

In addition, Listing 2 shows the Main subroutine, the shell where you assemble and execute the Package. To begin assembling the Package, create and initialize a new Package object.

Package Creation and Initialization

To start, create, and initialize a Package object by calling the CreatePackage subroutine, which Listing 3, page 49, shows, and detail the Package. Then, assign it a name—in this case HRMartRefresh. The Package lets you log its completion status in the NT Event Log via the WriteCompletionStatusTo-NTEventLog property. You can adjust and adapt the Package to a specific execution environment by tuning the Package-PriorityClass and MaxConcurrentSteps properties. The PackagePriorityClass property establishes the Package's Win32 process priority. You can use the DTSPriorityClass enumeration constants to set this priority.

The MaxConcurrentSteps property controls the number of Steps that the Package can execute concurrently. As you tune this value, remember that each Step runs in its own thread, so setting this value larger than the number of CPUs in the computer won't improve performance.

The LineageOptions and RepositoryMetadataOptions properties control Package interaction with the Microsoft Repository. The LineageOptions property tells the Package how and whether to use the Repository; the DTSLineage enumeration constants control the property's values. The RepositoryMetadataOptions property controls how a Package is saved to the Repository; the DTSReposMetadataOptions enumeration constants control the property's values.

Another function of the Package object is to handle the transaction model; the UseTransaction property controls transaction usage. You control a Package's TransactionIsolationLevel by setting that property to one of the enumeration constants contained in DTSIsoLevel; its default value is Read-Committed. Last, you use the AutoCommitTransaction property to define transactional control. This property tells the Package whether to implicitly or explicitly commit an active transaction upon completing execution. Next, you can create the Connections to the various data stores.

Adding Connections

This example needs three Connections, one each to represent the parent and two subsidiary companies. The Con- nections are named PubsData, NorthwindData, and HRMart-Data and have ConnectionIds of 1, 2, and 3, respectively. To create the new Connections and add them to the Package, the Main subroutine in Listing 2 calls the AddConnection subroutine in Listing 4, page 49. The Connection object gives the Package the OLE DB service provider's definition. The Connections collection contains all the Connection objects accessible to the Package. To create a new Connection object, supply the class name of the OLE DB provider. Then, assign the ID property a unique numeric value.

The Connection object also contains several other attributes. The Reusable property determines whether multiple Steps can share the Connection. The ConnectImmediate property controls the Connection initiation. When the value is true, the Connection to the data store occurs when the Package starts running; when the value is false, the Connection doesn't start until a Step referencing it executes.

Other Connection properties describe the data store's environment information. With the Microsoft OLE DB property for SQL Server, the DataSource property identifies the server that SQL Server is on. For other providers, this value may differ. For example, for Oracle or Informix providers, this property may refer to the instance name. The UserID, Password, and UseTrustedConnection properties provide Connection security information.

The last property to implement is the Catalog, which identifies the database that the Connection initially occurs in. The ConnectionProperties specify and initialize additional attributes that are identifiable to the OLE DB provider. The example relies on the default settings of the SQL Server provider.

After you initialize the Connection object, add it to the Package's Connections collection. After you've added all three Connections, create the Tasks and Transformations the Package will perform. The first Task to add is an ExecuteSQLTask object to delete the HRMart..Employee data.

Adding an ExecuteSQLTask

An ExecuteSQLTask object is a specialized CustomTask object that lets a Package execute a SQL statement on a given Connection. To create a new Task, execute the New method of the Tasks collection and pass it the class name of the CustomTask to create. The purpose of this task is to clear the employee file before performing the transformation. To add the ExecuteSQLTask to the Package, call the Task-HRMartPurge subroutine, which Listing 5 shows.

First, create a new ExecuteSQLTask by calling the Tasks.New method and passing it the class name, DTSExecuteSQLTask. Next, set the properties; the Task's name is HRMartPurgeTask. The SQLStatement property contains the SQL statement for the Task to perform. In this case, the SQL statement is DELETE FROM Employee. A Task needs a Connection in order to execute; this Task uses the HRMartData Connection you defined using the enumeration constant EConnectionIds. Next, set the timeout value to establish how long the Task will wait for the SQL statement to complete execution.

Finally, using the Tasks collection's Add method, add ExecuteSQLTask to the Package. The next Task to add is the DataPumpTask, which transforms the Pubs..Employee data and inserts it into HRMart..Employee table.

In the next issue, we'll continue discussing the object model and complete the assignment by implementing the DataPump-Task, the DataDrivenQueryTask, Steps, and Constraints.

Comments

Plain text