DTS Grows Up

New SQL Server 2000 tools and tasks make transformations faster and easier

Since its introduction in SQL Server 7.0, Data Transformation Services (DTS) has helped many SQL Server professionals migrate data from OLE DB-compliant data sources to any compliant destination. But in SQL Server 2000, DTS has really grown up. If you've used DTS in SQL Server 7.0, you'll notice many enhancements in the new release. DTS is full of new features and tasks that help you transform your data and perform other tasks such as FTP and sending messages to queues. Let's look at how Microsoft has enhanced DTS in SQL Server 2000 through improved tasks and tools such as the Dynamic Properties task, the Execute Package task, and the multiphase data pump. You could cut your DTS development time in half after you begin using SQL Server 2000 because the new built-in features alleviate the need for many custom scripts.

Global Variables

Global variables do the same thing in DTS packages that local variables do in stored procedures: Global variables let you reuse a value in a DTS package multiple times. In SQL Server 7.0, you need complex ActiveX scripts to reuse values, but now every task in a package can use variables. Global variables' importance has increased greatly in SQL Server 2000 because you can use them in nearly every task to make your package dynamic. To make a package dynamic, you use a new task called the Dynamic Package task. Other tasks, such as the Execute SQL task, have been enhanced to support global variables as parameters in queries. When a task is working inside a package, that task can't easily communicate with other tasks or packages. With global variables, you can store in memory such values as a server name or a connection string, then reuse those values from various tasks inside your package.

By specifying the /A switch in SQL Server 2000, you can pass global variables into a package when you use the Dtsrun command-line utility to execute a package. You can later use these passed-in global variables to set connection strings in your package. As I show in a moment, this capability gives you a wonderful advantage when you combine it with several Dynamic Properties tasks. The following syntax shows the essential components for passing a global variable into a package:

DTSRUN /S ServerName /U LoginID /N 
PackageName /P Password /A
GlobalVariableName:TypeID=Value

The /A switch requires the global variable's name, its value, and its TypeID, which is the global variable's data type. Table 1, page 34, shows an abbreviated list of common TypeIDs that DTS uses. For a complete list, see the entry for the Dtsrun utility in SQL Server Books Online (BOL).

For example, you might use the following syntax to pass a package a string global variable with the TypeID of 8:

DTSRUN /S "(local)" /N "Dynamic Properties"
	/A  "gvCatalog":"8"="Pubs.dbo
	.DynamicPropertiestask"  /E

(You can use the /E switch to use Windows authentication instead of SQL Server authentication so that you don't have to hard-code a login name and password. Instead, the package will execute with the credentials of your current login.) By passing a global variable into a package in this way, you can change which tables a package loads the data into. You can create a package to load data into multiple tables. You can also use a scripting language such as VBScript or JScript to programmatically pass global variables into a package.

New Tasks

In SQL Server 2000, Microsoft has more than doubled the number of DTS tasks that install by default. The new tasks remove several limitations that exist in SQL Server 7.0. For example, some of these limitations require DTS developers to program complex code inside an ActiveX Script task to execute a package inside a package or to change a package's properties dynamically. In SQL Server 2000, DTS has two new tasks, Dynamic Properties and Execute Package, that can do these things for you automatically.

After you spend weeks developing and stabilizing a DTS package, the last thing you want to do is start the entire process again for another client. Reusability is an important part of DTS in SQL Server 2000. Dynamic packages let you create a package once and then use it over and over again. A sample dynamic load might consist of a DTS package, PackageA, that uses the ActiveX Script task to look in a directory every 10 minutes for a file to upload from the mainframe. After PackageA finds the file, it moves the file to a client directory. Then another ActiveX Script task launches PackageB, which performs the extraction of data to SQL Server or another OLE DB-compliant data source. However, before you can execute PackageB, you must pass to PackageB the global variables that contain the path and filename to extract. In SQL Server 7.0, you would pass these variables entirely through the ActiveX Script task. In SQL Server 2000, the Dynamic Package task takes care of the dynamic pieces of this scenario.

So, why would you use two packages? Extraction processes can grow quite large. Most packages that I develop for handling extraction processes are larger than 1MB, but SQL Server provides no way of caching such large packages when they execute, and they could be slow if executed whole. You don't want a 1MB package slowing down the production server by executing every 10 minutes. Instead, you can execute PackageA (which in most cases would be 20KB to 50KB) and have it execute PackageB only when it finds the file.

After PackageB receives the necessary global variables and executes, several things must occur before the load procedure can begin. The properties of each connection and task must change to match the data that you're loading. In SQL Server 7.0, you have to code ActiveX Script tasks before each connection or task that you want to modify. This process leaves a lot of room for errors, and your testing cycle could last weeks or months as you debug each task. Another problem with using the ActiveX Script task to modify the properties of a connection or task in DTS is that when you need to change some code, you have to modify the code in each ActiveX Script task in every package where the code exists. Again, you have a lot of room for error. Let's look more closely at the new tasks.

Dynamic Properties task. To cut down on coding and thereby minimize errors, Microsoft added the Dynamic Properties task to DTS. With the assistance of this task, you don't have to create bulky ActiveX Script tasks to dynamically set a DTS property, such as a username that you use to establish a connection. This task lets you change the value of any nonidentifying property that is accessible through the DTS object model (e.g., non-name/ID properties of a step, connection, task, package, or global variable). What once took 3 weeks to stabilize, you can now write and stabilize in less than a day. Using the Dynamic Properties task gives you faster performance than writing the same process with an ActiveX Script task because DTS doesn't resolve the ActiveX Script task until runtime.

Execute Package task. The new Execute Package task helps you with the second part of the example. Rather than creating an ActiveX Script task to execute a package from within a package, you can use the Execute Package task to serve the same purpose. When you use this task, you can remove redundant pieces of your package. For example, if you have a standard auditing procedure, you can create a separate child package for the procedure and execute that package from multiple other packages whenever you need it. Then, when you need to update your auditing code, you don't have to update the code in every package. Instead, you can update just the child package. You can also put secure parts of your package, such as payroll transformations, into separate packages that you can protect with a user password.

With the Execute Package task, you can also make a child package join a transaction if one exists. Transactions help you ensure that all your data from multiple tasks is either loaded or rolled back. You can use transactions to help ensure your data's integrity: If one step in a package fails, you can roll back the entire transformation. To enable transactions, go to Workflow Properties by right-clicking any step and selecting Workflow, Workflow Properties. Then, select Join Transaction if Present on the Options tab to tell the step to join a transaction, if one exists. If no transaction exists, DTS creates a new one. You also need to ensure that the Use Transactions option is selected on the Advanced tab in DTS Package Properties. (This option is enabled by default.)

Inside the Execute Package task, you can easily pass global variables to the child package, as Figure 1 shows. If you click the Inner Package Global Variables tab in the Execute Package task, DTS scans the child package for any global variables that it expects, and you can set them to your own values. Inner global variables let you set the global variables for the child package from the Execute Package task. Inner global variables are useful when you're calling auditing packages and you want to pass to the child package certain errors based on events in the parent package. Outer global variables send global variables from the parent package to the child package. If a global variable exists in the child package, DTS updates the child's global variable to the parent's value. If the global variable doesn't exist in the child package, DTS creates it.

Message Queue task. The Message Queue task is an elegant way of passing string messages, files, or global variables between packages or other programs. The Message Queue task works with the Microsoft Message Queue Service (MSMQ) that ships with Windows 2000 or the MSMQ that installs on Windows NT 4.0 as a part of the Microsoft Windows NT 4.0 Option Pack. This task can send messages to a queue for another package or program to pick up and process later, letting programs that don't support OLE DB interface with DTS. Through the Message Queue task, you can also have several packages that are running in parallel on multiple servers check in to a queue upon completion. Meanwhile, the parent package waits for the child packages to complete before it executes a cleanup script. This process is useful when you want to scale-out a data-load procedure in which you're loading millions of records.

File Transfer Protocol task. The File Transfer Protocol task lets you receive files from an FTP site or another Universal Naming Convention (UNC) path. With SQL Server 7.0, you have to use a line-command FTP application or develop your own file-copying or FTP component to perform the same task. A drawback of the File Transfer Protocol task, however, is that it can't send files to a remote FTP site or directory. The File Transfer Protocol task can only receive files.

Microsoft added five other minor tasks to SQL Server 2000 that work behind the scenes automatically with the Copy Database Wizard but that you can also incorporate into your packages. With the Transfer Databases, Transfer Jobs, Transfer Logins, Transfer Error Messages, and Transfer Master Stored Procedures tasks, you can transfer objects stored in the Master database from any SQL Server 2000 or 7.0 instance to another SQL Server 2000 instance.

Improved Tasks

SQL Server 7.0 DTS has only one functional phase, Row Transform, when it transforms data. In SQL Server 2000, Microsoft revamped the DTS Data Pump task, so transformations can have as many as six unique phases. By using the multiphase data pump, which is a component of the Data Pump task, you can add error checking to your package or load data around constraints. You can also monitor the number of successful transformations and restart the transformation if an error occurs. The DTS Designer's graphical interface to the multiphase data pump is disabled by default for usability purposes. After you enable the feature, your Data Pump task might look slightly more complex. To make this option available, right-click Data Transformation Services and select Properties. In the Properties screen, select Show multi-phase pump in DTS designer.

Figure 2 compares SQL Server 7.0's transformation phase with SQL Server 2000's multiple phases. The phases in the multiphase data pump are

Pre Source—DTS executes functions in this phase before it fetches the first record from the source. You can use functions in this phase to create objects that you use in later phases. For example, you can use this phase to create an ADO object that you'll use to write auditing information to a table.
Row Transform—This phase, which is the only phase in SQL Server 7.0, transforms the data.
Post Row Transform—For each row that DTS transforms, it executes the Post Row Transform phase. This phase has two subphases: Insert Success and Insert Failure.
On Batch Complete—By default, each data pump has only one batch. You can change the batch size in the Transform Data task's Options tab. This phase is the perfect place for reporting the status of a large data load.
Post Source Data—Unlike the On Pump Complete phase, the Post Source Data phase can access data. For example, you can use this phase to write footer rows to the data you created in the Pre Source phase.
On Pump Complete—After DTS has transformed all rows, the On Pump Complete functions execute. This phase can't access the data, but it's ideal for freeing up memory that was used during previous phases.

The Data Pump task's Transformation Editor is much easier to use and more functional in SQL Server 2000. SQL Server 2000's DTS has built-in COM components to perform some of the common DTS transformations. For example, one new component can transform the data to uppercase or trim any trailing spaces. Also in the Data Pump task, you can output any failed transformations to exception files. You can configure exception files on the Data Pump task's Options tab.

You can also read and write to global variables easily from several tasks in SQL Server 2000. For example, with the Execute SQL task, you can dynamically load the orders for a particular client ID for any given date by using a custom stored procedure and a question mark to represent the variables:

EXEC INS_LoadOrders @ClientID = ?, @DateLoad = ?

In addition, you can output the results of an Execute SQL task into a global variable. You can store either a specific record or the entire rowset in a global variable for later use. You can also use similar input variables in Data Pump tasks. For example, you can use the Data Pump task to select from the source all records that meet certain criteria, such as records that contain a client ID.

Tool and Engine Improvements

SQL Server 2000 adds several benefits to the DTS tools that help you develop and debug packages. These benefits include the ability to cache packages when you open them in DTS Designer, enhanced logging, and the ability to save the packages as Visual Basic (VB) files. In SQL Server 2000, DTS can cache your packages in memory at design time if you're running Win2K. DTS in SQL Server 7.0 is extremely slow at opening packages of any significant size (generally more than 1MB). This slowdown happens because when you open the package, DTS must search your registry to determine whether any new OLE DB provider, task, or scripting language has been added. After you turn on caching, your packages open much more quickly in DTS Designer because DTS skips the registry search step in Win2K.

You can turn on caching by right-clicking Data Transformation Services in Enterprise Manager, selecting Properties, then clicking Turn On Cache. After adding a new custom task, provider, or scripting language, you need to refresh your cache so that DTS can scan it.

On this same Properties screen, you can turn on just-in-time debugging, which lets you use debugging tools to step through your ActiveX Script task if you have any errors or if a forced stop occurs. This option uses the Microsoft Script Debugger, which ships with NT 4.0 Option Pack, Visual InterDev 6.0, and Win2K. You can also download the Script Debugger at http://msdn.microsoft.com/scripting. Keep in mind that the options you select on the Properties screen are in effect only for the client that is executing or designing the package.

As you select connection options, DTS usually verifies each connection and database to make sure the database exists. Disconnected Edit is a new DTS Designer mode that lets you design a package while you're working disconnected from your network. With this mode, DTS won't verify each option. (Because DTS performs the verification for your protection, I recommend that only advanced users use this mode.) You can access this mode under the Package menu in DTS Designer. As Figure 3 shows, Disconnected Edit lets you access any DTS property, including such identifying properties as task names.

In SQL Server 2000 DTS, Microsoft has greatly enhanced logging capabilities. Whereas SQL Server 7.0 can log only at the package level, with SQL Server 2000, you can log the success or failure of an individual step or package. You can turn on logging inside each package on the Package Properties screen's Logging tab. You can log packages to the local SQL Server instance or consolidate the logs to a central SQL Server instance. You can also write logs to a file. This new feature is useful when you're executing a package from within a package because it accurately logs the events that occur in the child packages.

As you debug a package, you'll often need to execute individual steps in the package. In SQL Server 2000 DTS, you can do this by right-clicking any step and selecting Execute Step. In SQL Server 7.0, you can't execute an individual step through DTS Designer. Instead, you have to disable every step except the one you want to execute, then run the entire package.

The fastest way to learn the DTS object model is by saving a package as a VB file. For SQL Server 7.0, a tool was available on Microsoft's Web site to convert a package to a VB file. Microsoft has incorporated this functionality into DTS in SQL Server 2000 as an additional way to save your packages in DTS Designer. After you save the code in a VB file, you can copy and paste it into your VB program or view it in almost any viewer (e.g., Notepad). After you save a package in this format, however, reverse-engineering it into a graphical format is difficult. If you want to save a package in a VB file, always keep a copy of the package in another format for editing.

SQL Server 7.0 users can use the command-line utility Dtsrun for executing packages. However, the commands can be lengthy and the switches complex. In SQL Server 2000, Microsoft added a new tool called dtsrunui.exe, which installs by default in the \Program Files\Microsoft SQL Server\80\Tools\Binn directory. You can execute the tool either from a command line or from Windows Explorer. Dtsrunui.exe lets you quickly generate commands, encrypt commands, or simply execute a package through the GUI.

Where Do You Go from Here?

Microsoft has revamped DTS in SQL Server 2000 to give you added performance, usability, and customization abilities. Do you have a problem that you can't solve through one of the built-in DTS tasks? Why not build your own? In SQL Server 2000, as in SQL Server 7.0, Microsoft implemented all the built-in DTS tasks as custom tasks, so you can call and remove each built-in task just as you would a task that you create from scratch.

Comments

Plain text