Making Package Magic

Downloads
43805.zip

Microsoft has completely rewritten Data Transformation Services (DTS) in SQL Server 2005 for many reasons. In addition to streamlining DTS's functionality and making it easier to use, the rewrite improves scalability and performance. In "What's New In DTS?" (May 2004, InstantDoc ID 42141), I gave an overview of DTS's highlights, including the design goals of the rewrite and the new features of the DTS Designer. With the July release of a broad private beta and with the SQL Server 2005 public beta drawing close, it's time to take a closer look at some specific DTS enhancements. Many SQL Server 2000 DTS users work primarily or only with the Import/Export Wizard, so let's start the closer inspection of DTS 2005 with this wizard. Although the Microsoft DTS development team I work on has improved and expanded the entire DTS 2005 toolset, the Import/Export Wizard is still a great way to quickly build packages for moving data. The rewritten Import/Export Wizard includes two notable changes. First, the Import/Export Wizard no longer supports copying objects between SQL Server databases. To copy objects, you use the Transfer Database Wizard, which you launch from SQL Server Management Studio, the new SQL Server 2005 management suite that replaces Enterprise Manager, Query Analyzer, and other utilities. Second, the rewritten wizard no longer supports building script transformations.

Why did the team make these changes? The simple answer is that we needed to refocus and simplify the wizard so that it's only a data-movement utility—not an extraction, transformation, and loading (ETL) tool. We learned that having the copy-objects functionality in two places—in the DTS Import/Export Wizard and the Copy Database Wizard—sometimes confused users. To eliminate this duplicate functionality, we consolidated the copy functions into the Copy Database Wizard. The change should clarify which tools to use for specific tasks. For quickly building packages to move data from one place to another, use the Import/Export Wizard. For moving database objects—from whole databases to bits and pieces of a database—use the Transfer Database Wizard. And to create packages that perform simple to complex transformations on data at high throughput rates, use the DTS Designer.

Although we simplified the Import/Export Wizard, it's still a fairly complex and flexible tool. Depending on the settings you choose and the options you select, you can take many different paths through the wizard, so I can't cover every wizard setting here. Therefore, as I walk you through the creation of a simple data-import package in the enhanced Import/Export Wizard, I highlight features that are different from DTS 2000. In a future article, I'll explain what the wizard is doing behind the UI by showing and editing the resulting package in the DTS Designer.

Getting Started

If you're an Enterprise, Universal, or Professional Microsoft Developer Network (MSDN) subscriber, you can download the DTS 2005 Import/Export Wizard at http://msdn.microsoft.com/subscriptions/downloads. To simulate a typical data-import scenario for this article, I generated a flat file containing customer-address data from the SQL Server 2005 AdventureWorks sample database. I exported customer addresses by using the T-SQL query that Web Listing 1 shows to combine some of the AdventureWorks tables into a flat file called AWCustomers.txt. (You can download the listing at InstantDoc ID 43805.) I've purposely kept the data simple so that we can focus on the tools. This example simulates a typical data load from a flat file into a SQL Server table.

You can launch the Import/Export Wizard from four different locations. At the command prompt, you can type DTSWizard.exe to launch the wizard. In the Business Intelligence Workbench—the new DTS development environment that you can launch from the SQL Server Start menu—you can launch the wizard from the DTS Packages project node. As in SQL Server 2000, SQL Server 2005 setup adds a DTS Wizard menu item to the SQL Server menu tree in the Start menu, so you can also launch from there. And in SQL Server Management Studio, you can launch the wizard from any database node. The first thing you'll see when you launch the wizard is the welcome screen. The first time you run the wizard, select the Do not show this starting page again check box and rid yourself of this extra step on subsequent runs.

Configuring the Data Source

Figure 1 shows the next window, the Data Source dialog box, where I specify the source data. The drop-down box at the top of the window is for choosing the data-source provider type. This drop-down box also hides a subtle difference in how DTS now handles flat files. In SQL Server 2000, DTS uses a text OLE DB driver to access flat files. DTS 2005 introduces a new pipeline component called an adapter. (To learn about the pipeline—aka the Data Flow Task—see "What's New in DTS?") Source adapters are the pipeline components that interact with source systems and files. So, this drop-down box actually shows available providers and adapters together. For this example, I want to import a flat file to SQL Server, so I selected the Flat File Data Source. In the File name box, I specified the AWCustomers.txt file, which you can download at InstantDoc ID 43805.

After you select the provider or adapter, the dialog box controls you see will be based on your selection. Notice also that the flat-file format I've specified is Ragged Right, which tells the flat-file adapter how it should parse the source file. Ragged-right support is new in DTS 2005 and is the same as fixed-width format, with one exception: Each row is delimited with a CR/LF combination. The other formats available are fixed-width and delimited. Fixed-width format means that the data is aligned into columns whose width is the same for all rows. Delimited data means that the columns have some delimiter such as a semicolon, tab, or comma; comma-separated value (CSV) files are an example of delimited data.

Because the example file includes two header rows, one containing the column names and the other containing dashes, I could set the Header rows to skip option to 2 and ignore the first two column-header rows. But I selected the Column names in the first data row check box instead. This selection isn't necessary, but it lets the adapter pick up the column names, which might be useful later if I want to edit the package in the DTS Designer. In addition, working with real column names rather than default names such as Column1 and Column2 is always nice. In a moment, I explain how to handle the second row of dashes in the sample data.

The Locale drop-down box lets you select the locale the source connection and adapter will use. The locale setting affects the way DTS converts the data from strings. For example, DTS converts date strings to date representations that are locale-specific. So DTS would convert the date Monday, May 17, 2004 to the short-date format 17/5/04 in the English/New Zealand locale but to 04/5/17 in the English/South Africa locale. By default, the wizard selects the locale of the machine on which the wizard is running. If you change the locale, the wizard tries to make life a little easier and selects the default code page for that locale. The code page isn't bound by the locale setting, however, and you can select any code page. As Figure 1 shows, I selected the Unicode setting, so the Code page drop-down box is disabled.

You can use the Header row delimiter setting to work with flat-file formats that differentiate between header rows and data rows. These file types use one character to delimit the header row and another to delimit data rows. The reasons to differentiate between header and data rows are varied; for example, some legacy bulk-insert systems skip a row if the header isn't delimited differently from the data rows. If your flat file is formatted this way, the new wizard supports it.

The Text qualifier box is useful when the source file is delimited. For example, if your data source is a comma-separated file and you specify a double quote (") as a text qualifier, the wizard will ignore commas in text inside double quotes. So in the string "15, rue Descartes", the wizard won't interpret the comma as a delimiter. Text qualifiers are typically double or single quotes but can be user-defined as well.

Once you set the source file and format, you can set up the column widths. Click the Columns node in the tree in the left pane of Figure 1's window to open the Columns dialog box that Figure 2 shows. Because I selected ragged-right format, I need to define the width of each column. If the source file were delimited, the wizard would automatically detect the column widths for you. Clicking the ruler control creates a marker that lets you choose where the columns start. You can remove the column marker by double-clicking the marker or right-clicking it and selecting Remove from the resulting context menu.

The next step is to specify the column properties. After you set column widths, click the Column Properties node in the tree in the left pane to open the dialog box that Figure 3 shows. Here, you can set type-specific properties to ensure each column is converted correctly. For example, if a column contains dates, you'd specify the data type DT_DATE for that column. The wizard automatically adds column definitions based on the width settings that Figure 2 shows. However, if for some reason you need to modify any column property manually, you can use the New and Delete buttons to do so.

At the bottom of the Column Properties dialog box is the Suggest Types button. Clicking it brings up the Suggest Column Type dialog box. This feature does some smart processing to determine the type of each column in the source flat file. You should use this feature to automatically assign column types because it's simple, fast, and usually correct. Assigning the appropriate column types makes it easier to modify the resulting package and optimizes the resulting data flow by assigning the smallest appropriate types to the rows.

After setting the column properties, you're ready to look at the data preview that Figure 4, page 19, shows. For our example data, everything looks accurate—except that the one row with dashes in it below the column names is gone. I told the wizard to skip that row by setting Data rows to skip to 1. Other than the rows-to-skip setting, this final Choose a Data Source window is mostly designed to let you verify that all the settings are correct. This window, new in DTS 2005, lets you scroll through all the columns and look for problems such as columns that are too narrow or invalid rows.

Configuring the Destination

When you've looked at the Data Preview window and you're satisfied that all the settings are right, you're ready to set up the destination file. For this example, I created a database called Addresses and set the wizard to load the data into that database. The wizard lets you create a new database by clicking New in the Choose a Destination dialog box that Figure 5 shows. I typed in the name Addresses in the resulting Create New Database dialog box, accepted the rest of the default settings, and closed the dialog box by clicking OK. The Addresses database is now selected in the Database field of the Choose a Destination dialog box. So in this step, I created a database and the destination connection.

After you set up the destination connection, you need to specify which table to load into and how to map the columns from the source flat file to the destination table. Clicking Next in the Choose a Destination dialog box brings up the Select Source Tables and Views dialog box that Figure 6 shows. This figure shows only one table because I'm importing from a flat file. If I were importing from a database, I'd likely have more than one table from which to choose. I selected the AWCustomers.txt file because that's my source data file. I also need to define the column mappings to tell the wizard where in the destination table to put each flat-file column. I do that by clicking Edit..., which brings up the Column Mappings dialog box that Figure 7 shows.

The Column Mappings dialog box is where you map the source columns to the destination columns. You can remove columns by clicking the destination for a column and selecting <ignore> from the drop-down box that pops up. Note that I selected the Nullable box for the EMail Address and Address Line 2 columns. Selecting Nullable tells the wizard to let these columns be NULL when it's creating the destination table.

Because I'm loading data to the destination table for the first time, I need to create the table. Clicking Edit SQL in the Column Mappings dialog box opens the Create Table SQL Statement dialog box, which shows the script DTS has automatically generated to create the destination table. You'll notice that in the generated script, there's no NOT NULL constraint for EMail Address and Address Line 2 because I selected Nullable in Figure 7. If I need to modify the script in any way, I can do so here. Clicking OK in the Create Table SQL Statement dialog box and again in the Column Mappings dialog box takes you back to the wizard's Select Source Tables and Views screen, which Figure 6 shows.

Saving and Running the Package

To save the new DTS package you've created, click Next to open the Save or Schedule Package dialog box. This is where you save the package either to SQL Server or to an XML file by using the DTSX filename extension. Saving to XML is new in DTS 2005. XML files are easier to read and modify than packages saved in structured storage. Also, more tools are available to support processing XML files. So Microsoft decided to save packages to XML and phase out support for structured storage files.

You also have the option to schedule a package to be executed. In SQL Server 2005 Beta 2, this option is disabled, and the DTS development team still hasn't decided how this feature will work, so this dialog box will likely change. Note that two save options that were available in the DTS 2000 Import/Export Wizard are missing. DTS no longer supports saving to Metadata Services because Microsoft has phased it out. And saving to script isn't currently supported in Beta 2 but might be supported by the time DTS 2005 ships.

Depending on which save option you chose—SQL Server or XML file—the next dialog box lets you enter a SQL Server or XML file name to save to. SQL Server will save the sample package, for example, in a new table in the sysdtspackages90 MSDB database. If you choose to save to XML, you get a different dialog box than you get if you choose to save to SQL Server.

After you give the wizard all the information it needs to save the package, you'll see a summary in the Complete the Wizard window, which Figure 8 shows, that reports all the operations the wizard has been configured to perform. Showing everything together in one report like this is another way the new wizard makes double-checking your work easier. If you're happy with the setup and the summary looks right, you can execute the package. If you selected the Run Immediately option in the Save or Schedule Package dialog box, clicking Finish in the Complete the Wizard window will run the resulting package. If you click Finish and you get errors, you can go back into the wizard and modify settings. This feature is great because you can test the package and correct any errors while retaining all the settings.

After you tell the wizard to execute, it first must build the package. Even if you choose not to run the package immediately, the status window that Figure 9 shows will appear when you click Finish on the summary screen. All the rows in the Action column up to the Saving entry are reporting the success or failure of package creation. Because I opted to run the package immediately, the Action column also shows additional rows such as the Executing, Post-execute, and Cleanup steps that report the results of executing the package. If errors occur when the package runs, an Error link will appear in the Message column. Clicking the link opens another window that shows you the errors.

In this example, the status window shows that the package execution was successful, so clicking the 18508 rows transferred link in the Message column will show the package output. DTS 2005 output is quite verbose compared to DTS 2000 output, and the extra information should help you understand what's happening inside successful packages as well as troubleshoot problem packages. Notice also in Figure 9 the Report button, which lets you save the package output to a report in a file, save it to the Clipboard, or send it as an email message. Clicking Filter lets you filter the entries. For example, you can filter to show only failed or only successful operations.

As you can see, the new DTS Import/Export Wizard still looks a lot like the old wizard, but it has some substantial changes. The new wizard focuses on moving data, with the transform and transfer features gone, and it lets you more easily see and correct problems as you build packages.

Comments

Plain text