A simple solution for downloading and processing Web data

Businesses increasingly need to pull data from the Web into internal databases and spreadsheets. This common task has spawned a proliferation of custom applications that all do the same thing. The applications have code for downloading files from the Web, logging successes and failures, running the application on a predetermined schedule, and notifying the administrator of failures and completions. In devising methods of handling common tasks, you need to find a method that provides quick turnaround; provides a consistent way to easily manage, monitor, and modify these applications; and uses as many of the OS's services as possible.

For example, let's consider the business problem of bringing data from the Web into internal databases. This task seems to be a good fit for SQL Server 7.0's Data Transformation Services (DTS). A good solution for downloading flat files and Microsoft Access or other data would be to use DTS for data pickup and massage and to use the SQL Server 7.0 Job Scheduler for scheduling, notifying, and logging. DTS is a welcome addition to SQL Server's arsenal of tools. But in its current form, DTS doesn't interact well with the Web because it can't access data by means of a URL and it doesn't support Web site logon authentication. Someone will likely soon produce an OLE DB driver for Web data, and one for XML won't be far behind. In the meantime, however, DTS does let you run VBScript, and you can instantiate COM-compliant objects from VBScript. You could use Visual Basic (VB) to create a simple, generic object for downloading files from the Web. When the object is complete, you'll have a framework for processing data from the Web.

In this article, I demonstrate how to use VB to make a simple ActiveX DLL with the sole function of downloading files from the Web. (If you don't like to write code, you'll like this project. It has only a couple of dozen lines of code, and after you've written and compiled it, you shouldn't have to look at it again.) After the DLL is complete, you can concentrate on writing a DTS package to process the data. Finally, you use Job Scheduler to schedule DTS package execution, set notification options, and set logging options. This design will let you reuse the same code and application services for downloading files, event logging, notifying, and scheduling. And because you set up the scheduling and failure notification with Job Scheduler, you can easily roll the application over to the support team.

Creating the ActiveX DLL

If you don't have VB 6.0, you can use regsvr32 to register the DLL by opening the Run dialog box and typing

regsvr32 "C:/WebDataFetcher/ WebDataFetcher.dll"

to replace the first occurrence of WebDataFetcher with your file path to the WebDataFetcher.dll file. Now you can start creating the project. First, open VB. If you don't see the New Project dialog box, Select File, New Project from the File menu list. Select the ActiveX DLL icon from the New Project dialog box, as Screen 1 shows. Next, open the Project Properties dialog box by selecting the Project1 Properties menu item in the Project menu list. Then, change the project's name and description in the Project Properties dialog box, which Screen 2 shows, to WebFileFetcher. Open the class Properties dialog box by pressing Ctrl+R. Select Class1 from the list, and press F4. Then, change the name of the class to CWebDataFetcher, as Screen 3 shows. Finally, press Ctrl+T to open the Components dialog box, and select the Microsoft Internet Transfer Control 6.0 check box, as Screen 4 shows. You can use several other controls, but this one is simple to use and supports FTP, HTTP, and HTTP over Secure Sockets Layer (HTTPS).

To add a form to the project, you click the Add Form menu item in the Project menu list. Press F4 to open the form's property list, then change the form's name property to frmGetFile. You're now ready to add the Internet Transfer control, which is a VB control that implements FTP and HTTP, to the frmGetFile form. To add the control, click the toolbox menu item in the View menu list, then double-click the Internet Transfer control. Finally, add the code in Listing 1 to the CWebDataFetcher class. You can open the CWebDataFetcher code window by pressing Ctrl+R and double- clicking the CWebDataFetcher item.

Here's a short explanation of the code you add to the CWebDataFetcher class. By calling the GetFile function, you pass in a URL (for example,, filename and location (C:\test.mdb), username, and password. Pass in empty strings for the username and password parameters if you don't need them. (You'll need these parameters, for example, to restrict users' access to secure Web sites.) The code in Listing 1 loads the frmGetFile form. The line of code

frm.GetFile.Inet1.URL = aURL

sets the Inet1 control's URL property to the aURL variable, which the code passed into the function. The aURL variable will contain the address and name of the file that you want to download from the Web. The code passes the filename and address to the Inet1 control.

The next line of code

If aUsername <> "" Then

is a conditional statement that checks to see whether the aUsername variable isn't an empty string. If the aUsername variable isn't an empty string, it will set the Inet1 control's Username and Password properties to values contained in the aUsername and aPassword variables, respectively. The line of code

b() = frmGetFile.Inet1.OpenURL(, icByteArray)

calls the OpenURL method of the Inet1 control, requesting it to return the file at the requested address as the ByteArray variable type and put it in the b() ByteArray variable. The line of code

Open aFile For Binary Access Write As #1

creates the file specified in the aFile variable. The code

Put #1, , b() 

then writes the b() ByteArray variable, which contains the Web file, to the file opened in the previous line of code. The code

Close #1

closes the open file. The code then unloads the form. (The code also contains some simple error handling for completeness.) The project to download and process Web data is now complete. Compile the project by clicking the compile WebDataFetcher.dll menu item in the File menu. For information about debugging the project, see the sidebar "How to Debug Visual Basic COM DLLs Used in DTS," page 55.

Creating the DTS Package

Now, let's create a simple DTS Package that downloads a text file from the local Web server. This text file contains a couple of new stores for the Pubs database. The package will insert these new stores into the Stores table in the Pubs database. First, open SQL Server Enterprise Manager and connect to your local database. Right-click the Data Transformation Services folder, and select the New Package menu item. On the Task menu, drag an ActiveX Script Task into the package. Name the ActiveX Script Task Download Webdata.txt, then put the code from Listing 2 in the Script window, as Screen 5, page 53, shows. This code creates an instance of the CWebDataFetcher class. The code then calls the GetFile function, passing in the location of the file on the Web and the place where you want to save it locally.

Copy the Webdata.txt file—which you can download from the SQL Server Magazine Web site at http://www.sqlmag .com, Article ID 8413—to the root directory. (The Microsoft Internet Information Server—IIS—installation process sets this directory's location. The default is C:\InetPub.) Save the DTS package, then run it. If an error occurs when the task runs, follow the instructions in the sidebar "How to Debug Visual Basic COM DLLs Used in DTS." The Webdata.txt file is now in the root directory of your drive, and you'll use the file to continue building the package.

Now you need to open the DTS Designer and add a text file source to the package. The text file source will serve as the connection that DTS will use to access the Webdata.txt file. To open the Connection Properties dialog box, drag a new text file data icon from the toolbox into the DTS package. Name the connection Webdata .txt, and set the filename to C:\WebData .txt, as Screen 6 shows. Click OK, and the Text File Properties wizard will open. Select the Delimited option, and set the Text qualifier to none, as Screen 7 shows. Click Next. In the Specify Column Delimiter dialog box, select the Comma option. The two rows of data in the WebData.txt file will appear in the Preview window, as Screen 8 shows. Click Finish, then click OK.

Drag a Microsoft OLE DB Provider for SQL Server data icon onto the package. This icon is the server icon in the Data toolbox at the side of the package. Name the connection pubs, set the Server text box to (local), and select the pubs database, as Screen 9 shows. Click OK. To create a transformation between the WebData.txt connection and the pubs connection, hold down the Ctrl key and click the WebData.txt connection, then click the pubs connection. Right-click to open the menu list, then click the Transform On menu item in the WorkFlow menu list. Then, open the transformation's Properties dialog box and give it a description, such as Move the store records from the text file, as Screen 10 shows. Select the Destination tab, and select the \[pubs\].\[dbo\].\[stores\] table in the Table name drop-down list, which Screen 11 shows. Select the Transformations tab, then click OK. Create an On Success workflow between the Download WebData.txt ActiveX script and the WebData.txt source control, as Screen 12 shows. To do this, hold down the Ctrl key and click Download WebData.txt ActiveX script, then click the WebData.txt source control. Right-click to open the menu list, then click the On Success menu item in the WorkFlow menu list.

You can download the completed package from the SQL Server Magazine Web site. If you want to run the package again, delete the newly added rows from the Stores table by using the following SQL statement:

delete from stores where stor_id in ( 1, 2 )

Save the package to the (local) server.

This project gives you a way to download files from the Web within DTS. You can reuse the ActiveX Script Task—Download Webdata.txt—in any DTS project to download and process Web data.

How the SQL Server Agent Fits In

Downloading and processing Web data is a task you'll probably need to perform periodically, so you might want to set up failure or completion notification and logging. The Schedule Package wizard in the SQL Server Agent is perfect for this task. To open the wizard, right-click on a DTS package in Enterprise Manager. The Edit Recurring Job Schedule dialog box, which Screen 13 shows, lets you set various scheduling options, such as frequency and duration. After you've completed the steps in the wizard, you'll have a package that the SQL Server Agent runs on a schedule. You can access the logging and failure notification options through a Job's properties on the Notifications tab, as Screen 14 shows. Set the output file that you want to send the notification to on the Advanced tab of the step's Properties dialog box. After you set the notification options, you have a robust data processing system in place.

The biggest advantage of this solution is that it lets designers apply the business rules to the data. You don't have to write code for scheduling, logging, notifying of failure or completion, or file downloading. This solution also lets you process all Web data in a consistent manner, which makes the data easier for the support team to manage.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.