Gathering File System Data

Downloads
93451.zip

I recently encountered a problem with NTBackup that required me to develop a new perspective on the common scripting task of iterating through collections of folders and files. The problem was that a user had accidentally deleted a file on one of our company's file servers, but she had only a vague idea of the file's name or its folder's name. I couldn't find the file by using the volume shadow copy replicas on the server, so I put in the previous night's backup tape, fired up NTBackup, and started looking for the file in NTBackup's restore window. At this point, I became painfully aware that NTBackup doesn't provide a way to search a backup catalog.

A Scripted Solution
My first idea to solve this problem was to create full-detail backup reports using NTBackup's /L:f option. (You can read about and download my NTBackup scripts in the Windows IT Pro article "Want a Flexible Automated Backup Solution?" http://www.windowsitpro.com, InstantDoc ID 44990.) When you run backups with this option, NTBackup logs the full path and filename for each file in the backup. This method has some shortcomings, though. The first is that the log files can be quite large, depending on the type of backup. If you email or print a large log file after a backup, you could generate a lot of network traffic or use a lot of paper.

Another limitation is that the log file doesn't list the sizes or the last modification dates of the files. Because of these concerns, I decided to script my own solution that would build a database of files based on the backup job's backup selection (.bks) file.

When you create a scheduled NTBackup job, it creates a Unicode text file with a .bks extension that contains a list of files and folders selected for the job. You can also create a .bks file manually in the NTBackup GUI if you choose the Save Selections option from the Job menu. I decided to see whether I could open a .bks file, step through each selected file and folder, and save the names of these files to a comma-separated value (CSV) file. I could use the FileSystemObject object to connect to each folder or file named in the .bks file and use the TextStream object's Write and Write-Line methods to save these files' names to an output file.

However, the Write and WriteLine methods cause some problems when working with database data. After you've written a line of text to a text file, you can't go back and access the line again without closing the text file, reopening it, and reading each line starting at the top of the file. This is a problem because .bks files contain lines that end with the string /Exclude to indicate that the folder or file is to be excluded from the backup. If my script used the FileSystemObject object to go through the .bks file, line by line, adding files and folders to a text file, what would the script do when it encountered an /Exclude line? Using only the FileSystemObject object, the script would have to close the output file, open it again, look for files and folders that matched the /Exclude line, and write a second text file that didn't contain the excluded filenames. This process was cumbersome.

A More Robust Approach
Instead of the TextStream object, I decided to use an ActiveX Data Objects (ADO) Recordset object. (See the Web-exclusive sidebar"ADO Recordset Objects," InstantDoc ID 93632; for download information, see On the Web, page 1.) For readers who haven't used one before, a Recordset object is an in-memory collection of data that's often (but not always) retrieved from a database. Like a database table, a Recordset object has fields (columns) and records (rows).

A Recordset object's data doesn't have to come from a database, however. A Recordset object without a data source is called a disconnected recordset. If you decide to use a disconnected recordset, your script will need to define the Recordset object's fields and add the records using the Recordset object's methods.

ADO can also connect to Microsoft Access .mdb files, SQL Server databases, and more. In this case, however, I decided to stick with CSV text files, because CSV is a universally understood and readable format. I also wanted to avoid the overhead of proprietary database formats.

The FileDB Object
To accomplish my objectives, I wrote a Windows Script Components (WSC) object, FileDB.wsc, which can add the path, filename, size, and last modification date and time for files to a disconnected recordset. It can also easily remove files from the recordset. When the Recordset object contains the needed records, the FileDB object can create a CSV file containing the contents of the recordset. You can download FileDB.wsc from the Windows Scripting Solutions Web site. (See page 1 for download information.) In this article, I'll explain how the FileDB component works. In a future article, I'll provide a script that uses the component to create a CSV file based on the data in a .bks file.

Before you can use the FileDB object in a script, you need to register it on the computer. The FileDB component's programmatic identifier (ProgID) is Penton.FileDB. For more information about how to register a WSC file, see my article "WSH, Part 3: Windows Script Components" at http://www.windowsitpro.com/windowsscripting, InstantDoc ID 49092. Before I go into detail about how the FileDB object works, however, I need to provide a bit of background about the Recordset object.

The Recordset Object
As I mentioned previously, a Recordset object is an in-memory database table, so it contains fields (columns) and records (rows). The Recordset object's fields are accessed through its Fields collection. To add a field, use the Fields collection's Append method. The Append method's parameters define the field's name, data type, and size.

Listing 1 shows how the FileDB component creates the Recordset object and adds the needed fields. Because the FileDB component is going to write the data to a CSV text file, it stores all field values as text (the adVarChar parameter). The Recordset object's Path and Name fields hold the name of a file's parent folder and the name of the file, respectively. In Windows OSs, folder and file names can't be longer than 256 characters, so the Recordset object's Path and Name fields need to be 256 characters to allow for the longest possible folder or file name.

The Recordset object's Size property-returns a variant that contains a number. This variant requires 16 bytes of storage; hence, the Size field must be 16 characters. When the FileDB component writes the number in the Size field to the CSV file, the component automatically converts the number to a string.

The date in the Date field is stored in the following format: yyyy/mm/dd hh:mm. JScript doesn't have a method that outputs a date in this format, so I overrode the JScript Date object's toString method. This date format is 16 characters long, so the Recordset object's Date field is set to hold 16 characters. Figure 1 shows an example of the Recordset object's layout with the Path, Name, Size, and Date fields and a pair of sample records.

If you read the FileDB component's source code carefully, you might notice that I don't declare the adVarChar constant used in Listing 1. I bypass this requirement by including the XML <reference> element near the top of the component. For those who haven't used it before, the < reference> element provides access to the type library information for a specified ProgID. This means the component can use that object's constants without having to explicitly declare them first.

The FileDB Object's Methods
You can see a list of the FileDB object's methods in Table 1. Here's how each method works:

The Include method. The Include method, which Listing 2 shows, uses the FileSystemObject object to obtain a collection of files to be added to the Recordset object. The Include method's syntax is fairly specific. To add all files in a folder, the Path parameter must end with a backslash (\). Without the trailing backslash, the Include method assumes you want to add a specific file. To include files in subfolders of the specified folder, the Recurse parameter must be true.

Callout A in Listing 2 shows how the Include method adds information about a file to the recordset. The method follows these steps:

It calls the Recordset object's AddNew method. This creates a temporary record that hasn't been saved to the recordset yet.
It sets each field's value. In this case, the Path field will get the Parent-Folder property's value, the Name field will get the Name property's value, and so forth.
It uses the Recordset's Update method to save the record to the recordset.

The Exclude method. This method, which uses the Recordset object's Filter property to select the records in the recordset that match the folder or filename specified in the Path parameter (see Web Listing 1). The Exclude method's Path parameter must end in a backslash if you want to remove every file in a particular folder. Otherwise, the Exclude method will assume you want to remove a specific file. Unlike the Include method, the Exclude method doesn't use the FileSystemObject object. This object doesn't let you use wildcards (or something similar) to filter a collection of files by file type or filename.

Fortunately, the Recordset object's Filter property provides the needed filtering capabilities. In the Exclude method, the Filter property is set to a string such as

Path Like 'foldername%'

where foldername is the name of the folder. If the Recurse parameter is true, the percent (%) character is included. Otherwise, it's excluded. (I'll provide more information about the percent character shortly.)

When the Path parameter doesn't end with a backslash, the Exclude method assumes you're searching for a specific file. In this case, the Exclude method extracts the file's parent folder name, gets the Files collection for that folder, and iterates through the collection. The Exclude method selects the record in the recordset that exactly matches the name specified in the Path parameter. The Exclude method then deletes the file associated with that record.

When the Path parameter ends with a backslash, the Exclude method removes the trailing backslash from the Path parameter and works with the entire Files collection. When the percent character isn't present in the Filter property's string (i.e., Recurse is false), the Exclude method searches for folders whose names start with the folder name specified in the Path parameter. When the percent character is present (i.e., Recurse is true), the Exclude method searches for folders and their subfolders whose names match the folder name specified in the Path parameter. The method then deletes all the files in the folders found by the search.

The Exclude method ends by setting the Filter property to an empty string. This removes the filter and gets the method ready for the next search.

The WriteCSV method. The Write CSV method writes a recordset to a CSV file. The method's first parameter is the name of the CSV file to save; its second parameter is a Boolean (true/ false) value that tells the method whether it's allowed to overwrite the file if it already exists. The FileDB object's source code contains two functions, WriteFields and WriteRecords, which use a TextStream object to write the Recordset object's fields and data to a CSV file. The first line of the CSV file will contain the field names. Figure 2 shows what the CSV file looks like based on the recordset in Figure 1.

As I was developing FileDB, I discovered that the Recordset object doesn't account for duplicate records, so a file can be added to a recordset more than once. To account for this, I needed to filter out duplicate records because the CSV file should contain only one line for each file.

To filter out duplicates, the WriteCSV method first saves the Recordset object's data to a temporary CSV file using the WriteFields and WriteRecords functions, then closes the Recordset object. Next, it creates a Connection object and configures a connection string that sets the temporary file's directory as a data source. The WriteCSV method opens the temporary CSV file as a recordset and uses a Select Distinct query to filter out duplicate records. After this step is completed, the WriteCSV method uses the WriteFields and WriteRecords functions to write the final copy of the CSV file. Last, the method deletes the temporary CSV file. You have to create a temporary CSV file first because there's no way to execute the Select Distinct query against a disconnected recordset. (You can't execute SQL statements such as the Select Distinct query without a data source.)

A Sample Script
Listing 3 shows a sample script, CurrentDirectory.vbs, which uses the FileDB component to create a CSV file containing a listing of the files in the current directory. First, the script declares the FileDB variable that will hold an instance of the FileDB object. Next, it creates the FileDB object using VBScript's CreateObject function. It then uses the FileDB object's Include method to add all the files in the current directory (the trailing backslash is required to tell the Include method that we're giving it a folder name). The second parameter (Recurse) is set to False, so files in subfolders will not be added to the recordset. Last, Current-Directory.vbs uses the FileDB object's WriteCSV method to create a file called CurrentDirectory.csv. The Write CSV method's second parameter is set to True, so the CSV file will be overwritten if it already exists.

The FileDB component has many potential uses. For example, a file server administrator might want to store the data for files in a shared network folder. The administrator can then import the CSV file into a database tool (e.g., Microsoft Access) and perform useful queries or reports on this information (e.g., a listing of the 100 largest files in the database). Next month, I'll provide the script that opens an NTBackup Bks file and creates a CSV file based on its contents.

Know Your Limits
Keep in mind that the FileDB component stores information about each file in memory. If you run the component against a folder structure with a large number of folders and files, it could potentially consume a substantial amount of memory.

Comments

Plain text