Over the last several weeks, I've been working extensively with comma-separated value (CSV) files. One simple example is the component I described in "Gathering File System Data" (October 2006, Instant-Doc 93451), FileDB.wsc, which saves information about files to a CSV file. CSV files are plain-text database files in which each line represents a record (row) that's divided into fields (columns), with commas separating data items from each other, as Figure 1 shows. The first line (sometimes called a header line) of the CSV file contains the field or column names, and subsequent lines contain the data. Double-quote characters define individual items in case a data item contains a comma. CSV files can contain an arbitrary number of lines, and they might not contain a header line that contains the field names. As I worked with the output from FileDB.wsc, I realized that an HTML application (HTA) would be an ideal way to quickly and easily view CSV files. Let's take a look at the HTA I wrote and see how the scripting behind it makes it work.
An HTA is a dynamic HTML (DHTML) document with an .hta extension that runs at a higher privilege level than Internet Explorer (IE). As such, HTAs can use any available ActiveX objects on the computer, even those that aren't marked as "safe for scripting" for IE. For example, an HTML document that attempts to use the FileSystemObject will generate a warning message if you open the document in IE because the FileSystemObject is not marked "safe for scripting." (And this is a good thing, because otherwise any Web site that you browse in IE could potentially access the local file system.) In contrast, HTAs run in mshta.exe, which allows access to all ActiveX objects on the computer. As such, HTAs need to be treated with the same care as scripts or executables, but they provide a convenient way to create GUI-based applications.
The Microsoft Tabular Data Control (TDC) simplifies the creation of an HTA for viewing CSV files. The TDC is an ActiveX object that lets you display CSV-based data in an HTML table. It's a core IE component that first appeared in IE 4.0 and is available on every Windows OS that has IE 4.0 or later. The HTA I wrote, CSVviewer.hta, is a convenient tool for anyone who needs to view a CSV file but doesn't need a data-analysis tool such as a spread-sheet or a database. The CSVviewer .hta script is too long to print in its entirety here, but you can download it from the Windows IT Pro Web site at http://www.windowsitpro.com/windowsscripting, InstantDoc ID 94260.
The CSVviewer.hta script generates a screen like the one that Figure 2 shows. The HTA is straightforward: Simply enter the name of a CSV file or click Browse to select a file, then click OK. If the CSV file's first line doesn't contain field names, deselect the First line of CSV file contains field names check box. The Display x records per page setting (where x is a number between 1 and 999) defines the number of CSV file records that the application will display at one time. To clear the form and start over with a different CSV file, click Reset. Finally, you can click Exit to close the application.
After you've selected a CSV file and clicked OK, the data from the file appears in a table in the lower half of the window along with a Filter button and a set of navigation buttons, as Figure 2 shows. The Filter button provides access to the TDC's Filter property, which I explain in a moment. The First Page, Last Page, Previous Page, and Next Page buttons let you navigate through the data. You can also sort the data by clicking the column headings. The first click of a column heading will perform an ascending sort; subsequent clicks on the same column heading will reverse the sort order with each click.
A Peek Under the Hood
CSVviewer.hta is easy to use, but there's plenty of scripting under the hood that makes it work. I wanted CSVviewer.hta to be able to display data from an arbitrary CSV file; that is, I didn't want to "hard-code" the column names and the CSV file's name into the document. The CSVviewer.hta script accomplishes these goals by reading the column headings and setting the appropriate TDC properties at runtime.
Creating the object reference. Before you can use the TDC in an HTML or HTA document, you need to insert an <object> element for it. CSVviewer.hta uses the following HTML code to insert a TDC object:
<object id= "csvdata" classid= "clsid:333C7BC4-460F- 11D0-BC04-0080C7055A83"> <param name="CaseSensitive" value= false //> </object>
The id attribute provides a named reference for the data in the table to connect, or bind, to the TDC's data source (a CSV file). The classid attribute must always be set to this particular value because it specifies the TDC's globally unique identifier, which remains constant regardless of the computer it's used on. The <object> element can include one or more <param> elements that specify the TDC's initial property settings. CSVviewer.hta sets the TDC's CaseSensitive property to false, which means that data comparisons will ignore case. The other TDC properties (such as the DataURL property that specifies the CSV file) are set at runtime. For more information about the TDC, see "About Micro-soft Tabular Data Control" (http://msdn.microsoft.com/library/default.asp?url=/workshop/database/tdc/overview.asp).
Selecting a file. I wanted the HTA to provide an interactive way of selecting the CSV file to view. HTML provides the <input type="file"> element, which includes a Browse button. This element's main purpose is to provide a way to upload files to a Web server, but CSVviewer.hta uses it to select a file. The id attribute makes the element's contents (i.e., the typed or selected filename) easily accessible to the HTA's script code.
Verifying that the file exists. An <input type="submit"> element provides the OK button. This element's onclick event handler calls the processFile function; that is, the HTA executes the processFile function when you either press Enter or click OK. The processFile function, which you can see in Listing 1, uses the File-Exists method of the FileSystemObject to determine whether the selected file exists. If the file doesn't exist, the function uses the alert method to display an error message; otherwise, the function sets the global variable filename to the name of the selected file. The function then executes the getFields and putData methods, which I describe in a moment.
Processing the CSV file's fields. After the HTA verifies that the file exists, it executes the getFields function. As Listing 2 shows, the getFields function populates the fields array (a global variable) with the field names from the CSV file. The function uses the FileSystemObject's Open-TextFile method to open the CSV file and read its first line, then closes the CSV file. Next, it empties the fields array by setting its length property to zero. The function uses the split method to return an array of substrings, using the comma (,) character as a delimiter. Be aware that the get-Fields function uses the split method to parse the first line of the CSV file, so the function will fail if a field name contains a comma (even if it's enclosed in quotes).
The getFields function then uses the for statement to iterate the array of substrings that the split method creates. If the useHeader function returns true (which will be the case if you've selected the check box), the function uses the replace method to remove any double-quote (") characters from the field name. Next, the function determines whether the field name contains the colon (:) character, indicating a field type. (I'll discuss field types in a moment.) If the field name contains a colon, the getFields function removes the colon and the remainder of the string so that only the field name is added to the fields array. For example, if the field is named Size:Int, the fields array will contain only Size (the :Int is not part of the field's name).
If the useHeader function returns false, the getFields function will populate the fields array with the field names Column1, Column2, Column3, and so forth. The TDC requires these field names if it's processing a CSV file that doesn't have a header line.
Defining the field types. When a CSV file contains a header line, a field name can include a colon (:) and a field type. Acceptable field types are String (the default), Date, Boolean, Int, or Float. For example, if the field's name is Size:Int, the TDC will use Size as the field's name and Int as its type. Without field types, the TDC will sort information as text strings, so if you want to sort a numeric field, make sure to specify its type in the CSV file before viewing it with CSVviewer.hta. See http://msdn.microsoft.com/workshop/database/tdc/reference/use header.asp for more information about field types.
Creating the output table. After the getFields function determines the list of field names, the processFile function executes the putData function to create the output in the bottom section of the HTA window. The putData function dynamically populates the data area of the document with the CSV data by setting the innerHTML property of the <div> element at the bottom of the document.
The putData function starts by creating a series of HTML elements that contains the Filter and navigation buttons. Next, the function creates the HTML table element that connects to the TDC as a data source by setting the table's datasrc attribute to #csvdata (the leading # character is required). The function also sets the table's id attribute, which binds the navigation buttons to the table; that is, the Previous Page, Next Page, First Page, and Last Page buttons use this ID to update the table on the screen with the data from the CSV file. The table's datapagesize attribute comes from the Display x records per page text box near the top of the document.
After creating the opening <table> tag, the putData function adds a <thead> tag and a <tr> tag and iterates the array of field names populated by the getFields function. For each field name, the function creates a <th> element, with each <th> element's id attribute set to the field's name. It also sets the <th> element's onclick event handler to a sorting function, which I describe shortly. After the function iterates the field names, it closes the table heading by adding the </tr> and </thead> tags.
Next, the putData function adds a <tbody> tag, an opening <tr> tag, and iterates the fields array a second time. For each field name, the function creates an opening <td> tag, a <span> element with its datafld attribute set to the TDC object's id, and a closing </td> tag. The datafld attribute is how the table rows bind to the TDC data source. After iterating the array, the function adds the closing </tr>, </tbody>, and </table> tags. Figure 3 shows the HTML table the putData function will create if you view the CSV file shown in Figure 1.
After the putData funtion has generated the HTML that contains the navigation buttons and the table, it sets the innerHTML property of the document's <div> element to the generated data. To complete its work, it sets the TDC's DataURL property to the CSV file's name, sets its UseHeader property, and calls the TDC's Reset method to display the data.
Filtering the data. When you click Filter, CSVviewer.hta calls the setFilter function, which uses JScript's prompt method to display a simple dialog box that requests a filter string, as Figure 4 shows. The string you enter in this text box is set as the TDC's Filter property. See http://msdn.microsoft.com/workshop/database/tdc/reference/filter.asp for information on filtering the TDC's data.
Sorting the data. The TDC initially displays an unsorted view of the CSV file, but CSVviewer.hta makes sorting easy. When the putData function constructs the HTML table, it creates <th> elements with id attributes set to the CSV file's field names. It also configures each <th> element's onclick attribute to call the sortBy function, with the field name as a parameter (see Figure 3 for an example). The sortBy function sets the TDC's Sort property to the column name that you clicked, then calls the TDC's Reset method to redisplay the data.
Know Your Limits
CSVviewer.hta relies on the TDC to connect to the CSV file and retrieve its data, so it's subject to the TDC's limitations. If you open a very large CSV file, be aware that the file might take a long time (and a lot of memory) to open. Also, take care that the CSV file doesn't start with a blank line, which will cause the TDC to not display any data at all. Even with these limitations, CSVviewer.hta is a useful tool for viewing CSV files.