The HTML Text Filter


Improve performance without losing maintainability

Many Web sites and applications are going through a performance crisis because they either weren't designed to performance standards or weren't tested during the development and deployment cycles. Developers and systems administrators are trying to make sites and applications perform better. However, any process that optimizes applications or sites makes maintaining them more difficult.

This dilemma is particularly troublesome for Web applications and sites because the application or site almost always produces HTML code that must travel over a network connection to the requesting browser. Any increase in the HTML file's size results in increased network traffic, which might translate into slower response time for the application's users. This problem is especially evident when users access the site through a slower-speed analog modem connection.

You need to weigh this need for speed against the need for maintenance. To assist in application maintenance, developers insert comments into their code. For HTML code, developers insert comment blocks into their code like this:

This is a comment.
// >

The browser picks up the comment tag (<!) and ignores everything after it until the closing comment tag (//>). This process lets developers insert into a page notes or other text that describes what the HTML code does. (The above comment uses the HTML 3.2 format. Developers can also use other formats in various languages.)

Developers and designers also use white space in an HTML application to improve the code's legibility. Because white space breaks the code into sections, the space lets a developer or designer quickly grasp the code's meaning. Design tools such as Microsoft FrontPage also insert formatting into the HTML they produce to make the code more readable.

The Microsoft Windows 2000 Resource Kit includes the HTML Text Filter tool, which you can use to automatically remove comments and extra white space from .htm, .html, and .asp files. This automation makes it easy to shrink a file before it goes into production; the resulting smaller file runs faster.

To test this tool, I created the simple .htm code (commenttester.htm) that Listing 1 shows. To execute the tool against this test file, use this syntax:

C:\htmlfltr commenttester.htm

Demonstrating the Filter
Figure 1 shows the execution of this command and its results (i.e., that it compressed one file). When the tool finished processing, the file looked like the file in Listing 2.

Notice that the HTML in Listing 2 has a new starting tag <! HF>. The filter inserted this tag to show that the tool filtered the text. Also, notice the structure of the HTML in Listing 2: The HTML is not as easy to scan as the HTML in Listing 1 because the filter removed most of the white space and squeezed the tags together. For example, the <HTML>, <HEAD>, and <META> tags are all on the same line. Reducing the amount of white space reduces the size of the file but makes the HTML much harder to read. This sample file doesn't represent the complexity of a file with hundreds of lines of code. The lack of white space becomes more important with larger files because HTML's complexity makes them much harder to read.

One other thing has changed from Listing 1 to Listing 2. The comment block after the <BODY> tag is gone. However, the comment tag embedded in the <SCRIPT> block is still there. Why? Client script blocks require a comment tag surrounding the script because some older browsers don't support client scripts. To honor this requirement, the HTML Text Filter is smart enough to detect a script block but retain comments in the script block. You can insert the <! NoCompress> tag into a comment block to stop the HTML Text Filter from stripping that comment.

You can use two switches with this tool. The /Q switch causes the HTML Text Filter to run in quiet mode, which displays no output for the user. Quiet mode is useful when you execute the tool from a script or batch file. The /S switch compresses files in all subdirectories under the specified directory. This capability is useful when you need to compress all the files in an entire directory tree.

You can also have the HTML Text Filter process an entire directory of files by specifying a directory name instead of a filename when you execute the command. This capability is particularly handy when you've copied the files from the development server to a production server and need to optimize them.

Quite often, you'll find that you need to use more than one tool on a file. For example, if you use Microsoft Office to generate any HTML for a Web application, you might want to use the Office HTML Filter with those files. This filter lets the user remove Office-specific markup from .htm files. You automatically install the Office HTML Filter as part of the export process in Microsoft Word 2000. After you use the filter, you can no longer edit the files in Word, but you can download*and possibly load*files faster. You can find out more about this filter from htmlfilter.htm.

The Bottom Line
How much does the filter affect file size? The original file size in Listing 1 is 699 bytes; the compressed file is 554 bytes*a reduction of 145 bytes, or more than 20 percent. This reduction in size has three benefits:

  • The file that IIS reads from disk is smaller.
  • The data that IIS must send across the wire is smaller.
  • The browser doesn't have to process as much data.

The benefits to your users will depend on the amount of compression, the speed of their connection, the type of browser they're using, and the speed of the system running the browser.

As you do with any tool that works with your applications, make sure you test the sites or applications that you process with the HTML Text Filter before you release them into production. Also, be sure to run this tool on a copy of the original source files and not the source files. Testing is important because it's conceivable that the HTML Text Filter could modify a file and cause it not to work correctly. Running this tool against a file radically alters the file's contents and makes the file more difficult to maintain. However, if you take a little care, a tool such as the HTML Text Filter is a great way to help you optimize applications and maintain them easily.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.