Simple API for XML (SAX)

Microsoft has released the latest version of its XML parser, Microsoft XML Engine (MSXML) 3.0, and it's chock full of capabilities. MSXML's Document Object Model (DOM) implementation is solid, and the release contains Extensible Style Language Transformations (XSLT) and XPath that stand up to the latest World Wide Web Consortium (W3C) standards. You can use MSXML on the client side or the server side, in COM objects or in scripting. Microsoft has included some new things in the package, and one in particular caught my eye: Simple API for XML (SAX). (The SAX specification is currently version 2.0 and is often referred to as SAX2.)

SAX lets you access the information in an XML document, but it's very different from DOM. When you use DOM, you instantiate a DOM object, load an XML document, and access elements and attributes as needed from the data tree. You can transform the document, add to it, and output it in any style you choose. SAX is event driven. The full XML document doesn't load at the start. Instead, it loads section by section, serially. An event executes at each stage as the section processes. For example, consider the following XML code:

<?xml version="1.0" ?>

The parser steps through and produces these events:

startElement: company
startElement: name
characters: Interknowlogy
endElement: name
endElement: company

The basic idea is that you can create content handlers and attach them to the events. MSXML provides SAX objects for Visual Basic (VB) and Visual C++ (VC++). SAXXMLReader is the parser object. You create a content handler to implement the events you need and then attach the content handler to receive parsing events from SAXXMLReader. You can also create an error handler to receive error events. MSXMLWriter, which is the producer object, can create another XML document. Once bound as the contentHandler, it can capture selected events from the reader and output the new data tree.

Why would you use SAX when you can use DOM? Resources. Processing a 10MB XML document is very resource intensive with DOM because the entire file is loaded into memory. SAX, however, is good for processing very large files. SAX is also faster than DOM because it has less overhead and you can stop parsing at any time. It's well suited for creating a new document tree because with it, you don't have to parse out the document just to build another document. You can also use SAX to extract a content summary.

However, SAX is limited because it works serially only. It doesn't allow for random access to the document content, which means that you must save off data you need for further processing. If the task is complex, use DOM. Also, SAX runs on the server side only; it doesn't currently include native client support.

The MSXML 3.0 software development kit (SDK), which you can download from the Microsoft Web site, presents examples of how to do all these things I've talked about in VB and VC++. For more information about SAX and to download a SAX 2.0 Java Distribution, go to the Megginson Technologies Web site.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.