XMLDOM vs. SAX Parsers

XML parsers are the tools you use to code against XML documents. By means of parsers, you can use advanced and handy tools to read and write XML documents. As I mentioned in my last column, two types of parsers are available. The first type to appear was the XML Document Object Model (XMLDOM) parser. Recently, a second type of parser has appeared—Simple API for XML (SAX).

XMLDOM

When you use XMLDOM to manipulate an XML document, XMLDOM reads the file, breaks it into individual objects, such as elements and attributes, and then creates an in-memory tree structure. Using XMLDOM is beneficial because you can reference and manipulate each object (also called a node) individually. XMLDOM provides a complete "black-box" service: You specify the XML file name, and the system serves a ready-to-use object. On the down side, creating an XMLDOM tree structure for a document, especially for a large document, requires a significant amount of memory.

SAX

Unlike XMLDOM, SAX is an event-driven interface, which means that SAX generates events as it finds specific symbols in the XML document. SAX scans a document section by section, recognizes specific XML items, and notifies the calling application. Because SAX processes documents serially, it uses less memory than XMLDOM and is significantly better for processing very large documents.

On the other hand, SAX doesn't create a persistent in-memory structure to represent the document. It makes a one-time visit to any element in the XML document, lets the application know about that element, and finishes. The application then is responsible for building any persistent representation of the document. Taking this approach to the limit, you can see that an application might create a full-blown XMLDOM tree. However, the application can do that according to its own special needs, adding extra information or removing unneeded items. From this point of view, SAX is more flexible than XMLDOM, but SAX requires more coding.

Microsoft XML Parser (MSXML) 3.0 includes support for the SAX2 API. The SAX2 implementation provides both Visual Basic (VB) and C++ interfaces and offers a simple and fast alternative to XMLDOM.

When To Use Which

To help you decide when to use XMLDOM and when to use SAX, consider the following SAX advantages:

SAX requires less memory than XMLDOM.
SAX lets you abort parsing.
SAX lets you retrieve only small portions of the document.
SAX lets you create a new document structure from an existing document.

With SAX, memory consumption doesn't increase with the file size. The complexity of the SAX algorithm remains linear while the complexity of XMLDOM's algorithm increases at least polynomially. Thus, if you need to process large documents, SAX is the better alternative, especially if you don't need to change the document's content. If your application requires frequent document modifications, XMLDOM is the best approach because it offers a good compromise between ease of development and overall performance.

Finally, if you want maximum freedom working with the document but you don't like the XMLDOM API, stay tuned to .NET developments. ADO.NET, in particular, promises to offer an alternative interface that works with XML files and treats them much like tables of records.

Comments

Plain text