Introducing Simple API for XML

An XML document is simply a text file—a sequence of ASCII characters. To spring to life, it needs something to interpret the content and transform it into a memory structure that applications can work with. Although this memory structure certainly doesn't need to be binary, a binary structure helps keep the memory footprint small and under control. A binary structure built on top of the source XML code also lets client applications interact with the content more directly. But how do you obtain such a binary structure? Who is responsible for creating it? Is there only one way to create it? Answering these questions can help you understand the importance of the XML Document Object Model (DOM) and the newer Simple API for XML (SAX). In addition, the presence of an intermediate module that builds up a memory representation of the XML file helps you understand why XML is so highly interoperable and language- and platform-agnostic.

The Parser's Role

XMLDOM parsers' overwhelming presence in the Windows world makes people forget a parser's fundamental role. A parser is responsible for building a (possibly binary) memory representation of an XML document. An XML parser is a sort of black box that reads in XML text and outputs a memory representation for it; an XML parser is NOT something that reads in XML text and creates a COM object, such as the XMLDOM object. An XMLDOM object is only one possibility. On a non-Windows platform, for example, you don't usually find COM support, but you do find XML support. In fact, XML is the key to enabling cross-platform communications because ASCII text, such as XML, can easily travel from platform to platform. On the target system, a platform-specific parser transforms the XML document into a memory object suitable for the platform. So it happens that some parsers—running on IBM machines, for instance—transform an XML document into a Java class. What matters is that the same content springs to life.

Types of Parsers

Two types of parsers are available. The first type to appear was the XMLDOM parser. Regardless of the platform, such a parser works by creating a memory object that exposes the content of the XML document through methods and properties following the recommendations of the World Wide Web Consortium (W3C) XMLDOM specification. This parser reads in the document and creates an internal representation of it. The particular representation obviously depends on the platform capabilities. It's a COM object in Windows and a Java class on many other platforms.

Recently, a second type of parser has appeared—SAX. It owes its origin to people in the Java and open-source communities, but it's gaining acceptance on Windows platforms. Version 3 of the Microsoft XML Parser (MSXML), in fact, includes SAX parser support. SAX parsers follow a different logic. They don't create a memory representation of the XML content. Instead, they fire events whenever they find a new element in the XML source. As a result, the client application becomes aware of any element forming the XML stream and can decide what to do. In my next several columns, I'll cover the pros and cons of each parser type and delve further into SAX.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.