Skip navigation

Writing SAX Applications

In our ongoing exploration of Simple API for XML (SAX), let's look this time at how to write a Visual Basic (VB) application that works with SAX. We won't consider .NET and VB.NET here because the .NET class framework contains specific SAX classes. I'll cover .NET XML features in future columns.

Using SAX and VB, the first step is to add to your VB project a reference to the Microsoft XML Parser (MSXML) 3.0 type library. From the Project menu, select the References menu item, and then select Microsoft XML v3.0 as the library. To start SAX parsing over a given XML file, insert an interactive control (e.g., a button) and associate the following code with its click event:

Dim parser As New SAXXMLReader
Dim contentHandler As New ContentHandlerImpl
Set parser.contentHandler = contentHandler
parser.parseURL (App.Path & "\foo.xml")

The SAXXMLReader object, which the Microsoft XML v3.0 library provides, represents the SAX parser. The ContentHandlerImpl object represents the content handler that you must write. In VB, a SAX content handler is a class that features the IVBSAXContentHandler interface. Thus, you need to add a new class to the project and include following line at the beginning of the file:

Implements IVBSAXContentHandler

Then you need to define handlers for all the events that the interface makes available. Of these available events, you should consider two in particular—startElement and characters:

Private Sub IVBSAXContentHandler_startElement( _
    strNamespaceURI As String, _
    strLocalName As String, _
    strQName As String, _
    ByVal oAttributes As MSXML2.IVBSAXAttributes)
      ' code here
End Sub

Private Sub IVBSAXContentHandler_characters( _
      strChars As String)
      ' code here
End Sub

StartElement fires when the parser starts working on a new XML tag. The application receives the namespace URI, the raw tag name, and the fully qualified name—including the namespace prefix, if any. The fourth argument is an object that gathers all the attributes that the element has. The startElement event doesn't tell you anything about the text between the opening and the closing XML tag. To access that piece of information, you use the characters event.

A common problem that you face with SAX parsers is state management. In a typical scenario, you want to process the text of certain elements. Unfortunately, the characters event doesn't tell you anything about the element to which that text belongs. Inevitably, you have to resort to globals to keep track of the last element processed and decide whether you're interested in its characters. Consider the following XML file:

<clients>
       <client>
            <firstname>Joe</firstname>
            <lastname>Users</lastname>
      </client>
      <client>
            <firstname>Jack</firstname>
            <lastname>Whosthisguy</lastname>
      </client>
</clients>

Suppose that you want to extract only the clients' names. You must store in a global (g_strTagName) the most recently started element's name and retrieve that name in the characters event handler.

Sub IVBSAXContentHandler_startElement( _
    strNamespaceURI As String, _
    strLocalName As String, _
    strQName As String, _
    ByVal oAttributes As MSXML2.IVBSAXAttributes)
	
	g_strTagName = strLocalName
End Sub

In addition, you have to ensure that you reset the g_strTagName global when the parser finishes with an element. You need a second global (g_strBuf) if you want to concatenate the first and last name into a single string.

Sub IVBSAXContentHandler_characters(strChars As String)
    If g_strTagName = "firstname" Then
        g_strBuf = strChars & " "
    End If
	
    If g_strTagName = "lastname" Then
        g_strBuf = g_strBuf & strChars
        Form1.List1.AddItem g_strBuf
        g_strBuf = ""
    End If
End Sub

At the end of the process in this example, the application adds the string to a listbox.

A SAX parser is extremely fast because it makes only one pass through the XML document from top to bottom. For efficiency reasons, the parser doesn't track what it did in the previous step. Thus, tracking down state is completely up to you.

TAGS: SQL
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish