More About .NET XML Readers

In my May 29 column, I introduced an XML reader, the XmlTextReader class. In .NET, you can use the XmlTextReader class as a lightweight, but not less-effective, alternative to XML Document Object Model (XMLDOM) classes. XML readers let you move along a source file with a cursor-like approach. The atomic elements that you're reading aren't records, as with a database, or single bytes, as with streams. Instead, XML readers let you jump from node to node. Let's examine the methods the XmlTextReader provides that let you make those jumps.

The Read Method

The Read method blindly moves the internal pointer from one node to the next, regardless of the node type. Thus, a Read can move you from a comment node to the root document-element node or from a given element's last attribute node to the next element.

The MoveToContent Method

The MoveToContent method lets you skip over nonelement nodes and reach the document element node directly from the beginning of the XML file. For example, the following code first opens the specified XML document:

XmlTextReader reader = new XmlTextReader(fileName);

When the XML reader loads, the system automatically positions the reader before the physical beginning of the file. Moving the reader to the first node requires a call to the Read method. In an XML file, the first node can be of various types. In an XML 1.0-compliant document, the first node is a declaration node:

<? xml version="1.0" ?>

Under other circumstances, the first node can be a processing instruction, a comment, a doctype, or a document element.

If you plan to work only on content nodes and attributes, you use the MoveToContent method to skip over the first block of nodes and automatically position the pointer at the first content node—the root document element.

The MoveToElement Method

XML readers also have other interesting features that qualify the whole API as a noncached, read-only (but not necessarily forward-only) way of working with nodes. For example, suppose that you access the attribute list of a given node. You then move from one attribute to the next in a clearly forward-only manner. When you finish the attribute list, you might want to continue to the next content node in line or return to the parent node—the one that the attributes belong to.

The former case is obviously a move forward. The latter case, however, could qualify as a backward move because you're jumping back over already-read attributes. This instance is the only one in which the XML reader class provides any backward movement. The MoveToElement method provides this capability: it moves the pointer back to the node element that contains the current attribute node.

The Skip Method

Below is a typical loop to scan the content of an XML document:

while (reader.Read())
   Console.Write("Node Type: ");
   Console.Write(", Node Name: ");

This loop checks all the nodes it finds on its way. You can use the Skip method, however, to skip the current node and jump to the next one. For example, in the following code, the reader skips all nodes with names different from MyNode:

while (reader.Read())
  if (reader.Name != "MyNode")

The XmlTextReader class, which inherits from the abstract base class XmlReader, enforces the rules for well-formed XML but doesn't provide XML data validation. It also checks doctype nodes to ensure that they're well-formed and that the syntax of the specified Document Type Definition (DTD) is correct. The XmlTextReader expands entities and checks to ensure that they're well-formed. In no case, however, does it use the DTD to perform validation.

The XmlTextReader class is a very fast parser because it doesn't perform the extra steps necessary for data validation. To perform data validation, you must use a new derived class—XmlValidatingReader—which I'll review in my next column.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.