Get There with XPathNavigator

Exploiting the .NET XPath Query Engine to Navigate Hierarchical Data

InXPath Basics I gave a quick introduction to the syntax of XPath expressions to help the uninitiated get comfortable with XPath, which is a very important technology to understand for working with XML data of many forms.

The thing about XPath is that it can't do anything on its own; it needs a processing engine to perform work based on the expressions. That processing engine could come in many forms. In .NET 1.x, XPath comes into play both for querying and navigating XML data in documents, and also for transforming XML documents using XSLT. In this article I'm going to give a quick introduction to working with the XPath processing engine that you use to query and navigate XML data in .NET - specifically the XPathNavigator class and how to use it.

 

Getting to the Root of Things

One reader correctly pointed out that in XPath Basics I did not cover an important concept of XPath related to absolute and relatives paths. That was partially intentional, so now let me set things straight on that account. In "XPath Basics" I emphasized that the evaluation of an XPath statement is always relative to the current context node. This is true whether you are talking about an individual location step within an XPath statement, or about an entire expression. The context node can be set by a previous location step, or it can be set based on the context of the processing engine that's evaluating the expression.

So if XPath expressions are always relative to the current context node, how can you have an absolute path? The answer is that you can still think of an absolute path in XPath as being relative to the current context node. The way to specify an absolute path in XPath is to use the "/" character at the beginning of the expression. This basically says "start at the root of the document". So from that perspective, an expression like /Music/Album is an absolute path that is evaluated starting at the root of the document, looking for a root element named Music, containing a child element named Album. The way you can view this as still being relative to the current context node is that you can legally evaluate this expression against a reference to a node anywhere within the document, so the query is executed relative to the current node.

The reason I waited to mention this, is that in order to make a statement like the previous sentence, you're really starting to talk about the use of XPath with a particular processing engine. Because I was going to wait until this article to talk about the XPath processing engine in .NET, I thought I'd wait to clarify the relative vs. absolute path issue. That being done, let's get on with some processing!

 

XPathNavigator Knows the Way

The primary object for querying and navigating XML in .NET is XPathNavigator. If you've been using the Document Object Model (DOM) for dealing with XML for a long time, you may feel more comfortable dealing with the XmlDocument class and using the SelectNodes method to perform queries. The truth is that, under the covers, SelectNodes is using XPathNavigator for you. And if you start using XPathNavigator directly, you can adopt a consistent programming approach that will work with XmlDocument, XmlDataDocument, or XPathDocument objects. This will become even more important in .NET 2.0 when XPathDocument gets a serious overhaul to its implementation, allowing it to track changes made to the document in a similar way that DataSets do today.

The XPathNavigator class basically encapsulates a cursor into an XML node set, and allows you to navigate or perform queries relative to that node. The class exposes a set of methods to move to sibling, parent, or child nodes, as well as a set of methods focused on executing a query using an XPath expression. Using an XPathNavigator object you can pre-compile an expression and use that compiled version to perform repeated queries with the same expression much more effectively.

You can get an XPathNavigator from any of the .NET XML document types by calling the CreateNavigator method. What you get is an instance of an XPathNavigator with its underlying cursor initialized to the root of the document. From there you can perform queries to obtain sets of other XPathNavigator objects that point to the results of the query, or you can move the current cursor through the document using the navigation methods of the class. You can also access a number of properties on a navigator to extract the data contained in the node to which it is currently pointing so you can perform processing on that data.

To perform a query with an XPathNavigator instance, you can call its Select method, passing in an XPath expression. What you get back is an XPathNodeIterator that allows you to step through the results. This is another lightweight object that allows you to obtain an XPathNavigator reference to each of the nodes that matched the query. Using these references, you can then either extract data from the nodes, or you can use the navigator to perform subsequent queries or navigation that will be done relative to the matching nodes.

 

Query for Music

Let's look at an example. First we need some XML to work against. Say you have some XML that contains information about music. If you had a schema as shown in Figure 1, you would have a Music root element, Artist elements under that, Album elements under Artist, and Track elements under Album. Each of those elements has certain attributes, as shown in Figure 1, that you might be interested in extracting for processing. The resulting XML looks like Figure 2.


Figure 1: The Music XML data schema.



  

    

      Going Under

      Bring Me To Life

      Everybody's Fool

    

  

 

Figure 2: A Music XML file.

Given that schema, let's say we first wanted to use an XPathNavigator to query for all the Album elements within a document. The code for doing so would look like that shown in Figure 3.

public void ProcessAlbums()

{

  // Load a document.

  XPathDocument doc = new XPathDocument("Music.xml");

  // Get a navigator initialized to the root.

  XPathNavigator nav = doc.CreateNavigator();

  // Perform a query.

  XPathNodeIterator iter = nav.Select("//Album");

  // Iterate through the results.

   while (iter.MoveNext())

  {

    XPathNavigator navCurrent = iter.Current;

    ProcessAlbum(navCurrent);

  }

}

Figure 3: Querying the XML document for Album nodes.

In the code in Figure 3, I first load the XML into an instance of XPathDocument. The XPathDocument class is the best to use in .NET if you don't need to modify the contents of the document while processing it. I obtain an XPathNavigator from the document by calling CreateNavigator. Using that navigator, I execute a simple XPath query for all descendant elements named Album (using the XPath shorthand operator // for the descendant:: axis). That query returns an XPathNodeIterator that can be used to iterate through the results.

To use the iterator, you call MoveNext, which returns true if there were any more nodes to process in the iterator. If so, then the Current property on the iterator will return a reference to an XPathNavigator positioned on the current node represented by the iterator. I take that navigator reference and pass it off to another method to process the results (which you can see in Figure 4).

public void ProcessAlbum(XPathNavigator navAlbum)

{

  // Clone navigator to move off axis.

  XPathNavigator navArtist = navAlbum.Clone();

  // Move to the parent (Artist) node.

  navArtist.MoveToParent();

  // Move to its name attribute.

  navArtist.MoveToFirstAttribute();

  // Output the artist name.

  Console.WriteLine(navArtist.Value);

  // Move to the album name attribute.

  navAlbum.MoveToFirstAttribute();

  Console.WriteLine("\t" + navAlbum.Value);

  // Move back up to the parent element.

  navAlbum.MoveToParent();

  // Move down to first track element and output its text.

  navAlbum.MoveToFirstChild();

  Console.WriteLine("\t\t" + navAlbum.Value);

  // Loop through the rest of the track elements.

  while (navAlbum.MoveToNext())

  {

    Console.WriteLine("\t\t" + navAlbum.Value);

  }

}

Figure 4: Navigating results with the XPathNavigator.

In the ProcessAlbum method, I switch from using a navigator as a query tool to using it to navigate a known schema of nodes. The code embeds the knowledge of the schema in the form of some explicit navigation steps from node to node using the navigator that was passed into the method representing an Album.

The first thing the code in Figure 4 does is to clone the navigator. If you are going to move "off axis" to move up to a parent or down into a collection of child nodes, and you want to resume processing where you started, you'll need to clone the navigator before you start calling navigation methods. Remember that the navigator maintains a single reference (or cursor) into the nodes saying what the current context node is as far as it's concerned.

As soon as you call a MoveXXX method, that cursor has changed, and you'll have no easy way to get the context back to where you started - short of reversing all the navigation steps you have taken. So if you clone a navigator, you can hold onto either the original or cloned navigator and use the other to move away from the current node. When you're done with that processing path, you can simply resume using the cloned navigator that's still where it was when you cloned it, and throw away the other navigator.

Once the code in Figure 4 has a cloned copy of the Album node navigator, it uses the cloned copy to move up to the parent node, which, based on the schema, should be an Artist node with a name attribute. So it uses a couple of MoveXXX methods to move to that attribute, and then simply spits out to the console the name of the Artist for the album.

After that, it resumes using the original Album navigator and moves down to its first attribute, which should be the Album name. After spitting that out to the console, the code backs the navigator up to the parent, which is the original Album element when you have moved to an attribute. That's one thing to get used to when moving to attributes. They are not treated as child nodes of an element, but the element itself is treated as a parent to the attribute node. Once the cursor is back on the Album element, the code moves it down to the first child element, which should be a Track element based on the schema.

From there it extracts the Value property of the current node, which is simply the contained text node when the element contains text like the Track element. After processing the first child, it processes the remaining Tracks by calling MoveNext on the navigator, which will keep moving the cursor to the next sibling node until there are no more, at which point it will return false and exit the loop.

The code is very fast when you use the MoveXXX methods to step through the nodes in the schema. So I could've used the Select method repeatedly to get to each node of interest, issuing a different XPath expression to ensure I got back the desired results. Performing a query, however, is much less efficient than simply bumping the node reference using a Move method.

 

Pre-compile for Speed

There are many other things you can do with XPathNavigator to process the contents of an XML document. The first to be aware of is that if you're going to perform the same query a number of times, perhaps on a collection of documents, then the query will execute significantly faster if you pre-compile the expression.

You do this by calling the Compile method on the navigator, passing in an XPath expression as a string and getting back an instance of an XPathExpression object. You can pass that XPathExpression object to the Select method, and the execution of the Select method will be much quicker than if you passed in the XPath as a string every time. Figure 5 shows a variation on the ProcessAlbums method that uses this approach.

public void ProcessAlbumsCompiled()

{

  // Load a document.

  XPathDocument doc = new XPathDocument("MusicBase.xml");

  // Get a navigator initialized to the root.

  XPathNavigator nav = doc.CreateNavigator();

  // Compile the query first.

  XPathExpression exp = nav.Compile("//Album");

  // Perform a query using the compiled expression.

  XPathNodeIterator iter = nav.Select(exp);

  // Iterate through the results.

  while (iter.MoveNext())

  {

    XPathNavigator navCurrent = iter.Current;

    ProcessAlbum(navCurrent);

  }

}

Figure 5: Executing a compiled expression.

The last thing to mention about XPathNavigator is that if you're evaluating an XPathExpression that will result in a value instead of a set of nodes, you can use the Evaluate method instead of Select. Evaluate will return a value corresponding to the value that results from the evaluation of the XPath expression. Remember from last time that I said that XPath expressions can result in a numeric, string, or Boolean value. The Evaluate method simply returns an object reference, so you'll have to cast the result to the appropriate type. For numeric values, the return result comes into the .NET code as a double, so you'll have to cast appropriately there (see Figure 6).

int GetAlbumCount()

{

  // Load a document.

  XPathDocument doc = new XPathDocument("MusicBase.xml");

  // Get a navigator initialized to the root.

  XPathNavigator nav = doc.CreateNavigator();

  // Compute the count of Album elements.

  double d = (double)nav.Evaluate("count(//Album)");

  return (int)d;

}

Figure 6: Returning a value from an XPath expression with Evaluate.

That's a quick tour of using the XPath processing engine with the XPathNavigator class to query and navigate a document. This should be your preferred mode of dealing with XML (over using SelectNodes in the XmlNode class) because it's portable across all the XML document types in .NET and will be the way of the future when XPathDocument in .NET 2.0 introduces change tracking. I'll write more on that topic when we get a little closer to the .NET 2.0 release.

The files referenced in this article are available for download.

 

Brian Noyes is a software architect with IDesign, Inc. (http://www.idesign.net), a .NET-focused architecture and design consulting firm. Brian is a Microsoft MVP in ASP.NET who specializes in designing and building data-driven distributed Windows and Web applications. Brian writes for a variety of publications and is working on a book for Addison-Wesley on building Windows Forms Data Applications with .NET 2.0. Contact him at mailto:[email protected]

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish