Traveling the XPath

XtremeData

LANGUAGES: C#

ASP.NET VERSIONS: ALL

Traveling the XPath

Finding and Filtering XML Data with XPath

By Dan Wahlin

XML (eXtensible Markup Language) has grown from a limited-use data storage format to one that's increasingly used in a variety of applications on a variety of development platforms. XML provides a great deal of flexibility and can be used in many different ways, such as: data exchange, Web services, configuration, content management, and Web integration. Whether XML is used to tie distributed systems together or to generate graphics based on Scalable Vector Graphics (SVG) technology, the data must often be queried, filtered, or sorted.

In Find and Filter Relational Data I presented different ways that relational data can be searched, filtered, and sorted using ADO.NET classes in the .NET Framework. This article will focus on performing the same types of operations on XML data using XML-specific .NET classes. Performing these types of operations typically involves a language called XPath, so the next section provides a quick introduction to XPath language fundamentals.

XPath Fundamentals

XPath is a language that can be used to search XML document hierarchies. Several .NET classes provide support for XPath, including XmlDocument, XPathExpression, and XPathNavigator (to name a few). To use the XPath language with one of these classes, you must create one or more XPath statements. Fortunately, XPath statements look somewhat similar to DOS path statements, and are fairly easy to learn with a little study and practice.

An XPath statement is comprised of one or more location steps that identify how to locate a node or set of nodes in an XML document. Each step is separated by a forward slash character, "/", and can be comprised of three main parts, referred to as the axis, node-test, and predicate:

axis::node-test[predicate]

The axis determines the direction of the search in the XML document. For example, will the search look through all the children (the child axis) of a given node, look for an ancestor (the ancestor axis), look for a previous sibling (the previous-sibling axis), or look along another axis such as the attribute or namespace axes? The node-test identifies the name of the node to look for on a given axis. If a node-test succeeds because of a node being found, the next step in the XPath statement (if another step exists) will be executed. Finally, the predicate is surrounded by brackets, "["and "]", and allows filter expressions (similar to SQL language WHERE clauses) to be included to filter out undesirable nodes. The predicate is optional.

The following XPath statement shows how to search along the child and attribute axes of the XML document shown in Figure 1:

/child::Customers/child::Customer[attribute::id='ALFKI']/

It also shows how to filter out unwanted Customer nodes using a predicate.

Alfreds Futterkiste

Maria Anders

Du monde entier

Janine Labrune

Figure 1: An XML document containing customer data.

This statement may be intimidating if you're new to XPath. Fortunately, because the child axis is the default axis and the attribute axis can be abbreviated using the @ character, the previous XPath statement can be simplified to the following:

/Customers/Customer[@id='ALFKI']

This statement contains two location steps. It starts from the beginning of the XML document and moves to the child axis, looking along the way for a node named Customers. It then moves to the Customers child axis and selects a node named Customer that has an id attribute with a value of ALFKI.

The following XPath statement would return all ContactName nodes found in the XML document shown in Figure 1. Notice that no predicate is included in the statement, which results in two nodes being returned:

/Customers/Customer/ContactName

To grab the ContactName for the Customer node with an id attribute equal to DUMON, the following XPath statement can be used:

/Customers/Customer[@id='DUMON']/ContactName

This statement moves to the child axis and finds the Customers node. It then moves to the child Customer node where the id attribute equals DUMON. If the Customer node exists, it moves to the child node named ContactName.

Although there is much more to the XPath language than can be covered in this abbreviated introduction, you've now seen the different parts of an XPath statement. The following sections will demonstrate how to execute XPath statements using .NET classes.

Finding and Filtering XML Data with XPath

There are several classes that can be used to locate data in an XML document using the XPath language. Two you'll use most frequently are XmlDocument and XPathNavigator. The XmlDocument class (located in the System.Xml namespace) can be used to read and edit data. XmlDocument works by loading an XML document into a memory-based structure referred to as the Document Object Model (DOM). Another class, named XPathNavigator (located in the System.Xml.XPath namespace) can also be used to execute XPath queries, although it cannot be used to edit data in version 1.1 of the .NET platform. Although XPathNavigator also works with an in-memory structure, the structure is optimized for executing XPath statements.

XmlDocument contains two XPath-aware methods named SelectSingleNode and SelectNodes. Figure 2 shows an example of using these methods to query the document shown in Figure 1.

XmlDocument doc = new XmlDocument();

doc.Load(Server.MapPath("../Xml/Customers.xml"));

// Locate specific ContactName node using XPath predicate.

XmlNode node = doc.SelectSingleNode("Customers/" +

"Customer[@id='ALFKI']/ContactName");

if (node != null) {

this.txtOutput.Text = "Found Customer ALFKI: " +

node.InnerText;

}

// Locate all ContactName nodes.

XmlNodeList nodes = doc.SelectNodes(

"Customers/" + "Customer/ContactName");

foreach (XmlNode node in nodes) {

this.txtOutput.Text += "Found Customer " +

node.ParentNode.Attributes["id"].Value +

": " + node.InnerText + "\r\n";

}

Figure 2: The SelectNodes and SelectSingleNode methods can be used to execute XPath queries against a DOM structure using the XmlDocument class.

This example first loads the XML data into the DOM by calling XmlDocument's Load method. It then uses the SelectSingleNode method along with an XPath expression to return a specific ContactName node. Once the ContactName is found, its child text node can be accessed (or edited) using the InnerText property.

The second part of the code uses the SelectNodes method to select all ContactName nodes within the XmlDocument. SelectNodes returns a collection of XmlNode objects (referred to as an XmlNodeList) that can easily be iterated through using a standard foreach loop.

In cases where XML data needs to be filtered with XPath, but not edited, the XPathNavigator class (located in the System.Xml.XPath namespace) is more efficient than the XmlDocument class. It provides an efficient memory store that is optimized for XPath statements. XPathNavigator is an abstract class that cannot be created directly using the new keyword. However, classes such as XmlDocument, XmlDataDocument, XmlNode, and XPathDocument have a CreateNavigator method that can be used to create an XPathNavigator instance.

Figure 3 shows an example of using XPathNavigator's Select method with XPath to select a single ContactName node from the XML document shown in Figure 1.

XPathDocument doc = new XPathDocument(

Server.MapPath("../Xml/Customers.xml"));

XPathNavigator nav = doc.CreateNavigator();

nav.MoveToRoot(); // Move to document.

XPathNodeIterator it = nav.Select(

"Customers/Customer[@id='ALFKI']/ContactName");

if (it.Count > 0) {

it.MoveNext();

this.txtOutput.Text = "Found Customer ALFKI: " +

it.Current.Value;

}

Figure 3: The XPathNavigator class is designed to work with XPath. This example shows how its Select method can be used to locate a specific node in an XML document. Once the node is found, it can be moved to by calling the XPathNodeIterator's MoveNext method.

Figure 3 starts by creating a new instance of an XPathDocument class. XPathDocument provides an efficient way to perform XSLT transformations, or create XPathNavigator objects. Once the XPathNavigator instance is created, its MoveToRoot method is called to move to the XML document. Next, the Select method is called to locate a single ContactName node. Select returns an XPathNodeIterator collection that can be iterated through using its MoveNext method. Because the code in Figure 3 only tries to retrieve one node from the XML document, the XPathNodeIterator's Count property is checked to see how many nodes are in the collection. If the count is greater than 0, the MoveNext method is called to move to the first node in the collection and access its value. Notice that XPathNodeIterator's Current property is called to access the node and its associated child text node value.

Figure 4 shows how multiple nodes can be selected using XPathNavigator's Select method. Once the nodes are retrieved, they're iterated through using XPathNodeIterator's MoveNext method.

XPathDocument doc = new XPathDocument(

Server.MapPath("../Xml/Customers.xml"));

XPathNavigator nav = doc.CreateNavigator();

nav.MoveToRoot(); // Move to document.

XPathNodeIterator it =

nav.Select("Customers/Customer/ContactName");

while (it.MoveNext()) {

it.Current.MoveToParent(); // Move up to parent node.

this.txtOutput.Text += "Found Customer " +

it.Current.GetAttribute("id",String.Empty);

// Move back to ContactName node.

it.Current.MoveToFirstChild();

this.txtOutput.Text += ": " + it.Current.Value + "\r\n";

}

Figure 4: This sample demonstrates how to use XPathNavigator's Select method to locate multiple nodes within an XML document. Once the nodes are located, they can be accessed using XPathNodeIterator's MoveNext method.

Finding and Filtering XML News Nodes

Now that you've been introduced to the different ways XML data can be found and filtered using XPath and different .NET classes, let's put this knowledge to work to perform a more useful (and fun) task. MoreOver.com provides XML news feeds on a variety of news topics, including world news, sports, technology, and even XML. The XML for ASP.NET Developers Web site (http://www.xmlforasp.net) serves XML and Web service articles found at the MoreOver.com Web site by tying into the following XML feed:

http://p.moreover.com/cgi-local/

page?c=XML%20and%20metadata%20news&o=xml

There are many ways to extract the XML data from the MoreOver.com feed and display it, including using the XmlTextReader, DataSet, and other classes. Figure 5 demonstrates how classes within the System.Net and System.Xml.XPath namespaces can be used to access the remote XML data and filter nodes based on specific keywords. These classes are encapsulated within an ASP.NET user control named NewsItems.ascx to facilitate code re-use and allow for caching of the news items.

Figure 5 shows a method named GetNewsXml in the user control. This method uses the WebRequest object (located in the System.Net namespace) to grab the XML news document. It uses the XPathNavigator class to filter the data and output news headlines to the browser.

private string GetNewsXml(string url, string filter) {

StringBuilder newsHTML = new StringBuilder();

WebRequest req = null;

WebResponse resp = null;

XmlTextReader reader = null;

string xpath = "//article[contains(headline_text,'" +

filter + "')]";

try {

req = WebRequest.Create(url);

// If you're behind a proxy server, uncomment the

// following code, and update the domain, user, and

// password.

// ----------------------------------------

// WebProxy proxyServer =

// new WebProxy("proxyServer.com",true);

// NetworkCredential cred =

// new NetworkCredential("user","pwd","domain");

// proxyServer.Credentials = cred;

// req.Proxy = proxyServer;

resp = req.GetResponse();

reader = new XmlTextReader(resp.GetResponseStream());

XPathDocument doc = new XPathDocument(reader);

XPathNavigator nav = doc.CreateNavigator();

// Select all article nodes that meet filter condition.

XPathNodeIterator it = nav.Select(xpath);

int count = it.Count;

int i = 0;

while (it.MoveNext()) {

// Access article url and headline_text child nodes.

XPathNodeIterator itURL =

it.Current.SelectChildren("url",String.Empty);

itURL.MoveNext(); // Move to selected node.

XPathNodeIterator itHeadline =

it.Current.SelectChildren(

"headline_text",String.Empty);

itHeadline.MoveNext(); // Move to selected node.

newsHTML.Append("\"");

newsHTML.Append(itURL.Current.Value);

newsHTML.Append("\",\"");

newsHTML.Append(itHeadline.Current.Value);

newsHTML.Append("\"");

if (i != count-1)

newsHTML.Append(",");

i++;

}

catch {

newsHTML = String.Empty;

}

return newsHTML;

}

Figure 5: The NewsItems.ascx user control accesses XML data from a remote URL and filters out unwanted nodes using XPath. The resulting nodes are converted into a string array that is sent down to the browser and manipulated using JavaScript.

Displaying XML news headlines that can be filtered and cached within an ASP.NET Web Form is as simple as adding the following user control syntax:

URL="http://p.moreover.com/cgi-local/

page?c=XML%20and%20metadata%20news&o=xml"

NewsFilter="XML" CacheName="XMLNewsCache"

CacheDuration="60" HeadlineDelay="8000" runat="server"

Figure 6 shows the output generated when the ASP.NET Web Form is run (Note: The news headlines are displayed dynamically using DHTML and JavaScript).

Figure 6: The XML news user control sends the appropriate headlines to the browser based on the filter text specified in the control syntax. The user control relies on XPath and the XPathNavigator class to do the majority of the work.

Conclusion

XML continues to become more and more prevalent because of its ability to mark up data in a flexible and platform-neutral manner. In this article you've been introduced to the XPath language, as well as several ways that XPath can be used to access nodes within an XML document using native .NET classes, such as XmlDocument and XPathNavigator. Learning the different techniques to find and filter XML data will allow you to create more flexible ASP.NET Web applications that can leverage data retrieved from a variety of sources.

The sample code in this article is available for download.

Dan Wahlin (Microsoft MVP for ASP.NET and XML Web services) is the president of Wahlin Consulting and founded the XML for ASP.NET Developers Web site (http://www.XMLforASP.NET), which focuses on using ADO.NET, XML, and Web services in Microsoft's .NET platform. He's also a corporate trainer and speaker, and teaches XML and .NET training courses around the US. Dan coauthored ASP.NET Insider Solutions (SAMS 2004), Professional Windows DNA (Wrox, 2000), ASP.NET: Tips, Tutorials and Code (SAMS, 2001), and authored XML for ASP.NET Developers (SAMS, 2001).

Comments

Plain text