Skip navigation

XPath—Retrieving Nodes from an XML Document

XPath is a Recommendation from the World Wide Web Consortium (W3C) that defines a path language and expression syntax for Extensible Style Language Transformations (XSLT), XPointer, and XLink. XPath syntax operates on the abstract logical structure of an XML document rather than on its physical surface syntax. XPath gets its name from its use of a path notation, as in URLs, for navigating through the hierarchical structure of an XML document. In addition, XPath is designed so that it has a natural subset that you can use to test whether a node matches a pattern; XSLT describes this XPath use.

XPath operates on an XML document as a tree of nodes. You can use an XPath processor node tree to provide a document hierarchy represented as an inverted tree, with the root node at the top and the branches and trunk below.

One important XPath expression is a location path, which selects a set of nodes relative to the context node. The result of evaluating a location path expression is the node-set that contains the nodes the location path selects. You can express every location path using a straightforward, but rather verbose, syntax:

  • child::para—selects the para element children of the context node
  • child::*—selects all element children of the context node
  • child::text()—selects all text node children of the context node
  • child::node()—selects all the children of the context node, whatever their node type
  • attribute::name—selects the name attribute of the context node
  • attribute::*—selects all attributes of the context node
  • descendant::para—selects the para element descendants of the context node
  • ancestor::div—selects all div ancestors of the context node

You also have a number of syntactic abbreviations that let you express common cases concisely:

  • para—selects the para element children of the context node
  • *—selects all element children of the context node
  • text()—selects all text node children of the context node
  • @name—selects the name attribute of the context node
  • @*—selects all attributes of the context node
  • para\[1\]—selects the first para child of the context node
  • para\[last()\]—selects the last para child of the context node

There are relative location paths and absolute location paths. A relative location path consists of a sequence of one or more location steps separated by a forward slash (/). An absolute location path consists of / followed, optionally, by a relative location path. A / by itself selects the root node of the document containing the context node.

XPath includes the following operators:

  • Boolean (and, or)
  • relational (, =)
  • equality (=,!=)
  • arithmetic (+, -, *, div, mod)

XPath includes the following functions:

  • node set
  • string
  • Boolean
  • number

To see how to execute XPath queries using HTTP, let's look at an example from SQL Server 2000 Books On Line. Consider the following annotated XML Data Reduced (XDR) schema (stored as MySchema.xml in the directory associated with the virtual name of schema type):

<?xml version="1.0" ?>
<Schema xmlns="urn:schemas-microsoft-com:xml-data"
        xmlns:dt="urn:schemas-microsoft-com:datatypes"
        xmlns:sql="urn:schemas-microsoft-com:xml-sql">

  <ElementType name="Customer" sql:relation="Customers" >
    <AttributeType name="CustomerID" />
    <AttributeType name="ContactName" />
    <AttributeType name="Phone" />

    <attribute type="CustomerID" />
    <attribute type="ContactName" />
    <attribute type="Phone" />
  </ElementType>
</Schema>

The URL http://IISServer/nwind/schema/Schema2.xml/Customer\[@CustomerID="ALFKI"\] executes an XPath query against the XDR schema (MySchema.xml) the URL specifies. (In the URL, "nwind" is a virtual directory created using the Microsoft IIS Virtual Directory Management for SQL Server utility and "schema" is the virtual name of the schema type you define when you create the virtual directory.) The XPath query requests all the customers with CustomerID of ALFKI. Here's the result:

If the query might return more than one customer, you must specify the root keyword to return a well-formed XML document. The URL http://IISServer/nwind/schema/Schema2.xml/Customer?root=root specifies the root keyword, so the query returns all the customers. Below is the partial result:

<?xml version="1.0" encoding="utf-8" ?> 
<root> 
   <Customer CustomerID="ALFKI" ContactName="Maria Anders" 
             Phone="030-0074321" /> 
   <Customer CustomerID="ANATR" ContactName="Ana Trujillo" 
             Phone="(5) 555-4729" /> 
    ...
</root>

XPath is a powerful yet simple way to navigate and retrieve data from an XML document. XSLT uses XPath to retrieve the nodes, and SQL Server 2000 supports its use as part of its XML integration. To review the working draft of the XPath Recommendation, go to the W3C Web site.

TAGS: SQL
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish