Exploring XML - 22 Aug 2001


Creating a Document That Has Columns from Different Hierarchy Levels

Editor's Note: Send your XML questions to Rich Rollman at [email protected]

Using a SQL query, can I create an XML document that has columns from different XML hierarchy levels in the same table? Then, when I have that XML document, can I place data from one column into an element's content even though the other columns are attributes of that element?

When querying a relational database, the underlying table structure rarely matches the query's exact shape. To obtain the desired result, you write SQL queries that perform JOINs and restructure the result the way you desire. This statement is even truer for XML. Because you can structure XML documents with arbitrary hierarchies, you might need to represent data stored in one database table at different hierarchy levels in an XML result. Many ways of obtaining the desired result are possible. Let's focus on using FOR XML queries. You can create this document in two ways—one simple and one more complex (albeit more flexible)—by using SQL Server 2000's FOR XML clause. The simple solution uses a FOR XML AUTO query with one or more SELF-JOINs to produce an XML result that contains columns from different nested elements of the same table. The more complex solution uses a FOR XML EXPLICIT query, which is harder to write but provides more flexibility in the structure of the XML result that the query produces. How you want the XML result structured determines which solution you choose.

First, let's explore the simple solution. When processing a FOR XML AUTO query, the XML post-processor built into SQL Server 2000's query engine uses either the names of the tables and columns in the query or arbitrary names that use table or column aliases to name the XML elements and attributes. The XML query's structure is based on the order in which the columns occur in the SELECT clause. If the SELECT clause contains a column from a different table, the output includes a new, separate subelement. Listing 1 shows the approach with a query that uses the Pubs database. For the top five publications, the query outputs the author's name, the publication's title, and its price, as Figure 1 shows.

You can also use a FOR XML AUTO query to produce an Author element with a nested Address element that contains the author's address information, as the following sample result shows:

<Author FirstName="Rich" LastName="Rollman">
    <Address Street="One Main Street" City="Sammamish"
State="WA" Zip="98075"/>

The query in Listing 2, page 62, a modification of the query from Listing 1, uses a SELF-JOIN to retrieve the author's address fields. You use table aliases to name two instances of the Authors table: one for the Author and one for the Address. The XML post-processor uses the alias names and treats the two aliases as separate tables. The result is two separate elements: Author and Address. The column order in the SELECT clause determines the element nesting order in the result. Because a column from the Author alias appears first in the SELECT clause, Author becomes the top-level element with Address elements nested within the Author elements, as Figure 2 shows.

Because a FOR XML AUTO query is easy to write and maintain, it should always be your first choice for generating an XML result from SQL Server. Unfortunately, a FOR XML AUTO query can't produce a result in which the values of some columns are attributes and the values of other columns are subelements. For example, consider the following XML fragment, which modifies the result in Figure 1:

<Author FirstName="Rich" LastName="Rollman">
  <Title Price="0.0000">
      XML Q&A

In this fragment, the publication title, "XML Q&A," is the Title element's content, but Price is an attribute. To produce this kind of XML result, you need a FOR XML EXPLICIT query.

Now, let's examine the more complex solution to creating an XML document that contains columns from different XML hierarchy levels in the same table: the FOR XML EXPLICIT query. Listing 3 contains the FOR XML EXPLICIT query that returns the XML result that Figure 3 shows. The result in Figure 3 is almost identical to the result in Figure 1, with two exceptions. First, Listing 3 returns all the rows, instead of just the first five, because you can't use the TOP clause in a FOR XML EXPLICIT query that has a UNION operator. Second, the FOR XML EXPLICIT query selects all authors, not just the authors who have titles.

The query in Listing 3 consists of two SELECT clauses: The first retrieves the author data, and the second retrieves the titles for each author. These two clauses gather all the data you need to generate the result. The ORDER BY clause sorts the result to position all rows for one author together and ensures that the rows containing titles for an author follow that author's name row. This structure enables the XML post-processor to nest Title tags within the correct Author tags.

Special structured column names specify the names of the XML elements and attributes in the XML result. Each column name consists of four fields and uses an exclamation point as the field separator. For example, \[Author!1!FirstName\] from the query in Listing 3 shows the first three fields and an implied fourth field. The first field specifies the containing element's name (e.g., Author); this XML element contains the selected column's values. The second field specifies the Tag number for the element (e.g., 1); the Tag number refers to the parent element in the hierarchy and corresponds to the value in the Parent column. The third field is the name of the attribute or contained element that holds this column's value in the SELECT clause (e.g., FirstName). The fourth field is a directive to the XML post-processor about how to represent the XML output value; directives include representing a value as an attribute, a subelement, or a CDATA section. (For more information about directives, see SQL Server 2000 Books Online—BOL.) By combining the values of the third and fourth fields, you can produce the result in Figure 4.

The code in Listing 4 modifies the FOR XML EXPLICIT query from Listing 3 to output the book's title as the Title element's content. In the query in Listing 4, I changed the title column name from \[Title!2!Name\] to \[Title!2!!element\]. I set the third field to NULL to show that the output contains no attribute or element name. The directive in the fourth field tells the XML post-processor to represent the value as a Title subelement. However, the value appears as the content of the Title element because the third field doesn't supply a name. This structure produces the desired result, as Figure 3 shows.

You can use the techniques in this answer to build complex data sets. For example, the query in Listing 5 combines the functionality of the queries in Listings 3 and 4 to produce an XML result in which the Author element contains an Address subelement and zero or more Title subelements, one for each title by the author, as Figure 4 shows. Note that unlike a FOR XML AUTO query, a FOR XML EXPLICIT query doesn't require a SELF-JOIN to produce subelements from the same table because you can use multiple SELECT clauses with the UNION operation to obtain a similar result.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.