Have you ever built an XML View in SQL Server 2000, then executed a query and received an XML result that seemed to contain all the right data, but some elements had no children and others had too many? I run into this problem every now and then when I forget to specify primary keys for the tables I use in the XML View. Fortunately, this problem is easy to fix once you recognize what's happening. This month, I describe a common scenario in which this ordering problem occurs, explain why it happens, and show you how to fix the problem. But first, let's review how XML Views work.
Inside XML Views
An XML View provides a way to obtain XML query results from SQL Server. You define an XML View by annotating an XML Schema Definition (XSD) schema. An XSD schema defines the structure and content of an XML document, much like a database schema defines the tables, columns, and data types in your database. The annotations you add to the XSD schema describe the mapping between the schema-defined elements and attributes and the tables and columns in your database. An XML View lets you treat your database as a virtual XML document and query this virtual XML document through the XPath query language.
Although you obtain XML query results from SQL Server, SQL Server doesn't directly process the annotated schema or XPath query. This processing is the job of the SQLXML OLE DB provider. When you execute an XPath query against an XML View, the SQLXML OLE DB provider processes the annotated schema and the XPath query into a FOR XML EXPLICIT query (for a full description of what happens when you execute an XPath query against an XML View, see SQL Server 2000 Books Online—BOL). Then, the SQLXML OLE DB provider sends the FOR XML EXPLICIT query to SQL Server, which runs the query and returns an XML result that matches the annotated schema. SQL Server executes a FOR XML EXPLICIT query just as it does any other SQL query, except for one additional step. After the query processor returns the relational result, or rowset, from the query, SQL Server's XML post-processor turns that rowset into XML. The post-processor uses the special column names from the FOR XML EXPLICIT query and the order of the rowset to correctly structure the XML result.
I've gone into detail about how the SQLXML OLE DB provider processes an XPath query on an XML View because the FOR XML EXPLICIT query is the root of the ordering problem. The XML post-processor depends on the rows in the rowset occurring in the same order as the data contained in the rows will appear in the XML document. Let's look at an example that illustrates this point.
XML Views and EXPLICIT Queries
FOR XML EXPLICIT queries require that the rows of the rowset that the query processor returns be ordered from the top down, from parent to child, as they will appear in the XML result. Ordering the rows this way lets the XML post-processor stream the query result to the client. Figuring out the correct ORDER BY clause to match this rule can be tricky, and sometimes the SQLXML OLE DB provider needs some additional information to formulate the correct one. The following example shows you how to specify the additional information.
Figure 1, page 19, shows a simple annotated schema that defines an XML View on the Pubs sample database. The annotated schema describes the XML document that Figure 2 shows. This document contains a list of employees and a publisher subelement for each employee. When you use Figure 1's schema to execute the XPath query /Employee, the SQLXML OLE DB provider produces the FOR XML EXPLICIT query that Listing 1 shows. Executing this FOR XML EXPLICIT query produces the incorrect XML result that Figure 3 shows. (I truncated the XML results in Figures 2 and 3 to save space.) You can clearly see the problem by looking at the last Employee element in Figure 3, where id= "GHT50241M". This Employee element incorrectly contains multiple Publisher elements, while the preceding Employee elements are all missing their Publisher elements. This malformed XML result happens because the ORDER BY clause in Listing 1's query specifies only the pub_id column (which corresponds to the ordinal 4) and the parent and TAG columns (ordinals 2 and 1, respectively). If you remove the FOR XML EXPLICIT clause from the query and execute the query, you get the rowset that Figure 4 shows. In the SQL Server documentation, Microsoft refers to this rowset as the universal table.
As you can see, the row order in Figure 1 breaks the top-down ordering rule I mentioned: The rows corresponding to the Publisher element (TAG column value 2) don't immediately follow the rows corresponding to the Employee element (TAG column value 1). The rowset is ordered incorrectly because, to relate the Employee and Publisher tables, the XML View uses the pub_id column (as in the sql:relationship annotation from Figure 1), which isn't a primary key for the Employee table. To fix the problem, you need to find a way to add to the query the Employee table's primary key, emp_id, and sort by it.
Using sql:key-fields for Proper Sorting
You can easily correct the ordering problem by adding a sql:key-fields annotation to your mapping schema. As the name implies, sql:key-fields specifies a space-separated list of columns that comprise the primary key for the mapped table. The SQLXML OLE DB provider uses the information from this annotation to add columns to the query result and ORDER BY clause to properly sort the universal table and produce the correct XML result.
To correct Figure 1's schema, add the sql:key-fields annotation with the value emp_id to the xsd:element for Employee (shown in red). With the addition of sql:key-fields, the SQLXML OLE DB provider generates for the XPath query /Employee the FOR XML EXPLICIT query that Listing 2 shows. Listing 2's query now includes the emp_id field from the Employee table, and the SQLXML OLE DB provider has expanded the ORDER BY clause to sort by emp_id and pub_id as well as the parent and TAG columns. The emp_id and pub_id columns appear in the ORDER BY clause twice. Emp_id appears the first time because you specified sql:key-fields and again because its value is contained in the id attribute in the Employee element. Pub_id appears twice because it contains the key value that relates the Employee and Publisher tables.
If you remove the FOR XML EXPLICIT clause from Listing 2's query, you get the universal table that Figure 5 shows. Note that the corresponding row from the Publisher table (TAG value 2) correctly follows each row from the Employee table (TAG value 1). With the rows ordered properly, the XML post-processor produces the correct XML result that Figure 2 shows.
Make sql:key-fields Your Best Practice
Now you've seen how adding sql:key-fields to your annotated schema affects the sort order and which columns you include in the query. To avoid similar sorting problems in the future, get in the habit of including sql:key-fields in your annotated schemas whenever you specify a sql:relation. If you adopt this best practice, you'll ensure that you always get the XML result you want.