Overcoming OpenXML Hangups


How to prepare for OpenXML in a real-world application

If you're writing an application that uses XML as a data-interchange format, you need a way to extract the data from the XML document and store that data in your database. T-SQL's OpenXML keyword is a convenient way to insert, delete, and update data in SQL Server 2000. OpenXML creates one or more relational views (or rowsets) of the XML document within a T-SQL stored procedure. You can access the data in those views to perform relational operations on your database. Although OpenXML is fairly easy to work with, you need to overcome a few hurdles before you can use it in real-world applications. I've discussed the largest stumbling block—the difficulty in passing an XML document to a stored procedure—in my Exploring XML columns in July 2001, InstantDoc ID 21077, and August 2001, InstantDoc ID 21259. Now let's look at two other common OpenXML obstacles you might encounter.

Declaring Namespace Prefixes

Real-world XML documents use namespace-qualified elements to ensure that the application that processes the data interprets the document correctly. If your application uses OpenXML to extract data from an XML document, at some point you'll likely need to specify namespace-qualified elements as XPath queries for the row and column patterns that OpenXML uses to construct the relational view. Row patterns use an XPath query to select elements from the XML document that correspond to rows in the relational view. Column patterns use XPath queries to select columns for each of the rows. Specifying namespace-qualified elements or attributes might be a challenge, especially if you don't know which prefixes will be used in the XML. A namespace prefix provides an abbreviated way to specify that an element or attribute belongs to a particular namespace defined by a Uniform Resource Identifier (URI). You can't depend on a prefix being associated with a particular namespace because the namespace specification lets you choose prefixes arbitrarily, even letting different prefixes correspond to the same namespace. (For more information about namespace specification, see the Namespaces in XML specification at http://www.w3.org/TR/1999/REC-xml-names-19990114/.) However, you can overcome the problem of undeclared namespace prefixes by using a system stored procedure.

Sp_xml_preparedocument is a system-supplied stored procedure that T-SQL uses to parse an XML document into an in-memory representation and return a numeric handle to the in-memory document. OpenXML uses this handle to obtain the in-memory representation of the XML document and build a rowset incorporating the data the document contains. If you've used sp_xml_preparedocument before, you're familiar with the first parameter (the handle to the in-memory representation of the XML document) and the second parameter (the XML document that is parsed into memory). But you might be surprised to know that you can pass a third, optional parameter to sp_xml_preparedocument. This parameter contains an XML document that declares prefixes for namespaces that you can use within the OpenXML syntax. You simply write a root element for the document, then include namespace declarations on it. For example, the <root xmlns:p="urn:myPerson" /> XML document declares the prefix p for the namespace associated with the "urn:myPerson" URI. Then, you can use the prefixes from these namespace declarations in your XPath queries to build a rowset that contains data from the namespace-qualified attributes or elements. Let's look at a practical example.

Listing 1, page 42, shows a simple XML document that contains information about a book, including the title of the book and the title of the book's author. These specifications create ambiguity because you're using the <title> tag to specify both the book and author titles. The two <title> tags would be indistinguishable except that each belongs to a different namespace. Suppose I want to use OpenXML to extract the book's title and the author's title, first name, and last name. Listing 2 shows the T-SQL code for the stored procedure containing an OpenXML statement that extracts the book and author data. Callout A shows the call that executes sp_xml_preparedocument. Notice the last parameter. This XML document declares the namespaces that will be used in the OpenXML statement. Callout B shows the OpenXML statement. Notice the use of the prefixes declared at callout A. OpenXML properly matches the namespaces associated with the prefixes.

Listing 3 shows the T-SQL code that executes Listing 2's stored procedure. Notice that the document passed to the stored procedure uses different prefixes (bk and pers) than those in the third parameter to sp_xml_preparedocument. Namespace prefixes are just placeholders for the URIs that define the namespaces (in this case, myBooks and myPerson). When the OpenXML statement executes, it matches the namespace URI associated with the prefix rather than the prefix itself. Therefore, the stored procedure correctly selects the book and author titles. Figure 1 shows the results of executing the stored procedure.

Storing Unmapped Data

Passing the third parameter to the sp_xml_preparedocument stored procedure is a painless fix for using namespaces with OpenXML. You'll find that the solution for storing unmapped data is also a breeze. XML by nature and by name is extensible. This extensibility makes applications resilient to certain changes in the data format by letting additional elements and attributes appear within the XML without breaking the applications that consume the data. As you saw earlier, OpenXML lets you extract—or map—data from the elements and attributes in the XML document to the rows and columns of the rowset that OpenXML generates. However, some elements and attributes might not map to your database. What if you want to store this data in case you need to return it in a query later? OpenXML can capture, return, and store this unmapped data—or the complete XML representation of the data (which in this column's example is one book)—in a column of the resulting rowset. To retrieve the unmapped data, you specify the @mp:xmltext metaproperty as the XPath query for the column pattern of a column in the rowset that OpenXML produces. OpenXML automatically inserts the proper XML data into the column. The default mode is to store all the data for the XML fragment that OpenXML's row pattern specifies. Alternatively, you can specify that OpenXML will store only the unmapped data. By setting the third bit of the flag parameter to OpenXML (flag is a bitmap—a set of bits that controls OpenXML's execution behavior), you can specify that OpenXML will include only unmapped data in the column that uses the @mp:xmltext metaproperty. To see how OpenXML stores the unmapped data, let's return to the book example.

You used OpenXML in Listing 2's stored procedure to build a relational view containing the book's title and the author's title and name. Now suppose you also want to store other data associated with the book so that when someone queries your database for information about the book, you can return not only the title and author information but also the genre, description, or any other attributes or elements included in the XML. To accomplish this task, you simply include another column in the rowset that OpenXML generates, designating @mp:xmltext as the column pattern. Listing 4 shows the OpenXML statement, which returns a rowset that contains the XML representation of each book in the Alldata column. This column contains the book element—including all of its attributes and descendants. Figure 2 shows the statement's results, which you obtain by executing the T-SQL stored procedure in Listing 5. (This code is identical to the T-SQL code in Listing 3 except that it calls ViewUsingNamespaceXMLText instead of ViewUsingNamespace.) By using the @mp:xmltext metaproperty, you can obtain all the data for each book without specifically knowing the name—and namespace—of each individual attribute or element that the XML document contains.

I encourage you to experiment on your own with OpenXML's features, including the two features described here. When you receive an XML document that contains data you need to store in one of your databases, OpenXML provides the most flexibility for extracting that data. Add to this flexibility OpenXML's ability to use T-SQL to write business logic that validates the data and executes INSERT and UPDATE statements, and you've acquired a valuable tool in your quest to write Internet-enabled applications.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.