"Open" XML Content

Store XML data even if you don't know the format


As XML applications mature and enter their second generation, application developers will leverage the built-in extensibility of XML more frequently, adding new data elements to existing XML formats that first-generation applications use. How well first-generation applications work with this additional data depends on whether the developer considered extensibility in the application's architecture. Luckily, SQL Server 2000 has built-in functionality to account for additional data, even when you don't know what or how much additional data you might need to receive or send from your application. Let's explore this functionality by way of an example.

Say you've implemented an application that receives product data—name and unit price—from one of your suppliers and stores it in your SQL Server database. When a customer places an order, your order-entry application uses the product data to send a message to your fulfillment vendor specifying the individual items in the order. You've implemented the application, and it's running flawlessly. But now your supplier wants to send you data that specifies the shipping weight and package sizes for each product. Your application won't use this data but must forward it to your fulfillment vendor for order processing. Will you need to rewrite your product data import application and your order-processing application to account for this additional data? Fortunately, the answer is no—if you've designed and implemented your application to account for the presence of additional data in the XML you receive and send.

Additional data that's included within an XML document is called open content. Open content encompasses any attribute or element in the XML document that either isn't specified in an XML Schema Definition (XSD) describing the document or, when no schema is used, is unexpected in its context. For example, in the scenario I described above, you and your supplier might have agreed on a schema for products that includes name and unit price, but the supplier sends shipping weight and package size as well. Although your application can't act directly on open content, you can store the open content for later use alongside the data your application uses directly. In this case, you could pass the shipping weight and package sizes to your fulfillment vendor without ever knowing the supplier had included new data.

SQL Server 2000's XML technology provides functionality that stores open content in your database and includes open content in XML query results. You can use OpenXML, Updategrams, and XML Bulk Load to store open content in the column of a table in your database. You can then include this open content in a result produced by executing a FOR XML EXPLICIT query or an XPath query against an XML View.

Open Content and OpenXML

OpenXML lets you use T-SQL code to extract data from an XML document. You can think of OpenXML as providing a relational view or rowset (similar to the OPENROWSET T-SQL function) of the XML document. Using OpenXML, you can query the data in the XML document as if it were in a table in your database.

Using OpenXML's @mp:xmltext meta-property, you can store open content in a column in your database. Meta-properties are special column selectors you use with OpenXML's WITH clause that provide access to nonphysical parts of an XML document. These parts include the open content; the node ID, which uniquely identifies each element and attribute within the document (see my February column, "OpenXML's @mp:id Meta-Property," InstantDoc ID 27473); and the previous sibling node ID. The @mp:xmltext meta-property automatically extracts open content from an XML document so you can store it in your database. Let's look at how to use OpenXML's @mp:xmltext meta-property.

Suppose that when you build your product data import application, you want to use OpenXML to store the product data you receive in the Products table that Listing 1 shows. Let's say you also want to store any open content you receive with the product data in a column named Overflow in the same table. First, you construct the OpenXML statement to extract the product data from the XML. Callout A in Listing 2 shows the product data. (I included the XML in the T-SQL code to simplify the example. In a real application, the XML would be passed as a parameter.) Callout B shows the INSERT statement that uses OpenXML to extract the product name and unit price from the XML document.

In addition to these two columns, the WITH clause also includes a column named Overflow that uses the @mp:xmltext meta-property to store the open content in the Overflow column. Exactly what is stored in the Overflow column depends on the third OpenXML parameter—a bit flag to modify how OpenXML works. The fourth bit in the bit flag alters what data the @mp:xmltext parameter returns. When you set the flag to 0 or omit the parameter, OpenXML returns the element the row selector specified, including all its attributes and descendants. If the flag is set to 1, as in Listing 2, then OpenXML returns only the attributes and descendants from the element the row selector specified—the ones that weren't otherwise selected in the WITH clause (the unconsumed data).

Figure 1 shows a sample row from the Products table containing data that Listing 2's INSERT statement produced. Notice how OpenXML formats the open content in the Overflow column (obtained from the @mp:xmltext meta-property). The content consists of a Product element that ensures that the column content is well-formed XML. Well-formed XML documents obey a set of rules spelled out in the XML specification. You can think of them as documents in which all elements are closed and attribute names are unique. The name of this element isn't significant; SQL Server chose the name Product because that's the name of the element the row selector specified. The Product element contains any attributes or descendants that weren't referenced in the WITH clause—here, the weight attribute and the QuantityPerUnit child element. The formatting is straightforward, so you could generate similar data without OpenXML. But OpenXML provides a convenient and powerful way to process XML by using T-SQL code.

Open Content Other Ways

Updategrams and XML Bulk Load offer functionality similar to OpenXML's but use an XML View to define the mapping between the XML and database tables. XML Views define how the elements and attributes of an XML document correspond to the rows and columns in your database. To construct XML Views, you use XSD schemas to describe an XML document's structure and content, then add annotations to define the mapping to your database. An annotated XSD schema is called a mapping schema. One annotation, overflow-field, lets you specify the name of the column to store open content in. You include the overflow-field annotation on an element in the mapping schema to indicate that the element contains open content to be stored in your database. Using overflow-field is similar in functionality to specifying the row selector of OpenXML. Both Updategrams and XML Bulk Load store the open content in a format identical to that of OpenXML, but neither supports the option to store all the attributes and descendants. They can store only the unconsumed data—that is, data that's not specifically mapped to the database by other annotations in the mapping schema.

As you revise existing XML applications or develop new ones, build those applications with extensibility in mind. By accounting for open content, you can ensure that your application continues to work properly even when it receives unexpected data.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.