Editor's Note: Send your XML questions to Rich Rollman at [email protected]
I understand that you can use XML technologies in several ways to query and update SQL Server 2000 databases. Can you give me some guidelines about using one technology versus another?
You can obtain XML query results from SQL Server in five ways and update SQL Server in four ways. Let's look at each method in turn.
The five ways of returning XML query results fall into two broad categories of query technology: the FOR XML extension to the T-SQL syntax and XPath queries using annotated schemas. Microsoft added the FOR XML clause to T-SQL so that you could generate XML output rather than a standard recordset from SQL Server. Until October 2001, three FOR XML modes existed: RAW, AUTO, and EXPLICIT. But with XML for SQL Server 2000 Web Release 2 (SQLXML 2.0), Microsoft introduced a fourth mode called NESTED. Each mode generates a different XML result. Let's look at each FOR XML mode and guidelines for its use followed by guidelines for using XPath and annotated schemas.
RAW mode. RAW mode generates one row element for each row of the result and includes the column data as the row element's attributes. RAW mode is appropriate for the simplest queries, which carry one requirement: The data has to be represented in string format within an XML document. And although RAW mode looks deceptively like the ADO persistence format, don't be fooled. Query results that you format by using RAW mode won't load into ADO. For information about the ADO persistence format, see XML Q&A, December 2000, InstantDoc 15867, and XML Q&A, January 2001, InstantDoc 16028.
AUTO mode. AUTO mode bases the XML it generates on the names of the tables, views, and columns and on the joins that the query uses. This mode is most appropriate when you can control the result's XML schema. The XML schema is the way you've structured the XML, including the hierarchical structure of the XML and the elements' and attributes' names. You can also use AUTO mode when you're trying to generate a particular XML schema from a query. By using table and column aliases, you can leverage AUTO mode's ability to rename elements and attributes, and you can control the hierarchy by carefully specifying the column order in your query. (For more information about using AUTO mode, see "Retrieving and Writing XML Data" in SQL Server 2000 Books OnlineBOL.) But if the XML schema requires sophisticated nesting or represents column data as a mix of elements and attributes, an EXPLICIT query or an XPath query is more appropriate. Finally, the result you obtain from AUTO mode queries depends on the database structure. If you change the structure later, you'll need to either modify the query in your application to account for the new table structure or add views to your database to mimic the previous table structure.
EXPLICIT mode. EXPLICIT mode is the most flexible method for generating XML query results, but it's also the most complicated to use. Consider using EXPLICIT mode only when you have an existing database, when the XML schema has been specified, and when AUTO mode won't produce the desired result. (For example, when you need to create XML that represents columns from two different tables in your database as attributes on the same element, AUTO mode might not return the result you expect.) However, if the XML result you're trying to construct is complex, building an EXPLICIT-mode query can be a daunting task, even for T-SQL gurus. Alternatively, you might consider using the XPath query and annotated schema support. But if you enjoy a challenge, you can use EXPLICIT mode to produce almost any XML document.
NESTED mode. You can use NESTED mode any time you can use AUTO mode. NESTED mode moves the process of expanding a query result into XML format from SQL Server to the middle tier. When you use NESTED mode, you can offload processing cycles from the machine running SQL Server, thereby increasing the scalability of your platform. However, NESTED mode has a few minor limitations over AUTO mode because not all metadata in the database is available on the middle tier.
XPath and annotated schemas. Annotated schemas let you define an XML view on your database, which you query by using XPath. When a schema that describes the desired XML result is available, XPath query with annotated schemas is a good choice for generating XML results from SQL Server. The advantage to using XPath and annotated schemas is that this option hides the complexity of building EXPLICIT queries. The disadvantage is that XPath isn't nearly as expressive a query language as T-SQL, and XPath—as well as the SQL Server implementation—contains many limitations and peculiarities. These differences include lack of support for the wildcard character (*), disallowing position-oriented XPath node references, limited support for data types, and the absence of the UNION operator. If your scenario requires a query that you can't express in the subset of XPath implemented in SQLXML, you'll need to write EXPLICIT mode queries to obtain the desired output. If you're a T-SQL programmer and you want to use this technology in your applications, you'll also need to learn XPath and understand XML schemas. Generally, learning XPath is a welcome challenge for us geeks, but you need to factor in time to climb the learning curve if you're new to XML technologies.
The four options for updating SQL Server through XML are OpenXML, updategrams, DiffGrams, and XML Bulk Load. The following guidelines will help you choose the appropriate XML update technology for your situation.
OpenXML. OpenXML is a T-SQL feature that provides a relational view of an XML document. Although OpenXML doesn't directly update the database, you can use it within a T-SQL script or stored procedure to extract data from an XML document and store that data in database tables. In addition, you can write T-SQL business logic that uses OpenXML to validate the XML data before you update your database. OpenXML works best for relatively small documents. Because OpenXML runs directly within SQL Server, processing the XML—which includes parsing the XML document and allocating memory for the XML Document Object Model (DOM) that XML uses—can introduce a significant processing load to SQL Server. I recommend that you limit the size of the document to less than 1MB. Passing an XML document to the stored procedure that uses OpenXML can also be inefficient. Because you can't pass a document stream or a file reference to a stored procedure, you have to load the entire XML document into a string buffer, which is passed to the stored procedure. Allocating an in-memory buffer for the entire XML document could cause a large memory overhead on the client machine. Still, OpenXML is the best choice when you don't have an XML schema that describes the structure of the data or when the XML is semi-structured. Semi-structured XML documents contain known data elements, but the location of the elements within the document's XML hierarchy might vary. Overall, OpenXML provides the most programmatic flexibility for XML updates to SQL Server.
Updategrams. Updategrams use optimistic updates (updates that don't lock the tables in the database) to modify data in SQL Server and use an XML grammar to specify before and after images for fragments of the modified XML data. (You don't have to include in the updategram the whole data set obtained from the original query.) The primary advantage to using updategrams is the data abstraction they provide. Updategrams implement an XML-to-SQL mapping that eliminates the need to write T-SQL update queries and to understand the database structure. Although this advantage might seem cool at first glance, two obstacles limit the usefulness of updategrams. First, Microsoft hasn't built updategram generation into either the XML DOM or ADO. So if you want to use updategrams, you need to write code that tracks changes to your data and builds the updategrams. After you've built that code, you'll likely encounter the next obstacle: the lack of an object model to write business logic. And the need for business logic that validates the correctness of data before you execute a database update is commonplace. The only object model available for processing updategrams is DOM. Although you can build business logic by using the DOM, the code must tediously navigate the updategram's structure, making the updategram difficult to maintain over time.
One other somewhat useful updategram feature is the parameterized updategram. When you combine parameterized updategrams with SQLXML templates, you can use the updategrams to update the database by simply posting parameters to a URL. All things considered, updategrams have limited applicability in the real world.
DiffGrams. Microsoft introduced DiffGrams in ADO.NET and now supports them in SQLXML 2.0. DiffGrams are essentially updategrams, although you use a different XML grammar to express the updates. Another difference between the two is that you can generate a DiffGram automatically from the ADO.NET DataSet object, but you can't generate an updategram the same way. If you're using the Microsoft .NET Framework, DiffGrams are your best choice for XML updates.
XML Bulk Load. As its name implies, XML Bulk Load is best for loading extremely large XML data sets into SQL Server. Because XML Bulk Load doesn't load an entire XML document into memory, you can use it to load XML documents hundreds of megabytes in size. But XML Bulk Load isn't limited to just large documents. XML Bulk Load supports annotated schemas and transactions, so you can use it as a generic update mechanism. Like updategrams, this option's lack of an object model to write business rules hampers more frequent use for small documents. However, a hybrid approach that compensates for the lack of an object model can produce good results. You can use XML Bulk Load for loading XML data into temporary tables and use stored procedures for executing business rules and transferring data into your database tables.