This week I'm going to finish up my discussion about XML support in Microsoft SQL Server 2005 by showing you how XML columns integrate with other data in an application database. As I noted in my last article "SQL Server 2005 Provides True Integration of XML Data," SQL Server 2005 supports both untyped and typed XML columns. At its core, an XML column is still a string, and when an XML column is untyped, the database engine will accept whatever well-formed XML string you pass in as valid.
However, typing a column has two benefits. First, a typed column allows SQL Server to validate the individual fields within your XML column. This validation goes beyond just ensuring that your XML elements are ordered and structured correctly. It includes details like ensuring that a column labeled as a decimal value does indeed contain numbers.
The second benefit relates to the ease in which you can retrieve data. As you probably know, getting data into a table is only half the battle. The second half of the battle is finding and retrieving the data you need. When you use a typed XML column, SQL Server 2005 doesn't need to do as much on-the-fly type conversion to determine whether the decimal value you're querying for in your XML column is in fact a decimal value. Because the typed XML column's fields have already been verified, your query operates more efficiently.
By the way, you might have noticed that I've taken to consistently referring to the value elements inside of XML as "fields." I do this because, in theory, each column in a relational table has a value, as does each field inside of an XML structure. Similarly, the XML term "element" can refer to an entire subtree as opposed to a single value. Thus, I use the term "field" as a way to identify elements that contain a value while still maintaining a distinction from the term "column" in a relational table.
Besides typing, you can use primary keys and indexes to enhance the efficiency of querying a table. When you create a new table, you typically create a primary key. The purpose of the primary key is to identify unique entries within the table.
In addition to the primary key, developers often use an index. An index provides a quick way for the application logic to find a group of entries in the database. Although not all indexes are unique, the ability to reduce the number of records involved in a database query can provide significant performance gains.
SQL Server 2005 provides tools that let you index the fields that are part of an XML column. This process involves only two steps: creating a primary index and creating a secondary index. However, an initial requirement is that your table must have a primary key with a clustered index.
What is a clustered index? When you create a primary key, SQL Server automatically creates a clustered index for your table. You can alter this default behavior and choose another column to have your clustered index, but clustered indexes are unique in that they keep the index entries in order as the index is built. This feature is important because the clustered index is the most direct index into your table. As long as your table has a primary key and the associated clustered index, you can add a primary index for the data in your XML column. The primary index for an XML column differs from most other indexes in two respects. First, the primary index for an XML column takes all the elements contained in your XML data and creates an index based on all that data. Thus, although the primary index speeds up queries, it also takes up a fair amount of space.
Second, the primary index doesn't focus on a specific field or path within your XML data. To create an index that focuses on a more specific value, you need to create a secondary index. Let me be crystal clear here: You can't create a secondary XML index until after you create a primary XML index.
After you create the primary XML index, you can create one of three different types of secondary XML indexes: PATH, PROPERTY, and VALUE. Each one focuses on a different portion of the XML structure. You use the PATH index for queries that include a specific path to elements in your XML data. You use the PROPERTY index to retrieve multiple fields from within an XML file, where the primary key for the records you desire is already known. You use the VALUE index to retrieve elements in your XML data. The VALUE index is particularly useful for fields that might be repeated on different paths within your XML data.
Of course, creating primary and secondary indexes will greatly impact storage space. If you're concerned about space and you want to run queries against a single field in your XML column, an alternative is to promote an XML field to a column in SQL Server 2005. The advantage of this promotion is that you can reference the data the same way that you would reference any other column, including setting up standard SQL indexes on the column. However, as part of this process, SQL Server 2005 is essentially extracting the field value from your XML structure to create a column. Thus, you're querying against that column rather than directly against your XML structure. If you want to retrieve other fields from your XML data based on the value found in a promoted field, you'll pay a price in the form of an extra step: opening the XML data and retrieving the values.
After you've completely finished creating your XML columns, the next step is to put in place the business logic needed to manipulate the XML data in those columns. There are several ways to implement the business logic, which brings us to another feature in SQL Server 2005: the ability to create .NET functions.
XML columns and the integration of the Windows .NET Framework in SQL Server are the two features that will have the greatest impact on developers. In my next column, I'll start discussing how the .NET Framework's inclusion in SQL Server 2005 will affect developers. In the meantime, if you want more information about XML in SQL Server 2005, go to http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sql25xmlbp.asp.