SQL Server 2005 Provides True Integration of XML Data

In my last article, I started discussing Microsoft SQL Server 2005's native XML data type. This new data type brings with it some of the most important core changes to how developers will interact with SQL Server. As I noted in my previous article, XML structures are excellent for managing hierarchical data relationships. Being able to incorporate XML data into a relational database provides tremendous flexibility.

However, just being able to drop XML data into the database wouldn't be sufficient if you couldn't integrate this data with other data in the database. For example, SQL Server 2000 provides limited support for XML as part of its query and insert capabilities, but this support stops outside the database. The XML data is either treated as a blob in the database or transformed into a set of relational structures. Fortunately, SQL Server 2005 lets you integrate XML data with other data in the database through several new features. More important, the introduction of a native type means that SQL Server 2005 truly supports XML.

Before leaving SQL Server 2000 behind in this discussion, I want to quickly look at the future for those of you who have embedded XML in your current data structures under SQL Server 2000. To store this data, you have undoubtedly used one of the current built-in types, such as Varchar. This workaround lets you save your XML data, but the individual fields within that XML structure are nothing more than blobs as far as the database is concerned. As part of your transition to SQL Server 2005, you can alter your table and designate these columns as being of the XML type. As long as you have well-formed XML structures, you'll be able to leverage all the features that I'm about to discuss.

At the individual column level, SQL Server 2005 supports the XML data type by exposing several methods that have been added to the T-SQL language. These new methods let you search, retrieve, and update data that's embedded within a larger XML structure. Each method is combined with a XML field to limit or update data associated with an XML column. The five new methods are query(), value(), exist(), nodes(), and modify().

The query() method lets you search through a larger XML structure to find a set of data based on an XML Query (XQuery) definition. The XQuery language is a World Wide Web Consortium (W3C) standard for searching or defining a set of XML nodes that meet a set of criteria. Most of the other methods also leverage XQuery conditions. For more information about XQuery, go to http://www.w3.org/tr/xquery .

Exist() is an optimized method that lets you screen XML data the same way you screen data with a relational WHERE clause. However, instead of retrieving a value from your XML data, then checking to see whether this value matches a condition, you pass the condition into the XML processor, then retrieve only those records that match the condition.

The value() method returns a specific value from within your XML structure. The limitation is that this value must be a single instance, such as a string or number. It can't be a subset (i.e., node) of your XML structure.

The nodes() method returns a subset of your XML structure in the form of a node. This result can then be used by other methods, such as exist() and value() to pull out repeating values that might be embedded within a single XML column. The key is that by using the XQuery syntax, which is native to XML, you can embed your queries against the data in a single relational column.

The last method I'd like to discuss is modify(). This method lets you insert and update values and nodes that are contained within an XML column. The modify() method accepts both INSERT and UPDATE statements not only for scalar values but also for entire subtrees. Thus, you can add specific child elements to a collection of items. By leveraging XQuery statements within each XML column command, you can manipulate the individual components contained within your custom XML structure.

SQL Server 2005 supports both untyped and typed XML columns. In an untyped XML column, there aren't any definitions for the structures contained in that column. Consequently, SQL Server 2005 needs more time to search for items located within it. Besides slow performance, you're limited to how much integration you can achieve with the data in an untyped XML column. I'll discuss this limitation in more depth in a future article.

When can you leave an XML column untyped? You should do so only under two circumstances. First, you can leave an XML column untyped when you're developing a table and query performance isn't an issue. Second, you can leave an XML column untyped when you have a complex XML structure and the speed of insertion is the most important consideration.

Typing an XML column involves associating an XML Schema Definition (XSD) with the XML column. SQL Server then uses the XSD to identify what specific XML elements should be part of your XML data. SQL Server 2005 goes beyond just ensuring that you have well-formed XML. It actually validates insertions against the schema. In addition, if you have typed XML columns, you can leverage SQL Server 2005's features for XML data. I'll review these features in my next article.

Comments

Plain text