Subtle Changes You Might Have Missed

Downloads
94591.zip

In the year since SQL Server 2005's release, the number of technical sites containing SQL Server information has burgeoned. Microsoft's SQL Server development team members host blogs describing little-known details of SQL Server operations and best practices. But even with all this information available, some aspects of SQL Server have changed subtly—changes that you might have missed. You might not realize that what you think you know about SQL Server behavior is actually no longer true. I'll give you the straight story about three such misconceptions.

Choosing to Be a Victim—or Not

One SQL Server 2005 change that surprised me was an enhancement to the deadlock-resolution algorithm. In SQL Server 2000 and earlier, the SET option SET DEADLOCK_PRIORITY had two values, LOW and NORMAL. LOW didn't mean that you had lower priority for being selected the victim; it only controlled the connection in which the value was set. Setting this option to LOW indicated that if this connection were involved in a deadlock, it should be chosen the victim. There was no way to set your priority to HIGH, to indicate that this connection shouldn't be chosen as the victim.

I'd already written and delivered my updated SQL Server Internals course, which included a description of SET DEADLOCK_PRIORITY and stated that only two possible values could be used. By chance, I came across a reference to setting DEADLOCK_PRIORITY to HIGH and at first assumed it was a mistake. But when I checked SQL Server 2005 Books Online (BOL), I discovered a change I hadn't known about.

In SQL Server 2005, SET DEADLOCK_PRIORITY lets a process determine its priority for being chosen as the victim by using one of 21 different priority levels, from -10 to 10. You can still use the value LOW (equivalent to -5) or NORMAL (which is 0 and the default); you can also specify HIGH (which is 5).

Which session is chosen as the deadlock victim depends on each session's deadlock priority. If the sessions have different priorities, SQL Server chooses the lowest-priority session as the deadlock victim. If both sessions have the same deadlock priority, SQL Server selects as the victim the session that's less expensive to roll back.

Unnecessary Database Backups

Another old "fact" that's no longer true, though still commonly believed, concerns database backups. A longtime SQL Server user posted a question on a public newsgroup about a behavior change he'd noticed in SQL Server 2005. In SQL Server 2000, if he truncated a table, he then had to take a full database backup for his log-backup chain to be unbroken. He claimed that if he tried to take a log backup after the TRUNCATE, SQL Server complained. I don't know exactly why he generated an error, but there's no reason for SQL Server 2000 TRUNCATE operations to invalidate log backups. This was true in a much older version of SQL Server, which is probably where he first encountered this behavior; he just kept believing it until he tried it in SQL Server 2005.

In SQL Server 2005, 2000, and 7.0, a TRUNCATE is a logged operation. Log backups containing transactions involving a TRUNCATE operation are fully functional and can be used to restore the database as easily as log backups not containing transactions that perform truncations. Because of his outdated belief, the user probably spent more time than necessary performing full database backups. I imagine the person who posted this question wasn't the only one using this outdated information.

Maintaining Indexes During Bulk Modifications

Here's another behavior change you might not be aware of. Before SQL Server 7.0, it was usually considered a best practice to drop all your indexes before doing any kind of bulk load operation. Bulk loads include the BCP command, the T-SQL BULK INSERT command, and simple load operators in SQL Server 2000's DTS or SQL Server 2005's SQL Server Integration Services (SSIS, DTS's replacement). Before SQL Server 2000, it was usually considered faster to drop all indexes, load the new data into an unindexed table, then rebuild the indexes. Although rebuilding indexes is expensive, this process was less time-consuming—if a large percentage of new rows were being added—than trying to add those rows while maintaining indexes.

SQL Server 2000 introduced an internal technique for processing bulk inserts—index-at-a-time modifications—that makes inserting data much more efficient, even while maintaining multiple indexes. This technique is applicable to large update operations as well as bulk inserts.

The alternative to index-at-a-time modifications is row-at-a-time modifications, which was the only possibility before SQL Server 2000. For every new or updated row, SQL Server maintains each index individually, inserting or updating a pointer at the appropriate location in each nonclustered index. As an example, let's use the Sales.Sales OrderHeader table from the AdventureWorks database. I'll make a copy of that table and build the clustered index on SalesOrderID and nonclustered indexes on CustomerID, SalesPersonID, and ContactID. If I insert 100,000 new rows, the values of CustomerID, SalesPersonID, and ContactID will be relatively randomly spread across the index leaf levels. For each new row, SQL Server would have to find where in the CustomerID index to insert or update the new CustomerID value, find where in the SalesPersonID index to insert or update the new SalesPersonID value, and in the ContactID index, insert or update the new ContactID value. This process could result in 300,000 random-access data modifications, excluding any modifications at the indexes' upper levels as more rows are added to leaf.

Index-at-a-time modifications are possible because SQL Server sorts data efficiently. If sufficient rows are being inserted or updated, SQL Server can perform repeated sort operations on the data. The data can be sorted using the appropriate order for each index; then only a single pass through that index's leaf level is needed to merge all the new index values into the existing index records.

Run the code in Listings 1 and 2 to create a copy of the Sales.SalesOrderHeader table and build four indexes on the table. Listing 2 recopies the original table into the new table to double the number of rows to 62,930. When I look at the plan for the UPDATE query in Listing 3, I see a single clustered-index–scan operation (which is the same as a table scan) at the far right of the graphical plan and a clustered-index update as the final step on the left, which represents 91 percent of the query's cost. This operator performs all the modifications to all the indexes by using the row-at-a-time technique.

If I run the code in Listing 2 again, I'll increase the table's size to almost 100,000 rows. When I now look at the graphical plan for the UPDATE in Listing 3, it looks different. I see several icons indicating index-update operations and several spool operators that indicate SQL Server is collecting the index key values in a work table before sorting them. Web Figure 1 shows a portion of the plan containing the spooling, sorting, and index-at-a-time updates. I can't fully explain each operator in your plans here, but I want to show plans using each of the types of bulk operators (i.e., row-at-a-time and index-at-a-time). Depending on the resources available on your system, you might find that you need to run Listing 2 a different number of times to get the index-at-a-time plan. My technical editor got the index-at-a-time plan the first time he ran Listing 2. But when I retested this code on another SQL Server machine, I had to run Listing 2 four times.

Whether SQL Server chooses to use row-at-a-time or index-at-a-time operations for insert and update operations depends on various factors, including the total number of rows modified, percentage of existing rows affected by the change, and available system resources. I don't mean to suggest that you should never drop your indexes prior to bulk loads or bulk updates, just that you shouldn't assume it's always better to do so. You need to test your operations on your SQL Server systems with your data. (More information and examples of testing to determine at what point dropping the indexes before modifying data might be cheaper.)

Little Surprises

I've talked about three changes in SQL Server behavior that haven't been well advertised. As you can imagine, these examples aren't the only areas in which SQL Server has changed the way it behaves from version to version. The moral: Be prepared to be "pleasantly" surprised whenever you investigate behavior after any SQL Server upgrade, whether to an entirely new version or even to a service pack.

Comments

Plain text