An Upgraded UPDATE

Downloads
93729.zip

In SQL Server 2005, Microsoft enhanced the UPDATE command in ways that improve performance, reduce the occurrences of blocking and deadlocks, and will probably influence the way you write your code. In SQL Server 2000, when you issue an UPDATE statement against a table, SQL Server physically updates the data rows in the table, then updates the corresponding rows in all nonclustered indexes that contained at least one of the columns mentioned in the UPDATE statement’s SET clause. SQL Server 2000 performs the update to the nonclustered index rows regardless of whether a value actually changed in those columns. Unlike SQL Server 2000, SQL Server 2005 performs the update to the nonclustered index rows only if there was an actual value change in those columns. I’ll explain and demonstrate the advantages of this change to UPDATE by using a scenario called Update All Columns.

Update All Columns

First, create and populate a sample table called Employees by running the code in Listing 1. This code also creates a nonclustered index on each employee attribute (i.e., firstname, lastname, and salary).

I've written the usp_upd_Employees stored procedure to manage update requests of employee attributes. Using stored procedures rather than ad hoc code provides benefits such as better security, better reuse of execution plans, and easier program maintenance. It’s a common practice among T-SQL programmers to create stored procedures that encapsulate single-row INSERTs, UPDATEs, and DELETEs against a table. Writing stored procedures that encapsulate INSERTs and DELETEs is straightforward. However, writing stored procedures that encapsulate UPDATEs is less straightforward because you might want to update different subsets of the columns with each invocation of the procedure.

Run the code in Listing 2 to create the stored procedure usp_upd_Employees. The stored procedure accepts the key (@empid) of the target row as required input, and all other employee attributes (@firstname, @lastname, @salary) as optional input. If you don’t specify a parameter when you invoke the usp_upd_Employees procedure, SQL Server sets the parameter to the default value NULL.

The usp_upd_Employees stored procedure uses the COALESCE function to set each employee attribute to the corresponding input parameter (if it was provided, the parameter value is set to NOT NULL) and to the column value itself (if the parameter wasn't provided, the parameter value is set to NULL). If a parameter wasn't specified when the stored procedure was invoked, logically there’s no value change. Using the COALESCE function in a stored procedure, instead of writing a stored procedure for each subset of attributes, makes program maintenance easier. However, even if you use the COALESCE function with a stored procedure, when you run the procedure in SQL Server 2000, all nonclustered indexes are affected even if you don't need to update all attributes of a given employee. For example, run the following code in SQL Server 2000:

EXEC dbo.usp_upd_Employees
  @empid     = 1,
  @salary    = 1001;

The corresponding rows in all three nonclustered indexes created in the table are modified, instead of only the employee row in the table (clustered index) and the corresponding row in the nonclustered index on salary (idx_salary). Of course, making superfluous modifications to some of the nonclustered indexes in SQL Server 2000 incurs a performance penalty, but doing so can also potentially cause blocking or deadlocks.

Avoid Deadlocks

Now I'll show you how a deadlock can occur in SQL Server 2000 and how it's avoided in SQL Server 2005. In SQL Server 2000, open two new connections—call them Connection 1 and Connection 2. Run the code in Listing 3 in Connection 1 and the code in Listing 4 in Connection 2, and use text-output mode in Connection 2. When you get a deadlock error in one of the connections, stop the activity in the other connection.

The code in Listing 3 runs an endless loop that invokes the usp_upd_Employees stored procedure. In each iteration, the salary of employee 2 is set to a random value in the range of 1 to 1000. Behind the scenes, Connection 1 requests an exclusive lock on the data row for employee 2 in the clustered index; if the previous lock was granted, Connection 1 also requests exclusive locks on the corresponding rows in all nonclustered indexes as well as on the salary index—idx_salary.

The code in Listing 4 runs an endless loop that queries the Employees table, then returns the lastname value associated with the employee whose first name is fname 2. The employee ID of the requested employee is 2. (I used a hint: index = idx_firstname to force SQL Server to use the index idx_firstname in this tiny table, but if you run the code in Listing 4 on a larger table, SQL Server will use the index without the hint if the selectivity of the filter is high enough.) Behind the scenes, Connection 2 requests a shared lock on the index row where firstname is fname 2. If SQL Server grants the request, it requests a shared lock on the data row for employee 2 in the clustered index and returns the lastname value.

It's easy to see the potential for a deadlock in SQL Server 2000 if the timing is such that both requests begin at the same time. And indeed, if you run the code in Connection 1 and Connection 2 simultaneously, after a few seconds, you'll get a deadlock. If you run the same test in SQL Server 2005, you won't get a deadlock. No value change occurs in the firstname column, so that when you run the Connection 1 listing in SQL Server 2005, the usp_upd_Employees stored procedure doesn’t attempt to update the corresponding row in the nonclustered index idx_firstname.

Preventing Deadlocks in SQL Server 2000

So what can you do in SQL Server 2000 to avoid the need to update all nonclustered indexes and reduce occurrences of deadlocks? One option is to use dynamic SQL to construct the UPDATE statement. Listing 5 shows you how the UPDATE statement will include, in the SET clause, only those columns whose corresponding parameters were provided. However, using dynamic SQL can present other problems. If you use the code in Listing 5, you'll get a different plan for each subset of modified columns and the users who need to execute the stored procedure will require direct UPDATE permissions against the table.

If you’re working with SQL Server 2000 and you'd rather not use dynamic SQL, another available option updates only the relevant indexes and also reduces occurrences of deadlocks. Within your stored procedure, you can check which parameters were specified or not and invoke a different static UPDATE statement for each unique combination of specified parameters. The downside of this solution is that it will obviously require more maintenance than the other solutions I've presented.

UPDATE Options

I couldn't find any official Microsoft documentation about the change in physical processing of the UPDATE statement in SQL Server 2005, so this article should shed some light on the subject. This enhancement to the UPDATE command lets you use static SQL code in SQL Server 2005 and still enjoy better performance and a reduction of deadlock occurrences. In SQL Server 2000, you'll need to weigh the advantages and disadvantages of using static SQL Server versus dynamic SQL to determine how best to use the UPDATE command in your environment.

Comments

Plain text