Inside SQL Server: Indexing and Locking

Downloads
5665.zip

One of the most touted new features of SQL Server 7.0 is its ability to do true row-level locking. By locking only the data that is being processed, instead of an entire page containing numerous rows, SQL Server 7.0 drastically reduces blocking problems. (Blocking occurs when one user or application needs a lock on an object, but another user or application already has an incompatible lock on the object.) However, row-level locking alone can't solve all blocking problems. You still need to add appropriate indexes to your tables.

The Problem

Let's create a table to demonstrate why row-level locks by themselves won't solve all your blocking problems. The code in Listing 1 creates and populates the table. Use SQL Server 7.0's Query Analyzer or another querying tool to run the code in one connection.

Start a new connection and execute this partial transaction. (It's only a partial transaction because no COMMIT or ROLLBACK occurs inside the batch. SQL Server holds the locks until you execute a COMMIT or ROLLBACK command.)

BEGIN TRAN 
UPDATE testlocks
SET col1 = 7 
WHERE col1 = 1

EXEC sp_lock @@spid
   /* The output shows you one exclusive (X)
   	lock on a RID; this is the row that
	has been updated. */

In a third query window, run the following batch before you issue the ROLLBACK in the second query window.

USE pubs
UPDATE testlocks
SET col1 = 10
WHERE col1 = 3

Execute the ROLLBACK in the second window:

ROLLBACK TRAN

The Explanation

The second update command blocks, and can't update the row until the previous update transaction issues the ROLLBACK and releases its locks. Why does this happen when the two commands are updating different rows? If the first UPDATE is trying to update the row where the value of col1 is 1, why can't the second UPDATE proceed, when it is trying to update the row where col1 has a value of 3?

The answer lies in the fact that SQL Server must scan the entire table to do either update, because the table has no indexes. The first UPDATE scans the table looking for a value of 1, and finds this value in the first row. SQL Server takes an exclusive lock on that row and performs the update. Then it scans the rest of the table, but because it finds no more rows with the desired value, it acquires no more exclusive locks. In the sp_lock output, in addition to the X lock on the RID, IX locks are held on the page and the table. These intent locks, which I discussed last month (Inside SQL Server, "SQL Server 7.0 Locking," July 1999), are acquired at the coarser levels of granularity (pages and tables) to reflect locks held at a finer level (rows or keys).

The second UPDATE must also scan the whole table to look for rows with a value of 3. However, during the scan, when the UPDATE comes to a row with an exclusive lock, it can't know whether it needs to update that row. Even if it could read the row and see its current value, if the row is locked inside a transaction, that transaction might get rolled back to a value that would need updating. Because SQL Server has no way to know this, it must wait for the first transaction to release the lock to determine the locked row's real value.

The Solution

To take full advantage of row-level locks, SQL Server needs to use an index rather than a table scan. The script in Listing 2 builds the same table as in Listing 1 and then builds a nonclustered index on the table.

After you create the table, execute the first UPDATE again. The locks held are a bit different. You still have an X lock on the RID and IX locks on the page and the table, but additionally, you have two X KEY locks. Only indexes acquire KEY locks, so the X KEY locks are on the nonclustered index. When an indexed column changes, SQL Server also needs to change any nonclustered indexes on that column. The nonclustered index leaf level contains every KEY value, in sorted order. Originally, the nonclustered index contained 1, 3, and 5, but you updated the 1 to a 7. The index row where the 1 was before is locked, just like the data row, until the end of the transaction. The new version of the row has a value of 7, which goes to the end of the leaf level for this index because it's now the largest value. Also, that new index row must be locked for the duration of the transaction. So, although the data row doesn't move during the update, the leaf-level row must move, because the keys in the nonclustered index must always be sorted.

Without issuing a ROLLBACK for the first UPDATE, run the second UPDATE in a new query window. This time, the UPDATE succeeds. Using the index, SQL Server can determine that it doesn't need the locked row. It needs only the row with a value of 3 for col1. The index lets SQL Server access the desired rows directly, and no blocking occurs. Finally, make sure you roll back the original update or otherwise terminate the transaction so that all your locks are released.

The most important point here is that indexes not only are necessary for efficiency in executing individual queries, they also can be invaluable in providing maximum concurrency. SQL Server 7.0 can lock rows, but when it does a table scan, SQL Server needs to read the row to know whether that row needs locking.

One Last Question

If SQL Server needs to read the data row before knowing whether that data row needs locking, why doesn't the same condition apply in the leaf level of the nonclustered index? In other words, when the second UPDATE tries to find the index key for the row with a col1 value of 3, does it have to read the previous index keys, which are exclusively locked?

SQL Server doesn't know which data value an index row references until it reads that index row. During index traversal, SQL Server can use latches, which are special lightweight locks, to read a row internally without acquiring a lock on it. Even if the index key is exclusively locked, SQL Server can take a latch.

Latches are an interesting topic in their own right, and I'll talk about them more in another column. For now, think of a latch as a tool that's acquired for the system's internal purposes, and a lock as a tool that's acquired for the user's (or the application's) purposes. If the key value is not the one SQL Server is trying to find, SQL Server can ignore the row. Because all the keys in an index are sorted, SQL Server has to check only that the current value of the row is not the one it needs. If the row is locked and is going to be rolled back to a value SQL Server is searching for, that old value of the key occurs elsewhere in the index, and SQL Server blocks when it gets there. After SQL Server finds the index key it is looking for, it tries to acquire a lock to allow the user's UPDATE statement to succeed. If the key is locked, the process has to wait. The latch is released in either case (whether the lock is granted or denied), because it is a short-term mechanism.

You can run the same tests again after creating a clustered index on the table. In that case, the leaf-level index rows are the data. The sp_lock output will show you that the table has no row locks at all. The lowest-level locks acquired are key locks on the old position of the row and the new position of the row. (The row has to move because clustered indexes always keep the data in sorted order.) The second UPDATE won't experience any blocking problems, because SQL Server can use latches to inspect the value of the keys before requesting the lock.

Indexing Strategy

Like locking, effective indexing in SQL Server is a huge topic, and I'll cover more indexing details in a future column. SQL Server 7.0 includes tools to help you with your indexing decisions, but Microsoft designed these tools to optimize the speed of queries, as if they were running in isolation. None of the currently available indexing wizards take into account that queries might be running simultaneously or that a lack of indexes can cause blocking problems. You're on your own in making indexing decisions, but understanding how your choice of indexes affects locking can help you make the best choice.

Comments

Plain text