They're your database transactions. You worry about them. You try to structure them correctly from the start so that you can depend on them to execute properly and in the order you intend. But sooner or later, you see the alert error 1205: Your transaction was deadlocked with another process and has been chosen as the deadlock victim. Rerun your transaction.
"How can this be?" you ask. "They always behaved so well by themselves. They must have been running with a contentious crowd again."
Compared with earlier versions, SQL Server 7.0 offers a dramatically enhanced approach to transaction contention management. Most significant, row-level locking is now the default for all types of transactions. SQL Server 6.5 had an Insert Row Locking (IRL) feature, which was an optional attribute on a table-by-table basis that governed the behavior of concurrent inserts. But in reality, IRL was still a page-locking system; it simply allowed multiple inserts into a page at the same time. True row-level locking in SQL Server 7.0 is a necessary advance that positions SQL Server as a viable relational database management system (RDBMS) for enterprise applications with high volumes of data and high degrees of concurrent use.
SQL Server aficionados have long anticipated these locking mechanism improvements. Now you can control locking of your database resources at the granularity you want—the row level. By using SQL Server 7.0 and good application development techniques, you might feel adequately armed to avoid deadlocks. But despite your best efforts and the locking mechanism enhancements in SQL Server 7.0, deadlocks can occur. Here's an example you're probably familiar with. Process A updates table Orders, then attempts an update on table Customers. Process B, seemingly in defiance, updates table Customers then circles back to Orders. SQL Server takes the locks, the processes hang, and a deadlock occurs. To avoid this situation, you try not to let Process B happen. You go through great pains to ensure that any set of updates to Orders and Customers execute just as Process A would. Where possible, you write code as a sharable component, either a T-SQL stored procedure, a 3GL function packaged as a COM component in a DLL, or any other time-honored method for sharing code. "This is easy," you think. "I could avoid deadlocks in my sleep!" Then one night, as you sleep, your pager calls again. When you check your system, you see error 1205 again. Why? Try as you might, the traditional approach of ensuring correctly ordered transactions, though necessary, simply won't prevent SQL Server 7.0 from deadlocking on resources, sometimes within locking of a single row.
To illustrate this point, you can follow these steps to create an example. First, use the script in Listing 1 to create Indexed_Table. Note that the table has a nonclustered index and a data field named Non_Indexed_Field, which isn't contained in any index.
Next, populate the new table with the data in Listing 2. You can use this simple table to create a surprising deadlock scenario. Let's say that two processes, Process A and Process B, access this table at the same time. Having reformed its contentious behavior since the previous example, each process carefully avoids updating multiple rows in reverse order.
Process A, which needs to update only one row, submits the following statement:
UPDATE Indexed_Table SET Non_Clustered_Index_Field = 'NewData2' WHERE Id = 2
Process B, at roughly the same time, submits a query against the table:
SELECT Non_Indexed_Field FROM Indexed_Table WHERE Non_Clustered_Index_Field = 'Data2'
Although each transaction in this example will contend for access to the same row, the transactions don't fit into the classic deadlock scenario I described above. Most significant, each transaction affects only one row. Thus, a scenario in which each process holds a lock that the other needs doesn't seem possible. This example seems to be a simple blocking situation with one of two possible results: Either the select statement from Process B will find the appropriate row before the update statement from Process A gets to the row, or the select statement from Process B will come out empty-handed.
However, these two processes can deadlock because they both access the nonclustered index, albeit for different reasons. A nonclustered index requires storage space separate from the table (unlike a clustered index, which orders the storage of the table itself). Thus, the index is a different set of resources, governed by separate locks from the ones the table uses.
In the game of deadlocks, timing is everything. In this example, you have a slight window of opportunity if all the following conditions exist:
- The select statement uses the nonclustered index to find matching rows, but the statement needs to access the table data to complete its result set for any given row (the index doesn't cover the query).
- The update statement doesn't use the nonclustered index to find matching rows, but it needs to update the field in the nonclustered index.
- At least one row exists that, before any update, matches both statements' selection criteria.
Consider the execution plans and subsequent locking activity for the statements above. You can observe locking activity in SQL Server from several approaches. The best way to see the locks taken at each part of the execution is to use Trace Flag 1200. Microsoft documented this trace flag in SQL Server 6.5. Although the company doesn't document this flag for SQL Server 7.0, Trace Flag 1200 produces a thorough trace of the lock requests for any statement. Microsoft doesn't officially support the trace flags, but they're quite useful. Be warned, however, that the locking activity data is extensive; you should capture it only in controlled environments for small, isolated examples. For the purposes of this discussion, I distill the locking activity to the essential aspects involved in the deadlock. I won't cover the robust system of lock escalation.
As SQL Server executes the update statement and finds matching rows, it takes out exclusive locks on each row it updates. In this example, SQL Server uses a table scan to find the matching rows, and this method is the best execution plan given the schema and selection criteria. However, the update in this example also changes the value of a field in the nonclustered index (from Data2 to NewData2). Thus, for the row in question, SQL Server also needs to update the index with the new value to complete the transaction. SQL Server needs another exclusive lock, this time on the corresponding entry in the nonclustered index.
However, the nonclustered index also drives the select statement. Given the selection criteria, the SQL Server optimizer's only logical choice is to use the nonclustered index to find matching rows. When SQL Server is using the default locking mode, read committed, Process B takes shared locks on each entry in the index as it traverses that index. SQL Server releases these locks as soon as it determines no match for the row. When SQL Server finds a match, however, it holds the shared lock. Because the index (the field Non_Indexed_Field is represented only with the actual table data) doesn't completely cover this particular query, SQL Server needs to read the table to satisfy the query. SQL Server requests a shared lock for the corresponding row in the table.
By now you've probably noticed the window of opportunity for the deadly embrace. Process A requests two exclusive locks in sequential order—first on the row, then on the nonclustered index. Process B requests two shared locks in sequential order—first on the index, then on the row, which is the reverse order of Process A. This scenario is beginning to look like a classic deadlocking example, after all. If the timing is such that each process obtains its first lock successfully (the exclusive lock on the row for Process A and the shared lock on the index entry for Process B), the deadlock will occur.
The key to understanding this scenario is realizing that the first locks that Process A and Process B obtain can peacefully coexist. No logical or hierarchical connection exists between the row in the table and the entries in the nonclustered index. Thus, a process can obtain a lock on an index entry while the row in the table is locked.
You might want to reproduce this problem in a local environment by executing each statement in its own Query Analyzer window. But note that because the timing window opens and closes within the execution of each statement, you have no way to pause the processing in midstatement to force the timing. Even the quickest Minesweeper players will have little chance of successfully hitting the execute buttons fast enough. I needed to code similar examples into tight, infinite loops to produce the deadlock in a controlled environment. But this example is based on a real-world application, in which the window of timing opens about once every 10,000 update transactions.
Avoidance or Acceptance?
Despite your best efforts to control transactional sequencing, SQL Server manipulates resources at a level beyond your control, and in such a way that deadlocks might occur. In addition, deadlocks are more likely to occur when you have multiple indexes on a table to support diverse queries. What now?
You can consider one of the following avoidance or acceptance techniques, each of which has merits and problems. I include tips about identifying the proper technique for your situation.
Use the uncommitted read (dirty read) transaction isolation level. Using dirty reads is an avoidance technique. Submitting queries with the dirty read directive (by using the read uncommitted hint within the select statement or the set transaction isolation level statement) is a surefire way to prevent the deadlock in the previous example. Using a dirty read tells SQL Server to not take any shared locks during the reading process, letting the query return uncommitted data. In general, a dirty read transaction runs quicker than other types of transactions because it doesn't need to take out any locks, and because it lets other concurrent processes run quicker because they never need to wait for those locks. The obvious drawback, of course, is that dirty data might be incomplete or invalid. If the application has complex transactions that require several steps to complete and maintain data integrity, this approach won't work for you. If you choose this technique, and you decide to accept the risk that the SQL Server system will occasionally bring back incomplete data from the queries, you need to consider how your application will react to that data. Will other statements or transactions fail? Will SQL Server incorrectly alert users? If the system lets users drill down on the data in the screen for further detail, what will they see when they drill down? In general, use the dirty read approach only when the consequences of bad data to the downstream processes are minimal.
Remove the nonclustered index. In the absence of the nonclustered index, two things happen. First, the update doesn't need to keep that index current, so the update can complete its transaction with only one lock on the row. Second, the execution plan for the select statement changes dramatically. In the absence of an appropriate index, the SQL Server system needs to scan the table. Can this process execute as quickly as an index scan? The transaction process speed depends on several factors, including the size of the table, the selectivity of the index, and the distribution of data. When you evaluate the change in execution plans for the select statement, don't forget that the overall concurrency profile of the application might also improve with this approach. Thus, a small impact on one statement's performance might be negligible compared with the improved overall application performance that the change offers.
Cover the query. The previous example was one field away from having a covered query. In other words, SQL Server could have built the result set entirely by reading the index rather than scanning the table data, if I had built Non_Indexed_Field into the nonclustered index. The query could have completed its work with only one lock per matching row, and the deadlock never would have happened. Using covered queries is a good avoidance technique. In this situation, you completely eradicate the deadlock. However, you can't cover every query with indexes, especially in systems in which users have ad hoc querying capability. Also, every indexed field has a cost in storage and in the performance of inserts, updates, and deletes. In the real-world scenario that spawned this article, I would have had to add 15 columns to the index to follow this approach. Using covered queries is best suited for specific examples in tightly controlled systems.
Retry the transaction. Retrying the transaction is an acceptance approach. In most organizations, developers and users are constantly enhancing applications. Furthermore, many concurrency problems aren't identified in a requirements specification, in a test plan, or in a development lab. The more successful an application is, the more people will use it—often in ways that the developers didn't anticipate. People will use features at different time intervals, which will cause the application's concurrency profile to change. Retrying the transaction employs error handling and retry logic throughout the application. When you see error 1205, you can retry the offending statement an arbitrary number of times or until the transaction succeeds. Retrying the transaction is a good way to give your users uninterrupted availability, but it has three drawbacks. First, the method obscures easily preventable deadlocks; finding these deadlocks is tougher for you. (I'm assuming that you're already using the Profiler trace flags 1204 and 1205.) Second, the retry method is an after-the-fact remedy. By the time you decide to retry the transaction, the deadlock has occurred and the applications have incurred a delay. SQL Server 7.0 checks the lock queues every 5 seconds to detect and handle deadlock situations. Whenever a process waits on the same lock for more than one iteration of this check, SQL Server investigates the process for a deadlock. Thus, if you receive a deadlock error, your process might have been hung up for 5 to 10 seconds. Relying solely on this approach might degrade overall system performance. The third possible downside of the retry method is that it might cause code maintenance headaches, depending on how many different methods you employ to submit SQL statements. If you have a central component in which you can implement the retry logic once and apply it to the entire application, the retry technique will work fine.
The deadlock workaround that's right for you is probably some combination of the methods I've discussed. Having the retry logic in place is a good failsafe mechanism so that deadlocks remain your problem and not your users' problem. You can't cover every query. You'll need to index some fields that users regularly update. And, you'll likely have ramifications for reading dirty data.
The right workaround for you depends on your system's priorities. Is accurate, quick completion of the update the most important goal? Is user query response time the most important goal? What is the stimulus for each event? And how does the system's response affect downstream events? Answers to these questions will help guide your decision.