Most discussions about indexes and query tuning focus on getting the best performance from one instance of a running query. If the query performs data retrieval only, optimizing for one instance is enough. Even if many users executed the same query simultaneously, no further index tuning would be necessary because the best indexes for one execution are almost always the same as for multiple concurrent executions. However, if you have frequent data-modification operations, the index recommendations might change for multiple users. An update that runs in a reasonable amount of time for a single user might slow down drastically when multiple users execute it simultaneously. In fact, some data-modification queries might be impossible for multiple users to run simultaneously because the first user might acquire exclusive locks, forcing all other users to wait their turns. The query might take an acceptable amount of time once it starts running, but if the user has to wait for other user queries to complete before his or her query can begin, the query will seem very slow indeed.
This difference between data-retrieval and data-modification operations results from the different kinds of locks that SQL Server needs to acquire. When performing data-retrieval (SELECT) operations, SQL Server typically acquires shared locks, which any number of users can acquire on the same data simultaneously. On the other hand, data-modification (INSERT, UPDATE, and DELETE) operations cause SQL Server to acquire exclusive locks. Only one process at a time can hold an exclusive lock on any data element, which might be a row, an index key, a page, or the whole table. If another process requests an exclusive lock on data that's already locked, the process must wait until that data is available before it can do any of its operations.
Note that when formally discussing locking behavior, I talk about processes and not users. Keep in mind that one user can run multiple applications that connect simultaneously to SQL Server and that each of those connections is a separate process. In addition, some applications can open multiple connections to SQL Server, and each of those connections is also a separate process. SQL Server's lock manager doesn't distinguish whether separate processes were initiated by the same user, so a user can start one process that can block other processes from that same user. You can observe this behavior by using Query Analyzer to start multiple sessions. When you use Query Analyzer, SQL Server treats each query window that you open as a separate connection. You can run the code I show you in this article by opening multiple sessions in Query Analyzer.
Of course, I can't dive into all the details about locking here in this column. But for more information about the internal details of locking, see my Inside SQL Server columns "Transaction Isolation Levels" (June 1999, InstantDoc ID 5336), "SQL Server 7.0 Locking" (July 1999, InstantDoc ID 5445), and "Indexing and Locking" (August 1999, InstantDoc ID 5665). As part of my series about indexing for query tuning, this article points out why you must be aware of multiuser locking issues when designing your optimum indexes. First, let's look at why you don't need to consider alternative indexing strategies if you're designing the best indexes for retrievals only. Then, I'll show how your choice of indexes can affect modification queries in a multiuser system.
Although the optimum indexes for a query that one process runs in isolation are usually the same as those for a query that multiple processes run concurrently, each of those query executions isn't as fast as if only one user were running that query. Multiple simultaneous processes require more system resources. However, the decrease in performance might not be proportional to the number of users; it might be much less. One of the most time-consuming aspects of data retrieval is reading the data in from disk, and if multiple processes need the same data, only one of those processes needs to physically read the data in from disk. SQL Server then automatically caches the data, so all the other processes can read the cached data, which is much faster than reading data from disk. However, if your data cache is small compared to the total amount of data that the processes need to access, additional reading from disk might be required.
SQL Server 2000 Enterprise Edition includes an optimization to improve the performance of nonordered scans (scans that aren't requested to be in any particular order) when different processes simultaneously request multiple nonordered scans of the same table or index. Without this optimization, one process might start scanning and get perhaps 20 percent of the way through the table before another process requests the same data. If SQL Server's data cache is small or is used by other processes for unrelated data, the pages scanned by the original process might be swapped out, so SQL Server has to go back to disk to get the first pages of the table or index again. When the original scanning process resumes, any pages that were read in ahead might be gone, requiring more disk reads. This can cause some serious disk thrashing—when the I/O controller must jump back and forth between areas of the physical disk multiple times. The so-called merry-go-round scans optimization in SQL Server 2000 Enterprise Edition lets you avoid this thrashing by letting a second process start at whatever point the original process has already reached. Both processes can then read the same data, but SQL Server has to read each page from disk only once. When the first process finishes, the second process can read the initial 20 percent or so of the table. Another process that needs to scan the same data in no particular order can start reading from wherever SQL Server is reading at the time.
For data-modification operations, multiple processes executing the same query might cause a substantial decrease in overall performance; as I mentioned, this happens because of the locking that SQL Server must manage. By default, SQL Server makes all the locking decisions internally, and the user or developer has no control over these decisions. However, if you know how SQL Server makes some of its locking decisions, you can troubleshoot blocking problems. Here are some key points to be aware of to help you determine where problems might occur:
- SQL Server must protect all data modification with some form of exclusive lock while the modification is taking place. Usually, SQL Server attempts to lock just the row containing the data that's being modified. So if the data is part of an index, SQL Server locks the index key. Sometimes SQL Server decides to lock a whole page or a whole table if it determines that it has the resources or if doing so would be more efficient.
- Although most discussions of locking describe only shared and exclusive locks, SQL Server also uses a hybrid type of lock called an update lock. Besides UPDATE operations, these locks are used for other data-modification operations, which require SQL Server to first search for the data to be modified, then perform the modification. During the searching phase, SQL Server acquires an update lock, then when the desired data is found, escalates that lock to an exclusive lock.
- SQL Server must hold exclusive locks until the end of a transaction, in case it needs to undo the transaction. (Contrast this to shared locks, which SQL Server can release as soon as the scan moves off the page, assuming that READ COMMITTED isolation is in effect.)
Indexes and Data Modification
For a couple of reasons, creating good indexes can help reduce blocking problems that might result from concurrent access to the same data. The first reason is simple. If you have good indexes in place, SQL Server can quickly find the data it needs to modify, then quickly release the locks on that data. The faster SQL Server can perform the operations, the less the likelihood of lock conflicts with other users; if other users are blocked, they'll be blocked for a much shorter amount of time.
The second reason indexes can reduce blocking problems is because they help SQL Server avoid table scans. If a table has no useful index, SQL Server has to use a full table scan to find qualifying rows for an UPDATE or DELETE operation. In that case, SQL Server must inspect every row to find the row to modify. Meanwhile, other processes that need to find individual rows remain blocked even if they'll eventually modify different rows. Without inspecting a row or having help from an index, SQL Server has no way of knowing whether the row qualifies for the modification. If you're modifying only a subset of rows in the table, as determined by a WHERE clause, an index on the column or columns in the WHERE clause means that SQL Server doesn't have to perform a full scan. By using the index key values, SQL Server can move directly to the relevant rows and doesn't have to examine the unnecessary rows.
Let's look now at a couple of examples of how indexes can help avoid blocking problems. The first example is one that I used in my August 1999 column, "Indexing and Locking," but running it again can show you that the problem still exists in SQL Server 2000 Service Pack 2 (SP2). To demonstrate locking between multiple processes, you can open multiple connections in Query Analyzer, either by choosing File, New; by pressing Ctrl+N; or by using the corresponding toolbar button to open a new connection. In Listing 1, the first connection builds a simple table of three rows and no indexes. Then it begins a transaction and updates one of those rows. A second connection then attempts to access a completely different row but can't proceed because the first connection blocks it. To remove the block, you can either commit or roll back the first connection's transaction, cancel the query in the second connection, or just close all your connections by using the File, Disconnect all menu option. I recommend this last approach because it ensures that no open transactions hang around to cause further blocking problems.
Listing 2 revises Listing 1's code by building a nonclustered index on the column that you want to update, which is also the column used to determine which row to update. No blocking occurs when you run Listing 2's code because the index on the column being updated lets the second connection avoid the locked keys.
The next example demonstrates that sometimes the type of index on a table can affect the locking that SQL Server uses. Listing 3 creates a table that has a nonclustered index on a column called TestID, populates that table with a few rows, then begins a transaction and updates one row. In a second connection window, you can then execute the following update:
UPDATE Test SET TestValue=TestValue + 1 WHERE TestID = 1
The second connection will be blocked. Now return to the first connection and run another update:
UPDATE Test SET TestValue=TestValue + 1 WHERE TestID= 1
SQL Server has encountered a deadlock, which means that these processes mutually blocked each other so that neither one can proceed without intervention. SQL Server must intercede and kill one process, roll back its open transaction, and release the locks. The other process can then proceed. Because SQL Server automatically detects and deals with any deadlock, to the user running the process that wasn't killed, deadlocks might not look like a big deal. However, if deadlocks occur frequently in an application, the whole system's throughput can suffer. For more details about tracking down and dealing with deadlocks, see David Erickson's May 2000 article, "Outsmart Deadlocks in SQL Server 7.0," InstantDoc ID 8412, and Itzik Ben-Gan's April 2000 article, "Problem-Solving with SQL Profiler," InstantDoc ID 8232.
In this situation, the solution is to recreate the table in Listing 3 but change the index to be clustered instead of nonclustered. Explaining exactly why this solution works would require a much more detailed description of locking behavior, but consider altering the index type in your testing if you're troubleshooting blocking or deadlocking problems.
Keep in mind that locking isn't necessarily a bad thing; in fact, it's usually good. You want to prevent your data from being modified by multiple processes simultaneously, which can leave inconsistencies in the data. Your goal should be not to eliminate locking but to let SQL Server hold the locks for the shortest possible time on the least amount of data. Building good indexes can go a long way in helping you minimize blocking problems, but make sure to test your indexes in a multiple-user scenario before deciding which indexes are best.