SQL Server Read-Consistency Problems, Part 2

Downloads
97456.zip

Last month, in SQL Server Read-Consistency Problems, Part 1, I began a discussion about index scans and read-consistency problems that can develop when one process is reading data under the Read Uncommitted isolation while another process is inserting data. This month, I’ll provide tangible examples that demonstrate those problems and also suggest workarounds. Be sure to read last month’s article as a prerequisite to this one.

Seeing Is Believing

I’ll first demonstrate how an allocation-order scan can read the same row multiple times. Run the code in Listing 1 to create a database called testdb and within it a table called T1. The T1 table has two columns: cl_col is a UNIQUEIDENTIFER column with a default value NEWID(), and filler is a CHAR(2000) column with a default value of a. The table has a unique clustered index on cl_col. With this row size, a page will contain four rows at most. Each insert will enter a GUID in cl_col, and because of the random nature of GUIDs, frequent insertions will cause frequent splits in the clustered index.

Open two new connections (call them connection1 and connection 2). Run the code in Listing 2, in connection 1, then the code in Listing 3, in connection 2. The code running in connection 1 simply inserts new rows into T1, frequently causing splits. The code running in connection 2 has a loop that selects all T1’s data into a temporary table and checks whether any rows were read more than once. The SELECT statement against T1 uses the NOLOCK hint, and if you examine the plan for this SELECT INTO statement, you’ll see a clustered index scan with the Ordered: False property. Remember, this is the scenario in which the storage engine might decide to use an allocation-order scan. As soon as the IF EXISTS statement finds multiple occurrences of rows, the code breaks from the loop and returns the GUIDs that SQL Server read multiple times.

Table 1 shows sample output I got in connection 2 after a few seconds. Remember that the clustered index on cl_col is unique, so there’s no doubt that SQL Server read the same rows multiple times. Once you get output from connection 2, you can stop the code running in connection 1 or run the code in connection 2 again if you want to make sure that the first test wasn’t a fluke.

The second test will demonstrate an allocation-order scan that misses existing rows. Run the code in Listing 4, to recreate T1 with an additional column that will hold an incrementing sequence value, and the Sequence table that will be used to produce the sequence values. Run the code in Listing 5 in connection 1, then the code in Listing 6 in connection 2.

The code running in connection 1 has a loop that produces, in each iteration, a new sequence value (i.e., greater than the previous by 1) and inserts a new row into T1. As in the previous test, the inserts will cause frequent splits. The code running in connection 2 has a loop that selects all T1’s data into a temporary table and checks whether any sequence values are missing. Again, the SELECT INTO statement runs with the NOLOCK hint and will show up in the plan as an index scan with the Ordered: False property. Namely, the storage engine might use an allocation-order scan in this case. Once the IF EXISTS statement finds missing sequence values, the code returns the pairs of adjacent sequence values with a gap between them, proving that the scan skipped sequence values in the table. Table 2 shows sample output I got in connection 2 after a few seconds. Once you get output from connection 2, you can stop the code running in connection 1.

On Performance of Index Scans

Allocation-order scans are virtually always preferable to index-order scans in terms of performance. However, read consistency might very well be higher in priority. With zero logical fragmentation of an index, both should perform similarly; however, with high levels of fragmentation, index-order scans can get much slower. I discussed the cases in which the storage engine uses unsafe allocationorder scans. There are cases, however, when the storage engine can utilize the faster allocation-order scans when the plan shows Clustered Index Scan or Index Scan with Ordered: False without sacrificing read consistency:

TABLOCK—When you query data with the TABLOCK hint, no one can insert data while you read. The storage engine can safely use an allocation-order scan in this case. Of course, you have to consider the implications on processes trying to modify the table data while you read.
READ_ONLY—When the index being scanned resides in a READ_ONLY file group or database, again, since the storage engine realizes that no one can modify data during the read, it can safely use an allocation-order scan. This is something that you have control over. If you have tables that are modified periodically, you can place them in a READ_ONLY file group and temporarily set the file group to READ_WRITE allow the periodic modifications. This way, you’ll get the faster allocation- order scans. Similarly, a database holding a data warehouse or data mart can be set to READ_ONLY during the day and changed to READ_WRITE temporarily during the Extract, Transform, and Load (ETL) processes loading the daily data.

A Read Uncommitted/NOLOCK Alternative

A common Read Uncommitted scenario involves systems that need to support very frequent inserts, and readers keep getting blocked when trying to acquire shared locks. And if readers manage to obtain shared locks, they interfere with writers. You might have thought that individual inserts—each run as an independent transaction— wouldn’t cause consistency problems under Read Uncommitted, but now you know the truth. So, is your environment doomed to either suffer from the read-consistency problems described in this article or suffer from frequent blocking problems and slow running index scans?

There might be a reasonable alternative in SQL Server 2005 using the Read Committed Snapshot isolation. This isolation is similar in concept to Read Committed, except that it’s based on row versioning and not shared locks for readers. If your environment mainly has frequent reads and inserts with infrequent updates and deletes, it might be a good candidate for Read Committed Snapshot. Deletes and updates get versioned, but inserts don’t, so in such a system there will be little overhead on the version store in the tempdb database. The storage engine will utilize index order scans for clustered index scans and index scans with the Ordered: False property, so you won’t face the consistency problems that this series of articles describes. You can mitigate the performance problems with such scans by keeping your indexes defragmented, and when you rebuild indexes as part of your maintenance process, you can specify a fill factor to reduce the occurrences of splits. In this isolation, readers won’t get blocked by writers, and writers won’t get blocked by readers. This is an option that’s definitely worthwhile to test in your system as an alternative to Read Uncommitted.

On to the Next Step

You now know about read-consistency problems that can occur when reading data under the Read Uncommitted isolation when processes insert data while an allocation order scan of an index is in progress. Next month, I’ll describe other read-consistency problems that can happen even with index order scans in both Read Uncommitted and Read Committed isolations.

Comments

Plain text