Question: I know that SQL Server is prone to inconsistent analysis using the read committed isolation level, but I thought I could prevent that by using a clustered index. Is this true?
Answer: No, a clustered index does not prevent inconsistent analysis.
As background, the inconsistent analysis problem is where a read query may return different results on subsequent executions because of intermediate writes/updates. In some cases a query may return essentially a wrong result because of an update that occurs while a read query is executing.
One classic example of inconsistent analysis is when a SELECT COUNT(*) query returns an incorrect result because an update that occurred while the select was running that caused one of more rows to be counted twice.
Here is an example of inconsistent analysis where a clustered index is involved.First of all I’ll create a simple table with a clustered index and three rows:
CREATE TABLE t1 (c1 INT, c2 VARCHAR (100));
CREATE CLUSTERED INDEX t1s1 ON t1 (c1)
INSERT INTO t1 VALUES (1, 'Anderson');
INSERT INTO t1 VALUES (2, 'Randal');
INSERT INTO t1 VALUES (3, 'Tripp');
Now in connection 1 I’ll start an explicit transaction that updates row 3 in the table. This will hold an exclusive lock on row 3 until I commit the transaction.
UPDATE t1 SET c1 = 4 WHERE c1 = 3;
Now in connection 2 I’ll start a scan of the table. It will block waiting to acquire a share lock on row 3.
SELECT COUNT (*) FROM t1;
And finally I’ll update row 1 in another connection. This succeeds because the scan in connection 2 has already acquired and dropped the required share lock on row 1, and the update in connection 1 is holding an exclusive lock on row 3.
UPDATE t1 SET c1 = 5 WHERE c1 = 1;
The scan is still blocked, so if I commit the transaction in connection 2, the scan completes.
You’ll see that when the scan completes, it returns 4!! If you run it again it returns the correct value: 3.
The reason is inconsistent analysis, even with a clustered index.
In the scenario I created, the scan processed row 1, and then became blocked. While the scan was blocked I updated row 1 so it’s cluster key had a value larger than those values already processed by the scan. When the scan was able to continue, it processed the updated row *again*, making the row count incorrect. The row was encountered twice because an update to a cluster key forces a delete+insert operation (see my blog post here for proof and deeper explanation).