Skip navigation

Quaere Verum - Clustered Index Scans - Part I

Did it ever happen to you that you believed in something so strongly without ever bothering to verify the truth about it because it seemed so obvious, and one day realized that you were wrong? Well, it happened to me recently…

Before I unveil the details of the case, I’d like to express my deepest gratitude to Lubor Kollar and Srikumar Rangarajan who helped me get to the bottom of things and for explaining the behavior I couldn’t explain myself.
Lubor also gave me great guidance and advice!

Tested on SQL Server versions: 2000 Dev/SP4, 2005 Dev/SP1

I’ll start by asking you a seemingly simple question. Given a table T1 with a clustered index on col1, is the following query guaranteed to return the data in clustered index order?

SELECT * FROM T1;

Try to answer the question to yourself, and also to explain the reasoning behind your answer. Apparently, the answer is far from being simple, and the implications of what really happens are profound.

Bear with me as I go through some crucial fundamentals that you may already be familiar with before I even get to the main points I want to make and before describing the recent findings.

I’d like to stress the ANSI SQL standard’s stand on the matter. As far as the standard is concerned, such a query returns a set, and since a set has no order to its rows, the result is considered valid regardless of the order in which the rows are returned. If you want to guarantee that the result would be returned in a particular order you MUST specify an ORDER BY clause.

The implications in regards to SQL Server are that it can utilize unordered or ordered access methods as it deems fit.

I’d like to make some observations in regards to the common beliefs out there, and disprove some myths.

Clustered index physically organizes the data in order (myth/false assumption)

Some people believe that creating a clustered index on a table causes the data to be physically organized on disk by index key order.

The term “physical” itself is problematic since table/index data resides in pages, which reside in extents, which reside in files. Files are not necessarily contiguous on disk due to file system fragmentation, not to speak of a file being placed on a RAID array, and/or having multiple files in a database. But for the sake of the discussion, I’ll make things simple; I’ll work with a database that has only a single data file placed on a single drive, and ignore file system fragmentation; that is, I’ll assume that the file is a single contiguous unit.

Note that even the records within a page are not necessarily sorted physically. SQL Server organizes the records within the page in a sorted fashion by means of a row-offset array that it maintains at the end of each page.

I’ll demonstrate that the pages at the leaf level of the clustered index are not necessarily organized within the file in the same order as the logical index key order (maintained by the index leaf level’s linked list). To do so, I’ll create a table called T1 with a clustered index on col1; I’ll populate the table with a loop that inserts a row in each iteration, with a random value in col1:

SET NOCOUNT ON;

USE master;

GO

IF DB_ID('testdb') IS NULL

  CREATE DATABASE testdb;

GO

USE testdb;

GO

IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL

  DROP TABLE dbo.T1;

GO

CREATE TABLE dbo.T1

(

  col1   INT NOT NULL,

  filler CHAR(2000) NOT NULL DEFAULT('a')

);

CREATE CLUSTERED INDEX idx_cl_col1 ON dbo.T1(col1);

GO


DECLARE @i AS INT;

SET @i = 1;

WHILE @i 
BEGIN

  INSERT INTO dbo.T1(col1)

    VALUES(1 + ABS(CHECKSUM(NEWID()) % 1000));

  SET @i = @i + 1;

END

I’m using random values so that the rows will be entered into the clustered index in random order, causing page splits which in turn cause logical fragmentation. Logical fragmentation is expressed as the percentage of out-of-order pages; that is, the discrepancybetween the logical order dictated by the index linked list vs. the
order in which the pages appear in the file.

The premise of the myth that a clustered index physically organizes the data in order is disproved by the fact that logical fragmentation exists; if you think about it, another way to express the myth is that there’s no such thing as logical fragmentation, which is of course wrong.

The following code run in SQL Server 2005 returns the level of logical fragmentation in the clustered index (in SQL Server 2000 use DBCC SHOWCONTIG):

SELECT avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats

( 

  DB_ID('testdb'),

  OBJECT_ID('dbo.T1'),

  1,

  NULL,

  NULL

);

I got the result 96.5217391304348, meaning that there are over 96 percents of out-of-order pages. If you need further proof that the pages do not necessarily reside in the file in the same order as in the linked list, use DBCC IND to show the actual pointers between the pages in the linked list:

DBCC IND('testdb', 'dbo.T1', 0);

To save you the trouble in deciphering the output, the following code generates a string with the addresses of the pages in the order in which they appear in the linked list:

CREATE TABLE #DBCCIND

(

  PageFID INT,

  PagePID INT,

  IAMFID INT,

  IAMPID INT,

  ObjectID INT,

  IndexID INT,

  PartitionNumber INT,

  PartitionID BIGINT,

  iam_chain_type VARCHAR(100),

  PageType INT,

  IndexLevel INT,

  NextPageFID INT,

  NextPagePID INT,

  PrevPageFID INT,

  PrevPagePID INT

);


INSERT INTO #DBCCIND

  EXEC ('DBCC IND(''testdb'', ''dbo.T1'', 0)');


WITH LinkedList

AS

(

  SELECT 1 AS RowNum, PageFID, PagePID

  FROM #DBCCIND

  WHERE IndexID = 1

    AND IndexLevel = 0

    AND PrevPageFID = 0

    AND PrevPagePID = 0


  UNION ALL


  SELECT PrevLevel.RowNum + 1,

    CurLevel.PageFID, CurLevel.PagePID

  FROM LinkedList AS PrevLevel

    JOIN #DBCCIND AS CurLevel

      ON CurLevel.PrevPageFID = PrevLevel.PageFID

      AND CurLevel.PrevPagePID = PrevLevel.PagePID

)

SELECT

  CAST(PageFID AS VARCHAR(MAX)) + ':'

  + CAST(PagePID AS VARCHAR(MAX)) + ' ' AS \[text()\]

FROM LinkedList

ORDER BY RowNum

FOR XML PATH('')

OPTION (MAXRECURSION 0);


DROP TABLE #DBCCIND;

Here’s the output that I got:

1:109 1:1567 1:1531 1:1488 1:1541 1:1508 1:1523 1:1558 
1:1529 1:1501 1:1570 1:1536 1:1569 1:1491 1:1525 1:1539 
1:1555 1:1486 1:1480 1:1571 1:1526 1:80 1:1527 1:1528 
1:1557 1:1502 1:1548 1:1587 1:1510 1:1498 1:1483 1:1568 
1:1554 1:1500 1:1519 1:1564 1:1530 1:1489 1:1575 1:1524 
1:1562 1:1550 1:41 1:1583 1:1534 1:1549 1:1492 1:1566 
1:1540 1:1515 1:1538 1:1551 1:1503 1:1490 1:1504 1:1559 
1:174 1:1560 1:1514 1:89 1:1537 1:1563 1:1495 1:1509 
1:1580 1:1505 1:1496 1:1542 1:1556 1:1573 1:1512 1:1577 
1:1533 1:1485 1:1494 1:1520 1:1521 1:1547 1:1543 1:1516 
1:1565 1:1497 1:1552 1:1578 1:1579 1:1581 1:1522 1:110 
1:1586 1:1532 1:1507 1:1576 1:1481 1:1518 1:1487 1:1582 
1:1506 1:1544 1:1511 1:1572 1:1553 1:77 1:1574 1:1546 
1:1585 1:1517 1:1482 1:1493 1:1513 1:1499 1:1561 1:1535 
1:1484 1:1584 1:1545

You can clearly see that there are many out of order pages (next page in linked list appears earlier in the file).

Note that when you create or rebuild an index on an existing table, SQL Server will make effort to create it in a contiguous manner (least amount of fragmentation), but there are no guarantees—especially when the SORT_IN_TEMPDB option is not specified or the operation is processed in parallel. When you do specify the SORT_IN_TEMPDB option and the index creation/rebuild is processed with a single thread, this may improve
the continuity of index extents; but again, no guarantees that the operation will end up with an index that has no fragmentation at all.

Here’s a snippet from Books Online that explains this:

“The SORT_IN_TEMPDB option may improve the contiguity of index extents, especially if the CREATE INDEX operation is not being processed in parallel. The sort work area extents are freed on a somewhat random basis with regard to their location in the database. If the sort work areas are contained in the destination filegroup, as the sort work extents are freed, they can be acquired by the requests for extents to hold the index structure as it is built. This can randomize the locations of the index extents to a degree. If the sort extents are held separately in tempdb, the sequence in which they are freed has no affect on the location of the index extents. Also, when the intermediate
sort runs are stored in tempdb instead of the destination filegroup, there is more space available in the destination
filegroup. This increases the chances that index extents will becontiguous.”

Conclusion: having a clustered index on a table does not guarantee that the data is organized within the file by index key order.

Continues in Part II...

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish