Clustered Indexes and Swiss Cheese

4 Min Read
ITPro Today logo

SQL Server 6.5 manages table space using 16KB extents. Becauseextents are the smallest unit of table space management, SQL Server can'tdeallocate an extent from a table until every row has been deleted from everypage. If the average row size is 50 bytes, you can fit about 40 rows per 2KBpage, or 320 rows per extent. If you delete 319 of those rows, SQL Server stillconsiders the entire extent allocated to the table, so one 50-byte row consumes16KB. One almost empty extent isn't that bad, but if you have 100 or 1000 emptyextents, wasted space adds up quickly.

Clustered indexes help SQL Server manage space more effectively, minimizingthe empty extent scenario. The following example shows how and why the clusteredindex is such an important space management tool.

To create our test environment, we cloned the pubs ..authors tableuntil we ended up with 23,522 rows. We cloned the table by creating a new tablecalled Authors2, initially an exact match of the authors table, and used thefollowing command several times to populate the new table.

INSERT INTO Authors2 SELECT * FROM Authors2

Authors2 now contains 23,522 rows spread across 1071 pages and is tightlycompacted within the database. To obtain that information, we ran DatabaseConsistency Checker (DBCC) SHOWCONTIG on the tables. (DBCC SHOWCONTIG lets uscheck how fragmented a table has become. For a full explanation of DBCCSHOWCONTIG's output, refer to the documentation in SQL Server Books Online--BOL.)

DBCC SHOWCONTIG scanning 'Authors2' table...

[SHOW_CONTIG - SCAN ANALYSIS--condensed]

-------------------------------------------------

Table: 'Authors2' (572529073) Indid: 0 dbid:8

TABLE level scan performed.

- Pages Scanned...........................: 1071

- Avg. Page Density (full)..............: 97.94%

We know the data is tightly compacted on the page because the Avg. PageDensity (full) is 97.94 percent. But watch what happens when we delete half(11,761) the rows (the even-numbered rows):

DELETE AuthorsTest2

WHERE PKCol % 2= 0

Running DBCC SHOWCONTIG after our DELETE shows that the 1071 pages are now49.96 percent full. Why hasn't the size of the table decreased, even though wedeleted half the data? SQL Server can't reclaim the space because the extentsstill have some data on them. Here's a condensed version of the DBCC SHOWCONTIGoutput after the deletion:

DBCC SHOWCONTIG scanning 'Authors2' table...

-------------------------------------------------

Table: 'Authors2' (572529073) Indid: 0 dbid:8

- Pages Scanned...........................: 1071

- Avg. Page Density (full)..............: 49.96%

Now watch what happens when we reinsert the 11,761 rows we just deleted:

INSERT INTO Authors2

SELECT * FROM Authors3

WHERE PKCol % 2= 0

(Authors3 is an identical copy of the original Authors2 table. We're usingthis data to easily repopulate the 11,761 rows we just deleted.)

DBCC SHOWCONTIG scanning 'Authors2' table...

-------------------------------------------------

- Pages Scanned...........................: 1606

- Avg. Page density (full)..............: 65.97%

Authors2 still has 23,522 rows, but we've added 535 pages to ouroriginal 1071 pages. We've increased the space required to store our original23,522 rows by approximately 50 percent, without adding any new data. Somethingweird seems to be going on.

TABLE A: Space Utilization in Clustered Indexes

Step

Description

Pages Scanned

Avg. Page Density(full)

1

Start the test. Authors2 has 23522 rows in it.

1071

97.94 percent

2

Delete all of the even-numbered rows.

1071

49.96 percent

3

Reinsert all the even-numbered rows.

1071

97.94 percent

This growth occurred because the Authors2 table didn't have a clusteredindex on it. Without a clustered index, which forces SQL Server to insert thedata in sorted order, SQL Server adds all the new rows to the end of the table.Now the growth makes sense. Deleting 11,761 rows without a clustered index freedup space in the middle of the table, but SQL Server didn't reuse this space whenwe reinserted the same 11,761 rows. SQL Server added all the new rows to the endof the table rather than restoring them in sorted order.

Rerunning the same tests with a clustered index on the Authors2 table has adramatically different effect on space utilization within the table. We'restarting this test with two identical tables. Authors2 and Authors3 haveidentical structures and data, except for the value of PKCol. Table A summarizesspace utilization information as we delete and reinsert all the even-numberedPKCol rows:

In Step 2, Avg. Page Density (full) dropped to 49.96 percent, as weexpected when we deleted half the rows. However, when we reinsert those rows,SQL Server still can't reclaim the empty space when the pages are half full. Butthe clustered index lets SQL Server reclaim the unused space when the new rowsare reinserted in their sorted order. The clustered index lets us save 535 2KBpages, or 50 percent of the table's original size. The moral is to always placea clustered index on tables that are frequently subjected to INSERT, UPDATE, andDELETE commands. Otherwise, your tables will end up looking like Swiss cheese.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like