How Does Index Fragmentation Affect SELECT Performance?

Q: To preserve application architecture, we want to retain the GUID as a primary key when clustering on a new INT column that’s not used by the application. Of course, a nonclustered index will now be in place on the GUID column, both to enforce uniqueness of the primary key and to optimize SELECT * FROM TABLE WHERE LEGACYID = . I practiced doing this on a nonproduction instance. I also ran 50,000 INSERT statements after the fact to measure the improvement in contiguous pages. As expected, fragmentation of the base table remains under 1 percent, even after all these INSERTS. However, the fragmentation of our new nonclustered index for the GUID column is 99 percent. So, are we really better off by changing our clustered index if the new nonclustered index is so fragmented? Can you clarify how fragmentation of the base table affects SELECT performance before we commit?

A: This is a great question and, at first glance, you’re right that the fragmentation of the nonclustered index seems like a problem. However, fragmentation in nonclustered indexes isn’t as much of a concern as fragmentation in the clustered index—especially when the nonclustered index is used for highly selective “bookmark” lookups. There are a few points to make to prove why this is true. However, it’s true that fragmentation (in any index) is still a concern.

The clustered index is the largest structure associated with a table. When it’s fragmented, SQL Server’s going to end up requiring the entire table to be in cache (for INSERTs) when queries might not really warrant it. A good example is when something such as a sales table is clustered by a GUID. For most of the day-to-day operations, only recent sales are needed for queries, and inserts are of the utmost importance (both for speed and durability). If a sales table is clustered by a non-sequential GUID, within a short period of time every page will be in cache and will need to stay within cache to support inserts. Also, these inserts will be slower because of the splits.

Conversely, if the table is clustered by an ever-increasing key (an identity-based INT or even a DATETIME value), then only the portion of the table in which new rows are being inserted will need to be in cache. This helps query performance by placing less pressure on memory and wasting less memory. It’s true that nonclustered indexes that are fragmented will need to be in cache, but the size of a typical nonclustered index is far smaller than the clustered index, so the footprint on the cache is smaller. In addition, because the rows inserted into the index contain only the column(s) defined by the index, the index rows are smaller and settings for FILLFACTOR effectively last longer.

Finally, there are very few restrictions on nonclustered index rebuilds. Even if you aren’t running SQL Server Enterprise Edition, the downtime associated with a nonclustered index rebuild is less because it’s smaller and therefore takes less time to rebuild as opposed to a rebuild of the clustered index for that table. If you’re lucky enough to be running SQL Server Enterprise Edition, the good news is that you’ll very likely be able to rebuild nonclustered indexes online—with minimal impact on production. To leverage online rebuilds, your index can’t include any large object (LOB) columns within it. Naturally, the clustered index has all of the columns in the table, so this is one of the reasons online rebuilds for the clustered index might not be possible.

Now back to your question about SELECT performance. The type of SELECT you’re running is what I would categorize as a highly selective bookmark lookup. Because this requires both the clustered and the nonclustered index for seeking, fragmentation in either index is less of a concern. However, it’s definitely easier to handle and minimize in the nonclustered index (with an appropriate FILLFACTOR) than in the clustered index.

So, there are really a myriad of reasons why these changes help query performance, but it’s also about having a healthier table overall. No matter how you define your table (and indexes), you’ll still have fragmentation, but within the nonclustered indexes it’s less of a concern. No matter what, your tables still require regular maintenance to minimize the negative effects of splits and fragmentation.

Comments

Plain text