I have a table that holds about 60 million rows. The primary key is an integer column that I maintain by using a NextID table. Because no columns in the table are subject to a range search, I chose a clustered index to maximize the performance of joins to other tables and to avoid the development of disk hotspots. The clustered index also helps me avoid insert problems by distributing the data evenly among the company's RAID 10 set. I have several questions.
- If I place a clustered index on a mono-increasing column, will using a stripe set eliminate the disk hotspot problems? Or will I need to use another method to evenly distribute the data?
- How will SQL Server manage the index B-tree when the primary key's column value is always 1 greater than the last value? Will the B-tree grow unevenly and, therefore, require more maintenance and suffer reduced performance?
- If I create a nonclustered index instead of a clustered index on the primary key and let SQL Server manage the space allocation, what effect will the nonclustered index have on data distribution and join performance?
- I've heard that reversing the bit representation of the NextID value creates a more even distribution of key values for a clustered index but still gives you the benefit of using a simple NextID methodology to manage the disk-space allocation. For example, 1 (0000 0001) would become 128 (1000 0000), which SQL Server would insert as the key. The next value would be 64—because 2 (0000 0010) would transform to 64 (0100 0000), and so on. Or does reversing the bit representation constitute unnecessary overengineering?
Let's look at your questions in order.
- Even if you use a stripe set, disk hotspots will develop. (Disk hotspots are points of very busy activity that tend to cause system throughput to diminish.) You'll still have contention for the "last" set of pages because striping typically occurs in 64KB chunks. ("Last" means the set changes over time but will always be the most recently referenced set.)
- In your scenario, the B-tree will always remain balanced. However, inserts in a clustered index that are based on an increasing key always go to the "end" of the pages. ("End" refers to the logical end, not necessarily the physical end.)
- A nonclustered index on the increasing-value key will have the same effect on the data distribution and join performance as a clustered index.
- As for reversing the bit representation of the key column value, I believe you'll have greater success simply by creating a clustered index on another column that's reasonably unique, then using a nonclustered index for your primary key.
Of course, you should always test which indexes and primary keys work best for your specific application.