Q: I’ve learned that my clustering key (i.e., the columns on which I defined my clustered index) should be unique, narrow, static, and ever-increasing. However, my clustering key is on a GUID. Although a GUID is unique, static, and relatively narrow, I’d like to change my clustering key, and therefore change my clustered index definition. How can I change the definition of a clustered index?
A: This question is much more complex than it seems, and the process you follow is going to depend on whether the clustered index is enforcing a primary key constraint. In SQL Server 2000, the DROP_EXISTING clause was added to let you change the definition of the clustered index without causing all of the nonclustered indexes to be rebuilt twice. The first rebuild is because when you drop a clustered index, the table reverts to being a heap, so all of the lookup references in the nonclustered indexes must be changed from the clustering key to the row identifier (RID), as I described in the answer to the previous question. The second nonclustered index rebuild is because when you build the clustered index again, all nonclustered indexes must use the new clustering key.
To reduce this obvious churn on the nonclustered indexes (along with the associated table locking and transaction log generation), SQL Server 2000 included the DROP_EXISTING clause so that the clustering key could be changed and the nonclustered indexes would need to be rebuilt only once (to use the new clustering key).
However, the bad news is that the DROP_EXISTING clause can be used to change only indexes that aren’t enforcing a primary key or unique key constraint (i.e., only indexes created using a CREATE INDEX statement). And, in many cases, when GUIDs are used as the primary key, the primary key constraint definition might have been created without specifying the index type. When the index type isn’t specified, SQL Server defaults to creating a clustered index to enforce the primary key. You can choose to enforce the primary key with a nonclustered index by explicitly stating the index type at definition, but the default index type is a clustered index if one doesn’t already exist. (Note that if a clustered index already exists and the index type isn’t specified, SQL Server will still allow the primary key to be created; it will be enforced using a nonclustered index.)
Clustering on a key such as a GUID can result in a lot of fragmentation. However, the level of fragmentation also depends on how the GUIDs are being generated. Often, GUIDs are generated at the client or using a function (either the newid() function or the newsequentialid() function) at the server. Using the client or the newid() function to generate GUIDs creates random inserts in the structure that’s now ordered by these GUIDs—because it’s the clustering key. As a result of the performance problems caused by the fragmentation, you might want to change your clustering key or even just change the function (if it’s server side). If the GUID is being generated using a DEFAULT constraint, then you might have the option to change the function behind the constraint from the newid() function to the newsequentialid() function. Although the newsequentialid() function doesn’t guarantee perfect contiguity or a gap-free sequence, it generally creates values greater than any prior generated. (Note that there are cases when the base value that’s used is regenerated. For example, if the server is restarted, a new starting value, which might be lower than the current value, will be generated.) Even with these exceptions, the fragmentation within this clustered index will be drastically reduced.
So, if you still want to change the definition of the clustered index and the clustered index is being used to enforce your table’s primary key, it’s not going to be a simple process. And, this process should be done when users aren’t allowed to connect the database, otherwise data integrity problems can occur. Additionally, if you’re changing the clustering key to use a different column(s), then you’ll also need to remember to recreate your primary key to be enforced by a nonclustered index instead. Here’s the process to follow to change the definition of a clustered index:
Disable all the table’s nonclustered indexes so that they aren’t automatically rebuilt when the clustered index is dropped in step 3. Because this is likely to be a one-time operation, use the query in Listing 1 (with the desired table name) to generate the ALTER INDEX statements.
Listing 1: Code to Generate the ALTER INDEX Statements
SELECT DISABLE_STATEMENT = N'ALTER INDEX ' + QUOTENAME(si.[name], N']') + N' ON ' + QUOTENAME(sch.[name], N']') + N'.' + QUOTENAME(OBJECT_NAME(so.[object_id]), N']') + N' DISABLE' , ENABLE_STATEMENT = N'ALTER INDEX ' + QUOTENAME(si.[name], N']') + N' ON ' + QUOTENAME(sch.[name], N']') + N'.' + QUOTENAME(OBJECT_NAME(so.[object_id]), N']') + N' REBUILD' FROM sys.indexes AS si JOIN sys.objects AS so ON si.[object_id] = so.[object_id] JOIN sys.schemas AS sch ON so.[schema_id] = sch.[schema_id] WHERE si.[object_id] = object_id('tablename') AND si.[index_id] > 1Note that you should use the column for DISABLE_STATEMENTS to disable the nonclustered indexes, and be sure to keep the enable information handy because you’ll need it to rebuild the nonclustered indexes after you’ve created the new clustered index.
- Disable any foreign key constraints. This is where you want to be careful if there are users using the database. In addition, this is also where you might want to use the following query to change the database to be restricted to only DBO use:
ALTER DATABASE DatabaseName SET RESTRICTED_USER WITH ROLLBACK AFTER 5
The ROLLBACK AFTER n clause at the end of the ALTER DATABASE statement lets you terminate user connections and put the database into a restricted state for modifications. As for automating the disabling of foreign key constraints, I leveraged some of the code from sp_fkeys and significantly altered it to generate the DISABLE command (similarly to how we did this in step 1 for disabling nonclustered indexes), which Listing 2 shows.
Listing 2: Code to Generate the DISABLE Command
SELECT DISABLE_STATEMENT = N'ALTER TABLE ' + QUOTENAME(convert(sysname, schema_name(o2.schema_id)), N']') + N'.' + QUOTENAME(convert(sysname, o2.name), N']') + N' NOCHECK CONSTRAINT ' + QUOTENAME(convert(sysname, object_name(f.object_id)), N']') , ENABLE_STATEMENT = N'ALTER TABLE ' + QUOTENAME(convert(sysname, schema_name(o2.schema_id)), N']') + N'.' + QUOTENAME(convert(sysname, o2.name), N']') + N' WITH CHECK CHECK CONSTRAINT ' + QUOTENAME(convert(sysname, object_name(f.object_id)), N']') , RECHECK_CONSTRAINT = N'SELECT OBJECTPROPERTY(OBJECT_ID(' + QUOTENAME(convert(sysname, object_name(f.object_id)), N'''') + N'), ''CnstIsNotTrusted'')' FROM sys.objects AS o1, sys.objects AS o2, sys.columns AS c1, sys.columns AS c2, sys.foreign_keys AS f INNER JOIN sys.foreign_key_columns AS k ON (k.constraint_object_id = f.object_id) INNER JOIN sys.indexes AS i ON (f.referenced_object_id = i.object_id AND f.key_index_id = i.index_id) WHERE o1.[object_id] = object_id('tablename') AND i.name = 'Primary key Name' AND o1.[object_id] = f.referenced_object_id AND o2.[object_id] = f.parent_object_id AND c1.[object_id] = f.referenced_object_id AND c2.[object_id] = f.parent_object_id AND c1.column_id = k.referenced_column_id AND c2.column_id = k.parent_column_id ORDER BY 1, 2, 3
Use the column for DISABLE_STATEMENTS to disable the foreign key constraints, and keep the remaining information handy because you’ll need it to reenable and recheck the data, as well as verify the foreign key constraints after you’ve recreated the primary key as a unique nonclustered index.
Drop the constraint-based clustered index using the following query:
ALTER TABLE schema.tablename DROP CONSTRAINT ConstraintName
- Create the new clustered index. The new clustered index can be constraint-based or a regular CREATE INDEX statement. However, the clustering key (the key definition that defines the clustered index) should be unique, narrow, static, and ever-increasing. And although we’ve started to discuss some aspects of how to choose a good clustering key, this is an incredibly difficult discussion to have in one article. To learn more, check out my posts about the clustering key.
Create the primary key as a constraint-based nonclustered index. Because nonclustered indexes use the clustering key, you should always create nonclustered indexes after creating the clustered index, as the following statement shows:
ALTER TABLE schema.tablename ADD CONSTRAINT ConstraintName PRIMARY KEY NONCLUSTERED (key definition)
- Recreate the foreign key constraints. First, use the ENABLE_STATEMENT generated in step 2 to re-enable and recheck all of the foreign keys. In this case, you’ll want to make sure to recheck the data as well using the WITH CHECK clause. However, this is likely to be a one-time thing, so as long as you have kept the information from step 2, you should be able to recreate the foreign key constraints relatively easily.
- Once completed, make sure that all of the constraints are considered “trusted” by using the RECHECK_CONSTRAINT statements that were generated in step 2.
- Rebuild all of the nonclustered indexes (this is how you enable them again). Use the ENABLE_STATEMENT created in step 1. Rebuilding a nonclustered index is the only way to enable them.
Although this sounds like a complicated process, you can analyze it, review it, and script much of the code to minimize errors. The end result is that no matter what your clustering key is, it can be changed. Why you might want to change the clustering key is a whole other can of worms that I don’t have space to go into in this answer, but keep following the Kimberly & Paul: SQL Server Questions Answered blog and I’ll open that can, and many more, in the future!