In Search of Duplicate Indexes on Your Tables

I often see too many indexes on a given table that turn out to be redundant or less useful than they might seem to be at first. If you have more indexes than you really need, those indexes can be potentially harmful in terms of performance during Inserts, Updates, and Deletes, and they can increase the possibility of deadlocks.

I want to explore some of those reasons and show you how to identify the candidates for removal. To do so, I’ll use the DupeIndex script, which will generate a small report when I run it against my copy of the Adventureworks database. To highlight areas relevant to this discussion, I’ve included a few additional indexes in the database. For a little background about how to look for missing indexes, how to use a SQL Server DMV to examine the statistics of your existing indexes, and how to set a handle on which ones are in use, see the Learning Path.

Is It That Simple?

Take a look at Table 1, which shows the second portion of a report that outlines potential duplicate indexes. The report simply observes all indexes for each table and determines whether any have leading columns that are the same. Of course, matching leading columns don’t always mean that a certain index is duplicate or useless. Often, a compound index can have the same leading column as another index but be more selective. Or, the index might be a covering index—that is, an index that includes all the columns necessary to satisfy both the WHERE clause and the SELECT list, and negates the need to access the table itself.

Now, let’s examine a couple situations in which the indexes might be redundant. If you look at the indexes for the Vendor table, you can see that the VendorID column is the Primary Key (PK) and has a Clustered Index. That tells you that the index will never return more than one row if you include the VendorID in the WHERE clause with an equality expression; the fact that it’s clustered means that there’ll never be an additional lookup to satisfy the columns in the SELECT list. Therefore, it’s highly unlikely that you’d ever need the index on VendorID and AccountNumber because the engine can filter via the AccountNumber after it finds the row via the VendorID.

An exception is possible if the combination of VendorID and AccountNumber has been declared as a Unique Constraint. In that case, the combination is serving to enforce uniqueness even though it might never be used in a lookup. Even if this index was useful in our case, we certainly don’t need two of them. As you can see in Table 1, there are two identical non-clustered indexes on VendorID and AccountNumber—clearly unnecessary. Truly duplicate indexes provide no advantages whatsoever and will only add to the overhead associated with changes to the table and index maintenance. Another redundancy—this time with only a single column in the index—is on the BillOfMaterials table and the Unit- MeasureCode column. Again, these two indexes are true duplicates and only one of them might be necessary.

What about the indexes on the CultureID column of the Culture table? One is a clustered index and the other is non-clustered. The clustered index also happens to be the Primary Key constraint. Many people believe that a situation such as this is warranted, that you need an actual index that’s searchable in addition to the constraint. This myth is long-propagated. Constraints such as Primary Key and Unique create an index behind the scenes to enforce the constraints and are usable in searches, just like any other index.

Creating an index on the same expression as the Primary Key constraint is the most common concern I see related to duplicate indexes. The second most common is creating a nonclustered index on the same expression as the clustered index. If you have a clustered index already defined on a particular column, there’s no need for a non-clustered index on the same column.

Put It in Reverse

Table 2 is a sample of what you might see in the Reverse Indexes section of the report. In a reverse index, the columns of two indexes on a table are the same but in reverse order in the index expression. In this case, there are two columns—VendorID and AccountNumber—on the Vendor table. If all the queries that referenced either of these columns in the WHERE clause also included the other, these would be true reverse duplicate indexes. If both columns are specified, the order in the WHERE clause or index expression isn’t a factor because the engine is smart enough to arrange the lookup to match the index expression.

However, if you specified only one of the columns, it would need to be the first column in the index expression in order to do a seek. In this particular example, it might be best to create two single-column indexes on VendorID and AccountNumber instead of compound indexes. Or, if you frequently query on two columns together and sometimes on just VendorID, you can have a single index on VendorID, AccountNumber to satisfy both queries.

You’re the Judge

Remember that this simple report can’t replace a proper understanding of your data model and table usage. It can merely outline potential candidates for removal; you must be the final judge and apply good reasoning before taking action.

Comments

Plain text