How Many Indexes Should I Create?

Q: You’ve frequently discussed what a good clustering key should be, as well as how indexes work. However, you haven’t told us how many indexes we should have on a table. What’s a good number of indexes?

A: This question is part of a prank that Paul and I pull on each other occasionally. When Paul’s not around, I’ll tell students to ask him “If I have a database that’s X terabytes, how long will it take for CHECKDB to run?” and when I’m not around, he tells students to ask me “How many indexes should I have on a table?” However, I do have to admit that both are actually great questions! The problem is that neither question is easy to answer. There are a tremendous number of issues to consider, and even those don’t give you an exact answer. (To find out the answer to the CHECKDB question, see "Determining How Long It Takes for DBCC CHECKDB to Run.") In this article, I’ll respond to the indexing question; however, there’s no easy answer to this question.

Generally, I would recommend you create as few indexes as possible on any table. Over-indexing can be far worse than under-indexing because it leads to many problems, including data modification overhead, wasted disk space, and wasted cache. The worst type of over-indexing that I’ve seen is where an index is created on each column, without any regard for performance tuning. This is a horrible idea. In fact, very narrow indexes (i.e., indexes with only one or two columns) have very few uses in SQL Server. What I recommend are fewer overall indexes, but wider indexes because they have far more uses.

However, under-indexing can be very problematic because it limits the options the optimizer has to process your query requests, and it can lead to inefficient table scans. If a table is small (only a few megabytes), a table scan probably isn’t horrible, but for larger tables (tables in gigabytes) it can be a huge problem. I’ve seen a few cases in which the database had no indexing at all—no data integrity constraints and no manually created indexes. This scenario isn’t good, either.

It sounds like a fine line between too many indexes and too few, but it’s not quite as fine of a line as you might think. There’s definitely a wide range of what can be useful indexes. Additionally, how many indexes are really necessary will depend on many factors, the most important of which is the table’s usage pattern. The overall point, however, is how to create the right indexes. To start, I’d say that most tables should have fewer than 15 indexes. In many cases, tables that focus on transaction processing (OLTP) might be in the single digits, whereas tables that are used more for decision support might be well into double digits. However, I’d also add that most cases in which I see 40, 50, or more than 60 indexes are because of the following reasons:

Indexes are added without any single person watching over what’s being done (there are too many cooks in the kitchen).
Index recommendations are made by a variety of tools, but no one is analyzing what’s there before they approve (and create) the recommended indexes

I recommend starting with a small number of indexes and slowly (and iteratively) adding indexes that are recommended. However, never add an index without checking your existing indexes to see if you can consolidate or remove an existing index first. This is a bit more challenging (and I’ll write a few posts on this topic soon). Here’s the step-by-step process I like to follow for creating indexes (this process is especially helpful for developers):

Create an effective clustering key. To better understand the factors that make up a good clustering key, see "Changing the Definition of a Clustered Index," "Isn’t the Clustering Key Redundant?," "Where Does that Clustering Key Go?," and "What Happens If I Drop A Clustered Index?."
Create your primary key (if you decided that the primary key you want isn’t suited to being the clustering key—for example, if it’s a GUID).
Create your unique keys.
Add foreign keys when appropriate and manually create nonclustered indexes on your foreign keys. SQL Server has never automatically created indexes on foreign keys and these can help in several ways. They can help improve the performance of the relationship as well as some joins. There are often better indexes for many joins, but this is always a good way to start.
Add a few nonclustered indexes for the columns that are very selective and used frequently in queries (e.g., LastName, FirstName, MiddleInitial). If you’re doing a lot of customer lookups by name, then a single index that combines the columns in the order of what’s most requested can be added. If the combination is highly selective, SQL Server can use this index to help navigate to the data faster. If the column or combination of columns isn’t highly selective, the index isn’t as likely to be used (when the query is requesting other columns of data). This is where things start to get more challenging, and I’ll have to tackle these issues in another post. Do not add too many indexes.

Slowly and iteratively add only the most important and critical indexes based on workload analysis.

Comments

Plain text