As you're probably aware, one of the most important tasks that the SQL Server query optimizer performs is to determine which index (or set of indexes) to use for each table in a query. As each query is being optimized, SQL Server keeps track of indexes that the optimizer determines might have been useful by establishing the best possible index for each search argument in your WHERE clause and for each lookup done through a join. SQL Server 2005 then makes information about missing indexes available to you through two components. First, SQL Server provides a set of metadata objects to supply information about missing indexes. Second, for each query, information about missing indexes is available in the XML query plan. I'll explain the missing-index metadata, which can help you improve query performance, and next month I'll provide more details about working with the missing-indexes feature, including accessing missing-index information in your XML query plans.
Missing-Index Metadata
SQL Server has four dynamic management objects that provide information about missing indexes.
dm_db_missing_index_details. This view returns one row for each index that the optimizer determined was missing and includes information about columns that should be created as key columns and columns that can be added to the index as included columns. (For more information about SQL Server 2005's "included columns" feature, see "Seek and You Shall Find," September 2005, "10 Things to Love About SQL Server 2005," May 2005.) Each missing index is identified by a unique value called an index_handle.
dm_db_missing_index_group_stats. Missing indexes each belong to an index group, even when a group has only one index. In fact, in SQL Server 2005, a group will never have multiple indexes. Each index group appears in this view along with information about how many times this missing-index group could have been used and what kind of performance improvement might be expected if the indexes in the group were created. Each missing-index group is identified by a unique value called a group_handle.
dm_db_missing_index_groups. This view maps index_handle values to group_handle values; in this view, the missing-index–group values are in a column called index_group_ handle. Because there are no index groups with multiple indexes, you should see a one-to-one relationship between missing indexes and missing index groups. dm_db_missing_index_columns. This is the only function in the list and requires an index_handle as a parameter. The function returns a table containing a list of suggested columns needed to build the specified missing index.
Dissecting the Metadata
To delve into the metadata, first run the code in Listing 1, which makes copies of three tables from the AdventureWorks database and builds no indexes on them. Now run the query in Listing 2 and note the performance information. I'm running SQL Server 2005 with the November 2006 Community Technology Preview (CTP) of Service Pack 2 (SP2), and I get the results that Figure 1 shows for the logical reads component of STATISTICS IO. I get the results that Figure 2 shows for the execution-time component of STATISTICS TIME. The query plan indicates that all three tables are being scanned and that there's one loop join with the very small dbo.Territory table as the outer table and a second, hash join.
We can now inspect the dm_db_missing_ index_details view. As of SP2, SQL Server allows an optional second parameter for the object_name function. If you're still using SP1, you can invoke object_name with only the first parameter, as long as you're in the AdventureWorks database.
The query results in Table 1 show two recommended indexes, each of which SQL Server could use in a query that compares a single column to a constant. In the dbo.Customers table, we're looking for a particular TerritoryID value—2 in this example. The query also uses TerritoryID as the join key with the Territory table, so when Customers is the inner table in that join, the lookup to find the matching rows uses the TerritoryID column. The query uses the dbo.OrderHeader table as the inner table in a join, searching for matches on the CustomerID column, so the optimizer also recommends building an index on the CustomerID column of the dbo.OrderHeader table.
If the sys.dm_db_missing_index_ details view showed recommendations for inequality_columns only, indexes built on those columns would probably be less useful and the search condition would be less selective. If the sys.dm_db_missing_index_details view shows recommendations for both equality_columns and inequality_columns, we should create the index using the equality_columns first. The included_columns that sys.dm_db_missing_index_details shows are columns that you could add to the index definition to give you a covering index. In my experience, these columns don't have be actual "included columns." Rather, these columns could be additional key columns of the index.
The dm_db_missing_index_columnsfunction returns the list of all the columns to include in a specific suggested index, when the function is passed an index_handle as an argument. My query in Listing 3, which selects from thesys.dm_db_missing_ index_details view, doesn't include the index_handle, so as to limit the output width, but you could easily rewrite the query to include index_handle. My query returned an index_handle value of 13 for the index on the dbo.Customers table. I can then call the dm_db_missing_ index_columns function using that value, as the following query shows (code lines wrap because of space constraints), and see the results that Table 2 shows. This is basically the same information that we saw in the sys.dm_db_missing_index_details view, but we see the columns for only a single index, and each suggested column in the index is in its own row.
SELECT * FROM sys.dm_db_ missing_index_columns(13);
In a first attempt to improve this query's performance, run the script in Listing 4 to build two indexes using the equality_columns as keys. Then rerun the query in Listing 2; I observed the STATISTICS IO and STATISTICS TIME information that Figure 3 shows when I reran the query. We can see that not only is no Worktable required, but the number of reads in the dbo.OrderHeader table has been reduced. The plan still shows two table scans—but no hash join, and the new index on dbo. OrderHeader is being used. However, the new index on dbo.Customers isn't used. Apparently, merely building the index on dbo.Customers on a single column was insufficient. Let's run the code in Listing 5 to rebuild the index on dbo.Customers using all the columns, as already seen in Table 2.
Run the query in Listing 2 one more time. Note that this time the index on dbo.Customers is being used. You'll see the the STATISTICS IO and STATISTICS TIME values that Figure 4 shows. This latest plan indicates that the only table scan is on the outermost table in the query, and it's by far smallest table, with only two pages total.
There are fewer reads and the query takes less time to run, but the improvement isn't as great as the improvement after building the two indexes initially.
How Long Is the Index Missing?
The data is available through the missing-index dynamic management objects until SQL Server is restarted or until there's a change to the table schema, such as adding or removing columns or changing a column's data type. Also actually building an index suggested in the missing-index metadata removes the corresponding row from the view. In addition, SQL Server can keep track of a maximum of 500 individual indexes in the sys.dm_db_missing_index_ details view. Through SQL Server 2005 SP1, the optimizer stops saving missing-index recommendations once the 500-index limit is reached. SP2 includes logic to remove 20 percent of the less-relevant index suggestions when the 500-index limit is reached, so that over time the metadata should contain the most relevant information for the workload.
Keep in mind that the missing-index metadata isn't intended to replace the SQL Server 2005 Database Engine Tuning Advisor or your own index analysis. However, the metadata provides some quick suggestions to help you improve the performance of queries that lack only a single index on a search or join column.