Digging Up the Dirt on Indexes

In SQL Server, indexes are one of the most important data structures—they are the only means by which SQL Server can process huge tables in a reasonable amount of time. In my November column, "Get Into Index Structures," I described the structure and organization of SQL Server indexes and showed you an undocumented stored procedure that you can use to find the size of an index. SQL Server offers very few documented tools for examining internal information (such as metadata) about indexes, but it does have some. I show you some of the lesser known tools and techniques this month.

Help Yourself to Indexes

You can use the sp_helpindex stored procedure to access the most generic information about indexes. The information this procedure returns is a subset of the information you get back if you run sp_help. Both of these procedures list the names of all the indexes on the table, report which columns are the key columns, and list the index properties (e.g., uniqueness, clustered or nonclustered, ascending or descending order). However, sp_helpindex requires a table name as a parameter and limits its results to the indexes on one table, whereas the table name is optional for sp_help.

To obtain additional structural information about an index, you can use the documented function INDEXPROPERTY(). You can call this function to return several pieces of information about an index, such as number of nodes and fill factor, but not all of its options are well documented. And, you can request only one piece of information each time you call the function. Only the IndexDepth property helps you find index structure information, but I'll also mention several other properties that I think need a little more explanation because they can help you understand the usefulness of particular indexes for particular queries.

For any index, the IndexDepth property returns the number of levels in the index. Although the data pages are the leaf level of a clustered index, when INDEXPROPERTY() returns the number of index levels, it doesn't include the leaf level for clustered indexes. This function's result includes only the levels above the leaf. If you call the function as follows, you'll see that the index on the title_id column in the Pubs database's titles table has only one level aside from the leaf—that is, the root.

SELECT INDEXPROPERTY
	(object_id('titles'),
	'UPKCL_titleidind', 
	'IndexDepth')

In this query's results, note that the index name begins with UPKCL. When it creates an index automatically to support a declared primary key on the table, SQL Server uses this prefix to denote a unique clustered index on a primary key. Of course, you can manually create indexes that start with UPKCL, but this prefix is usually an indication that SQL Server created an index automatically.

Filling You In

IsPadIndex is another parameter you can pass to the INDEXPROPERTY() function. When you use this parameter, the output tells you whether the index creator used the PAD_INDEX option. But you might not know that this parameter is related to the index's FILLFACTOR value. When creating an index, you must specify a fill factor before the PAD_INDEX option can have any meaning. Alternatively, if you set the sp_configure option fillfactor to a non-default value, every index you create will have that FILLFACTOR value, unless you specifically supply another value when you create the index. Without the PAD_INDEX option, any fill factor you've specified affects only the index's leaf level, and SQL Server builds the node levels with space for one or two more index rows. The PAD_INDEX option tells SQL Server to apply the FILLFACTOR value to all levels of the index.

Microsoft recommends that you use information schema views to get all your metadata instead of directly accessing system tables. However, because indexes deal with the physical storage of your data, ANSI doesn't dictate how a relational database management system (RDBMS) such as SQL Server should organize the indexes, and thus no information schema view contains storage details about indexes.

Because information schema views contain no index storage information, you have to go directly to the SQL Server system tables to get it. SQL Server stores metadata information about your indexes in the sysindexes system table. Sysindexes contains one row for each table and one row for each nonclustered index on each table. If the table has a clustered index, the sysindexes table contains a row with an indid (for index ID) value of 1, and if the table has no clustered index, sysindexes has a row with an indid value of 0. One table will never have a row containing a 0 and a row containing a 1. Indid values for nonclustered indexes can range between 2 and 250. If the table has any text, ntext, or image columns, it will have a sysindexes row with an indid value of 255 to keep track of the storage for these large-object data columns.

Statistically Speaking

SQL Server 2000 and 7.0 let you build statistics on a column even if the column has no index. You can have SQL Server build these statistics automatically if you enable this feature through the database property AUTO_CREATE_STATISTICS. Alternatively, you can use the CREATE STATISTICS command or the sp_createstats procedure to force SQL Server to build specific column statistics. Each set of column statistics has a row in sysindexes, and like nonclustered indexes, each set of non-index statistics has a value between 2 and 250. Because indid values have to be unique for each object, if you have lots of statistics on a table, the number of nonclustered indexes you can build is limited—you can have no more than 249 nonclustered indexes and non-index statistics altogether on any table. In fact, because the sp_createstats procedure tries to create statistics in every table for all columns that don't have an index or statistics already built, if you have any tables with more than 250 columns, sp_createstats will generate an error.

Running the following query will show you all the indexes and all the statistics on a specific table:

SELECT indid, name  
	FROM sysindexes
WHERE id = object_id('MyTable')

Typically, the statistics that SQL Server has automatically created have the prefix _WA_Sys_, so they're easy to recognize. However, manually created statistics can have any legal identifier for a name, so you might not be able to tell which objects in sysindexes are indexes and which are statistics.

To tell the difference between statistics and indexes, try this technique. Using the IsStatistics property to call the INDEXPROPERTY() function will return a 1 if the name refers to non-index statistics or a 0 if the name refers to an index. About the IsStatistics property, SQL Server Books Online (BOL) says: "Statistics indexes are used as a placeholder for column-level statistics." However, there's no such thing as a "statistics index;" there are indexes, and there are column-level statistics. IsStatistics refers to column statistics, whether automatically or manually created. Alternatively, using the IsAutoStatistics parameter when calling the INDEXPROPERTY() function returns a 1 only for statistics that SQL Server automatically created and returns 0 for all manually created statistics and indexes.

Another special type of row that can appear in sysindexes is a hypothetical index. The Index Tuning Wizard creates hypothetical indexes as it tests various possibilities; the names look something like hind_c_2133582639_1A, where hind stands for Hypothetical INDex. Calling INDEXPROPERTY with the IsHypothetical property tells you whether a name in sysindexes refers to one of these indexes. SQL Server automatically removes these indexes after a while, so don't worry if you see them occasionally if you examine the sysindexes table before SQL Server has cleaned them up.

The more you know about indexes and the statistics on tables, the more easily you'll be able to understand the query performance you see, and you'll know why the query optimizer chooses a particular plan for a query. Next time, I'll discuss more of the columns in the sysindexes table and look at other tools you can use for examining index structures.

Read All About It

For more information about automatically created statistics in SQL Server, see these articles.

BRIAN MORAN:
SQL Server Savvy, "Real vs. Auto-Created Indexes," May 2003

KALEN DELANEY:
Inside SQL Server, "Statistically Speaking," October 2001
"SQL Server Statistics: A Useful Query Optimizer Tool," August 1999

For more information about SQL Server fill factor, see the following articles.

KALEN DELANEY:
Inside SQL Server, "More Fill-Factor Mysteries," May 2001
Inside SQL Server, "The Fill-Factor Truth," April 2001

Comments

Plain text