Index Organization in 2005

Downloads
47667.zip

SQL Server 2005 introduces a new paradigm for data-definition language (DDL) commands. You'll create all objects by using a CREATE command, remove them with a DROP command, and modify them by using ALTER. SQL Server 2005 won't have separate stored procedures for changing one aspect of an object—like sp_defaultdb in SQL Server 2000 and 7.0, which changes a user's default database—or special single-purpose creation commands such as sp_addtype. SQL Server 2000 started moving in this direction by enhancing the ALTER DATABASE command's functionality to include all changes to database properties, and discouraging the use of the sp_dboption command.

Some actions that in earlier releases require DBCC options will be performed by ALTER commands in SQL Server 2005. Before SQL Server 2000, DBCC was an acronym for DataBase Consistency Checker, and the DBCC options available in the first several versions of SQL Server—such as DBCC CHECKDB and DBCC CHECKTABLE—did indeed run a series of consistency checks. But as the product grew, so did DBCC, and the developers at Microsoft starting giving other jobs to poor, overloaded DBCC so that eventually, checking database consistency was the least it did. SQL Server 2000 Books Online (BOL) finally listed a new meaning for DBCC: DataBase Console Command. (See BOL for a full list of DBCC options.)

Some of the DBCC options in SQL Server 2000 request information, and some make changes. Two of the pre-2005 DBCC commands deal with indexes: DBCC SHOWCONTIG reports the fragmentation in an index, and DBCC INDEXDEFRAG reduces fragmentation by changing the pages that an index uses. In SQL Server 2005, an ALTER INDEX option replaces DBCC INDEXDEFRAG. Replacing DBCC SHOWCONTIG is a new dynamic management object called sys.dm_db_index_physical_stats().

The development team at Microsoft had long felt the need to replace DBCC SHOWCONTIG. Among their reasons was the fact that, as a DBCC command, it offered no easy way to filter the information returned or just to get the information you're interested in. You could use INSERT EXEC to save the output from DBCC SHOWCONTIG into a table, but you had to create the table first in a separate operation, and only after saving the data in a new structure could you filter it. In SQL Server 2005, you return fragmentation information by using the function sys.dm_db _index_physical_stats(). If you request all the columns this function can return, you'll get a lot more information than DBCC SHOWCONTIG gives. But because you return the data through a table-valued function (TVF), you can restrict which columns and rows you want.

The sys.dm_db_index_physical_stats() function takes five parameters, all of which have defaults. If you set all parameters to the defaults and don't filter rows or columns, as follows

SELECT * FROM sys.dm_db_index_
physical_stats
(NULL, NULL, NULL, NULL, NULL)

the function returns 20 columns of data for (almost) every level of every index on every table on every partition in every database of the current SQL Server instance. When I ran this command on my small instance, which includes only AdventureWorks, Pubs, and Northwind (plus the system databases), it returned more than 390 rows. I can't show 20 columns and 390 rows of output here, so I'll let you run the command yourself.

Now, let's look at the five parameters you can pass to this function. Along the way, I'll point out potential gotchas about using the parameters. Next month, I'll explain some of the output columns whose meaning might not be obvious.

Controlling the Returned Data

The first parameter is the database ID, which you must specify as a number. If you specify NULL, which is the default, the function returns information about all databases. If the database ID is null, SQL Server ignores the next three parameters and assumes they're all NULL.

The second parameter is the object_id, which also must be a number, not a name. Again, the NULL default gives you information about all objects—if you use it, SQL Server assumes the next two parameters, index _id, and partition_id, are also the default, which means all indexes and all partitions. (Partitions are a new SQL Server 2005 feature, which Itzik Ben-Gan discusses in his February through May 2005 T-SQL 2005 Web-exclusive columns.) To simplify obtaining the database and object ID values, you can use the db_id() and object_id() functions and pass the name of the database or object as an argument.

Be careful when using these built-in functions. If you specify an invalid name or simply misspell the name, you get no error message and the value returned is NULL. Since NULL is a valid parameter, SQL Server just assumes that's what you meant to use. For example, suppose I want to see all the information I described above, this time for the AdventureWorks database—but I mistype the command:

SELECT * FROM sys.dm_db_index_physical_stats (db_id
  ('AdventureWords', NULL, NULL, NULL,  NULL)

There's no such database as AdventureWords, so the db_id() function returns NULL, and the result is the same as if I call the function with all NULL parameters. Again, I get no error message or warning.

From the number of rows returned, you might be able to guess that you made an error, but if you have no idea how much output to expect, it might not be immediately obvious. BOL suggests that you can avoid this problem by capturing the IDs into variables and error checking the values in the variables as Listing 1's code shows, before calling the sys.dm_db_index_physical_stats() function.

A more insidious problem that I discovered while doing testing for this column is that SQL Server calls the object_id() function from the context of your current database, before any call to the dynamic management function (DMF) is made. I was in the AdventureWorks database but wanted information from a table in the Pubs database. So I ran the following code:

SELECT *
FROM sys.dm_db_index_physical_stats (DB_ID(N'pubs'),
   OBJECT_ID(N'dbo.authors'),  null, null,  null);

Since no dbo.authors table exists in my current database, SQL Server treats the object_id as NULL, and I get all the information about all the objects in Pubs. But if AdventureWorks had a dbo.authors table, SQL Server would use the ID for that table to try to retrieve data from the Pubs database. So, I would get either an error saying there was no such object ID in the database or data from a different table than the one I expected. This problem could be difficult to troubleshoot, assuming you figure out that a problem exists.

The only solution is to fully qualify the table name in the call to the TVF or, as in the preceding code, to use variables to get the ID of the fully qualified table name. It seems strange to me to have to fully qualify the object with the database name if a parameter already specifies the database name. But, we're going to have to live with this. If you write wrapper procedures to call the sys.dm_db_index_physical_stats() function, you can concatenate the database name onto the object name before retrieving the object ID and avoid the problem. This function's output is cryptic, so you might want to write a procedure that accesses this function and returns the information in a slightly friendlier fashion.

The third parameter lets you specify the index ID for the specified table, and again the default of NULL means that you want all the indexes. The fourth parameter indicates the partition number, and NULL again means you want information for all the partitions. The fifth and last parameter is the only one for which the default NULL doesn't return the most information. The last parameter indicates the mode of sampling that you want SQL Server to perform when retrieving the information. Valid inputs are DEFAULT, NULL, LIMITED, SAMPLED, or DETAILED. The default NULL corresponds to LIMITED.

The specified mode determines the level of scanning SQL Server performs to obtain the information that the function uses. When the function is called, SQL Server traverses the page chains for the allocated pages for the specified partitions of the table or index. Unlike DBCC SHOWCONTIG, which usually requires a shared (S) table lock, sys.dm _db_index_physical_stats() requests only an intent-shared (IS) table lock, which is compatible with most other locks.

The LIMITED mode is the fastest and scans the fewest pages. It scans all pages for a heap but only the parent-level pages of an index, which are the pages above the leaf level. The SAMPLED mode returns statistics based on a 1 percent sample of all the pages in the index or heap. However, if the table is relatively small, SQL Server will convert SAMPLED into DETAILED, and for the purpose of this function, small means fewer than 10,000 pages. The DETAILED mode scans all pages and returns all statistics. So you can see that the modes are progressively slower as SQL Server does more work.

Setting Your Own Limits

As I mentioned, sys.dm_db_index_physical_stats() can return a lot of rows if you use all default parameters. But even with careful use of the available parameters and for a subset of tables or indexes, you might get back more data than you want. Because sys.dm_db_index_physical_stats() is a TVF, you can add your own filters to the results—for example, you could look at the results for just the nonclustered indexes. With the available parameters, your only choices are to see all the indexes or one particular index. If you use a third parameter of NULL to specify all indexes, you can add a filter in a WHERE clause to indicate you want only rows where index_id is between 2 and 250. Or you might look only at rows that indicate a high degree of fragmentation. But a discussion of how to determine high or low degree of fragmentation will have to wait for next month.

Comments

Plain text