If you've ever researched performance tuning, you're well aware that correct indexing is probably the most important area on which to spend your tuning efforts. Good performance requires good indexes - but for SQL Server's optimizer to recognize the usefulness of your indexes, you need high-quality statistics. Starting in SQL Server 7.0, SQL Server updated index statistics automatically, which improved the chances of your statistics being up to date. Prior to version 7.0, statistics could only be updated manually; thus, the first task on most performance-troubleshooting checklists was to update the statistics on all tables needed by the troublesome queries.
Automatic updating of your index statistics is a great feature, but it isn't perfect. We'll look at how SQL Server's optimizer determines when statistics should be updated, then learn about some new trace flags in SQL Server 2005 SP1 that give you more control over the automatic - statistics-updating functionality.
Keeping Statistics Updating On
By default, all SQL Server databases are created with the AUTO_UPDATE_STATISTICS database option enabled. For most tables, the work required to update statistics won't affect your system's performance enough for you to consider turning off this option. Of course, there are exceptions, but generally you should assume that this option should stay on unless you're certain that your applications run better with it off. In most cases, the performance degradation you'll experience if statistics aren't regularly updated will be much worse than performance problems you might encounter by leaving this option on.
If you believe automatic statistics updating impedes your system performance, you have a couple of alternatives for capturing statistics. One is to turn off automatic updates for a single table or index, then use ALTER INDEX and set the STATISTICS_NORECOMPUTE option to ON. Another choice is to use a SQL Server 2005 feature called AUTO_UPDATE_STATISTICS_ASYNC, which can be set with ALTER DATABASE. If this option is on, SQL Server will automatically update statistics in the background, and the query that detected that statistics needed updating won't wait for the statistics updating to occur. Individual queries won't be affected by the updating of statistics, and the system as a whole will benefit by having the statistics regularly updated. You can read more about ALTER INDEX and ALTER DATABASE in SQL Server Books Online (BOL).
Although having automatic statistics updating is rarely detrimental, one problem with this feature is that it's activated only after a substantial number of changes have occurred in the index key values. Although very small tables might have their index statistics updated more aggressively, any table with more than 500 rows needs to have at least 20 percent of its rows modified, or the number of rows needs to increase by at least 20 percent in order for SQL Server to detect that statistics are out of date. This means that for a 100,000-row table, you have to update or insert at least 20,000 rows.
If the updates or inserts are evenly distributed throughout the table, statistics updating might not be necessary. For example, if you modify 10 percent of the rows in the table, and the updates are evenly spread throughout the table, the original statistics could still continue to provide satisfactory estimates of the number of rows that would satisfy a query.
Updating Statistics After a Range of Values Changes
However, if the data changes so that there are many new values greater than the previous maximum data value, it might be impossible to obtain good plans until the statistics are updated. Let's look at an example.
Run the code in Listing 1 to build a table called dbo.details in the AdventureWorks database, and copy the rows from Sales.SalesOrderDetail into the table. Listing 1 then builds a clustered and nonclustered index on the table.
The dbo.details table has 121,317 rows and 1,258 pages. You should be aware that the nonclustered index on SalesOrderID will be used when only a few rows from the dbo.details table are accessed. For this table, the cutoff is about 350 rows. A clustered index scan, which is the same as a table scan, is used for more than about 350 rows. Look at the execution plans for the following two queries to verify this.
SELECT * FROM details WHERE SalesOrderID > 75000 SELECT * FROM details WHERE SalesOrderID < 56000
The first query returns 301 rows and uses a nonclustered index seek and a lookup into the clustered index. The second query returns 57,877 rows and uses a clustered index scan.
The maximum value in the SalesOrderID column is 75,123. Let's update several thousand rows to be greater than that maximum. The following query updates 14,148 rows to be greater than the previous maximum.
UPDATE dbo.details SET SalesOrderID = 76000 WHERE SalesOrderID < 47000 GO
When you look at the plan and number of logical reads for the following SELECT statement, you'll see that SQL Server continues to perform a nonclustered index seek because the statistics don't reflect the large number of rows greater than the previous maximum.
SET STATISTICS IO ON SELECT * FROM dbo.details WHERE SalesOrderID > 75000 GO
You should see 44,282 reads, which is far more than the number of pages in the table. But if you update the statistics, as the statement below does, the optimizer will choose a better plan that uses a clustered index scan.
UPDATE STATISTICS dbo.details GO SELECT * FROM dbo.details WHERE SalesOrderID > 75000 GO
SQL Server 2005 SP1 introduced the T-SQL trace flag 2389, which detects when the statistics on a particular index are repeatedly ascending. After three consecutive statistics-updating operations have noted ascending values exceeding the previous upper boundary of key values, SQL Server internally marks the index as an ascending index. You can use the undocumented trace flag 2388 to show you whether an index's leading column has been determined to be ascending. This trace flag changes the output of the DBCC SHOW_STATISTICS command to give you a historical look at the most recent statistics-updating operations. In this case, the only column in the DBCC SHOW_STASTISTICS output you'll be interested in is the last one, called Leading column Type. As with all undocumented trace flags, keep in mind that the behavior of trace flag 2388 isn't guaranteed to continue to work as described in any future version or service pack, and no further information is available about what the values mean or how they're generated.
Run the code in Listing 1 again to recreate the dbo.details table and its indexes. Then run Listing 2 to turn on both trace flags, and perform three UPDATE operations, three SELECT operations, and three statistics updates. (If you look at the plans for the SELECT statements, you should see that they're all using an index seek inappropriately because of bad statistics information.) Then run DBCC SHOW_STATISTICS to examine the statistics for the ASCENDING state. You should see the values that Table 1 shows in the last column of the DBCC SHOW_STATISTICS output. Now that the index is branded as "ascending," when you run one more data update, as follows, the statistics should be updated automatically:
UPDATE dbo.details SET SalesOrderID = 82000 WHERE SalesOrderID < 56000 GO
If you run the following SELECT, you'll see that the optimizer chooses the better plan even though statistics aren't manually updated. The number of reads should be 1,258, which indicates a clustered index scan.
SELECT * FROM dbo.details WHERE SalesOrderID > 81000
Another trace flag, 2390, introduced in SQL Server 2005 SP1, enables the same behavior as trace flag 2389 - but only in cases where the optimizer doesn't know whether the leading index column is ascending. So if you use both 2389 and 2390, your statistics should be automatically updated much more often. You should never use trace flag 2390 alone, since doing so would mean that statistics would be updated only when the ascending nature was unknown and not when the column was known to be ascending. If you're interested in exploring how to use trace flags 2389 and 2390, see the Microsoft articles "FIX: You may notice a large increase in compile time when you enable trace flags 2389 and 2390 in SQL Server 2005 Service Pack 1" (http://support.microsoft.com/?kbid=922063) and "FIX: SQL Server 2005 may not perform histogram amendments when you use trace flags 2389 and 2390" (http://support.microsoft.com/?kbid=929278), which discuss problems with the flags and fixes for them. As with any Microsoft hotfix, you should apply the fix only if you can definitely establish that the bug is negatively affecting your applications.
Manual Intervention Still Required
SQL Server's ability to automatically update statistics is useful, but you can't assume that just because the feature is enabled, you'll never need to run the UPDATE STATISTICS command. Two new trace flags in SQL Server 2005 SP1 can help the automatic statistics-updating functionality gather more information about your data distribution. Nonetheless, even with the trace flags, a DBA still needs to monitor query performance, spot when queries are performing less than optimally due to out-of-date statistics, and either manually update the statistics or set up a job schedule to update them more often than the auto update feature allows.