Estimated Row Counts Affect Performance

When looking at Query Analyzer's graphical showplan output, I see many cases in which the estimated row count differs significantly from the actual row count at a query level and for individual steps within the query. What does this mean and should I be worried?

Yes, go ahead and worry. The estimated number of rows is incredibly important when Query Analyzer is choosing an execution plan, so what you're seeing could be causing a serious performance problem. I can't address all the nuances of this particular issue here, but I'll get you started with some basics that will help you understand what's going on and troubleshoot your problem. We'll start with optimizer basics. SQL Server uses a cost-based optimizer that evaluates the expected query cost, then chooses the plan that has the lowest expected cost. SQL Server never knows what the actual query cost is until it runs the query, so it's always possible that SQL Server will choose a more expensive plan. Here's a simple example: SQL Server evaluates Query A and needs to choose between Plan 1 and Plan 2. Plan 1's expected cost is 1.0, but it runs at 2.0. Plan 2's expected cost is 2.0, but it runs at 1.0. SQL Server chooses plans based on expected cost, so will therefore choose Plan 1 even though the actual cost is higher.

Now, let's tie that in with how the estimated number of rows affects how SQL Server chooses a plan. More specifically, you'll see how a wide difference between actual and estimated number of rows leads to a wide difference between the actual and expected query cost. Imagine you have an Orders table that has a unique, nonclustered OrderId primary key. The table has 1,000,000 rows and there are no other indexes. You issue the following query:

SELECT * FROM Orders WHERE OrderId > 999999

Listing #1 — SQL Server should use the index

The index effectively supports that query; however, a table scan would more effectively process the next query:

SELECT * FROM Orders WHERE OrderId > 1

Listing #2 — SQL Server should not use the index

We know that the first query will return one row while the second query will return 999,999 rows. A nonclustered index is effective for selective queries, but a table scan is generally more efficient for queries that aren't selective. SQL Server has to guess in advance how many rows the query's step (which might or might not use the index) will return. If SQL Server thinks that the query in Listing 2 will return only one row, it will choose to use the index. The expected cost is low; however, the actual cost (and total run time) is high because SQL Server has to perform 999,999 bookmark lookups. I've dramatically oversimplified what happens to make my point, but now you can see that SQL Server might choose execution plans poorly when actual and estimated row counts differ greatly.

Unfortunately, I can't cover everything that causes this behavior or how to troubleshoot the problem—it's too complex. But here's a war story from a recent job that highlights the benefits of investigating cases in which the actual number of rows is significantly different from the expected number of rows. I recently helped a customer tune a query that returned 99 rows. SQL Server estimated that the query would return more than 200,000 rows and take more than 5 minutes to run. On a hunch, I updated statistics for the tables involved in the query, using the WITH FULLSCAN syntax, which samples every row in the table when updating statistics. The estimated number of rows dropped to a few hundred and the query time dropped to 10 seconds. In some cases, you'll find that you can solve the problem of estimated row counts varying significantly from actual row counts by running UPDATE STATISTICS using a sampling ration other than the default. Data distribution in the table can affect default sampling intervals and create skewed rowcount estimates. Sometimes I use the FULLSCAN option to see whether that dramatically changes the plans I get.

FULLSCAN might be overkill in most cases, but if I run UPDATE STATISTICS using the default settings and don't see a change, I might want to jump right to FULLSCAN to quickly determine if a higher sampling ration will solve the problem. Over time, I can experiment by dropping the SAMPLE option down to some number less than FULLSCAN until I find the lowest setting that reliably gets accurate row counts for the queries I'm tuning and gives me a good plan.

Comments

Plain text