Why Do Similar Queries Have Different Execution Plans?

Downloads
39444.zip

I'm having a performance problem when I try to use datetime values in a query's WHERE clause. Performance is fine when I use a literal string that represents the datetime value:

SELECT * FROM BigTable WHERE 
TargetDate > '2003-03-01'

But the query slows down when I reference a variable:

SELECT * FROM BigTable WHERE 
TargetDate > @TestDate

Why is the second query much slower than the first?

When seemingly similar or identical queries perform differently, you need to compare the queries' execution plans to solve the mystery. For example, you might start by investigating the use of a local variable in the date range that the search argument (SARG) specifies. Consider the three queries in Listing 1. Queries 1 and 2 use a local variable in the SARG. Query 3 uses a hard-coded reference to the same value that Query 1's variable contains. Queries 1 and 3 return the same result set, but as you'll see, each query has a different execution plan.

Execute the three queries in Listing 1 once to ensure you know what results these queries return and how each query is different. Now, execute the queries again, but turn on STATISTICS IO at the beginning of the batch:

SET STATISTICS IO ON

Although Query 1 and Query 3 return the same result set, Query 1 (which uses a local variable) requires 21 logical reads, while Query 3 (which uses a hard-coded literal value) requires only 10 logical reads. Query 1 requires the same number of reads as Query 2, even though Query 2 returns more rows.

Run the queries one more time and look at SHOWPLAN to see how SQL Server executes each query. You can view the query plan from Query Analyzer or by using the SET showplan_text ON command at the beginning of each batch and issuing the SET showplan_text OFF command at the end of each batch. You'll see that Query 1 and Query 2 have identical plans: To execute the queries, SQL Server scans the primary key, which is on the OrderId column that the WHERE clause doesn't reference. To execute Query 3, SQL Server uses the OrderDate index (defined on the OrderDate column) to do an index seek, which explains the difference in the number of logical reads between Query 1 and Query 3.

Why does SQL Server choose such different plans for queries that seem identical? SQL Server doesn't know the value of Query 1's local variable when it optimizes the query, so it has to guess what the value might be. Nonclustered indexes typically aren't useful if you need to return a large percentage of a table's rows. And although Query 1 returns only five rows, SQL Server doesn't know that, so it assumes that the query will return roughly a third of the table because you're using a greater than (>) operator. The OrderDate index wouldn't be effective if the query returned that many rows, so SQL Server doesn't use it. In Query 3, however, SQL Server knows precisely how many rows the query will return because the SARG is literal. And knowing that the query will return only five rows, SQL Server uses the nonclustered index.

One way to make sure SQL Server's optimizer knows the value of a variable at compile time—and can then use the appropriate index—is to encapsulate the SQL query in a stored procedure. You need to use the RECOMPILE option to create the stored procedure if the optimal query plan varies based on input values:

CREATE PROC DateRangeTest
  @odate AS DATETIME
WITH RECOMPILE
AS
SELECT * FROM Orders
WHERE OrderDate >= @odate
GO

Now run the following commands and compare the number of reads and the execution plan for each:

EXEC DateRangeTest  '19980506' 
-- returns 5 rows
EXEC DateRangeTest  '19960101' 
-- returns 830 rows

You'll see that the first invocation of the procedure, which returns five rows, can effectively use the index on OrderDate, while the second invocation continues to scan the clustered index because the query is returning so many rows.

Comments

Plain text