The query optimizer is one of the most complex parts of the SQL Server engine, and it gets more complex with every release—and even with every service pack. The optimizer's job is to determine the optimum internal plan for carrying out a T-SQL query that has been submitted for execution, either ad hoc through an application or as part of a stored procedure. The optimizer has to make decisions such as
- which indexes (if any) SQL Server should use to access each table
- which index (if any) SQL Server should use to perform any ORDER BY operations
- in what order SQL Server should access the tables in a multitable query
- which JOIN algorithm SQL Server should use for each pair of tables
- how SQL Server should perform the GROUP BY and DISTINCT operations
Because Microsoft's SQL Server developers want the query optimizer to find a way to execute every query quickly, they're continually adding more methods for accessing your data and more algorithms to use in determining which method is best. Although evaluating all possible methods and finding the best one is time-intensive, the time is usually well spent. However, you don't want the optimizer to look for a new plan if a useful one already exists in SQL Server's plan cache. Let's explore how you can determine when and why the optimizer looks for a new query plan for a stored procedure, then discuss how you can avoid having SQL Server perform unnecessary work.
Recompilation and Optimization
Most SQL Server documentation uses the term "compiling" to refer to the optimizer's process of determining a suitable execution plan. But when SQL Server compiles a stored procedure, it does more than optimize all the procedure's data-manipulation language (DML) statements (SELECT, INSERT, UPDATE, and DELETE). It also translates into executable form the non-DML statements: declarations, conditional statements, calls to procedures or functions, and so on. SQL Server always compiles stored procedures as a unit; during compilation, the query optimizer finds a plan for every DML statement in the procedure. Whenever a recompilation occurs, SQL Server reoptimizes every statement, whether it needs to or not. It's the cost of the optimization phase that I'm concerned with in this article.
Reducing unnecessary recompiles is one of the best ways to improve a procedure or application's performance. However, you don't want to avoid all recompilations. In some situations, recompilation and reoptimization are desirable. In a moment, I'll look at the most common reasons SQL Server automatically recompiles a procedure, and you'll see that these recompilations can be useful at times. You can use several methods to explicitly force SQL Server to recompile a procedure, but determining when and why to force recompilation is a topic for another article. In this column, I deal only with automatic recompilation.
How can you tell when the optimizer has recompiled a procedure? The easiest, most straightforward method is to use SQL Server Profiler to capture the SP:Recompile event, which occurs whenever a stored procedure recompiles during execution. If you have SQL Server 2000 Service Pack 2 (SP2), Profiler can give you information about why the procedure was recompiled. SP2 makes information available in the EventSubClass column when Profiler reports the SP:Recompile event. This column contains a value between 1 and 6 that indicates why SQL Server recompiled the procedure. Table 1, page 22, lists the meanings of the different values as documented in the Microsoft article "HOW TO: Identify the Cause of Recompilation in an SP:Recompile Event" (Q308737, http://support.microsoft.com).
Besides SP:Recompile, you need only three events to set up a trace to capture all procedure recompiles and their causes: SP:Starting, SP:StmtStarting, and SP:Completed. The SP:StmtStarting event tells you which statement inside the procedure caused the recompile. The SP:Recompile event occurs after the procedure starts and right before the statement that caused the recompile.
For data columns, all you need besides the EventSubClass column is the EventClass column (to tell you which event occurred) and the event's Text column. For an SP:Starting, SP:Completed, or SP:Recompile event, the Text column specifies the name of the stored procedure that started, completed, or recompiled. In addition, I like to add the StartTime data so that I always know the chronological sequence of events.
You can also add filters to the trace. The most useful filter is probably one that limits which databases are traced. If you use a filter to limit DatabaseID to values greater than or equal to 5, for example, you'll avoid receiving information about procedure recompiles in all the system databases, which you can't do anything about anyway.
Let's look at an example of when stored procedures might be recompiled. Set up and run a trace to capture the events and data columns described above, then run the script that Listing 1 shows. The captured trace data in the EventSubClass column will show examples of the first four reasons that SQL Server automatically recompiles your procedure while it's executing. Let's explore those reasons in detail.
Reasons for Recompile
An EventSubClass value of 1 means that SQL Server recompiled the procedure because of a change to one of the objects that the stored procedure references. That's a good reason for recompiling, especially if the change is a structural one involving data-type changes or the addition or removal of columns. In those cases, recompiling is mandatory for SQL Server because a plan based on outdated schema information could be useless. You wouldn't want to avoid recompiling, but you might want to investigate whether the schema change is necessary. If you need to make multiple schema changes in one stored procedure, make sure they all occur close to the start of the procedure. Otherwise, instead of recompiling the procedure once, SQL Server might end up recompiling the procedure several times during one execution. In Listing 1, the recompile occurs the second time the procedure runs because the table is recreated after the procedure starts, so the plan that SQL Server used during the first execution of the stored procedure is no longer valid. Note that the recompilation happens only when the optimizable SELECT statement executes.
An EventSubClass value of 2 means that a large number of rows in one of the tables in a query have changed since the plan was first compiled. Because this change might have changed the data distribution and thus the ideal query plan, SQL Server recompiles the procedure. How many rows is considered large depends on the size of the table and how empty it was before the rows changed. For permanent tables, the threshold is 500 rows plus 20 percent of the number of rows in the table. So if the table had 100,000 rows, you'd need to make 20,500 changes to trigger recompilation. In Listing 1, the table initially has only 140 rows, but the code adds 2155 more rows. The recompile with the EventSubClass value of 2 is triggered right before the second SELECT statement, which is now based on a table with many more rows than the first SELECT in the procedure.
If you think that the changes won't affect the query plan, you can avoid this recompile by using the query hint OPTION (KEEPFIXED PLAN) at the end of the second SELECT statement. (This hint was added in SQL Server 7.0 SP3 and is documented in the Microsoft article "INF: How to Use KEEPFIXED PLAN to Disable Stored Procedure Recompilations," Q276220, at http://support.microsoft.com.) If the table is a temporary table, SQL Server is even more conservative and the threshold for recompilation is even smaller. In fact, a change of only six rows in a temporary table can trigger recompilation of the entire procedure. You can avoid this cause of recompilation by using the query hint OPTION (KEEP PLAN). This hint relaxes the recompile threshold on temporary tables to the same standard as for permanent tables, and the KEEPFIXED PLAN option avoids all recompilation caused by row modifications.
An EventSubClass value of 3 means that SQL Server didn't find the object during the initial procedure compilation. SQL Server 2000 uses a feature called deferred name resolution to let processing continue in that case, in hopes that the object will exist when the procedure is executed. This feature is designed for temporary tables, which coders frequently don't create until right before they're used. In Listing 1, because I drop the newDetails table before the procedure runs the first time, SQL Server recompiles the procedure before it begins execution. However, the Profiler trace won't show this because the SP:Recompile event indicates recompiles only after execution has begun. When SQL Server recompiles immediately before execution, it determines that no newDetails table exists and records that fact. When the test_recompiles procedure executes later, the table will exist and the optimizer will recompile the procedure to incorporate information about this table.
An EventSubClass value of 4 means that a SET option changed during the execution of a procedure, and again SQL Server is very conservative. Many SET options can change the meaning of queries and cause SQL Server to return completely different results. In Listing 1, I change the value of the option CONCAT_NULL_YIELDS_NULL to OFF right before the third SELECT statement. This change won't affect the query results, but SQL Server can't determine that before execution. The only way to avoid this recompilation is to not include the SET option in the procedure. If you absolutely must have a SET option, put it at the beginning of the procedure before any optimizable statements. Note that not all SET options will cause recompilation; for more details, see my article "SET Options and Recompilation," February 2001, InstantDoc ID 16308.
An EventSubClass value of 5 is like a 1 value but represents a schema change in a temporary table. Similarly, an EventSubClass value of 6 indicates a schema change in a remote table. You can avoid these recompilations in the same way that you can avoid recompilations based on schema changes for permanent tables.
The SQL Server query optimizer is an extremely sophisticated piece of code, and I recommend that you take advantage of it whenever it can help you. However, putting the optimizer through its paces unnecessarily can be disadvantageous, especially with stored procedures because the entire procedure recompiles if even one statement needs recompilation. Knowing when and why SQL Server is recompiling a procedure can help you devise tests to determine whether the recompilation is necessary. In some cases, you'll have no performance degradation by avoiding the recompilation. However, if you find that avoiding a recompilation results in worse performance when you're executing a procedure, you'll know that the optimizer was effectively doing what it was designed to do—recompiling your procedure to give you a better execution plan.