Debunking the Myths of Temporary Objects, Part 2

Downloads
96479.zip

Last month, in "Debunking the Myths of Temporary Objects, Part 1" (InstantDoc ID 96288), I began a discussion about temporary objects, which include temporary tables, table variables, and table expressions. I gave you an overview of each object type, described its characteristics, and provided recommendations for when to use each. This month, in the interest of making last month's discussion more clear, I dig deeper by giving some concrete examples that will clarify that discussion. Before you read this month's article, be sure that you've read last month's article in preparation: The information in that article is an important prerequisite for understanding this article's examples.

Relying on Distribution Statistics

In this article's examples, I'll use a table called Orders with 1,000,000 rows. To create the Orders table in the tempdb database (you can also create it in a user database of your choice), run the code in Web Listing 1, and populate it with sample data. The fact that I'm creating the table in tempdb has nothing to do with temporary objects. I'm using tempdb simply because I want the table to be cleared upon SQL Server restart. I often use tempdb for test purposes.

My first example involves the need to aggregate order data by the hour, calculating the total quantity and number of orders per hour, and to submit multiple queries against the aggregated data. This scenario precludes table expressions, leaving you with the choice of whether to use a temporary table or a table variable.

Temporary table approach. Run the code in Web Listing 2 to create and populate a temporary table called #OrderTotalsByHour with the hourly aggregates. With four years worth of data, you should have more than 35,000 rows in the temporary table. Web Listing 3 shows the set of queries that you need to submit against #OrderTotalsByHour. Note that in queries 3 and 4, I specify a constant in the filter that represents the highest qty column value from the temporary table (in my case, 2880). The reason I use that value is that there should be a very small number of rows with an equal-to or greater-than value (i.e., highly selective), and a very large number of rows with smaller values in the qty column (i.e., low selectivity). I've used random values to populate the Orders table, so before you run these queries and analyze their plans, first obtain the maximum hourly quantity value from the temporary table and replace the constant 2880 in queries 3 and 4 with the value you get from the following query:

SELECT MAX(qty) FROM
  #OrderTotalsByHour;

Web Figure 1 shows the execution plans I got for the four queries in Web Listing 3. If you want to analyze the execution plans, enable the Include Actual Execution Plan option in SQL Server Management Studio, and run the code in Web Listing 3.

Notice that all queries get ideal plans. Observe the estimated number of rows and the actual number of rows reported in the plans. The selectivity estimations that the optimizer makes are very close to reality. Queries 1 and 2 have trivial plans because in both cases there's a clustered index on the filtered column. In this case, distribution statistics aren't very important because the optimizer would figure out the ideal plan without the need to estimate selectivity anyway. However, with queries 3 and 4, selectivity estimations are very important. I can see that when the filter was highly selective (query 3), the optimizer chose to use the nonclustered index created on the qty column. When the filter had low selectivity (query 4), the optimizer instead opted for a full clustered index scan. The reason the optimizer made good choices is because it could make reasonable selectivity estimations based on the availability of distribution statistics.

In my case, query 3 ended up doing only five logical reads (i.e., seek in the non-clustered index, plus lookup). Query 4 involved 1006 logical reads— the number of pages at the leaf of the clustered index (all data pages). Had the optimizer chosen to use a full clustered index scan with query 3, it would cost 1006 reads. Had the optimizer chosen to use the nonclustered index with query 4, it would have resulted in more than 35,000 lookup operations (each of which is a seek in the clustered index), which would have amounted to substantially more I/O than performing a full clustered index scan. In short, the optimizer made good choices.

Table variable approach. Next, run the code in Web Listing 4 to test the table variable approach and observe the execution plans that you see in Web Figure 2. (Be sure to replace the constant in queries 3 and 4.) Notice that with all four queries, the estimated number of rows returned from querying the table variable is always 1. The reason should be clear by now: The optimizer doesn't have distribution statistics, so it can't make reasonable selectivity estimation. Queries 1 and 2, as I mentioned earlier, have trivial plans, so you get good plans regardless of whether distribution statistics are available. Queries 3 and 4 have different ideal plans for different selectivity scenarios. The optimizer can't make good estimations, so it uses a conservative approach for a range filter—namely, a full scan of the clustered index. With query 3, the choice of execution plan is unfortunate because you end up paying a lot more I/O than what you potentially could. With query 4, you can consider yourself lucky that the conservative approach the optimizer chose happens to fit this case. Sometimes, guesses end up being right.

Using SELECT INTO

Last month, I mentioned that you can use a SELECT INTO statement to create and populate a temporary table, but not a table variable. SELECT INTO is a bulk operation that is minimally logged when the database recovery model isn't set to FULL. Because the tempdb database's recovery model is SIMPLE, you get minimal logging when you use SELECT INTO. "Minimal logging" means that only the addresses of the extents allocated during the operation are recorded in the log to support a rollback, if necessary. The address of an extent is six bytes in size, as opposed to the actual extent size, which is 64KB. When populating a temporary table with SELECT INTO, you get much less logging compared with using INSERT INTO. With table variables, you don't have the option to use SELECT INTO, so you'll end up with more logging, resulting in slower population of the table variable, compared with populating a temporary table with SELECT INTO. Ready for a tangible example?

The code in Web Listing 1 creates a function called fn_nums, which returns a table result with numbers in the range of 1 through the requested number of rows. I'll use it to populate temporary objects with 1,000,000 rows. To check the amount of logging involved, you can query the fn_dblog() function, as follows. (Make sure you're in tempdb while doing so.)

SELECT COUNT(*) FROM fn_
  dblog(NULL, NULL);

Let's start with a table variable. Since there's no option to use SELECT INTO, the code in Web Listing 5 uses an INSERT statement to populate a table variable with 1,000,000 rows. On my test machine, this code ran for 17 seconds and produced about 1,000,000 log records. Now, let's try a temporary table, with the code in Web Listing 6. On my test machine, this code ran under 1 second and produced a little more than 1,500 log records, which explains the performance difference.

Examples with Table Expressions

Now, I'll show a couple examples with table expressions. I'll use common table expressions (CTEs), but you'll get the same plans if you instead use derived tables or views. You'll work with the Orders table from the earlier examples. The request is to calculate total quantity values per year and to compare each year's total quantity with the previous year's.

Web Listing 7 shows a solution using a CTE, and Web Figure 3 shows its execution plan. (Note that the plan will look a bit different if you run the code on a multiCPU machine with parallelism enabled.) Observe in the plan that the Orders table is scanned twice—once for the instance of the CTE called Cur, and once for the instance of the CTE called Prv. As I mentioned last month, the definition of the table expression is expanded in each reference. After expansion, SQL Server ends up with a self-join between two instances of the Orders table. The Orders table in our example has 1,000,000 rows; obviously, the effect in cases of bigger tables is more dramatic.

As for the performance of this query, when the cache is cold (i.e., no pages in cache after running DBCC DROPCLEANBUFFERS), all the data from the Orders table will be physically read once, and logical reads will be doubled. In my testing, the 1/0 cost amounts to 57,270 logical reads and 21,528 physical reads. When the cache was hot (i.e., all pages in cache, ran the query twice and measured the second run), I got 57,270 logical reads and 0 physical reads. Of course, in production, you might face situations in which some of the pages are cached and some aren't, so physical reads will vary. There are an additional 15 reads against an internal work table that the plan uses (the internal work table is represented by the spool operator in the plan). The subtree cost of the query is about 58. In my test system, the query ran for 10 seconds against cold cache and 2 seconds against hot cache.

At this point, there's potential for performance improvement by using a temporary table or a table variable. You can eliminate one of the full scans of the Orders table (clustered index scan). In this case, a table variable is more appropriate because the resultset with the aggregates that needs to be materialized is tiny—only four rows. Having distribution statistics doesn't play an important role. By using a table variable, you minimize potential for recompilations.

Web Listing 8 has a solution that uses a table variable. Web Figure 4 shows the execution plans I got for querying the Orders table to populate the table variable, as well as querying the table variable to match current/previous years.

The advantage is that SQL Server scans the Orders table only once; then, it scans the tiny table variable twice for the join. When I ran the solution on my test machine, I got the following performance statistics: 28,635 logical reads; 21,528 physical reads against cold cache and 0 physical reads against hot cache; a total subtree cost of 29; 9 seconds run time against cold cache, and 1 second against hot cache. The performance advantage isn't dramatic when running the code against cold cache—most of the run time is involved with the physical scan of the data, which happens once in both cases. Still, there is some advantage to the solution based on the table variable. With hot cache, the advantage becomes much more apparent because the number of logical reads (as well as the run time) is reduced by 50 percent.

In my final example, you need to access aggregated data only once. Suppose you need to return orders with the highest quantity value per hour. Web Listing 9 has the solution that relies on a CTE, and Web Listing 10 has the solution that relies on a temporary table. Because you need to access the aggregated data only once, there's no special benefit to using a temporary table or a table variable. I compared the performance of both solutions and found them to be similar. With the CTE solution, SQL Server stores the aggregated data in a hash table. With the temporary table solution, I stored the aggregated data in a temporary table explicitly. In short, very similar activity takes place in both cases, hence the similar performance. The solution based on a temporary table does have several disadvantages: It's a bit more complicated, and because it involves data definition language (DDL) and data changes associated with the temporary table, it might trigger recompilations (of the hosting procedure). So, in this case, I'd probably use the CTE-based solution.

The Value of Examples

This article's tangible examples help demonstrate a decision-making process regarding the choices of temporary object types. There's much to consider, and there are important—sometimes subtle—differences between the object types. By understanding the characteristics of the different object types, along with performance tests and analysis, you can make educated and informative decisions.

Comments

Plain text