Many DBAs and T-SQL developers use nonclustered indexes to avoid table scans, thereby improving query performance. Frequently, this type of performance tuning works well. However, with large tables, costly bookmark lookups often occur as a result. I devised a technique to improve the performance of queries that run slow because of bookmark lookups. This technique also demonstrates that DBAs and T-SQL developers often know more than the query optimizer. Unlike the query optimizer, they can make assumptions based on their knowledge of the data and devise creative solutions.
Before I tell you about the technique, let me tell you a bit about the problem. My client's data warehouse environment contained several very large and very wide tables. The clustered indexes for the tables were based on an identity column. The client chose to use the identity column to improve batch performance and reduce fragmentation during INSERT operations. In addition, because the tables contained multiple types of date-time columns (e.g., Julian, local DateTime, Coordinated Universal Time—UTC) to meet specific needs, there wasn't an obviously suitable candidate to use to replace the identity columns as the range-based clustered key.
As you would expect, the client created nonclustered keys for columns, which were typically found by using WHERE clauses that included a join condition. Almost every query used the following type of filter to return data from the large tables:
WHERE location = @location AND some_date_time BETWEEN @start and @end
Nonclustered indexes were created for the (some_date_time, location) columns. This was optimal because the selectivity of the location column was low with respect to the size of the table.
In this case, the client's use of indexes and T-SQL code to leverage the indexes, avoid table scans, and improve query performance was by the book. However, the client still was experiencing performance problems with some of the crucial stored procedures. Specifically, several of the data-load stored procedures took 90 minutes to run, which, in turn, was affecting the timeliness of report generation.
After looking at the execution plan for these queries, I was able to determine the problem. As Table 1 shows, table scans weren't occurring and index seeks were occurring on the nonclustered indexes as planned. However, the nonclustered index seeks were followed by bookmark lookups, which were drastically impacting the I/O and subtree cost, thus slowing down queries.
I looked at several books and online articles and discovered three ways to avoid bookmark lookups:
- Create fully covering nonclustered indexes.
- Create many separate, small (i.e., nonwide) nonclustered indexes so that the query optimizer can use index joins.
- Change the clustered index key.
In this case, creating covering nonclustered indexes and creating many small indexes weren't viable solutions because numerous queries were returning many different columns. For example, one query might return 12 columns and another query might return 15 columns, nine of which differ from the columns returned by the first query.
Because the client's stored procedures used many different date-time columns to meet specific needs and because of the concern with fragmentation, changing the clustered key wasn't a palatable option to the client. Had the stored procedures used the same date-time column for every query, this would have been a possible solution.
So, I decided to creatively leverage the clustered index. In the client's environment, the data was inserted into large, source tables per date (some_date_time) and location (location). Because of this setup, the data being extracted from these large tables physically exists on the disk in relatively contiguous chunks per date range. However, the query optimizer didn't know this, so it used bookmark lookups to retrieve the actual data for the queries. To circumvent the bookmark lookups, I took the following steps:
- I rewrote part of the stored procedures' code so that they retrieved the clustered identity key's MAX and MIN values before they ran queries against large tables. As Listing 1 shows, these values are stored in the @max_value and @min_value variables. The identity column is clustered, so this type of query to retrieve the clustered key's MAX and MIN values per the (some_date_time, location) columns is fully covered by the nonclustered indexes. As you might know, this is possible because the clustered key values are implicitly stored within the nonclustered indexes.
- I added the MAX and MIN values in the @max_value and @min_value variables to the WHERE clauses. Listing 2 shows an example of a rewritten WHERE clause.
These changes let the query optimizer perform a clustered index seek directly against the clustered index key that's based on the identity column. As Table 2 shows, the costly bookmark lookups have been eliminated. Another positive consequence of the changes is the addition of parallelism.
This simple solution addressed the major cause of the performance problems. The time it takes the data-load stored procedures to run has decreased from 90 minutes to only 3 minutes (or less)—a 97 percent improvement in performance. This simple solution also proves that DBAs and T-SQL developers can often come up with clever solutions because they know their data well.