paper with read graph and black ball point pen

The Tipping Point

Understanding how SQL Server's query optimizer makes decisions

DBAs and SQL Server developers who are new to SQL Server tuning are often puzzled by why the query optimizer in SQL Server would prefer to scan an entire table instead of using a nonclustered index that seems to fit the query perfectly. To explain this mystery, I'll use the following query, which retrieves data from the SalesOrderDetail table in the AdventureWorks database:


SELECT LineTotal

FROM sales.SalesOrderDetail

WHERE ProductID = 870 

If you have any version of the AdventureWorks sample database, you can follow along. If you don't, you can download it from CodePlex.

Related: Find Your Most Expensive Queries

The Demonstration

To begin, click the Include Actual Execution Plan button on the toolbar in SQL Server Management Studio (SSMS). This will allow you to see the execution plan chosen by the optimizer.

Next, type the following command in the query window:

USE AdventureWorks2008R2;



SELECT LineTotal

FROM sales.SalesOrderDetail

WHERE ProductID = 870; 

Note that you might need to change the database name (AdventureWorks2008R2) to the version of the database you're using.

Be sure to include the SET STATISTICS IO ON statement. It'll cause SSMS to display the query cost (in logical page reads) on the Messages tab in the results pane. If you're unfamiliar with the SET STATISTICS IO ON statement or the Include Actual Execution Plan option, see the article "SQL Server Query Tuning." This article also explains why logical page reads are the best metric to use to measure query cost.

In its entirety, the SalesOrderDetail table has 121,317 rows of data contained in 1,240 8KB data pages. It has a nonclustered index on ProductID and a multi-column clustered index on SalesOrderID and SalesOrderDetailID. With the value of 870 chosen for the ProductID filter, this query will return 4,688 rows or about 4 percent of the total number of rows in the table.

In light of this information, you'd expect the query engine to execute a seek on the ProductID index. However, if you execute this query and examine the Execution Plan tab in the results pane, you'll find that instead of doing a seek on the ProductID index as you expected, the optimizer preferred a full scan of the clustered index. It preferred to read all the data contained in the table rather than just the 4 percent of the rows where the ProductID equals 870. Looking at the Messages tab in the results pane will confirm that the query is reading all 1,240 data pages in the table.

Is it a mistake? No, it's not a mistake. That's the optimum query plan. If you don't believe it, change the query slightly to force the use of the ProductID index:

SELECT LineTotal

FROM sales.SalesOrderDetail

WITH (index=IX_SalesOrderDetail_ProductID)

WHERE ProductID = 870 

After you execute this query, look at the Messages tab. You'll see that SQL Server had to read 14,377 data pages to return the same result. Forcing the "right" index was roughly 10 times more expensive than the clustered index scan chosen by the optimizer.

The Tipping Point

The key to this mystery is understanding a crucial difference between clustered and nonclustered indexes. Most DBAs and SQL Server developers know that the clustered index contains the data itself in the leaf level of the index, whereas nonclustered indexes contain pointers to the pages containing the data. However, based on my experiences in the field, it seems to me that the implications of that difference aren't widely understood.

At the end of a seek or scan on a clustered index, the data has been retrieved and nothing further needs to happen. At the end of a seek or scan on a nonclustered index, the query engine has only pointers to the actual data. The data itself has yet to be returned.

When a nonclustered index is used, the data is retrieved one row at a time with a series of single-page I/O operations called Row ID (RID) lookups. A page returned by a RID lookup might contain several qualifying rows. If so, it won't be read just once. It'll be read several times, once for each qualifying row.

The bottom line is that a nonclustered index seek will typically use at least as many page reads as the number of rows returned and, depending on various other conditions, perhaps many more. The point at which the number of page reads required by the RID lookups exceeds the total number of data pages in the table is the point at which a clustered index scan becomes less expensive than the nonclustered index seek. This point is called the tipping point. When the tipping point is exceeded, the optimizer will usually prefer a scan of the clustered index instead of a nonclustered index seek.

It's important to know that nonclustered covering indexes don't use RID lookups, so they don't have this problem. For this reason, they can be lifesavers in some situations. Covering indexes contain all the data required by a specific query or set of queries. The data doesn't need to be retrieved from the base table because all the necessary data is contained in the index itself. If you aren't familiar with covering indexes, you might want to check out the article "SQL Server Tuning - The Covering Index."

Where Is the Tipping Point Exactly?

The tipping point is different for every table. It's generally expressed as a percentage of the total number of rows in the table. The key elements in calculating the tipping point are the number of table rows that will fit on an 8KB data page and the number of rows returned by the query. The more rows that will fit on a page or the larger the number of rows returned, the sooner the tipping point is reached and a full scan is preferred by the optimizer.

You can calculate the average number of rows per page in the SalesOrderDetail table by dividing the total number of rows by the total number of pages. In doing so, you'll find that 98 rows fit on each data page. This means that a clustered index scan on SalesOrderDetail can retrieve 98 rows with each page read while scanning the table, whereas a RID lookup retrieves only one row per page read, no matter what the size of the row. This is why reading the entire table with a clustered index scan can be less expensive than reading only 4 percent of the data in the table using RID lookups.

However, if the rows in the table were wider, fewer of them would fit on a page. That would change the ratio between rows returned by the query and total pages in the table. For example, imagine that the rows in the SalesOrderDetail table were so large that only two of them fit on a page. That would mean it would take more than 60,000 data pages to hold the 121,317 rows in this table. This change would be heavily in favor of the nonclustered index seek over the clustered index scan. With the nonclustered index, the query would read fewer than 5,000 pages to return the 4,688 rows. With the clustered index scan, the query would have to read all 60,000+ data pages to return the same rows. In this case, the query optimizer would certainly prefer the nonclustered index.

As you can see, the selectivity of the nonclustered index also helps determine the tipping point. A nonclustered index returning a smaller percentage of the total number of rows in the table reduces the number of RID lookups required and makes it more likely that a nonclustered index seek will be used. As a general rule, very selective nonclustered indexes are used frequently. Nonclustered indexes with poor selectivity are seldom or never used because the clustered index scan is less expensive.

Other Factors Affecting the Tipping Point

Although the cost of RID lookups is the most important factor that affects the tipping point, there are a number of other factors:

  • Physical I/O is much more efficient when scanning a clustered index. Clustered index data is placed sequentially on the disk in index order. Consequently, there's very little lateral head travel on the disk, which improves I/O performance.
  • When the database engine is scanning a clustered index, it knows that there's a high probability that the next few pages on the disk track will still contain data it needs. So, it starts reading ahead in 64KB chunks instead of the normal 8KB pages. This also results in faster I/O.

For these reasons, the optimizer might prefer the clustered index scan, even when the page counts are similar between the clustered and nonclustered indexes.

Other factors such as index fragmentation and stale statistics can also affect the optimizer's decision about the tipping point. Because you don't have the same information that the optimizer does, it's difficult to predict exactly where the tipping point will be for a given table at a given time.

The Optimizer Usually Knows Best

When you work with SQL Server, you'll often see decisions made by the optimizer that make no sense to you. After working with SQL Server for nearly 20 years, this still happens to me. It's tempting to think of these decisions as bugs or mistakes by the database engine. Sometimes that's true, because the optimizer isn't perfect. But in most cases, I've found that the problem was my ignorance of the details of an extremely complex process. I've learned enough to give the query optimizer the benefit of the doubt when I don't understand why it's working the way it is.

The sample query provided a good demonstration of this. At first, it seemed that using the nonclustered index was the best way to execute the query. However, with a little more knowledge, you gained a better understanding of the complexity of the process behind the optimizer's decision.

After working with SQL Server for many years, I've learned that correct logic applied to incomplete knowledge often leads to incorrect conclusions. Beware of applying a simplistic mental model to a complex phenomenon that you don't completely understand. I have to tell myself this almost daily.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.