Do query plans take account of what data is in memory?

Question: When SQL Server is figuring out the best query plan to use for my query, does it take into account how much of my table is already in memory?

Answer: The simple answer is: No.

When the query optimizer is evaluating various plans as it narrows down the possible choices for the best plan it can come up with in a reasonable time (which is not always the absolute best plan, as the query optimizer cannot spend an inordinate amount of time doing plan compilation), it makes use of the relational metadata.

The relational metadata describes the table columns, indexes, constraints and the statistics about the value distributions of the various columns (if these statistics have been created). The storage metadata describes how the table and indexes are actually stored in the data files, and this is not used by the query optimizer.

SQL Server does not keep track of which portions of a table and its indexes are in memory at any time. The buffer pool tracks which data file pages from a database are in memory, but nothing in SQL Server does any kind of automated aggregation to be able to determine that, say, 50% of index 2 of table X is in memory whereas only 5% of index 3 of table X is in memory.

The query optimizer assumes nothing is in memory and is likely to choose a query plan that involves the least number of I/Os, as I/Os are time consuming and expensive. For instance, given the choice of two nonclustered indexes that could satisfy a SELECT query, where the first index has all the columns required, and the second index has all the columns required and several additional columns, the query optimizer will choose to use the first index because it is narrower (i.e. the index rows are smaller) and so there should be fewer I/Os required to read the necessary data.

However, what if a large proportion of the second, wider index is already in memory, and none of the chosen index is in memory? The query will require I/Os to read the chosen index into memory, which will be a lot slower than using the wider index that is already in memory. In that case, the query plan was actually sub-optimal – but the query optimizer has no way of knowing what’s in memory.

As a former software engineer on the SQL Server team, I can tell you that the engineering complexity of keeping an aggregate view of which tables and indexes are in memory to aid in query plan selection is extremely challenging, and would add an undesirable performance overhead for only an occasional benefit. I don’t think it will ever happen.

To finish off, if you’re interested in seeing what’s in the buffer pool, check out the DMV sys.dm_os_buffer_descriptors and the various queries I’ve put together on the Buffer Pool section of my other blog.

Comments

Plain text