Missing Indexes Correlated Against Execution Costs and Counts

Here’s a query that every DBA should have in their toolbox.

The short (or seriously over-simplified) version of how this query works is pretty simple. Each time SQL Server goes to create a new execution plan, it takes note of which indexes are available (and suitable) to help it most efficiently execute the query in question. As part of this process of evaluating which indexes exist and are suitable for use in the generation of an execution plan, SQL Server also keeps tabs of columns (or combinations of columns) where no index exists but where, if one did, SQL Server could create a better plan. Or, in other words, as SQL Server is busy creating execution plans, it’s almost like it says: "Hmm. Sure would be NICE if there were an index across this column—or across these 3 columns ordered like so and with this other column included."

But, while SQL Server IS able to take note of these recommendations, it’s not allowed to CREATE indexes by itself (and that’s a VERY good thing). Instead, details about these index recommendations end up getting dropped into the sys.dm_db_missing_index* DMVs. These recommendations are also incorporated into the execution plans themselves. As you might suspect, that means that there are potentially gobs of scripts or options out there for querying and analyzing this information.

A Cost-Based Approach to Missing Indexes

In my case, all I’ve done is correlate information about missing indexes against execution costs and the number of times an execution plan has been run (i.e., execution counts). Grabbing costs can be a tiny bit of a pain to extrapolate because it requires the use of XPATH (meaning that the query below is going to be slow and clunky on production systems—but it’s designed not to block or cause problems). In many ways, this is really just an extension of the technique I outlined previously on how to find your most expensive queries. The difference, though, of course is that this query limits output to potential problem areas that SQL Server has already identified.

-- based off of a combination of:
--	Jason Strate's Excellent Missing Indexes queries to get the Execution Plans:
--		http://www.jasonstrate.com/2010/12/can-you-dig-it-missing-indexes/
--	  And my own technique of grabbing the costs and multiplying them 
--			by the Execution Count to get Aggregate costs:
--		http://sqlmag.com/blog/performance-tip-find-your-most-expensive-queries
--	TODO: need to figure out a way to 
--		a) get a list of indexes by impact, then 
--		b) cross-apply their plans and get costs. 

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
GO

WITH XMLNAMESPACES(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
PlanMissingIndexes AS (
	SELECT 
		query_plan, 
		usecounts 
	FROM 
		sys.dm_exec_cached_plans cp
		CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
	WHERE 
		qp.query_plan.exist('//MissingIndexes') = 1
), 
MissingIndexes AS (
	SELECT
		stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex/@Database)[1]'
			, 'sysname') AS DatabaseName,
		stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex/@Schema)[1]'
			, 'sysname') AS SchemaName,
		stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex/@Table)[1]'
			, 'sysname') AS TableName,
		stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)[1]'
			, 'float') AS Impact,
		ISNULL(CAST(stmt_xml.value('(@StatementSubTreeCost)[1]'
			, 'VARCHAR(128)') as float),0) AS Cost,
		pmi.usecounts UseCounts,
		STUFF((SELECT DISTINCT ', ' + c.value('(@Name)[1]', 'sysname')
	FROM 
		stmt_xml.nodes('//ColumnGroup') AS t(cg)
		CROSS APPLY cg.nodes('Column') AS r(c)
	WHERE 
		cg.value('(@Usage)[1]', 'sysname') = 'EQUALITY' 
		FOR  XML PATH('')), 1, 2, '') AS equality_columns
			,STUFF((SELECT DISTINCT ', ' + c.value('(@Name)[1]', 'sysname')
		FROM stmt_xml.nodes('//ColumnGroup') AS t(cg)
		CROSS APPLY cg.nodes('Column') AS r(c)
		WHERE cg.value('(@Usage)[1]', 'sysname') = 'INEQUALITY'
		FOR  XML PATH('')), 1, 2, '') AS inequality_columns
			,STUFF((SELECT DISTINCT ', ' + c.value('(@Name)[1]', 'sysname')
		FROM stmt_xml.nodes('//ColumnGroup') AS t(cg)
		CROSS APPLY cg.nodes('Column') AS r(c)
		WHERE cg.value('(@Usage)[1]', 'sysname') = 'INCLUDE'
		FOR  XML PATH('')), 1, 2, '') AS include_columns
		,query_plan
		,stmt_xml.value('(@StatementText)[1]', 'varchar(4000)') AS sql_text
		FROM PlanMissingIndexes pmi
		CROSS APPLY query_plan.nodes('//StmtSimple') AS stmt(stmt_xml)
		WHERE stmt_xml.exist('QueryPlan/MissingIndexes') = 1
)

SELECT TOP 200
	DatabaseName,
	SchemaName,
	TableName,
	equality_columns,
	inequality_columns,
	include_columns,
	usecounts,
	Cost,
	Cost * UseCounts [AggregateCost],
	Impact,
	query_plan
	FROM MissingIndexes
	ORDER BY 
		Cost * usecounts DESC;

A Few Caveats

Of course, with everything in SQL Server, there are a few caveats.

The first and most important caveat is that you should never EVER, EVER, EVER just assume that you can take the output of this query and build a routine that’ll dynamically process the results and start giving SQL Server all of the indexes it thinks it needs. Such a thought is ingenious—but, as my favorite bumper sticker says: "Don’t believe everything you think." Or in other words:

Within SQL Server, there’s absolutely nothing that prevents you from creating multiple, duplicate (as in 100 percent the same—or even SLIGHTLY different but FUNCTIONALLY EQUIVALENT—indexes). As such, your script would have to watch out for this because SOMETIMES SQL Server will recommend an index that already exists (like when you have (Bigger) problems with data coercion).
SQL Server’s also a bit too, well, dumb, to figure out that some of the indexes it recommends could EASILY be made or created if you simply made a MINOR modification to an existing index, i.e., sometimes all you need to do to make an existing index do double duty and work for more than one important query or operation is simply plunk in a new column on the right-most side of the index or just drop in a column (or two) in the INCLUDE clause.
Dynamically adding every single index that SQL Server THINKS it needs is like taking crack. At first, it’s awesome (or so I gather) but, before long, you’re mugging grandmas and stealing from friends to pay for your habit—and that’s exactly what’ll happen if you start giving SQL Server every index it THINKS it needs. You’ll end up increasing the cost or overhead associated with updating your data (i.e., INSERTs, UPDATEs, and DELETEs), to the point where you’ll start seeing slow-downs and likely start running into issues with locking and blocking problems that will trash concurrency on your systems. That, and you can actually end up giving the optimizer so MANY choices to pick from when creating execution plans that recompiles and adhoc queries can potentially start incurring serious overhead (i.e., to the point where compilation can take upwards of 5 seconds or so for what would otherwise be fairly simple queries).

Finally, an obvious limitation of the query above—or the approach it uses—is that it only works against queries that are in the cache. If you’ve got queries that are somehow called at high rates and are missing indexes, but are also being regularly evicted from the cache (i.e., recompiles, stats changes, etc.), then you can MISS those missing indexes with the query above. So, use the query above as a tool (frankly, a powerful one), but don’t assume that it’s the ONLY tool you can or should use.

Comments

Plain text