Performance Tip: Find Your Most Expensive Queries

performance checklist One thing I’m asked on a regular basis when working with SQL Server Consulting clients that don’t have much experience with performance tuning, is how to find what their most expensive queries are. Only, the problem is that the answer to that question can be a bit more complex than you might think. Happily though, I’ve got a pretty decent answer to that question. But first, some background.

SQL Server Uses a Cost-Based Optimizer

While I don’t have time to cover it in any detail here, one of the things that makes SQL Server special is that it boasts a cost-based optimizer. Meaning, that if you feed it a complex query, it’ll iterate over a number of different potential ways to execute your query against the storage engine in order to find a plan with the least expensive cost. And, to figure out cost, that means that SQL Server ends up having to know a decent amount about your data – such as how many rows are in each table queried along with the distribution of unique values within those rows (or columns being filtered against) as well as an understanding of how likely it is that ‘hits’ will be found for the joins or filters being specified. (Or in other words, if you fire off a SELECT * FROM dbo.SomeTable WHERE blah = ‘yadayada’; statement, SQL Server has (or will have) a pretty good idea of not only how many rows are in dbo.SomeTable, but it’ll also have a rough idea of how many of them have a blah column with a value equal to ‘yadayada’. For more information, I highly recommend taking a peek at this fantastic MCM presentation on Statistics from Kimberly L. Tripp of SQLSkills.)

Long story short though, as powerful and amazing as SQL Server’s cost-based optimizer is (and, make no mistake, it’s part of SQL Server’s ‘secret recipe’), one of the great things about SQL Server is that we can actually view the costs associated with particular operations – simply by either highlighting the query in question within SQL Server Management Studio and pressing CTRL+L to have SQL Server either go and generate or fetch (from the cache) and then “Display [the] Estimated Execution Plan” for any query or operation, or by executing the query with the “Include Actual Execution Plan” option toggled – as shown in the following screen capture:

A Very Rough Overview of Costs

Then, once you’re able to view an execution plan, one of the great things about it is that you’re able to see the cost of not only the entire execution plan, but each individual operation that makes up the plan – simply by mousing-over it – as shown below:

And, again, a key thing to call out here is that these costs (estimated or otherwise) are based on SQL Server’s knowledge of the size of your tables as well as the cardinality and distribution of your data. Or, in other words, these costs are based upon statistics about your data. They’re not, therefore, something ‘tangible’ like the number of milliseconds associated with an operation. As such, the best way to think of them is that lower numbers are better – unless you want to try and get into some of the nitty-gritty details about how these costs are calculated (which, again, is proprietary information or part of SQL Server’s ‘secret sauce’).

With that said, there’s still a way to ‘frame’ these costs – to provide an idea of what costs roughly mean in the ‘real’ world.

.003. Costs of .003 are about as optimized as you’re going to get when interacting with the storage engine (executing some functions or operations can/will come in at cheaper costs, but I’m talking here about full-blown data-retrieval operations).
.03. Obviously, costs of .03 are a full order of magnitude greater than something with a cost of .003 – but even these queries are typically going to be VERY efficient and quick – executing in less than a second in the vast majority of cases.
1. Queries with a cost of 1 aren’t exactly ugly or painfull (necessarily) and will typically take a second or less to execute. They’re not burning up lots of resources, but they’re also typically not as optimized as they could be (or they are optimized – but they’re pulling back huge amounts of data or filtering against very large tables).
5. Queries with a cost greater than 5, by default, will be executed with a parallel plan – meaning that SQL Server sees these queries as being large enough to throw multiple processors/cores/theads-of-execution at – in order to speed up execution. And, if you’ve got a web site that’s firing off a query with a cost of 5 or more per every page load, for example, you’ll probably notice that the page ‘feels’ a bit sluggish loading – maybe by a second or two – as compared to a page that would ‘spring up’ if it was running a query with a cost of, say, .2 or lower. So, in other words, queries up in this range start having a noticeable or appreciable ‘cost’.
20. Queries in this range are TYPICALLY going to be something you can notice taking a second or so. (Though, on decent hardware, they can still end up being instantaneous as well – so even at this point, things still depend on a lot of factors).
200. Queries with this kind of cost should really only be for larger reports and infrequently executed operations. Or, they might be serious candidates for the use of additional tuning and tweaking (in terms of code and/or indexes).
1000. Queries up in this range are what DBAs start to lovingly call ‘queries from hell’ – though it’s possible to bump into queries with costs in the 10s of thousands or even more – depending upon the operations being executed and the amount of data being poured over.

And, in case it’s not obvious from some of the descriptions above, the ‘thresholds’ I’ve outlined above REALLY need to be taken with a grain of salt – meaning that they’re just rough approximations to try and give these numbers a bit of context (for those that aren’t very experienced with performance tuning in general).

The True Cost of Operations

However, while taking a single query and comparing its cost in isolation is a great way to tune that operation to get better performance out of it (i.e., by adding/tuning indexes and/or tuning the code to make it better your GOAL is to decrease costs – since LOWER costs are BETTER costs), it isn’t a viable way to know what your most expensive query on a given server or within a given database is.

For example, which query is truly an ‘uglier’ query from a performance standpoint? That big/horrible/ugly report that management likes to run once a day at 7PM with a total cost of 1820.5? Or a single operation with a cost of .23 that gets called over 800,000 times in the same day? Typically a query with a cost of .23 won’t really be scrutinized that much – because it’s ‘optimized enough’. But if it’s called at highly repetitive rates, then that cost is incurred over and over and over again – typically during periods of peak load on your server. And, in fact, if you multiply .23 * 800K, you end up with a total, aggregate, cost of 184,000 – something that makes that ‘nightly’ query look like child’s play.

As such, finding your most expensive queries is really a question of juggling execution costs against execution counts – because it’s only when you consider both concerns that you start getting a sense for the TRUE costs of your most expensive operations.

Querying SQL Server For Your Most Expensive Queries

Accordingly, a while back I wrote a query that takes advantage of a couple things to be able to go in and actually ‘ask’ SQL Server for a list of Top Worst Performing queries. To do this, I took advantage of the fact that SQL Server’s query optimizer KEEPS execution plans in the cache once it generates a good plan for a query or operation that’s been sent in to the server. I also took advantage of the fact that SQL Server keeps track of how many TIMES that execution plan gets used (or re-used) by subsequent executions – as a means of determining execution counts. Then, I also took advantage of the fact that SQL Server exposes these execution plans to DBAs as XML documents that can then be ‘parsed’ and reviewed – to the point where you can actually use XPATH to ‘interrogate’ an execution plan an 'extract’ the actual cost of a given operation.

And with that, I was able to come up with a query that will scour SQL Server’s plan cache, grab execution costs from each plan in the cache, and then multiply that number against the total number of times that the plan has been executed – to generate a value that I call a ‘Gross Cost’ or the ‘total’ cost of each operation being fired over and over again on the server. And with that information, it’s then possible to easily rank operations by their ‘true’ cost (as in execution cost * number of executions) to find some of your worst queries on your server.

The code itself isn’t too hard to follow – and it’s patterned in many ways on some similar-ish queries that Jonathan Kehayias has made available where he too uses XPath/XQUERY to zip in and aggressively query full-blown execution plans within the plan cache:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;

WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
core AS (
	SELECT
		eqp.query_plan AS [QueryPlan],
		ecp.plan_handle [PlanHandle],
		q.[Text] AS [Statement],
		n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS OptimizationLevel ,
		ISNULL(CAST(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float),0) AS SubTreeCost ,
		ecp.usecounts [UseCounts],
		ecp.size_in_bytes [SizeInBytes]
	FROM
		sys.dm_exec_cached_plans AS ecp
		CROSS APPLY sys.dm_exec_query_plan(ecp.plan_handle) AS eqp
		CROSS APPLY sys.dm_exec_sql_text(ecp.plan_handle) AS q
		CROSS APPLY query_plan.nodes ('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn ( n )
)

SELECT TOP 100
	QueryPlan,
	PlanHandle,
	[Statement],
	OptimizationLevel,
	SubTreeCost,
	UseCounts,
	SubTreeCost * UseCounts [GrossCost],
	SizeInBytes
FROM
	core
ORDER BY
	GrossCost DESC
	--SubTreeCost DESC

Limitations of this Approach

Of course, there ARE some limitations with the query I’ve pasted above.

First, it’s an AGGRESSIVE query – typically weighing in with one of the WORST costs on many servers (i.e., it commonly shows up as a top 10 offender on servers that haven’t been running for very long or which don’t have lots and lots of really huge performance problems.) And that, in turn, is because while SQL Server can do XML operations, they typically end up being VERY expensive. And, in this case, this query performs expensive XQUERY iterations over each and every plan in the cache – meaning that it typically takes a LONG time to run.

However, even despite how LONG this query will typically take to run (remember that it has to grab the ‘raw’ cost of every plan in the cache before it can calculate a gross-cost based on total number of executions – meaning that there’s no way to 'filter out’ any particular plans), it won’t block or cause problems while it runs.

Another limitation of this approach is that it can only calculate gross-costs against ‘accurate’ execution counts – meaning that if you have expensive queries (or lots of tiny little queries called over and over and over again) that get kicked out of the cache, then the execution counts aren’t going to be as high as they really/truly are, and you’ll therefore suffer from lower ‘gross costs’ and may, therefore, miss some of your worst performing queries.

But otherwise, the query listed above provides a fantastic way to quickly and easily be able to go out and query a SQL Server to get a list of your worst performing queries. Then, once you have them, you’re free to analyze the execution plans in question (by simply clicking on the QueryPlan column in the results pane – which should kick you out to a Graphical ShowPlan; if it doesn’t I’d recommend this post by Aaron Bertrand for a fix for a stupid bug that Microsoft refuses to address), and then, armed with information about how FREQUENTLY a particular operation is being called, you can then spend whatever energy and effort is necessary to try and tune that operation as needed in order to try and drive its total, overall (or gross) cost down.

Comments

Plain text