I regularly see code that calculates a count of a set of rows then makes a decision that’s based on the count, but only on whether or not the count is zero. Consider this (slightly contrived) example based on AdventureWorks:
Note the query plan that’s returned for the SELECT:
There is no great surprise here that SQL Server has used the clustered index to locate any rows that match, and a stream aggregate to count them. If you stop and look at the code however, it’s apparent that there was never any need to count the rows. This really is a test for existence. One thing that grates on me when I see it is the counting of rows when the aim is just to find out if any rows exist.
Existence and the Optimizer
SQL Server’s optimizer is a pretty clever piece of code that does a good job of protecting us from ourselves.
For example, if I need to know if there are any Sales.SalesOrderHeader rows in the AdventureWorks database, I don’t need to count the rows in that table to find out.
Let me show you. Let’s count the number of rows in the Sales.SalesorderDetail table and check if there are more than 10. The query looks like this:
And the query plan looks like this:
No big surprises here. The plan used entails counting the number of rows. Note that the object that SQL Server has chosen to count isn’t the table, but an index on the table:
That makes perfect sense as the index has the same number of rows as the table, but reading all of it involves less data than reading the whole table. So SQL Server just counts the number of rows in the index.
By comparison, consider an EXISTS query:
And note the plan:
Obviously, there was no need to count the rows. SQL Server has chosen to just look for any rows in the smallest index on the table that it could find. Again this makes perfect sense.
Now though, look what happens if we change the query that counted the rows, so that it compares the total to zero instead of 10:
Here’s the plan:
SQL Server has now used the same query plan for the COUNT version of the query as it did for the EXISTS version. SQL Server is clever enough to realize that if you are counting the values, just to see if the count is greater than zero, that you are really just doing an EXISTS query.
So does it matter if you write the COUNT version of the query? I believe it does. It’s not from a performance point of view, but from a readability point of view. Compare the two queries:
I think it’s far clearer in the second query that you are just testing for existence. And readability matters.
Early versions of SQL Server did very little query re-writing. I don’t blame the team for that as every time you re-write someone’s query, you have a chance of changing their intent. Early versions of SQL Server would carry around multiple predicates that could have been totally removed.
Let me show you an example of how it’s used in current versions of SQL Server. Consider this version of the query:
Here’s the query plan:
Notice that the Sales.SalesOrderHeader table isn’t mentioned at all in the query plan. Once again, the optimizer has come to the rescue. It has worked out that no rows could ever match both of these predicates at the same time:
- SalesOrderID < 100
- SalesOrderID > 100
SQL Server has decided that it doesn’t need to even consider that table. This is an important concept because many queries really need re-writing. Consider the following astonishing predicate:
I’m sure your immediate reaction is “who would write such nonsense in the first place?” but these are the types of queries that SQL Server has to deal with every day. You might assume that no-one in their right mind would write that logic but this is precisely the type of messy code that is output by query generators and code that is created on the fly by applications.
Having SQL Server try to rewrite queries before executing them is a very, very good thing.
Limits to Cleverness
There’s not always a free lunch though. SQL Server can’t spend too long thinking about how to execute a query. Sometimes it just needs to get on with executing it.
We have the luxury of spending the time to think it through but SQL Server can’t spend all day solving logic puzzles before getting on with executing the query.
The optimizer is aware of lots ways to protect you from yourself but it’s far from perfect. Write code that makes it easy for it whenever you can. More importantly, write code that is easier for others to read and maintain, and that means writing code that clearly shows your intent.
If you intend to just check if something exists, use an EXISTS query.