DATA TALES #10: It’s a Matter of Existence
I regularly see code that calculates a count of a set of rows then makes a decision that’s based on the count, but only on whether or not the count is zero. Consider this (slightly contrived) example based on AdventureWorks: Note the query plan that’s returned for the SELECT:
August 24, 2016
I regularly see code that calculates a count of a set of rows then makes a decision that’s based on the count, but only on whether or not the count is zero. Consider this (slightly contrived) example based on AdventureWorks:
Note the query plan that’s returned for the SELECT:
There is no great surprise here that SQL Server has used the clustered index to locate any rows that match, and a stream aggregate to count them. If you stop and look at the code however, it’s apparent that there was never any need to count the rows. This really is a test for existence. One thing that grates on me when I see it is the counting of rows when the aim is just to find out if any rows exist.
Existence and the Optimizer
SQL Server’s optimizer is a pretty clever piece of code that does a good job of protecting us from ourselves.
For example, if I need to know if there are any Sales.SalesOrderHeader rows in the AdventureWorks database, I don’t need to count the rows in that table to find out.
Let me show you. Let’s count the number of rows in the Sales.SalesorderDetail table and check if there are more than 10. The query looks like this:
And the query plan looks like this:
No big surprises here. The plan used entails counting the number of rows. Note that the object that SQL Server has chosen to count isn’t the table, but an index on the table:
That makes perfect sense as the index has the same number of rows as the table, but reading all of it involves less data than reading the whole table. So SQL Server just counts the number of rows in the index.
By comparison, consider an EXISTS query:
And note the plan:
Obviously, there was no need to count the rows. SQL Server has chosen to just look for any rows in the smallest index on the table that it could find. Again this makes perfect sense.
Now though, look what happens if we change the query that counted the rows, so that it compares the total to zero instead of 10:
Here’s the plan:
SQL Server has now used the same query plan for the COUNT version of the query as it did for the EXISTS version. SQL Server is clever enough to realize that if you are counting the values, just to see if the count is greater than zero, that you are really just doing an EXISTS query.
So does it matter if you write the COUNT version of the query? I believe it does. It’s not from a performance point of view, but from a readability point of view. Compare the two queries:
I think it’s far clearer in the second query that you are just testing for existence. And readability matters.