Q: When writing a join, I always wonder if I can improve performance by moving the search argument into the JOIN clause instead of placing it in the WHERE clause. Listing 1 shows the search argument in the WHERE clause, and Listing 2 shows it in a JOIN clause. What effect would this move have, and is it beneficial?
A: With a WHERE clause, search arguments are applied to all the rows in the result set. In an INNER JOIN, the position of the search argument has no effect on the result set because the INNER JOIN requires all rows to have a match in all joined tables, and all rows have to meet the conditions of the WHERE clause argument. That’s an important point to make because if this query were an OUTER JOIN, the result set returned would be different based on where the search argument is placed. I’ll explain more about this later.
The position of the argument shouldn’t change the performance of the query plan. In an INNER JOIN, SQL Server will recognize the search argument and apply it efficiently based on the plan it chooses. It’s likely that SQL Server will filter the data using the argument before the join, because doing so usually reduces the overall cost of the join. However, there are other factors, such as the indexes that exist, that could influence the query plan and result in different behavior, but again, these aren’t affected by the position of the argument.
As I mentioned above, OUTER JOINs are different—the position of the search argument really matters. When executing an OUTER JOIN, a search argument in the FROM clause will define the result set of the INNER JOIN before the OUTER JOIN results are added. If we wanted to see sales order information, including StateProvinceID, for all US customers but list all customers regardless of StateProvinceID, we could ask for an OUTER JOIN to describe customers and sales (which also requires joins to Person and PersonAddress to expose the StateProvinceID). If we use the base queries you mentioned in your question to do this, we’ll end up with the queries shown in Listing 3 and Listing 4. When executed, you’ll find that they have different query plans and drastically different results. Listing 3 finds all matching rows for customers and only outputs a row if the customer is a US customer. Using the AdventureWorks2008 database, there are a total of 12,041 customers where the CountryRegionCode is US.
Listing 4 uses the CountryRegionCode as the predicate for determining which rows should return the data for the columns in the select list, including sp.StateProvinceID. However, it still produces rows for customers that aren’t in the United States. This query returns 32,166 rows.
Your question really isn’t performance related. SQL Server can optimize your query regardless of the position of a search argument. However, I strongly suggest that all search arguments stay in the WHERE clause so that you never get caught with incorrect data if you decide to change a complex query from an INNER JOIN to an OUTER JOIN.