"You can't always get what you want" aren't just famous lyrics by the Rolling Stones. They're sometimes a mantra when you're writing n-table joins to return specific data—especially when you add search arguments. Furthermore, if your queries still contain old-style join syntax (*=), you might think you're receiving the correct results, when in fact your queries are ambiguous and return inaccurate data. For these reasons, let's look at writing flexible n-table joins with search conditions, using only the ANSI SQL-92 standard.
This T-SQL Tutor column addresses one of the most common questions about joins in general that students ask me: What's the difference between search conditions in the WHERE clause and search conditions in the JOIN clause? Interestingly, the search conditions always mean the same, but the data they return might vary according to their placement within a query. In "n-Table Joins," February 2002 and "n-Table Outer Joins," April 2002, I discussed the benefits of adding the related data of a third, fourth, or nth table to a query so that you can retrieve more detailed information. I also pointed out that when you add additional tables to an outer join, you need to be aware of the ways those extra tables can affect the outer joins. Now, let's look at the effects of adding search arguments to these sometimes complex n-table joins.
FROM and WHERE
When you write a join, you can add search conditions in either of two locations to restrict and limit the rows returned. Typically, when you want only rows that meet a certain condition, you add a WHERE clause, which lets you restrict the entire result set that the query will return. However, when the query also includes a join, you can choose to apply the search condition in the FROM clause as one of the elements that define the join condition, or you can apply it to the result set in the WHERE clause. If you place the search argument in the FROM clause, SQL Server includes the search argument's criteria in the definition of the intersection between the joined tables. If you place the search argument in the WHERE clause, the criteria applies to the entire set of joined data (almost as if the search condition were filtered after the join was processed—regardless of join type). When all joins in one query are inner joins, the search argument's location doesn't affect the query's result set, processing, or performance. In both the FROM and WHERE clauses, the search condition has the same effect because SQL Server returns only data that meets the join condition. However, for outer joins, the search argument's location can significantly affect results because an outer join includes rows that don't meet the join condition. (Remember, an outer join is an inner join with the addition of rows from the outer table that don't meet the join condition.) When you add search conditions to your outer-join queries, their placement dictates whether the conditions are applied as part of the inner join or to the entire result set (i.e. after the outer join). If you don't properly place your search conditions to describe the set of data you're looking for, you could end up with undesired results.
Location, Location, Location
Let's look at an example that demonstrates the benefits of strategically placing search conditions in the FROM and WHERE clauses in n-table inner joins and n-table outer joins. Suppose your company assigns customers to its salespeople alphabetically by last name in one of two groups: A-N or O-Z. And let's say that you, a salesperson, want to know which of your O-Z customers have purchased which products. You can start by using an inner join to return only your O-Z customers who have made a purchase. To fulfill the request, Show me all customers and their orders, then remove the customers who don't belong in my O-Z set, you could add the search condition in the WHERE clause (i.e., after the join), as the code at callout A in Listing 1 shows. Or to find only your customers and their orders, you could add the search condition in the FROM clause (i.e., before the join), as the code at callout A in Listing 2 shows. The result set and internal processing of the inner joins for the queries in Listings 1 and 2 are the same and return the four rows that Table 1 shows.
If you want to request the names of all your customers, whether or not they've made a purchase (in the O-Z range, Peter is the only customer who hasn't bought anything), you'd have to write an outer join. In this case, the location of the search argument could make a significant difference. The query in Listing 3 shows an outer join with the search condition in the WHERE clause. Table 2 shows this query's result set, which changes only slightly from the result set in Table 1 (it now includes customer Peter).
But what happens when you place the search condition in the query's FROM clause instead, as the code in Listing 4 shows? This query returns a larger result set, which Table 3 shows. Of the nine returned rows, the italicized rows are the result set of the inner join (note that Peter Parker isn't italicized), and the bolded rows are the result of the outer join. You might think that the query hasn't applied the search condition—but it has. If you review the information that's supplied for the ProductID, Product Name, and Category Name columns, you'll see a difference between the Product and Category data for customers who are in the O-Z range as opposed to those who aren't. Arthur Curry has made a purchase (as ProductID 6 shows), yet Product Name and Category Name are NULL. They're NULL because they didn't meet the inner- join criteria; they were added because you requested all rows in the intersection (the inner join that is part of an OUTER JOIN query) plus all rows that don't meet the join criteria (the rows that are added from the outer join). When you add this additional search condition to the FROM clause, you're actually making a slightly different request—Show me what all my customers (names from O to Z) purchased (defined by the intersection between customers and products), yet include all customers (even those outside the O-Z range) in the list. Therefore, the query completes the outer join after applying the search condition during the inner join.
Let's compare the data sets that Listings 3 and 4 return. Both queries include the intersection of customers whose last name falls within the O-Z range and their purchased products. For Listing 3, the search argument is in the WHERE clause and therefore applies to the entire set of data; only O-Z rows are returned. For Listing 4, the search argument further defines the join condition, meaning that you're asking to see only the purchases (i.e., the intersection) for customers whose names fall within the O-Z range. But you've also asked for an outer join, which tells SQL Server to include all customers—regardless of whether they've made a purchase or whether their last names fall in the O-Z range. Even if customers whose last names fall outside the O-Z range have made a purchase, you're not interested in seeing that data.
Avoid Using the Old Outer-Join Syntax
Especially for queries containing n-table joins and search conditions, I use the ANSI SQL-92standard syntax. By consistently using this syntax, as I've done throughout my series of columns about joins, you can control the placement of the search argument that I've described in this column. Before the ANSI SQL-92 standard was established, no ANSI standard existed for outer joins, and some database management products didn't support them. Before the release of SQL Server 6.5 (when the ANSI SQL-92 standard for outer joins was added), T-SQL let you write outer joins by strategically placing an asterisk (*) in the join conditions, which were listed in the WHERE clause along with the search conditions. Oracle allowed a similar syntax that used a plus sign (+). This old T-SQL outer-join syntax used an asterisk followed by an equal sign (*=) for a left outer join and an equal sign followed by an asterisk (=*) for a right outer join. (For more information about old right and left outer-join syntax, see "The Outer (Join) Limits," November 2001.) However, this simple T-SQL syntax didn't support full outer joins (i.e., *=* wasn't a valid syntax) and quickly became confusing because both join conditions and search conditions were lumped together in the WHERE clause.
Because I strongly urge you to avoid using the old syntax to write OUTER JOIN queries, I won't cover the old syntax's limitations here. Suffice it to say that in the old syntax, SQL Server always applies the search condition as part of the inner join and before the outer join rows are retrieved. For this reason, you don't have the flexibility or control to ask or return the data you want. As Listing 5 shows, you can use a simplified OUTER JOIN query written in the old T-SQL syntax to return the same customers (i.e., the correct number of rows and data—except for the Product Name and Category Name columns) that Listing 3 returns. However, when you use the old syntax, you can't generate the results that Listing 4 returns.
Tips for Join Success
This column is the last in my series of five tutorials about joins. Before I move on to other topics, I want to leave you with a recap of the ground rules for writing joins:
- Use aliases to help minimize typing and improve readability.
- Qualify all columns by using their table name or table alias name.
- Qualify all tables by using their owner name.
- Use the ANSI SQL-92 syntax.
- Focus on only two tables at a time.
- Review your entity relationship diagram (ERD), then consider and list all possible relationships that define the tables you're joining.
- Use a Venn diagram to anticipate the result set, then use the result set of each two-table join as the left table in the next join. Remember—focus on only two tables at a time.
- Place your search arguments in the WHERE clause if you want the criteria to apply to the entire result set.
- Place your search argument in the JOIN clause if you want the criteria to apply as part of the join (and before the outer join, if an outer join is requested).
- Test, test, test! Verify your results on small sets to make sure the search conditions, (search arguments in the WHERE clause), join criteria (search conditions in the JOIN clause), and join type (inner, outer) are exactly what you're looking for.
If you follow these guidelines, you'll likely get the data you need from joins—without having to try very hard.