n-Table Joins

Although n-table joins—joins between more than two tables—might seem more complex than two-table joins, n-table joins are just a series of two-table joins. Using Venn diagrams and database diagramming, you can predict and write complex n-table joins with greater ease and understanding.

n-Table Inner Joins

Before you write an n-table inner join, you must first analyze the tables that have the data you need. You only have to qualify the ambiguous columns, but if you qualify all the columns, you make more complex code easier to read.

To know what to output, you need a query and a question to answer about the data. Using the TSQLTutorJoins database that I provided as a download in "Joins 101," October 2001, ask the question, What categories of products have you sold to which customers?

After you analyze the tables, you can write the FROM clause; I recommend stepping through it slowly. Create a database diagram of the tables you're interested in, and let Enterprise Manager's Database Diagramming Wizard create a diagram of all related tables. If you use the wizard to add related tables, make sure you select the Add related tables automatically check box before you choose any tables. The tables you select, along with any related tables, will show up in the Tables to add to diagram box. You add three tables to the Tables to add to diagram box: Category, Order, and Product. You add the Customer table only to the Tables to add to diagram box because the Customer table isn't related to any tables other than those already listed. Figure 1 shows the Database Diagram as it looks before you make any of the modifications I discuss in this column.

After the wizard has diagrammed the tables you're interested in (and their related tables), you can see the correct number of tables to list—but not how to order them—in the FROM clause. In an inner join, the order in which you list the tables doesn't affect performance. However, when you're writing an n-table join, each table listed—except for the first table—must join to a table already listed in the FROM clause.

Let's look at a join between the Order, Product, and Category tables. If you joined these three tables together in an inner join, you couldn't list the Order table immediately followed by the Category table because these two tables have nothing relevant in common (they don't have a JOIN condition). However, you could list the Order, Product, and Category tables in one of the four ways that Figure 2 shows.

The Database Diagram shows that four tables in the TSQLTutorJoins database are relevant to the question, What categories of products have you sold to which customers? You can write the join in your FROM clause in many possible ways; however, for simplicity, I've listed only the two most logical (based on the flow of the relationships in the Database Diagram), which Figure 3 shows.

Using the four tables that Figure 3 shows, let's create a simple FROM clause without JOIN conditions:

FROM dbo.Customer AS Cust
  inner join dbo.\[Order\] AS O
  inner join dbo.Product AS P
  inner join dbo.Category AS Cat

Next, you need to thoroughly review the potential JOIN conditions among the tables, then add the applicable JOIN conditions to the FROM clause, which Listing 1 shows. If you're not sure what these JOIN conditions are, use the Database Diagram to determine the proper conditions. Using the Database Diagram in Figure 1, you right-click any relationship line between two tables to bring up a list of three options: Delete Relationship from Database, Show Relationship Labels, and Properties. Select Show Relationship Labels first to put the relationship names above each relationship line. Next, right-click the relationship line between the Order and Product tables, then select Properties. Your screen should look like the one that Figure 4 shows.

To determine the JOIN conditions, review the information displayed in the Primary key table and the Foreign key table sections on the Relationships tab of the Properties dialog box. The Primary key table is Product, and the Foreign key table is Order. The columns that define this primary keyforeign key relationship are ProductID in the Product table and ProductID in the Order table. So, your JOIN condition is Product.ProductID = Order.ProductID (alias names replace the table names in the actual query).

In this particular case, your tables easily join because a simple path—only one path—exists from end to end. In other words, reviewing the diagram and starting with one end (Customer or Category), each table only goes to one table. From the Customer table, you go to the Order table; from the Order table, you go to the Product table; and from the Product table, you go to the Category table (following the relationship lines). Progressively, each table joins to only one other table. However, some databases might have multiple paths and, therefore, multiple JOIN conditions. For example, when the right conditions exist, a common database design strategy that can improve performance and minimize the number of tables that the JOIN condition requires is to add redundant foreign keys.

Redundant foreign keys are columns that you add to a table to describe a relationship that might be numerous tables away. In the TSQLTutorJoins database, you could add a redundant foreign key to the Order table to describe the relationship between an order and the category of the product ordered. Remember, in this database, orders are simple orders—only one product is sold on an order—so, each order relates to only one category.

In the database diagram that Figure 1 shows, the relationship line between the Order table and the Product table goes from an infinity (ƒ) symbol to a key, which represents a many-to-one (M:1) relationship. Likewise, the relationship line between the Product table and the Category table also goes from an infinity symbol to a key. These two relationships are both M:1 relationships; however, you can also think of them directionally. In both cases, using the direction of Order to Product and Product to Category, you're going M:1. In this case, you can also add another relationship—Order to Category. It's almost as if transitivity applies here. (If a implies b and b implies c, then when a implies c, the relationship is said to be transitive.) If an Order has only one Product and a Product has only one Category, then an Order has only one Category. To demonstrate, let's add this relationship to the TSQLTutorJoins database. Use the code shown in Web Listing 1 to add a foreign key relationship between the Order and Category tables. The code adds the CategoryID column to the Order table, changing the structure and maintenance of the Order table and raising the following three questions:

How do you set and verify the value for Order.CategoryID when you insert or update orders?
How does adding the CategoryID column to the Order table change the database diagram?
How does adding the CategoryID column to the Order table change your JOIN condition?

The answer to Question 1 is very important. The foreign key redundancy adds a level of complexity to the database design and adds developer and user concerns. Nevertheless, the performance gains in a predominantly READ environment (i.e., SELECT, not UPDATE) can significantly offset the negatives. One possible option to ensure accuracy in the Order.CategoryID column is to allow NULL values on INSERT, then use a trigger to find the product's correct category. In effect, the inserts of supplied data for the Order table won't change, but the trigger has to update the new CategoryID column to ensure accuracy. The foreign key adds data integrity because all orders must have a valid CategoryID (from the Category table); however, the foreign key won't guarantee that the CategoryID is correct for that specific product in that order.

To guarantee that each order has the correct CategoryID for the product ordered, you must use another form of data integrity, such as a stored procedure or a trigger. The code for Web Listing 1 also provides this trigger with some comments; however, the ideas and code behind triggers are outside the scope of this column. If you're unfamiliar with triggers, take some time to review Web Listing 1. However, in this article, I focus more on Questions 2 and 3.

The biggest difference between the database diagram in Figure 5 and the one in Figure 1 is that in Figure 5, the Category, Order, and Product tables relate to each other in multiple ways. In an n-table join, the maximum number of possible combinations is n! (n factorial). For example, in a three-table join, you could potentially join the tables in six (3 * 2 * 1) combinations. However, when no data is required from the Product table (i.e., only data related to orders and their categories is needed), you can eliminate the Product table from the join. Eliminating a table from a join sometimes improves the join's performance. For the n-table inner join, your query has three options: leave the query as is, add the new JOIN condition, or remove the Product table from the join (if the query doesn't require any data from the Product table).

If you need data from the Product table or you continue to list the Product table in your query, you must choose one of the other options: leave the query as is or add another JOIN condition. By adding the JOIN condition, you give the SQL Server optimizer as many options as possible for internally processing the join. The FROM clause for a query that needs to return information from all three tables might look like the FROM clause that Listing 2 shows.

Finally, you can answer the question, What categories of products have you sold to which customers? For this query, the output should contain the name of the category and the first and last names of the customer. Because the query requires no data from the Product table, you have two options: you can include the Product table or leave it out of the JOIN. Listing 3 shows both queries.

So to accomplish an n-table join, always think in terms of two tables at a time. Start by determining which tables you need data from, then determining the possible paths between the tables by using a database diagram. Use your knowledge of the database and the database diagram to list all the relevant tables. Strive for the shortest path (i.e., when possible, remove a table if no data from that table is required) to minimize the number of tables in your query. And make sure you include all possible JOIN conditions to give SQL Server more options for internally processing the join. These strategies should make n-table joins easier to tackle and faster to execute. To practice n-table joins, see "T-SQL Tutor Project."

Comments

Plain text