One reason why DBAs often use relational (i.e., related) databases is that they avoid data duplication and hence save disk space. In relational databases, you use identifiers for items when possible. That way, you need to record the items’ textual descriptions in only one table. When you need to refer to an item in another table, you simply specify that item’s identifier. For example, suppose you have a database of movie data. In the database, you have a table named Genre that lists genre textual descriptions (e.g., Comedy, Documentary) and their identifiers (e.g., 1, 2). You now want to include a table similar to the MovieReview table in Lesson 4. The MovieReview table includes a column named Genre, which contains genre textual descriptions. Rather than list the genre textual descriptions, you can use the genre identifiers provided in the Genres table. When you need to display data from the MovieReview table with the genre textual descriptions in the Genre table, you can use a JOIN clause to join the tables.
JOIN clauses, which were first incorporated into SQL Server 7.0, let you build queries that retrieve data from two related tables. Prior to SQL Server 7.0, you had to join tables using a technique that consisted of listing the tables in a FROM clause and joining them in a WHERE clause that used special notations to represent different types of joins. SQL Server 2005 still lets you use the old join technique. However, future versions of SQL Server might not support it. Thus, you should learn how to use the JOIN clauses, no matter whether you’re first learning how to join tables or have joined tables for years using the old technique.
There are five types of JOIN clauses available in T-SQL:
- INNER JOIN
- LEFT OUTER JOIN
- RIGHT OUTER JOIN
- FULL OUTER JOIN
- CROSS JOIN
At first glance, these names might seem confusing. To help you understand the differences and which type of join is suitable for which type of queries, I’ll walk you each type of join. But first, you need to create and populate two tables so that you can follow along with the examples.
The examples in this lesson rely on objects in the MyDB database, which was created in Lesson 2. If you didn’t previously create the MyDB database, you’ll need to do so. Go to “T-SQL 101, Lesson 2” to get the code and instructions.
You also need to create two new tables: Movie and Genre. You can create and populate these tables by following these steps:
1. Download the CodeToCreateMovie&-GenreTables. sql and CodeToPopulate- Movie&GenreTables.sql files. Click the "Download the Code" hotlink at the top of the article.
2. Create the Movie and Genre tables. Open SQL Server 2005’s SQL Server Management Studio (SSMS) or SQL Server 2000’s Query Analyzer and copy the code in CodeToCreateMovie&GenreTables.sql into the query window. Execute the code.
3. Populate the Movie and Genre tables by executing the code in CodeToPopulateMovie&GenreTables .sql in SSMS or Query Analyzer.
The INNER JOIN Clause
An inner join is the most common type of join between two tables. A column or columns in the first table are compared to a column or columns in the second table using a comparison operator such as the equals (=) operator. Whenever a match is found, the rows from the first table are connected to the rows in the second table. Unmatched rows are discarded. The structure you end up with after the join is almost a new table unto itself. It will contain the selected columns from both tables. For example, if you have Table1, which has ColumnA and ColumnB, and Table2, which has ColumnC, an inner join will produce a structure with ColumnA, ColumnB, and ColumnC. Whether the individual rows contain data depends on the result of the join.
Let’s explore inner joins further with the Genre and Movie tables. To begin, execute the code
SELECT * FROM Genre SELECT * FROM Movie
As Figure 1 shows, both tables have a GenreID column, which makes them an excellent join candidate. Each GenreID in the Movie table points to the appropriate genre record in the Genre table. (Note that there is one GenreID—10—in the Movie table that doesn’t have a corresponding record in the Genre table. This is intentional and will be used for demonstration purposes in “The OUTER JOIN Clauses” section.)
To create a list of movies sorted by their genres, you can use the INNER JOIN clause to join the Movie table to the Genre table in code such as
SELECT g.Genre, m.Movie FROM Movie m INNER JOIN Genre g ON g.GenreID = m.GenreID ORDER BY g.Genre, m.Movie
Let’s walk through this code line by line. The first line selects the columns you want to see in the result set, which in this case are the Genre column from the Genre table and the Movie column from the Movie table. Notice that the column names are prefixed with the letters g and m. These are table aliases. The alias for each table is specified in the FROM clause immediately following the table name and usually consists of a shortened version of the table name. In this case, m is the alias for Movie and g is the alias for Genre.
Aliases are used in this query because both tables have columns named GenreID. Alternatively, you could use the TableName.ColumnName naming convention (e.g., Movie.GenreID, Genre.GenreID) when specifying the columns, but I find this approach a bit long winded, especially when referencing tables with relatively long names. It really comes down to personal preference, so feel free to use either method.
For inner joins (and outer joins), tables participating in the join are listed in the FROM clause. In this case, the Genre table and the Movie table are being joined. In simple inner-join queries, the order in which you list the tables doesn’t matter. For example, the code
SELECT g.Genre, m.Movie FROM Genre g INNER JOIN Movie m ON g.GenreID = m.GenreID ORDER BY g.Genre, m.Movie
would provide the same results. That’s not the case in some other types of joins, as you’ll see shortly.
The ON keyword is used to indicate which columns should be matched together. In this case, we’re matching the GenreID columns from both tables. Other column comparisons can be added after the ON keyword by using the AND keyword.
Finally, the ORDER BY clause is used to sort the results alphabetically first by genre, then by movie. Figure 2 shows the results.
Now suppose you want the result set to show only those movies in the Sci-Fi genre, as Figure 3 shows.
This can be accomplished by using a WHERE clause to add a filter:
SELECT g.Genre, m.Movie FROM Movie m INNER JOIN Genre g ON g.GenreID = m.GenreID WHERE g.Genre = 'Sci-Fi' ORDER BY m.Movie
As you can see, the INNER JOIN syntax clearly separates the joining criterion from the filtering criterion. In all five types of joins, the WHERE clause filters data from the result set after the join has already taken place. This is a key difference between the JOIN clauses used in SQL Server 7.0 and later and the old join technique used in SQL Server 6.5 and earlier. In the old join technique, both the join and filter criteria are in the WHERE clause.
The OUTER JOIN Clauses
There are three types of outer joins: left outer join, right outer join, and full outer join. When considering outer joins, it helps to think of two tables arranged side by side. The table on the left is the first table in the join. The table on the right is the second table in the join. Left outer joins return all rows from the first table and only rows from the second table that meet the join criteria. Columns from the second table that don’t match the criteria will contain NULL values. Right outer joins return all rows from the second table and only rows from the first table that meet the join criteria. Columns in the first table that don’t match the criteria contain NULL values. Full outer joins return all rows from both tables. Whenever a row in one table has no match in the other table, the columns will contain NULL values.
Let’s perform a left outer join on the Genre and Movie tables to see whether there are any genres that don’t have a matching movie. In the following query, Genre is the left table and Movie is the right table:
SELECT g.GenreID AS 'Genre.GenreID', g.Genre, m.MovieID, m.GenreID AS 'Movie.GenreID', m.Movie FROM Genre g LEFT OUTER JOIN Movie m ON m.GenreID = g.GenreID ORDER BY g.Genre, m.Movie
As this code shows, the OUTER JOIN syntax is similar to the INNER JOIN syntax, except that it matters which table gets listed first in the FROM clause. The Genre table must be listed first because it’s the left table.
Figure 4 shows the results of this left outer join. As you can see, the columns from the Genre table for GenreID 7 contain actual data, whereas the columns from the Movie table contain NULL values. Thus, there is a GenreID 7 in the Genre table but not in the Movie table.
Now let’s perform a right outer join on the Genre and Movie tables to see whether there are any movies that don’t have a matching genre. In this query, Genre is still the left table and Movie is still the right table:
SELECT g.GenreID AS 'Genre.GenreID', g.Genre, m.MovieID, m.GenreID AS 'Movie.GenreID', m.Movie FROM Genre g RIGHT OUTER JOIN Movie m ON m.GenreID = g.GenreID ORDER BY g.Genre, m.Movie
As Figure 5, shows, in the first row, the Genre table’s columns contain NULL values, whereas the Movie table’s columns contain data. So, there is a GenreID 10 in the Movie table but not in the Genre table.
Finally, let’s perform a full outer join on these tables to identify records in each table that have no corresponding record in the other table. When performing a full outer join, the concept of left or right table doesn’t apply, so the table order doesn’t matter. The query
SELECT g.GenreID AS 'Genre.GenreID', g.Genre, m.MovieID, m.GenreID AS 'Movie.GenreID', m.Movie FROM Genre g FULL OUTER JOIN Movie m ON m.GenreID = g.GenreIDORDER BY g.Genre, m.Movie
produces the results in Figure 6. If you look at the results closely, you’ll see all the movies with matching genres, the genres with no matching movies, and the movies with no matching genres. In essence, the full outer join gives you the results of an inner join, left outer join, and right outer join in one handy package.
Full outer joins are especially useful for tracking down data integrity problems. For example, from the results in Figure 6, you can see that “To Kill a Mocking Nerd” in the Movie table has an invalid GenreID. If you change this movie’s Genre- ID from 10 to 7 in the Movie table and rerun the full outer join query, this data anomoly disappears.
The CROSS JOIN Clause
A cross join produces a Cartesian product of two tables, which means that every row in the first table is joined to every row in the second table. You can read more about Cartesian products (and the mathematics behind them) at en.wikipedia.org/wiki/Cartesian_product.
Because of the exponential nature of the results, the cross join is commonly used to generate large volumes of test data from relatively small tables. For example, the Genre table has only 7 rows and the Movie table has only 13 rows, but an unfiltered cross join would produce a result set containing 91 rows.
The cross join is also commonly used because it’s easy to implement. You don’t need to include the ON keyword in cross joins, and the order in which the tables are listed doesn’t matter. For example, let’s perform a cross join of the Movie and Genre tables. To limit the result set, let’s include a WHERE clause so that only comedies and thrillers are included. The query would look like
SELECT g.Genre, m.Movie FROM Movie m CROSS JOIN Genre g WHERE Genre = 'Comedy' OR Genre = 'Thriller' ORDER BY Genre, Movie
As Figure 7 shows, every Genre record is connected to every Movie record.
JOIN In on the Fun
As you can see, it’s easy to use the INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, and CROSS JOIN clauses. Now that you know how to perform inner joins, outer joins, and cross joins, I hope that you’ll use your newfound join skills to create some sample reports. Just try to have fun and keep the cross joins to a minimum, especially when dealing with tables with thousands of rows or more. (Your DBA will appreciate it.)