Sharpen Your Skills: Joins, Groupings, and Data Types


Executive Summary:

Many Structured Query Language (SQL) functions seem similar at first glance but really aren’t. Learn the difference between SQL functions such as UNION and UNION ALL; INNER JOIN and OUTER JOIN; DISTINCT and GROUP BY; NVARCHAR, VARCHAR, and CHAR; and timestamp and GETDATE().

I frequently receive basic but interesting SQL questions on my Web site. It’s heartening to see so many junior DBAs and developers eager to thoroughly understand the basics. In addition to helping new technologists learn the fundamentals on my Web site, I started this series to give monthly guidance in SQL Server Magazine. I recommend SQL Server Books Online ( .aspx) if you want to check the accuracy of definitions and example syntax. I look forward to receiving questions from readers. I’ll plan future columns based on your concerns, so keep those emails coming!

Q: What is the difference between UNION and UNION ALL?

A: Both UNION and UNION ALL combine result sets of two SELECT statements. UNION performs a SELECT DISTINCT operation on the final result set, whereas UNION ALL combines both result sets without removing any duplicate records.

For example, suppose Table 1’s result set is Red, White,Blue,Orange, and Table 2’s result set is Red,Blue, Purple,Brown. Running UNION on both result sets gives you Red,White,Blue,Orange,Purple,Brown (duplicate values removed); running UNION ALL gives you Red,Red,White,Blue,Blue,Orange, Purple,Brown (duplicate values not removed). UNION ALL gives faster results than UNION does. Therefore, use UNION ALL if you know that all the records that will be returned are unique.

Q: What is the difference between INNER JOIN and OUTER JOIN?

A: INNER JOIN returns all rows from both tables where there’s a match on a common field. (Typically, this is the primary key in the first table and a foreign key in the second table.) OUTER JOIN returns all the records from the first table and, from the second table, only those records in which there’s a match on the common key field.

OUTER JOIN is further divided into three types of joins: LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN. LEFT OUTER JOIN and RIGHT OUTER JOIN are also known as LEFT JOIN and RIGHT JOIN. FULL OUTER JOIN combines LEFT OUTER JOIN and RIGHT OUTER JOIN. LEFT OUTER JOIN returns all rows from the first table whether or not there’s a match in the second table. For those rows in the first table that have a match in the second table, the join is completed normally. For those rows in the first table that don’t have a match in the second table, the values in the result set are padded out with NULL. RIGHT OUTER JOIN works similarly, except all rows from the second table are returned whether or not there’s a match in the first table, and all missing values are padded out with NULL. FULL OUTER JOIN returns all rows from both tables, padding out missing values with NULL where no match occurs.

Q: What is the difference between DISTINCT and GROUP BY?

A: DISTINCT and GROUP BY clauses typically generate the same execution plan. However, if a subquery is used, the plans will differ. Use a GROUP BY clause to apply aggregate operators to each column listed in the GROUP BY clause. Use a DISTINCT clause in the SELECT statement to remove any duplicate values returned in the result set. Listings 1, Listing 2, and Listing 3 provide examples of DISTINCT and GROUP BY clauses. (see .Zip File)

Q: What are the differences among NVARCHAR, VARCHAR, and CHAR with respect to storage?

A: NVARCHAR requires 2 bytes to store a single character (such as “A” or “a”). VARCHAR requires 1 byte to store a single character. The NVARCHAR data type enables multilingual characters because it represents all characters in Unicode. The VARCHAR data type doesn’t support multilingual characters. If you’re working only in English, and you’re sure you’ll never translate any part of your application into another language, then you might not need to use NVARCHAR.

CHAR is a fixed-length data type, and the disk storage size is equal to the maximum size of the column as defined when you created the table. VARCHAR is a variablelength data type, and the storage size equals the length of the data entered plus 2 bytes in the variable character column offset array.

If you’re concerned about storage, use VARCHAR rather than CHAR. For a thorough analysis, see Kalen Delaney, Inside Microsoft SQL Server 2005: The Storage Engine (Microsoft Press, 2006).

Q: What is the difference between timestamp and GETDATE()?

A: You can’t compare timestamp to a GETDATE() value, because timestamp is a data type and GETDATE() is a function. A timestamp is a unique binary number generated by SQL Server that doesn’t have anything to do with actual time. GETDATE() is a nondeterministic function that generates the current system date and time.

The timestamp data type is used mainly for marking different versions of a row in a table. When any data in a row are changed, the timestamp column is populated with a new unique binary number. This newly generated number compared with the previously stored timestamp for the same row indicates that some data in the row has been changed. If any actions (such as inserting rows in other tables) are to be taken when particular rows are updated, timestamp should be used. Because the timestamp changes when rows are updated, its previously stored value (in the other table) can be compared with the new values, rows that are changed can be identified, and actions can be taken accordingly.

GETDATE() returns the current system date and time, which can be stored in a column that is defined as a datetime data type. GETDATE() can be stored as a smalldatetime data type (4 bytes) or datetime data type (8 bytes).

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.