I used to think that UNION was useful only when you needed to combine the results of two queries from different sources (e.g., tables, views) into one result set. However, using UNION is sometimes the quickest way to select from just one table.
Suppose you need to retrieve all OrderIDs from the Northwind Orders table where the CustomerID is VICTE or the EmployeeID is 5:
SELECT OrderID FROM Orders WHERE CustomerID = 'VICTE' OR EmployeeID = 5
The Orders table has indexes on the CustomerID and EmployeeID columns, but SQL Server doesn't use them to execute the SELECT statement, as the following execution plan shows:
|--Clustered Index Scan(OBJECT: (\[Northwind\].\[dbo\].\[Orders\].\[PK_Orders\] ), WHERE:(\[Orders\].\[CustomerID\]='VICTE' OR \[Orders\].\[EmployeeID\]=5))
Instead, SQL Server scans the clustered index PK_Orders. SQL Server uses this plan because the OR operator in the query's WHERE clause makes the result satisfy both conditions at the same time, so SQL Server must double-scan the table. Scanning a clustered index in this case is almost the same as scanning the entire table; the server goes though the table record by record and checks for the specified predicate.
To improve the query's efficiency, you could create a composite index on CustomerID and EmployeeID:
CREATE INDEX IdxOrders001 ON Orders (CustomerID, EmployeeID)
The Index Tuning Wizard advises you to create just such an index to improve performance of the SELECT statement. SQL Server can use the new index to find all the records in which EmployeeID = 5, then scan only the resulting range of records to return the required result. The estimated execution plan for the new query shows that SQL Server uses the composite index:
|--Index Scan(OBJECT:(\[Northwind\] .\[dbo\].\[Orders\].\[IdxOrders001\]), WHERE:(\[Orders\].\[CustomerID\]='VICTE' OR \[Orders\].\[EmployeeID\]=5))
But an employee can make thousands of deals with thousands of customers. So even if you create a composite index, SQL Server will need to scan a range of records in the index and won't perform a seek operation, which uses indexes to retrieve records and is the fastest way for SQL Server to find information.
You need to make SQL Server use a seek operation instead of a scan, but SQL Server won't perform a seek when an OR operator is in the WHERE clause. You can solve this dilemma by using UNION to rewrite the SELECT statement:
SELECT OrderID FROM Orders WHERE CustomerID = 'VICTE' UNION SELECT OrderID FROM Orders WHERE EmployeeID = 5
SQL Server 2000's estimated execution plan for this statement is
|--Merge Join(Union) |--Index Seek(OBJECT:(\[Northwind\] .\[dbo\].\[Orders\].\[CustomersOrders\]), SEEK:(\[Orders\].\[CustomerID\]='VICTE' ) ORDERED FORWARD) |--Index Seek(OBJECT:(\[Northwind\] .\[dbo\].\[Orders\].\[EmployeesOrders\]), SEEK:(\[Orders\].\[EmployeeID\]=5) ORDERED FORWARD)
This execution plan looks longer than the original one, but both operators are index-seek operators. SQL Server doesn't use the composite index; instead, it uses two single-column indexes. You might think that two seek operations would cost more than one seek, but performance improves when you use this method. You can check performance by using SQL Trace to analyze the three versions of the SELECT statement. For the UNION query, my SQL Server 2000 system performed four reads to return the result. The first SELECT query required 23 reads, and the second SELECT statement, which created the composite index, required 11 reads.
This special use of UNION can help you avoid the OR operator's slow performance, but use it carefully. If you don't have the appropriate indexes (CustomersOrders and EmployeesOrders, in this example), you can double-scan the table.