The Auxiliary Table Advantage

Downloads
21411.zip

Simplify your life and improve your code

Editor's Note: Send your experts-only T-SQL tips to Itzik Ben-Gan at [email protected]. If we use your tip in the magazine, you'll recieve $100 and an exclusive T-SQL Black Belt shirt.

Sometimes, the use of an auxiliary table can help you write more efficient code than a solution that doesn't use an auxiliary table, as I showed in "Auxiliary Tables, Bit by Bit," July 2001. In the example in that article, I used an auxiliary table to calculate an aggregate bitwise OR. Here's another example I found in the public technical newsgroup microsoft.public.sqlserver.programming; this example demonstrates how you can use an auxiliary table to provide a solution that's efficient and easy to maintain.

Jan Hvarfvenius of STC MOBILELEKTRONIK AB in Sweden supplied this problem, which required him to find the first occurrence of a member according to a predetermined sequence. Hvarfvenius's company uses a table that maintains a log of repairs to items handed in to the company's workshops. By running the script that Listing 1 shows, you can create the Repairlog table and populate it with sample data.

Each item goes through a series of status changes during the repair process. Each time someone inputs a change in the item's status, SQL Server writes a row to the Repairlog table. The preferred path for each repair is

Item handed in: status = 0
Item transferred to workshop: status = 21
Work in progress: status = 20
Item ready: status = 30

However, for various reasons, the table doesn't have an entry for each status for each item during the repair process. For example, an item's status might go directly to 20 (work in progress), skipping 0 and 21. An item might also go from ready status to another status (e.g., if the repair technician forgot something), then back to ready status. So, keep in mind that each repair can have more than one row with the same status and that the chronological order of status changes might not reflect the preferred path.

Hvarfvenius had to write a query that for each repair returns a row containing the repair number (repairno) and a registration date (regdate), based on specific criteria, as part of a monthly warranty report and invoice that's sent to the manufacturer. First, if any existing rows represent a transfer to the workshop (status = 21), the code returns the regdate of the first occurrence of status 21. If no status 21 rows exist, the query checks whether any rows represent work in progress (status = 20). If so, the query returns the regdate of the first occurrence of status 20; otherwise, if any rows represent an item that was handed in (status = 0), the query returns the regdate of the first occurrence of status 0. If none of the previous rows exist, the query checks for rows representing an item that's ready (status = 30) and returns the regdate of the first occurrence of status 30. So, for each repair, the query should return the minimum regdate in the following order of preference: 21, 20, 0, and 30.

To fulfill these criteria, Hvarfvenius came up with the query that Listing 2, page 22, shows. This code uses the built-in COALESCE() function, which returns the first non-NULL value from the list of values that you supply to it as arguments. Table 1, page 22, shows the results of this query.

The Case for Auxiliary Tables

By using the COALESCE() function, Hvarfvenius's solution met the requirements, but this solution has some problems. Maintenance can be difficult; for example, if the company adds new repair statuses or changes the desired sequence of statuses in the future, the query will need revision. Also, the query's performance is inefficient: With the COALESCE() function, SQL Server needs to access the Repairlog table once for each status value. To handle both problems, you can use an auxiliary table that stores the possible status values for a repair along with a sequence number that reflects the desired sequence. In this case, you'd have the following pairs of status-sequence rows: 21-1, 20-2, 0-3, 30-4. You can use the code that Listing 3 shows to create and populate the Statuses auxiliary table.

Then, you can join the Repairlog table to the Statuses table to attach the correct sequence to each repair log row. You can filter the result of this join in such a way that it returns only the rows with the most recent date in each repair sequence. To achieve this result, you use a correlated subquery that returns the minimum sequence for the repair. Then, you simply group the result by repairno and request the minimum regdate. Listing 4 shows the query that produces the desired result. You can further simplify this query by creating a view that joins the Repairlog table and the Statuses table:

CREATE VIEW VRepairLog
AS
SELECT repairno, regdate, R.status, seq
FROM Repairlog AS R JOIN Statuses AS S
  ON R.status = S.status

Listing 5 shows a simplified query against the view.

Now, when any modifications introduce new status values or changes to the desired sequence of status values, you have to change only the Statuses table. The application that holds the query can remain unchanged.

Enlarge Your Toolbox

In many cases, code that uses an auxiliary table has two major advantages over a solution that doesn't use one: The auxiliary-table code performs better and is easier to maintain. The next time you face a problem that requires a T-SQL solution, try the techniques I've demonstrated, and you might end up with more time for other tasks.

Comments

Plain text