Some problems in SQL Server seem to require a cursor-based solution, but you can often find alternatives. Set-based solutions usually perform better, are more readable, and are easier to maintain than cursor-based solutions. Furthermore, set-based solutions are more in accord with the relational model, which is organized in terms of sets and not in terms of files. In "Sequential to Set-Based," November 2001, I showed how you can provide several different T-SQL set-based solutions for one such problem. I recently ran across another situation that posed a similar problem. A student in one of my classes claimed that the programmers in his company couldn't find a set-based solution for this problem, only a cursor-based solution.
In this scenario, a company stores its financial transactions in two tables, Trans1 and Trans2. Running the script in Listing 1 creates the Trans1 and Trans2 tables and populates them with sample data. Each table holds the transaction ID, account ID, and monetary amount of each transaction, and each table's primary key is defined on account_id, tran_id. The transaction IDs indicate the chronological order of the transactions within an account, starting with 1 and incrementing by 1 each time a new transaction takes place.
My student asked how to write a query that matches the chronological order of transactions with the same account ID and amount. So, a row in Trans1 that has an account ID of x and an amount of y and is chronologically the nth row with those values matches a row in Trans2 that also has account ID x and amount y and is chronologically the nth row with those values. For example, the following rows in Trans1 belong to account ID 1 and have an amount of 100.0000:
The following rows in Trans2 also belong to account ID 1 and have an amount of 100.0000:
Therefore, you need a query that matches the first transaction with the amount 100.0000 and account ID 1 in Trans1 with the first transaction with the amount 100.0000 and account ID 1 in Trans2, the second with the second, and so on. Thus, the script should match transaction IDs 1, 3, and 5 in Trans1 with transaction IDs 3, 4, and 7 in Trans2, respectively. Transaction ID 7 in Trans1 has no match in Trans2 because it's the fourth transaction with those account ID and amount values, but Trans 2 contains only three transactions with those account ID and amount values.
First, you need a way to determine, for each transaction, the chronological position within the same account and amount. You can find this position by using a subquery in the SELECT list, as the query in Listing 2 shows. The subquery counts the rows that have the same account ID and amount as the row in the outer query and that also have a transaction ID that's less than or equal to the transaction ID in the outer query. Remember that the transaction ID represents the chronological order of events. Table 1 shows the output of Listing 2's query against Trans1. You can write a similar query that retrieves the transactions and chronological positions from Trans2.
The next step is to match the results of both queries. To provide the final solution, you could write a complex query that uses derived tables, where the derived tables hold the queries against Trans1 and Trans2, but such a query would be very lengthy and hard to maintain. A more simplified approach is to create views containing the queries against Trans1 and Trans2. You can run the script that Listing 3 shows to create the views VTrans1 and VTrans2. After creating the views, you can find the matching transactions by running the query that Listing 4 shows. This query joins the views based on account IDs, amounts, and chronological positions. Table 2 shows the output of Listing 4's query.
Note in Table 2's results that if a transaction in one table doesn't have a matching transaction in the other table, the transaction doesn't appear in the output because Listing 4's query uses an inner join. In the scenario my student presented, only matching transactions are required in the output. For a similar solution that returns nonmatches as well, you need to change the inner join to a full outer join. You also need to slightly revise the SELECT list and ORDER BY clause so that in case one of the tables has a row with no match in the other table, the result comes from the table that has the data. Listing 5 shows the revised query, whose output appears in Table 3.
Chronological positions are important in determining which transactions match, but this example doesn't require them to appear in the output. You can write a query that joins Trans1 and Trans2 based on matching account IDs, amounts, and chronological positions. You can write subqueries to calculate and compare the chronological positions in the JOIN condition, without needing to specify them in the SELECT list. For example, you can replace the query in Listing 4 with the query in Listing 6. They both provide the same result, but the query in Listing 6 doesn't refer to any views.
You don't really need to create views in this case because the query in Listing 6 refers directly to the base table. I compared the performance of the solutions in Listing 4 and Listing 6, and they perform exactly the same, so you can use the solution that you feel more comfortable with.
Not Finished Yet
In your work with SQL Server, you'll find many examples of problems that you can solve by using a set-based solution, even though the solution might not be obvious at first. If, when you face the type of problems I describe here and in "Sequential to Set-Based," you first think of using cursors, it's time for you to make the move to set-based thinking. A good place to start is to practice the techniques I've demonstrated.
But we're not finished yet. In my next column, I'll present a more complex example and show you how to provide a set-based solution for it.