As I've discussed a lot recently, sometimes a cursor-based (rather than set-based) solution seems to be the simplest approach to solving a problem but isn't. In previous columns, I demonstrated some techniques that you can use to switch to set-based programming. I've received enthusiastic responses from readers who have shared their experiences of converting code that used cursors to code that implements a set-based approach. Readers also told of their attempts to solve the problems I presented before looking at the solutions. Your interest inspired me to give you a puzzle to test your skills in set-based coding and discuss a few possible solutions. I warn you that this puzzle is tough. Make sure you have enough free time to work on it. I presented this puzzle in a private SQL Server trainers' forum and received a couple of interesting solutions, which I also share with you. I'd like to thank SQL Server MCTs Dieter Noeth and Fernando G. Guerrero, who let me share their solutions with you.
The puzzle involves a sample Sales table, which you can create and populate by running the script that Listing 1 shows. I use the tempdb database, but you can revise it to any database you choose. The goal is to identify ranges of months with similar sales trends. A month's trend is determined by the month's quantity compared with the preceding month's quantity. A month's trend can be Up, Down, Same, or Unknown. For example, December 1999's trend is Unknown because it's the earliest month in the Sales table. January 2000's trend is Up because its quantity (110) is greater than December 1999's quantity (100). Similarly, you can determine May 2000's trend (Same) and June 2000's trend (Down).
Determining each month's trend is a fairly simple task; however, the puzzle requires the output to show ranges of months with similar trends, as Table 1 shows. For example, the months January 2000 to April 2000 are collapsed to a single range because all the months in that range have the same trend (Up). May 2000's trend is Same, meaning that it had the same sales value as April; note that May 2000 appears in its own range in the output because the adjacent months (April 2000 and June 2000) have different trends. Now you have all the information you need to start working on the puzzle. Oh yes, and cursors aren't allowed.
The first solution is the one that I came up with. As always, the best way to solve problems is to solve them in steps. The first step, which Listing 2, page 18, shows, is to create a view that identifies each month's trend. I use a subquery in the view's SELECT list to retrieve the current month's sales quantity along with that of the preceding month. I subtract the subquery's return value from the current month's quantity, wrapping the result with the SIGN() function. The SIGN() function returns -1 for a negative value (Down trend), 0 for zero (Same trend), and 1 for a positive value (Up trend). Table 2 shows the output of a SELECT * query against the VTrends view.
The next and most problematic step is to distinguish between one consecutive group of months with similar trends and other groups with the same trends. One way to express the difference is to write a query against VTrends that counts the number of rows that have a sales month greater than or equal to the current row's sales month but that have a different trend value. At this point, the explanation probably sounds like gibberish. To understand the trick I use, look at the view VTrendsDiff, which Listing 3 shows, and at Table 3, which shows the output of a SELECT * query against the view.
As an example, see whether you can tell the difference between the group of months January 2000—April 2000 and the group December 2000—February 2001, which have the same trend (Up). With the output in Table 3, it's easy. Each row in the first group has a differentiating factor of 11, which means that 11 rows in the Sales table have the same or greater date and a trend other than 1. A differentiating group factor is a value that helps you distinguish between one group and another. Each row in the second group has a differentiating factor of 5. The last step is fairly easy: a simple GROUP BY query that returns the minimum and maximum sales month (smonth) values for each group of differentiating factor plus trend, as Listing 4 shows. Running Listing 4's query produces the output we were trying to get.
Noeth's solution uses an approach similar to the one in Solution 1, which uses a differentiating factor. Running the script that Listing 5 shows creates the VSalesDiff view, which implements the first step in the solution. For each month, the query in the VSalesDiff view returns the difference between that month's quantity and the preceding month's quantity, plus the corresponding trend. The query uses a derived table that returns the sales month value and the difference between the month's quantity and the preceding month's quantity, using the same trick I use in Listing 2. The outer query uses a CASE expression to determine the trend: positive value = Up, zero value = Same, and negative value = Down. Web Table 1 (available at http://www.sqlmag.com, InstantDoc ID 23846) shows the output of a SELECT * query against the VTrendsDiff view.
Listing 6 shows the second and last step. The code implements the main logic of the query in the derived table T, where it calculates the differentiating factor—groupcol. In this case, for each row in VSalesDiff, a subquery returns the minimum sales month of the rows that have a greater sales month and a different trend than the row in the outer query. Let's focus again on differentiating between the group of months January 2000—April 2000 and the group December 2000—February 2001. The differentiating factor for each row in the first group is the minimum sales month value of the rows with a greater sales month value and a different trend, which is 200005. The differentiating factor for each row in the second group is 200103. Then, you just have to group the result by the differentiating factor—groupcol—and return the minimum and maximum sales month values to mark the boundaries of the range. The query also returns, for each group, the minimum trend value. Because Noeth didn't include the trend in the GROUP BY clause, he has to use an aggregate function such as MIN() to fetch the trend. All the trend values are the same in each group, so you can use MAX() instead of MIN() or, alternatively, include the trend in the GROUP BY clause.
Guerrero supplied the third solution, which approaches the problem differently from the previous solutions. The first step is creating the VSalesTrends view, as Listing 7 shows. The query in the VSalesTrends view performs a three-way join. The joins correlate each row from Sales with the rows of the preceding and following months. The DATEDIFF() function verifies that the difference is 1 month (in the case of the join that locates the following month) and -1 month (in the case of the join that locates the preceding month). Because DATEDIFF() expects a complete date (including year, month, and day), you can't just pass a value containing only the year and month, such as the one in smonth. The trick Guerrero uses here is to concatenate "01" to the existing smonth value to signify the first day of the month.
In the WHERE clause, Guerrero uses the SIGN() function to determine the trends of the current month and the following month, then filters only the rows in which the month's trend is different from the following month's trend. He uses a left outer join to ensure that the first sale and the last sale won't both disappear from the output. Remember that an inner join doesn't return rows from one input if the row has no matching row in the other input. In the Sales table instance with the alias S1, the first sales row has no matching preceding sale in the Sales table instance with the alias S2, and the last sales row in S1 has no matching following sale in S3. Guerrero uses COALESCE() for the same reason—to replace a NULL value, representing a nonmatch returned by the left outer join, with a 0 value. Without this trick, the first and last sales rows in S1 would disappear from the output. Web Table 2 shows the output of a SELECT * query against the VSalesTrends view.
Web Listing 1 shows the second and last step. Here, the main logic is in the SELECT list. Most of the work, which includes calculating the sales month groups and the trend, was already done in the VSalesTrends view. The SELECT list uses a simple CASE expression to format the trend value, then returns sm1 as the to_month value of the range. Another CASE expression determines the range's from_month value. Sm2 holds the minimum smonth value of the next range of months. The CASE expression that calculates the from_month values uses a subquery to retrieve the sm2 value of the previous range. The only special case is in the first row in Web Table 2's output, where the trend is NULL because there's no previous month for the first month. In such a case, the trend is Unknown, and the from_month value is the same as the to_month value—in other words, sm1.
You can find information about using these solutions in the Web-exclusive sidebar "Practical Implementation," InstantDoc ID 23951. The examples in this article provide further proof that most problems have adequate set-based solutions. Thanks for all your comments. And let me know how you fared with this month's puzzle.