In my previous post I provided a T-SQL challenge called Travel Puzzle that involved calculating the total miles travelled domestically in each country by each traveler. You can find the details of the puzzle here: http://www.sqlmag.com/blogs/puzzled-by-t-sql/tabid/1023/entryid/13023/TSQL-Travel-Puzzle.aspx. Here I’m going to cover the solutions. Thanks to all those who posted solutions including Jermy, Peter Larsson (Peso), Marcello Poletti (Marc), Regan Wick, and Colleen Morrow.
The solutions use four different strategies:
1. Identifying islands (Itzik 1, Peso)
2. Using a subquery that retrieves information from the “next” row (Regan, Colleen)
3. Using a function that returns the “next” row and a recursive query (Marc)
4. Matching adjacent rows using row numbers (Jermy, Itzik 2)
Now for the details…
1. Identifying islands (Itzik 1 and Peso)
The key part of the solution is to calculate two row numbers that are partitioned by travelerid—one ordered by travelstart, and the other ordered by destination, travelstart. The difference between the two is constant and unique per traveler and destination. You can then group the rows by traveler, destination and that diff, calculate the difference between the maximum miles traveled and minimum miles traveled, and this way get the total miles traveled in each leg of the itinerary. Finally, group the result by traveler and calculate the total miles traveled in all legs.
Here’s my solution implementing this approach along with performance measures:
-- Itzik 1
-- CPU time = 9501 ms, elapsed time = 8554 ms, logical reads 4978
WITH C1 AS
(
SELECT travelerid, destination, milestraveled,
ROW_NUMBER() OVER(PARTITION BY travelerid
ORDER BY travelstart) -
ROW_NUMBER() OVER(PARTITION BY travelerid
ORDER BY destination, travelstart) AS grp
FROM dbo.Travel
),
C2 AS
(
SELECT travelerid, destination,
MAX(milestraveled) - MIN(milestraveled) AS miles
FROM C1
GROUP BY travelerid, destination, grp
)
SELECT travelerid, destination, SUM(miles) AS totalmiles
FROM C2
GROUP BY travelerid, destination;
Peso implemented a very similar idea, only in the second row number he specified the destination attribute as a partitioning (as opposed to ordering) element. The effect is the same, though. Here’s Peso’s solution:
-- Peso
-- CPU time = 10950 ms, elapsed time = 8347 ms, logical reads 4978
WITH cteTravel(TravelerID, Destination, TotalMiles)
AS
(
SELECT TravelerID,
Destination,
MAX(MilesTraveled) - MIN(MilesTraveled) AS TotalMiles
FROM (
SELECT TravelerID,
Destination,
MilesTraveled,
ROW_NUMBER() OVER (PARTITION BY TravelerID ORDER BY TravelStart) AS Yak,
ROW_NUMBER() OVER (PARTITION BY TravelerID, Destination ORDER BY TravelStart) AS Yik
FROM dbo.Travel
) AS x
GROUP BY TravelerID,
Destination,
Yak - Yik
)
SELECT TravelerID,
Destination,
SUM(TotalMiles) AS TotalMiles
FROM cteTravel
GROUP BY TravelerID,
Destination;
The benefit in these solutions is that the data is scanned only once. One of the row number calculations can benefit from an ordered scan of an index. However, there’s no avoiding of one sort operation in the plan.
2. Using a subquery that retrieves information from the “next” row (Regan, Colleen)
The solutions in this category use a subquery retrieving the interesting information (milestraveled) from the “next” row; that is, in respect to the current row, the first row with the same travelerid value, a greater travelstart value, based on travelstart ordering, provided that the destination is the same. Regan implemented this approach using APPLY and a TOP query, like so:
-- Regan
-- CPU time = 11372 ms, elapsed time = 4552 ms, logical reads 3197603
SELECT
dbo.Travel.travelerid
,dbo.Travel.destination
,SUM(nextstop.milestraveled - dbo.Travel.milestraveled)