Insight and analysis on the information technology space from industry thought leaders.

T-SQL Feature Request: Add RESET WHEN Clause to Reset Window Partition

In this article, I present a couple of classic, representative T-SQL querying tasks, provide the solutions that are used currently, and explain how you would handle them with improved solutions using the RESET WHEN clause. Hopefully, by the end of this article you will be convinced of the importance of this feature and will add your vote to the connect item requesting it.

Itzik Ben-Gan

July 11, 2017

12 Min Read
T-SQL Feature Request: Add RESET WHEN Clause to Reset Window Partition

Window functions allow you to solve a wide variety of T-SQL querying tasks elegantly and efficiently. Still, as of SQL Server 2017, there are some tasks that are difficult to solve with efficient set-based solutions that could be handled easily and efficiently if T-SQL added support for a windowing clause called RESET WHEN. This clause resets the window partition when a certain condition—possibly one that is based on a window function—is met. This feature isn’t standard, but is currently supported by Teradata (see documentation article on the feature here). Thanks to Microsoft Data Platform MVP Alejandro Mesa for introducing the feature to me. In this article, I present a couple of classic, representative T-SQL querying tasks, provide the solutions that are used currently, and explain how you would handle them with improved solutions using the RESET WHEN clause. Hopefully, by the end of this article you will be convinced of the importance of this feature and will add your vote to the connect item requesting it.

Depleting Quantities

I’ll start with a challenge involving the computation of running totals that need to be depleted when a certain condition is met. Microsoft Data Platform MVP Geri Reshef presented the original challenge; later, Sharon Rimer of Naya Technologies presented a variation of the challenge.

The sample data for this challenge involves a table called Transactions, which you create and populate using the following code:

SET NOCOUNT ON;USE tempdb;DROP TABLE IF EXISTS dbo.Transactions;GOCREATE TABLE dbo.Transactions(  txid INT NOT NULL CONSTRAINT PK_Transactions PRIMARY KEY,  qty  INT NOT NULL);GOTRUNCATE TABLE dbo.Transactions;INSERT INTO dbo.Transactions(txid, qty)  VALUES(1,2),(2,5),(3,4),(4,1),(5,10),(6,3),        (7,1),(8,2),(9,1),(10,2),(11,1),(12,9);

 

The transactions keep adding quantities of some item to a container based on txid ordering. As soon as the cumulative quantity exceeds the container’s capacity (given as input), the container needs to be depleted. Your solution needs to show the state of the container (total quantity) after every transaction, with 0 as the total quantity after depletion, as well as the depletion quantity when relevant. Here’s the desired result for the given sample data and an input container capacity of 5:

txid        qty         totalqty    depletionqty----------- ----------- ----------- ------------1           2           2           02           5           0           73           4           4           04           1           5           05           10          0           156           3           3           07           1           4           08           2           0           69           1           1           010          2           3           011          1           4           012          9           0           13

Solution Based on Cursor

It’s common for people to try and solve such tasks using recursive queries. Such solutions can be elegant, but aren’t very efficient. People also solve such tasks using a method known as “quirky update,” which is very efficient, but not guaranteed since it relies on physical processing order. So far, I haven’t found efficient, guaranteed, set-based solutions for this task, and instead have been reluctantly using iterative solutions (T-SQL- or CLR-based).

Here’s an example for a simple iterative T-SQL solution using a cursor:

SET NOCOUNT ON;DECLARE @maxallowedqty AS INT = 5;DECLARE @C AS CURSOR, @txid AS INT, @qty AS INT, @totalqty AS INT = 0, @depletionqty AS INT = 0;DECLARE @Result AS TABLE(  txid         INT NOT NULL PRIMARY KEY,  qty          INT NOT NULL,  totalqty     INT NOT NULL,  depletionqty INT NOT NULL);SET @C = CURSOR FORWARD_ONLY STATIC READ_ONLY FOR  SELECT txid, qty  FROM dbo.Transactions  ORDER BY txid;OPEN @C;FETCH NEXT FROM @C INTO @txid, @qty;WHILE @@FETCH_STATUS = 0BEGIN  SELECT @totalqty += @qty, @depletionqty = 0;  IF @totalqty > @maxallowedqty  BEGIN    SET @depletionqty = @totalqty;    SET @totalqty = 0;  END;  INSERT INTO @Result(txid, qty, totalqty, depletionqty)    VALUES(@txid, @qty, @totalqty, @depletionqty);  FETCH NEXT FROM @C INTO @txid, @qty;END;SELECT txid, qty, totalqty, depletionqtyFROM @ResultORDER BY txid;

The code defines a cursor variable and uses it to fetch the transactions one at a time in chronological order. It accumulates the quantities in a variable called @totalqty. After every row fetched, the code checks if the cumulative quantity exceeds the container capacity, in which case it sets a variable called @depletionqty to the current @totalqty value, and then zeroes the current @totalqty value. The code then writes the current transaction info (txid and qty) along with the current @totalqty and @depletionqty values to a table variable. Once done iterating through all transactions, the code queries the table variable to produce the desired result.

Solution Using RESET WHEN (Not Supported as of SQL Server 2017)

The downsides of using iterative solutions are well known. The question is always whether a descent set-based alternative exists. So far, I haven’t found one myself for the task at hand with the existing tools that we have in T-SQL, and would love to see one created some day. As mentioned, Teradata supports a windowing clause called RESET WHEN that resets the window partition when a certain condition is met. The beauty in this clause is that the condition can use a window function, so you can ask about what accumulated until the previous row. With our task you would reset the window partition when the sum of the quantities from the beginning of the partition and until the previous row exceeds the input container limit, like so:

(Remember, this code currently isn’t supported in SQL Server.)

DECLARE @maxallowedqty AS INT = 5;SELECT txid, qty,  SUM(qty) OVER(ORDER BY txid                RESET WHEN                  -- reset window partition when                   -- running sum until previous row > @maxallowedqty                  SUM(qty) OVER(ORDER BY txid                                ROWS BETWEEN UNBOUNDED PRECEDING                                         AND 1 PRECEDING)                    > @maxallowedqty                ROWS UNBOUNDED PRECEDING) AS runsumFROM dbo.Transactions;

Had this code been supported, it would have produced the following output:

txid        qty         runsum   ----------- ----------- -----------1           2           2          2           5           7          -----------------------------------3           4           4          4           1           5          5           10          15          -----------------------------------6           3           3          7           1           4          8           2           6          -----------------------------------9           1           1          10          2           3          11          1           4          12          9           13          

As you can see, the window partition is reset after transaction IDs 2, 5 and 8.

To produce the final desired result, you set the total container quantity (call it totalqty) to zero when the running sum exceeds the container limit, and to the running sum otherwise. You then compute the depletion quantity (call it depletionqty) as the running sum minus the total quantity. Here’s the complete solution’s code:

DECLARE @maxallowedqty AS INT = 5;WITH C AS(  SELECT *,    SUM(qty) OVER(ORDER BY txid                  RESET WHEN                    SUM(qty) OVER(ORDER BY txid                                  ROWS BETWEEN UNBOUNDED PRECEDING                                           AND 1 PRECEDING)                      > @maxallowedqty                  ROWS UNBOUNDED PRECEDING) AS runsum  FROM dbo.Transactions)SELECT txid, qty, totalqty, runsum - totalqty AS depletionqtyFROM C  CROSS APPLY    ( VALUES( CASE WHEN runsum > @maxallowedqty THEN 0 ELSE runsum END ) )      AS A(totalqty);

As you can see, the solution is simple, concise and elegant!

Recently, Sharon Rimer of Naya Technologies introduced a variation of this challenge based on one of the company's customer cases where the desired result was to compute how many times the container exceeded the input limit. To handle this need, you would keep the same definition for the CTE C, and use the following outer query to compute the count:

SELECT COUNT( CASE WHEN runsum > @maxallowedqty THEN 1 END ) AS timesexceededFROM C;

The complete solution would look like this:

WITH C AS(  SELECT *,    SUM(qty) OVER(ORDER BY txid                  RESET WHEN                    SUM(qty) OVER(ORDER BY txid                                  ROWS BETWEEN UNBOUNDED PRECEDING                                           AND 1 PRECEDING)                      > @maxallowedqty                  ROWS UNBOUNDED PRECEDING) AS runsum  FROM dbo.Transactions)SELECT COUNT( CASE WHEN runsum > @maxallowedqty THEN 1 END ) AS timesexceededFROM C;

Again, short and elegant.

Complex Islands

There are many other challenges that perhaps do currently have a reasonable T-SQL set-based solution, but that can be solved more easily and elegantly with the RESET WHEN clause. A good example is handling complex islands tasks where you need to define a new island whenever a condition that compares something from the current row and something from the previous row is met. To demonstrate such a challenge, I’ll use a table called Stocks, which you create and populate by running the following code:

SET NOCOUNT ON;USE tempdb;DROP TABLE IF EXISTS dbo.StockRates;GOCREATE TABLE dbo.StockRates(  stockid  INT  NOT NULL,  dt       DATE NOT NULL,  val      INT  NOT NULL,  CONSTRAINT PK_StockRates PRIMARY KEY(stockid, dt));GOINSERT INTO dbo.StockRates VALUES  (1, '2017-08-01', 13),  (1, '2017-08-02', 14),  (1, '2017-08-03', 17),  (1, '2017-08-04', 40),  (1, '2017-08-05', 45),  (1, '2017-08-06', 52),  (1, '2017-08-07', 56),  (1, '2017-08-08', 60),  (1, '2017-08-09', 70),  (1, '2017-08-10', 30),  (1, '2017-08-11', 29),  (1, '2017-08-12', 35),  (1, '2017-08-13', 40),  (1, '2017-08-14', 45),  (1, '2017-08-15', 60),  (1, '2017-08-16', 60),  (1, '2017-08-17', 55),  (1, '2017-08-18', 60),  (1, '2017-08-19', 20),  (1, '2017-08-20', 15),  (1, '2017-08-21', 20),  (1, '2017-08-22', 30),  (1, '2017-08-23', 40),  (1, '2017-08-24', 20),  (1, '2017-08-25', 60),  (1, '2017-08-26', 80),  (1, '2017-08-27', 70),  (1, '2017-08-28', 70),  (1, '2017-08-29', 40),  (1, '2017-08-30', 30),  (1, '2017-08-31', 10),  (2, '2017-08-01', 3),  (2, '2017-08-02', 4),  (2, '2017-08-03', 7),  (2, '2017-08-04', 30),  (2, '2017-08-05', 35),  (2, '2017-08-06', 42),  (2, '2017-08-07', 46),  (2, '2017-08-08', 50),  (2, '2017-08-09', 60),  (2, '2017-08-10', 20),  (2, '2017-08-11', 19),  (2, '2017-08-12', 25),  (2, '2017-08-13', 30),  (2, '2017-08-14', 35),  (2, '2017-08-15', 50),  (2, '2017-08-16', 50),  (2, '2017-08-17', 45),  (2, '2017-08-18', 50),  (2, '2017-08-19', 10),  (2, '2017-08-20', 5),  (2, '2017-08-21', 10),  (2, '2017-08-22', 20),  (2, '2017-08-23', 30),  (2, '2017-08-24', 10),  (2, '2017-08-25', 50),  (2, '2017-08-26', 70),  (2, '2017-08-27', 60),  (2, '2017-08-28', 60),  (2, '2017-08-29', 30),  (2, '2017-08-30', 20),  (2, '2017-08-31', 1);

In this table, you keep track of daily closing stock rates. Say you need to identify periods (islands) where the stock value is greater than or equal to 50, and for each period you need to show when it started and ended, as well as what the maximum stock rate was during that period. The catch is that you need to tolerate (ignore) gaps of up to 6 days. Here’s the desired result for the given sample data:

stockid     startdate  enddate    maxvalue----------- ---------- ---------- -----------1           2017-08-06 2017-08-18 701           2017-08-25 2017-08-28 802           2017-08-08 2017-08-18 602           2017-08-25 2017-08-28 70

Observe, for example, that for the first island for stock ID 1 we ignored the gap between Aug. 9, 2017, and Aug. 15, 2017, since it wasn’t longer than 6 days, but we didn’t ignore the gap between Aug. 18, 2017, and Aug. 25, 2017, since it was 7-days long.

Currently Supported T-SQL Solution

As mentioned, there are currently supported set-based solutions for islands challenges like this one, but they are longer and more complex than how you would handle such tasks with the RESET WHEN clause. With one of the currently supported solutions, as the first step you compute a flag (call it isstart) that you set to 0 when it’s not the beginning of an island by comparing something from the current row and with something from the previous row (obtained with the LAG function). Otherwise, you set the flag to 1. In our case, after filtering only rows where the stock value is greater than or equal to 50, you set the flag to 0 when the difference between the previous date and the current date is less than 7 days; otherwise, you set the flag to 1. Here’s the code that implements this step:

SELECT stockid, dt, val,  CASE    WHEN DATEDIFF(day, LAG(dt) OVER(PARTITION BY stockid ORDER BY dt), dt)           < 7      THEN 0    ELSE 1  END AS isstartFROM dbo.StockRatesWHERE val >= 50;

This code generates the following output:

stockid     dt         val         isstart----------- ---------- ----------- -----------1           2017-08-06 52          11           2017-08-07 56          01           2017-08-08 60          01           2017-08-09 70          01           2017-08-15 60          01           2017-08-16 60          01           2017-08-17 55          01           2017-08-18 60          01           2017-08-25 60          11           2017-08-26 80          01           2017-08-27 70          01           2017-08-28 70          02           2017-08-08 50          12           2017-08-09 60          02           2017-08-15 50          02           2017-08-16 50          02           2017-08-18 50          02           2017-08-25 50          12           2017-08-26 70          02           2017-08-27 60          02           2017-08-28 60          0

As a second step, you produce an island identifier by computing the running sum of the isstart flag. Finally, you group the data by stockid and isstart, and return the start and end dates of the island, plus the maximum stock rate during the period. Here’s the complete solution’s code:

WITH C1 AS(  SELECT *,    CASE      WHEN DATEDIFF(day, LAG(dt) OVER(PARTITION BY stockid ORDER BY dt), dt)             < 7        THEN 0      ELSE 1    END AS isstart  FROM dbo.StockRates  WHERE val >= 50),C2 AS(  SELECT *,    SUM(isstart) OVER(PARTITION BY stockid ORDER BY dt                      ROWS UNBOUNDED PRECEDING) AS grp  FROM C1)SELECT stockid,  MIN(dt) AS startdate,  MAX(dt) AS enddate,  MAX(val) as maxvalueFROM C2GROUP BY stockid, grpORDER BY stockid, startdate;

Solution Using RESET WHEN (Not Supported as of SQL Server 2017)

Using the RESET WHEN clause, it would have been easier to solve the task since you can simply reset the window partition when the condition for starting a new island is met. You could then use the minimum date in the partition is the island identifier.

Here’s an example of how you would compute the island identifier (call it grp):

SELECT stockid, dt, val,  MIN(dt) OVER(PARTITION BY stockid               ORDER BY dt               RESET WHEN DATEDIFF(day,                  MIN(dt) OVER(                   PARTITION BY stockid ORDER BY dt                   ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING),                 dt) >= 7               ROWS UNBOUNDED PRECEDING) AS grpFROM dbo.StocksWHERE val >= 50;

Here the code uses the MIN window function to obtain the date from the previous row by using the window frame extent ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING. Alternatively, you could get the date from the previous row with the LAG function, like so (showing only the alternative RESET WHEN clause):

RESET WHEN DATEDIFF(day,   LAG(dt) OVER(PARTITION BY stockid ORDER BY dt),  dt) >= 7

Here’s what the output of this step would be, showing the computed group identifier for each island:

stockid     dt         val         grp----------- ---------- ----------- -----------1           2017-08-06 52          2017-08-061           2017-08-07 56          2017-08-061           2017-08-08 60          2017-08-061           2017-08-09 70          2017-08-061           2017-08-15 60          2017-08-061           2017-08-16 60          2017-08-061           2017-08-17 55          2017-08-061           2017-08-18 60          2017-08-061           2017-08-25 60          2017-08-251           2017-08-26 80          2017-08-251           2017-08-27 70          2017-08-251           2017-08-28 70          2017-08-252           2017-08-08 50          2017-08-082           2017-08-09 60          2017-08-082           2017-08-15 50          2017-08-082           2017-08-16 50          2017-08-082           2017-08-18 50          2017-08-082           2017-08-25 50          2017-08-252           2017-08-26 70          2017-08-252           2017-08-27 60          2017-08-252           2017-08-28 60          2017-08-25

Then you would need just one more step to group this data by stockid and grp, and return the beginning and end of each period, plus the maximum stock rate within the period, like so:

WITH C AS(  SELECT *,    MIN(dt) OVER(PARTITION BY stockid                 ORDER BY dt                 RESET WHEN DATEDIFF(day,                    LAG(dt) OVER(PARTITION BY stockid ORDER BY dt),                   dt) >= 7                 ROWS UNBOUNDED PRECEDING) AS grp  FROM dbo.Stocks  WHERE val >= 50)SELECT stockid,  grp AS startdate,  MAX(dt) AS enddate,  MAX(val) as maxvalueFROM CGROUP BY stockid, grpORDER BY stockid, grp;

As you can see, this solution is shorter and simpler than the currently supported one.

Conclusion

Window functions are profound; there’s no question about it. Wouldn’t you like to see them becoming even more powerful? The RESET WHEN clause, if implemented in T-SQL, will doubtless increase its utility and enable us to replace existing iterative solutions with elegant and more efficient set-based ones. If you find that the addition of this feature would be beneficial to you, be sure to add your vote to the connect item requesting it.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like