Running aggregations are calculations that continue to accumulate measures over a sequence—typically temporal, and possibly within partitions. An example of obtaining a running aggregate is finding a running sum of quantity for each employee and month, from a table that holds a row for each employee and month with measures such as quantity and value. That is, for each employee and month, you return the sum of all quantities from the beginning of the employee’s activity until the current month.
Last month I started a series of articles that explore the performance aspects and algorithmic complexity of different solutions to running aggregations. I covered two set-based solutions—one using subqueries and another using joins. (See "Subqueries and Joins for Running Aggregates.") This month I cover a solution based on cursors.
Some developers and database administrators consider cursors to be evil, even going so far as to recommend avoiding them at all costs. Indeed, set-based solutions have many advantages over cursor-based solutions in terms of being more aligned with the relational model, requiring less code, being more readable, involving less maintenance, and typically performing better. However, cursors perform better than set-based solutions in some cases. In this article I show that under certain circumstances, cursor-based solutions to running aggregations provide better performance than set-based solutions.
In this article I use the same Sales table as last month. Run the code in Listing 1 to create the table. The Sales table contains a row for each employee and month. The definition of the Sales table contains an attribute for the employee ID (empid), month (dt), quantity (qty), and value (val). The primary key (as well as the clustered index) is defined on the key list (empid, dt). The clustered index follows the indexing guidelines to support solutions to running aggregates—the key list should be the partitioning column(s) followed by the ordering column(s), and the aggregated measures should be covered by the index as well.
Run the code in Listing 2 to create the helper function GetNums, which returns a table of numbers with a requested number of rows. Next, run the code in Listing 3 to populate the Sales table with the desired number of partitions (employees) and the desired partition size (rows per employee).
Listing 5 shows the cursor-based solution for our running sum request. The solution is fairly straightforward. The code defines a table variable called @Result where the result will be stored, as well as a few local variables where the values from each row will be stored. The code declares a cursor based on a query that sorts the data by empid and dt. Based on this ordering, the code fetches the cursor records one at a time. In each iteration of the loop, the code handles the current row. If the current row’s empid value is different than the previous row’s value, this indicates that the current row is the first in the partition (for the employee). In such a case, the code initializes the variable holding the current running aggregate (@sumqty) with 0. The code then adds the current quantity to @sumqty and inserts a row with the current details and the running sum into the table variable @Result. Finally the code queries the table variable to return the result.
The clustered index created on the attributes (empid, dt) as the key list is ideal for this solution. Figure 1 shows the execution plan for the query that the cursor is based on. As you can see in the plan, SQL Server performs an ordered scan of the index.
Like I did last month, I’ll now analyze the solution’s algorithmic complexity to see how changing various factors in the solution affects its performance. The factors that I’ll examine are the number of aggregations to be calculated, the number of partitions involved, and the average number of rows per partition (assuming even distribution for the sake of simplicity). Again, I’ll use the letter p to represent the number of partitions, r to represent the average number of rows per partition, and a to represent the number of aggregations.
Let’s express the number of rows processed by the cursor-based solution when one aggregation is involved. As you can see in the plan that Figure 1 shows, the rows from the Sales table are scanned once from the clustered index, in index order. You can therefore express the number of rows processed as pr rows (again, assuming fairly even distribution of rows among partitions).
Note that individual row processing is more expensive when you process data with a cursor than when you use a set-based solution. For example, if you represent the cost of processing n rows with a set-based solution as n, you can express the cost of processing n rows with a cursor as n(1+ o) or n + no, where o represent the extra overhead associated with the processing of a row with a cursor. In terms of run time, n + no typically translates to several dozens of times longer than just n. So if pr is the number of rows processed by the cursor-based solution, and you want to add the cursor overhead to the formula representing the solution’s cost, you can express it as pr + pro. This is in comparison with set-based code that just scans pr rows, and whose cost you express as pr.
Effect of Number of Aggregates
Last month I covered in detail the set-based solution using subqueries, which Listing 4 shows. As you might recall, I expressed the number of rows to be processed by this solution as ap(r + r2)/2. I explained that unfortunately SQL Server’s optimizer performs a separate scan of the data for each subquery, and if you have a aggregates to calculate, this means that the work involved in handling each aggregate will occur a times. In other words, in terms of changes in a, this solution has linear complexity.
Now let’s explore the effect that increasing the number of aggregations has on the cursor-based solution. The two main factors contributing to the cost of the cursor-based solution are the number of rows being processed (pr) and the cursor overhead associated with each record manipulation (o). These two factors remain constant when the only change is in the number of aggregations to calculate (a). In other words, with respect to changes in a, the cursor solution has constant complexity, or close to it. Of course, a few more CPU cycles are required to calculate the additional aggregates, and a bit more space is necessary for the expanded row in the table variable, but those shouldn’t have a dramatic effect on the solution’s performance. Therefore, theoretically, adding aggregations to the cursor solution should cause only slight performance degradation.
To test this theory, I used the code in Listing 6 to calculate four running aggregates instead of one. The effect on performance was only about 10 percent degradation, thus proving the theory.
Effect of Number of Partitions
The next factor to evaluate is the number of partitions involved—that is, the effect of increasing the number of partitions on the solution’s performance. Regarding the set-based solution, when calculating a single aggregate, you can express its cost as p(r + r2)/2. If you increase the number of partitions by a factor of f, you get pf(r + r2)/2. This means that with respect to the number of partitions, the set-based solution has linear complexity; in other words, if you increase the number of partitions by a factor of f, the cost (which translates to run time) increases by a factor of f as well.
Earlier I expressed the cost of the cursor solution as pr + pro. If you increase the number of partitions by a factor of f, you get pfr + pfro, meaning that with respect to changes in the number of partitions, the cursor solution should also have liner complexity. Figure 2 shows a benchmark in which I started with 10,000 partitions with a constant partition size of 10, and I kept increasing the number of partitions by 10,000 every time until I reached 100,000 partitions. This benchmark clearly shows that the cursor solution also has linear complexity with respect to the number of partitions.
Effect of Partition Size
It’s interesting to see in Figure 2 that the cursor solution is significantly slower than the set-based solution when dealing with a very small partition size (10 rows in this case). Next, let’s evaluate the effect of increasing the partition size on the solution.
Regarding the set-based solution, if you increase the partition size by a factor of f, you can express the effect on the cost of the solution as p(rf + (rf)2)/2. This tells you that with respect to the partition size, the set-based solution has close to quadratic complexity. If the partition size increases by a factor of f, the cost increases by almost f2. For example, if you increase the partition size by 2, the run time increases by almost 4 times.
As for the cursor solution, when you increase the partition size by a factor of f, the effect on the cost is prf + prfo. This means that with respect to the partition size, the cursor solution has linear complexity.
To test this theory, I ran a benchmark in which I started with 1,000 partitions with a partition size of 100, and I gradually increased the partition size by 100 each time until I reached a partition size of 1,000. Figure 3 shows the result of the benchmark. Even though the result merely confirms the theory, I find it fascinating. You can clearly see that there is a point where the cursor solution becomes faster than the set-based solution. In my benchmark, this point is around 500 rows per partition.
A Necessary Evil
Perhaps you’ve grown used to the idea that cursors are evil. I must say that in many respects, I agree. However, the way SQL Server’s optimizer currently handles set-based solutions to running aggregates is with quadratic complexity (n2) with respect to partition size, whereas it handles cursor-based solutions with linear complexity. So in terms of performance, when dealing with small partition sizes—as many as a few hundred rows per partition—you’re better off with a set-based solution. However, when dealing with large partitions—more than 500 rows—you’re better off with a cursor-based solution. In addition, adding aggregates has only a minor effect on the performance of a cursor-based solution, whereas adding partitions causes linear performance degradation.
Next month I’ll explore even more solutions to running aggregates. Stay tuned!
SET NOCOUNT ON; USE tempdb; IF OBJECT_ID('dbo.Sales', 'U') IS NOT NULL DROP TABLE dbo.Sales; CREATE TABLE dbo.Sales ( empid INT NOT NULL, -- partitioning column dt DATETIME NOT NULL, -- ordering column qty INT NOT NULL DEFAULT (1), -- measure 1 val MONEY NOT NULL DEFAULT (1.00), -- measure 2 CONSTRAINT PK_Sales PRIMARY KEY(empid, dt) ); GO
IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL DROP FUNCTION dbo.GetNums; GO CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE AS RETURN WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n FROM L5) SELECT n FROM Nums WHERE n GO
DECLARE @num_partitions AS INT, @rows_per_partition AS INT, @start_dt AS DATETIME; SET @num_partitions = 10000; SET @rows_per_partition = 10; SET @start_dt = '20090101'; TRUNCATE TABLE dbo.Sales; INSERT INTO dbo.Sales WITH (TABLOCK) (empid, dt) SELECT NP.n AS empid, DATEADD(day, RPP.n - 1, @start_dt) AS dt FROM dbo.GetNums(@num_partitions) AS NP CROSS JOIN dbo.GetNums(@rows_per_partition) AS RPP;
SELECT empid, dt, qty, (SELECT SUM(S2.qty) FROM dbo.Sales AS S2 WHERE S2.empid = S1.empid AND S2.dt FROM dbo.Sales AS S1;
DECLARE @Result AS TABLE ( empid INT, dt DATETIME, qty INT, sumqty BIGINT ); DECLARE @empid AS INT, @prvempid AS INT, @dt AS DATETIME, @qty AS INT, @sumqty AS BIGINT; DECLARE C CURSOR FAST_FORWARD FOR SELECT empid, dt, qty FROM dbo.Sales ORDER BY empid, dt; OPEN C FETCH NEXT FROM C INTO @empid, @dt, @qty; SELECT @prvempid = @empid, @sumqty = 0; WHILE @@fetch_status = 0 BEGIN IF @empid <> @prvempid SELECT @prvempid = @empid, @sumqty = 0; SET @sumqty = @sumqty + @qty; INSERT INTO @Result VALUES(@empid, @dt, @qty, @sumqty); FETCH NEXT FROM C INTO @empid, @dt, @qty; END CLOSE C; DEALLOCATE C; SELECT * FROM @Result;
DECLARE @Result AS TABLE ( empid INT, dt DATETIME, qty INT, val MONEY, sumqty BIGINT, avgqty NUMERIC(12, 2), sumval MONEY, avgval MONEY ); DECLARE @empid AS INT, @prvempid AS INT, @dt AS DATETIME, @qty AS INT, @val AS MONEY, @sumqty AS BIGINT, @sumval AS MONEY, @count AS INT; DECLARE C CURSOR FAST_FORWARD FOR SELECT empid, dt, qty, val FROM dbo.Sales ORDER BY empid, dt; OPEN C FETCH NEXT FROM C INTO @empid, @dt, @qty, @val; SELECT @prvempid = @empid, @sumqty = 0, @sumval = 0, @count = 0; WHILE @@fetch_status = 0 BEGIN IF @empid @prvempid SELECT @prvempid = @empid, @sumqty = 0, @sumval = 0, @count = 0; SELECT @sumqty = @sumqty + @qty, @sumval = @sumval + @val, @count = @count + 1; INSERT INTO @Result(empid, dt, qty, val, sumqty, avgqty, sumval, avgval) VALUES(@empid, @dt, @qty, @val, @sumqty, 1.*@sumqty / @count, @sumval, @sumval / @count); FETCH NEXT FROM C INTO @empid, @dt, @qty, @val; END CLOSE C; DEALLOCATE C; SELECT * FROM @Result;