Calculating Concurrent Sessions, Part 3

A new set-based solution far exceeds previous solutions

Itzik Ben-Gan

February 9, 2010

8 Min Read
Calculating Concurrent Sessions, Part 3

In “Calculating Concurrent Sessions, Part 1” and “Calculating Concurrent Sessions, Part 2,” I covered a task to calculate the maximum number of concurrent sessions for each application. I started the series by presenting a set-based solution (call it Original Set-Based Solution) that didn’t perform well because it had quadratic algorithmic complexity. I also presented a cursor-based solution (call it Cursor-Based Solution) and explained that the cursor alternative performed better because it had linear complexity. In the second part of the series I presented a new set-based solution (call it New Set-Based Solution 1) with linear complexity. This solution performed better than the cursor-based solution, but it involved using a temporary table, a couple of scans of the data, plus a seek operation in an index for each row from the table.

I thought that New Set-Based Solution 1 was the best available, but I was pleasantly surprised to learn about a fantastic set-based solution (call it New Set-Based Solution 2) that performs even better. Several readers sent me this solution. New Set-Based Solution 2 doesn’t involve the use of any temporary tables, requires only two scans of the data, and has linear complexity. In a benchmark test that I ran, it proved to be an order of magnitude faster compared with New Set-Based Solution 1. I’d like to thank Ben Flanaghan, Arnold Fribble, and R. Barry Young for coming up with the new solution.

Creating and Populating the Table

When I originally presented the task, I provided code to create and populate a table called Sessions. Listing 1 contains the code to create and populate the Sessions table with sample data.

SET NOCOUNT ON;USE tempdb; IF OBJECT_ID('dbo.Sessions', 'U') IS NOT NULL DROP TABLE dbo.Sessions; CREATE TABLE dbo.Sessions(  keycol    INT         NOT NULL,  app       VARCHAR(10) NOT NULL,  usr       VARCHAR(10) NOT NULL,  host      VARCHAR(10) NOT NULL,  starttime DATETIME    NOT NULL,  endtime   DATETIME    NOT NULL,  CONSTRAINT PK_Sessions PRIMARY KEY(keycol),  CHECK(endtime > starttime));GO CREATE INDEX idx_nc_app_st_et ON dbo.Sessions(app, starttime, endtime);CREATE INDEX idx_nc_app_et_st ON dbo.Sessions(app, endtime, starttime); INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(2,  'app1', 'user1', 'host1', '20090212 08:30', '20090212 10:30');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(3,  'app1', 'user2', 'host1', '20090212 08:30', '20090212 08:45');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(5,  'app1', 'user3', 'host2', '20090212 09:00', '20090212 09:30');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(7,  'app1', 'user4', 'host2', '20090212 09:15', '20090212 10:30');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(11, 'app1', 'user5', 'host3', '20090212 09:15', '20090212 09:30');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(13, 'app1', 'user6', 'host3', '20090212 10:30', '20090212 14:30');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(17, 'app1', 'user7', 'host4', '20090212 10:45', '20090212 11:30');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(19, 'app1', 'user8', 'host4', '20090212 11:00', '20090212 12:30');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(23, 'app2', 'user8', 'host1', '20090212 08:30', '20090212 08:45');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(29, 'app2', 'user7', 'host1', '20090212 09:00', '20090212 09:30');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(31, 'app2', 'user6', 'host2', '20090212 11:45', '20090212 12:00');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(37, 'app2', 'user5', 'host2', '20090212 12:30', '20090212 14:00');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(41, 'app2', 'user4', 'host3', '20090212 12:45', '20090212 13:30');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(43, 'app2', 'user3', 'host3', '20090212 13:00', '20090212 14:00');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(47, 'app2', 'user2', 'host4', '20090212 14:00', '20090212 16:30');INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)  VALUES(53, 'app2', 'user1', 'host4', '20090212 15:30', '20090212 17:00');GO

Listing 2 contains code to create a helper table function called GetNums, which returns a table result with a sequence of integers of a requested size.

IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL  DROP FUNCTION dbo.GetNums;GOCREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLEASRETURN  WITH  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n FROM L5)  SELECT TOP(@n) n FROM Nums ORDER BY n;GO

Listing 3 contains the code to populate the Sessions table with a large set of rows to test the performance of the solutions.

SET NOCOUNT ON;USE tempdb; TRUNCATE TABLE dbo.Sessions; DECLARE @numrows AS INT;SET @numrows = 10000; -- Change this value according to your needs INSERT INTO dbo.Sessions WITH(TABLOCK)    (keycol, app, usr, host, starttime, endtime)  SELECT    ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS keycol,    D.*,    DATEADD(      second,      1 + ABS(CHECKSUM(NEWID())) % (20*60),      starttime) AS endtime  FROM  (    SELECT      'app' + CAST(1 + ABS(CHECKSUM(NEWID())) % 10 AS VARCHAR(10)) AS app,      'user1' AS usr,      'host1' AS host,      DATEADD(        second,        1 + ABS(CHECKSUM(NEWID())) % (30*24*60*60),        '20090101') AS starttime    FROM dbo.GetNums(@numrows) AS Nums    WHERE n <= @numrows  ) AS D;GO

As a reminder, you’re supposed to calculate the maximum number of concurrent sessions (active simultaneously) for each application. If one application ends at the same time that another starts, you’re not supposed to consider the two as concurrent. Table 1 shows the desired results for the sample data provided in Listing 1.

Table 1: Desired Results

app

mx

app1

4

app2

3

New Set-Based Solution 2

Like the cursor-based solution that I covered previously, New Set-Based Solution 2 relies on unifying start and end events into one chronological sequence of events, associating a +1 event type to start events, and a -1 event type to end events. However, instead of using a cursor to process the records one at a time and keep aggregating the event type to calculate the count of concurrent sessions at each point, New Set-Based Solution 2 uses row numbers to calculate the count of concurrent sessions. The technique is ingenious and surprisingly simple. The first part of the solution involves calculating start ordinals (how many sessions started so far), as well as start-or-end ordinals (how many sessions either started or ended so far). Start ordinals are calculated by assigning row numbers, partitioned by app, ordered by starttime, only to start events, and assigning a place holder (e.g., a NULL) to end events, like so:

SELECT keycol, app, starttime AS ts, +1 AS type,  ROW_NUMBER() OVER(PARTITION BY app     ORDER BY starttime) AS start_ordinalFROM dbo.SessionsUNION ALLSELECT keycol, app, endtime, -1, NULLFROM dbo.Sessions;

To calculate start-or-end ordinals, define a common table expression (CTE—call it C1) based on this query, and then in the outer query, assign row numbers to the unified events, partitioned by app, and ordered by ts and type, like so:

WITH C1 AS(  SELECT keycol, app, starttime AS ts, +1 AS type,    ROW_NUMBER() OVER(PARTITION BY app       ORDER BY starttime) AS start_ordinal  FROM dbo.Sessions  UNION ALL  SELECT keycol, app, endtime, -1, NULL  FROM dbo.Sessions)SELECT *,  ROW_NUMBER() OVER(PARTITION BY app ORDER BY    ts, TYPE, keycol) AS start_or_end_ordinalFROM C1;

Table 2 shows the output of this code in abbreviated form.

Table 2: Ordinals

app

ts

type

start_ordinal

start_or_end_ordinal

app1

2009-02-12 08:30:00.000

1

1

1

app1

2009-02-12 08:30:00.000

1

2

2

app1

2009-02-12 08:45:00.000

-1

NULL

3

app1

2009-02-12 09:00:00.000

1

3

4

app1

2009-02-12 09:15:00.000

1

4

5

app1

2009-02-12 09:15:00.000

1

5

6

app1

2009-02-12 09:30:00.000

-1

NULL

7

app1

2009-02-12 09:30:00.000

-1

NULL

8

app1

2009-02-12 10:30:00.000

-1

NULL

9

app1

2009-02-12 10:30:00.000

-1

NULL

10

Now that you have start ordinals (start_ordinal attribute) and start-or-end ordinals (start_or_end_ordinal attribute), you can easily calculate how many sessions ended at each point: end_ordinal = start_or_end_ordinal - start_ordinal. To calculate the count of active sessions at each point, simply subtract end_ordinal from start_ordinal: cnt = start_ordinal - end_ordinal = start_ordinal - (start_or_end_ordinal - start_ordinal) = 2 × start_ordinal - start_or_end_ordinal. So the only thing left is to group the data by app, and calculate the maximum count for each application, like so:

WITH C1 AS(  SELECT keycol, app, starttime AS ts, +1 AS type,    ROW_NUMBER() OVER(PARTITION BY app       ORDER BY starttime) AS start_ordinal  FROM dbo.Sessions  UNION ALL  SELECT keycol, app, endtime, -1, NULL  FROM dbo.Sessions),C2 AS(  SELECT *,    ROW_NUMBER() OVER(PARTITION BY app       ORDER BY ts, TYPE, keycol) AS start_or_end_ordinal  FROM C1)SELECT app, MAX(2 * start_ordinal - start_or_end_  ordinal) AS mxFROM C2WHERE type = 1GROUP BY app;

Not only is this solution beautiful in its simplicity, it’s also extremely efficient. Figure 1 shows this solution’s execution plan.

Similar to what I described about the cursor-based solution in the previous articles in this series, SQL Server relies on index ordering to unify start and end events with a Merge Join operator, as well as to calculate both the start ordinals and start-or-end ordinals with the ROW_NUMBER function. Amazingly, this plan doesn’t involve a single sort operator! As you probably realize, this means that the solution’s algorithmic complexity is linear.

Figure 2 shows the benchmark results for this solution compared with the other three solutions that I covered previously. Note that the graph for New Set-Based Solution 2 seems to merge with the X axis; this solution is about an order of magnitude faster than New Set-Based Solution 1.

An Aha! Moment

My recent experience with the task of calculating the maximum number of concurrent sessions truly inspired me. After many years of thinking that a cursor-based solution was the best way to address the task, I found a better set-based solution, and then an even better one. This discovery gives me hope that there are other kinds of problems for which a good-performing set-based solution hasn’t yet been found, but exists. The experience also reinforced my opinion that it’s worthwhile to revisit such problems from time to time, as well as to let others look at problems, in hopes of gaining new insight. I’m continually amazed at the profoundness of calculations based on the OVER clause. Perhaps the SQL Server development team will finally see the light, and will enhance SQL Server’s support for the OVER clause with standard features that are still missing.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like