Calculating Concurrent Sessions, Part 3

In “Calculating Concurrent Sessions, Part 1” and “Calculating Concurrent Sessions, Part 2,” I covered a task to calculate the maximum number of concurrent sessions for each application. I started the series by presenting a set-based solution (call it Original Set-Based Solution) that didn’t perform well because it had quadratic algorithmic complexity. I also presented a cursor-based solution (call it Cursor-Based Solution) and explained that the cursor alternative performed better because it had linear complexity. In the second part of the series I presented a new set-based solution (call it New Set-Based Solution 1) with linear complexity. This solution performed better than the cursor-based solution, but it involved using a temporary table, a couple of scans of the data, plus a seek operation in an index for each row from the table.

I thought that New Set-Based Solution 1 was the best available, but I was pleasantly surprised to learn about a fantastic set-based solution (call it New Set-Based Solution 2) that performs even better. Several readers sent me this solution. New Set-Based Solution 2 doesn’t involve the use of any temporary tables, requires only two scans of the data, and has linear complexity. In a benchmark test that I ran, it proved to be an order of magnitude faster compared with New Set-Based Solution 1. I’d like to thank Ben Flanaghan, Arnold Fribble, and R. Barry Young for coming up with the new solution.

Creating and Populating the Table

When I originally presented the task, I provided code to create and populate a table called Sessions. Listing 1 contains the code to create and populate the Sessions table with sample data.

SET NOCOUNT ON;

USE tempdb;

 

IF OBJECT_ID('dbo.Sessions', 'U') IS NOT NULL DROP TABLE dbo.Sessions;

 

CREATE TABLE dbo.Sessions

(

  keycol    INT         NOT NULL,

  app       VARCHAR(10) NOT NULL,

  usr       VARCHAR(10) NOT NULL,

  host      VARCHAR(10) NOT NULL,

  starttime DATETIME    NOT NULL,

  endtime   DATETIME    NOT NULL,

  CONSTRAINT PK_Sessions PRIMARY KEY(keycol),

  CHECK(endtime > starttime)

);

GO

 

CREATE INDEX idx_nc_app_st_et ON dbo.Sessions(app, starttime, endtime);

CREATE INDEX idx_nc_app_et_st ON dbo.Sessions(app, endtime, starttime);

 

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(2,  'app1', 'user1', 'host1', '20090212 08:30', '20090212 10:30');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(3,  'app1', 'user2', 'host1', '20090212 08:30', '20090212 08:45');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(5,  'app1', 'user3', 'host2', '20090212 09:00', '20090212 09:30');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(7,  'app1', 'user4', 'host2', '20090212 09:15', '20090212 10:30');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(11, 'app1', 'user5', 'host3', '20090212 09:15', '20090212 09:30');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(13, 'app1', 'user6', 'host3', '20090212 10:30', '20090212 14:30');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(17, 'app1', 'user7', 'host4', '20090212 10:45', '20090212 11:30');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(19, 'app1', 'user8', 'host4', '20090212 11:00', '20090212 12:30');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(23, 'app2', 'user8', 'host1', '20090212 08:30', '20090212 08:45');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(29, 'app2', 'user7', 'host1', '20090212 09:00', '20090212 09:30');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(31, 'app2', 'user6', 'host2', '20090212 11:45', '20090212 12:00');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(37, 'app2', 'user5', 'host2', '20090212 12:30', '20090212 14:00');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(41, 'app2', 'user4', 'host3', '20090212 12:45', '20090212 13:30');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(43, 'app2', 'user3', 'host3', '20090212 13:00', '20090212 14:00');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(47, 'app2', 'user2', 'host4', '20090212 14:00', '20090212 16:30');

INSERT INTO dbo.Sessions(keycol, app, usr, host, starttime, endtime)

  VALUES(53, 'app2', 'user1', 'host4', '20090212 15:30', '20090212 17:00');

GO

Listing 2 contains code to create a helper table function called GetNums, which returns a table result with a sequence of integers of a requested size.

IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL

  DROP FUNCTION dbo.GetNums;

GO

CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE

AS

RETURN

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n FROM L5)

  SELECT TOP(@n) n FROM Nums ORDER BY n;

GO

Listing 3 contains the code to populate the Sessions table with a large set of rows to test the performance of the solutions.

SET NOCOUNT ON;

USE tempdb;

 

TRUNCATE TABLE dbo.Sessions;

 

DECLARE @numrows AS INT;

SET @numrows = 10000; -- Change this value according to your needs

 

INSERT INTO dbo.Sessions WITH(TABLOCK)

    (keycol, app, usr, host, starttime, endtime)

  SELECT

    ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS keycol,

    D.*,

    DATEADD(

      second,

      1 + ABS(CHECKSUM(NEWID())) % (20*60),

      starttime) AS endtime

  FROM

  (

    SELECT

      'app' + CAST(1 + ABS(CHECKSUM(NEWID())) % 10 AS VARCHAR(10)) AS app,

      'user1' AS usr,

      'host1' AS host,

      DATEADD(

        second,

        1 + ABS(CHECKSUM(NEWID())) % (30*24*60*60),

        '20090101') AS starttime

    FROM dbo.GetNums(@numrows) AS Nums

    WHERE n <= @numrows

  ) AS D;

GO

As a reminder, you’re supposed to calculate the maximum number of concurrent sessions (active simultaneously) for each application. If one application ends at the same time that another starts, you’re not supposed to consider the two as concurrent. Table 1 shows the desired results for the sample data provided in Listing 1.

Table 1: Desired Results

app	mx
app1	4
app2	3

New Set-Based Solution 2

Like the cursor-based solution that I covered previously, New Set-Based Solution 2 relies on unifying start and end events into one chronological sequence of events, associating a +1 event type to start events, and a -1 event type to end events. However, instead of using a cursor to process the records one at a time and keep aggregating the event type to calculate the count of concurrent sessions at each point, New Set-Based Solution 2 uses row numbers to calculate the count of concurrent sessions. The technique is ingenious and surprisingly simple. The first part of the solution involves calculating start ordinals (how many sessions started so far), as well as start-or-end ordinals (how many sessions either started or ended so far). Start ordinals are calculated by assigning row numbers, partitioned by app, ordered by starttime, only to start events, and assigning a place holder (e.g., a NULL) to end events, like so:

SELECT keycol, app, starttime AS ts, +1 AS type,
  ROW_NUMBER() OVER(PARTITION BY app 
    ORDER BY starttime) AS start_ordinal
FROM dbo.Sessions

UNION ALL

SELECT keycol, app, endtime, -1, NULL
FROM dbo.Sessions;

To calculate start-or-end ordinals, define a common table expression (CTE—call it C1) based on this query, and then in the outer query, assign row numbers to the unified events, partitioned by app, and ordered by ts and type, like so:

WITH C1 AS
(
  SELECT keycol, app, starttime AS ts, +1 AS type,
    ROW_NUMBER() OVER(PARTITION BY app 
      ORDER BY starttime) AS start_ordinal
  FROM dbo.Sessions

  UNION ALL

  SELECT keycol, app, endtime, -1, NULL
  FROM dbo.Sessions
)
SELECT *,
  ROW_NUMBER() OVER(PARTITION BY app ORDER BY
    ts, TYPE, keycol) AS start_or_end_ordinal
FROM C1;

Table 2 shows the output of this code in abbreviated form.

Table 2: Ordinals

app	ts	type	start_ordinal	start_or_end_ordinal
app1	2009-02-12 08:30:00.000	1	1	1
app1	2009-02-12 08:30:00.000	1	2	2
app1	2009-02-12 08:45:00.000	-1	NULL	3
app1	2009-02-12 09:00:00.000	1	3	4
app1	2009-02-12 09:15:00.000	1	4	5
app1	2009-02-12 09:15:00.000	1	5	6
app1	2009-02-12 09:30:00.000	-1	NULL	7
app1	2009-02-12 09:30:00.000	-1	NULL	8
app1	2009-02-12 10:30:00.000	-1	NULL	9
app1	2009-02-12 10:30:00.000	-1	NULL	10

Now that you have start ordinals (start_ordinal attribute) and start-or-end ordinals (start_or_end_ordinal attribute), you can easily calculate how many sessions ended at each point: end_ordinal = start_or_end_ordinal - start_ordinal. To calculate the count of active sessions at each point, simply subtract end_ordinal from start_ordinal: cnt = start_ordinal - end_ordinal = start_ordinal - (start_or_end_ordinal - start_ordinal) = 2 × start_ordinal - start_or_end_ordinal. So the only thing left is to group the data by app, and calculate the maximum count for each application, like so:

WITH C1 AS
(
  SELECT keycol, app, starttime AS ts, +1 AS type,
    ROW_NUMBER() OVER(PARTITION BY app 
      ORDER BY starttime) AS start_ordinal
  FROM dbo.Sessions

  UNION ALL

  SELECT keycol, app, endtime, -1, NULL
  FROM dbo.Sessions
),
C2 AS
(
  SELECT *,
    ROW_NUMBER() OVER(PARTITION BY app 
      ORDER BY ts, TYPE, keycol) AS start_or_end_ordinal
  FROM C1
)
SELECT app, MAX(2 * start_ordinal - start_or_end_
  ordinal) AS mx
FROM C2
WHERE type = 1
GROUP BY app;

Not only is this solution beautiful in its simplicity, it’s also extremely efficient. Figure 1 shows this solution’s execution plan.

Similar to what I described about the cursor-based solution in the previous articles in this series, SQL Server relies on index ordering to unify start and end events with a Merge Join operator, as well as to calculate both the start ordinals and start-or-end ordinals with the ROW_NUMBER function. Amazingly, this plan doesn’t involve a single sort operator! As you probably realize, this means that the solution’s algorithmic complexity is linear.

Figure 2 shows the benchmark results for this solution compared with the other three solutions that I covered previously. Note that the graph for New Set-Based Solution 2 seems to merge with the X axis; this solution is about an order of magnitude faster than New Set-Based Solution 1.

Figure 2: Benchmark results

An Aha! Moment

My recent experience with the task of calculating the maximum number of concurrent sessions truly inspired me. After many years of thinking that a cursor-based solution was the best way to address the task, I found a better set-based solution, and then an even better one. This discovery gives me hope that there are other kinds of problems for which a good-performing set-based solution hasn’t yet been found, but exists. The experience also reinforced my opinion that it’s worthwhile to revisit such problems from time to time, as well as to let others look at problems, in hopes of gaining new insight. I’m continually amazed at the profoundness of calculations based on the OVER clause. Perhaps the SQL Server development team will finally see the light, and will enhance SQL Server’s support for the OVER clause with standard features that are still missing.

Comments

Plain text