Puzzled By T-SQL Blog

TSQL Challenge: Packing Date and Time Intervals

The relational model deals with date and time intervals and defines various operations on those. One of the fundamental operations is packing date and time intervals. Packing of intervals means that you want to merge all intervals that overlap into one contiguous interval. SQL Server doesn’t support a native interval type, so most represent an interval with two attributes holding the start and end points of the interval, or one of the points and a duration.

The challenge at hand involves addressing the task of packing intervals using TSQL efficiently, with a standard, set-based solution, that is reusable in other platforms—namely, use only standard SQL constructs.

I’ll provide both small and large sets of rows for sample data. Use the small set to test the validity of your solution, and the large one to test its optimality.

Here’s the DDL and sample data for the small set to test the validity of your solution:

-- DDL and sample data, small
SET NOCOUNT ON;
USE tempdb;

IF OBJECT_ID('dbo.Sessions') IS NOT NULL DROP TABLE dbo.Sessions;

CREATE TABLE dbo.Sessions
(
  id        INT          NOT NULL IDENTITY(1, 1),
  username  VARCHAR(14NOT NULL,
  starttime DATETIME2(3) NOT NULL,
  endtime   DATETIME2(3) NOT NULL,
  CONSTRAINT PK_Sessions PRIMARY KEY(id),
  CONSTRAINT CHK_endtime_gteq_starttime
    CHECK (endtime >= starttime)
);
GO

INSERT INTO dbo.Sessions VALUES('User1', '20111201 08:00:00.000', '20111201 08:30:00.000');
INSERT INTO dbo.Sessions VALUES('User1', '20111201 08:30:00.000', '20111201 09:00:00.000');
INSERT INTO dbo.Sessions VALUES('User1', '20111201 09:00:00.000', '20111201 09:30:00.000');
INSERT INTO dbo.Sessions VALUES('User1', '20111201 10:00:00.000', '20111201 11:00:00.000');
INSERT INTO dbo.Sessions VALUES('User1', '20111201 10:30:00.000', '20111201 12:00:00.000');
INSERT INTO dbo.Sessions VALUES('User1', '20111201 11:30:00.000', '20111201 12:30:00.000');

INSERT INTO dbo.Sessions VALUES('User2', '20111201 08:00:00.000', '20111201 10:30:00.000');
INSERT INTO dbo.Sessions VALUES('User2', '20111201 08:30:00.000', '20111201 10:00:00.000');
INSERT INTO dbo.Sessions VALUES('User2', '20111201 09:00:00.000', '20111201 09:30:00.000');

INSERT INTO dbo.Sessions VALUES('User2', '20111201 11:00:00.000', '20111201 11:30:00.000');
INSERT INTO dbo.Sessions VALUES('User2', '20111201 11:32:00.000', '20111201 12:00:00.000');
INSERT INTO dbo.Sessions VALUES('User2', '20111201 12:04:00.000', '20111201 12:30:00.000');

INSERT INTO dbo.Sessions VALUES('User3', '20111201 08:00:00.000', '20111201 09:00:00.000');
INSERT INTO dbo.Sessions VALUES('User3', '20111201 08:00:00.000', '20111201 08:30:00.000');
INSERT INTO dbo.Sessions VALUES('User3', '20111201 08:30:00.000', '20111201 09:00:00.000');

INSERT INTO dbo.Sessions VALUES('User3', '20111201 09:30:00.000', '20111201 09:30:00.000');
GO

The table holds information about user sessions against some service or application. Suppose that for billing purposes you need to pack the intervals for each user since you’re not supposed to charge the user multiple times for multiple concurrent sessions, but rather only for connection time regardless of number of sessions. The desired result for the given sample data is:

username  starttime               endtime
--------- ----------------------- -----------------------
User1     2011-12-01 08:00:00.000 2011-12-01 09:30:00.000
User1     2011-12-01 10:00:00.000 2011-12-01 12:30:00.000
User2     2011-12-01 08:00:00.000 2011-12-01 10:30:00.000
User2     2011-12-01 11:00:00.000 2011-12-01 11:30:00.000
User2     2011-12-01 11:32:00.000 2011-12-01 12:00:00.000
User2     2011-12-01 12:04:00.000 2011-12-01 12:30:00.000
User3     2011-12-01 08:00:00.000 2011-12-01 09:00:00.000
User3     2011-12-01 09:30:00.000 2011-12-01 09:30:00.000

To test the performance of your solution, use the following code:

-- helper function GetNums
IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL
  DROP FUNCTION dbo.GetNums;
GO

CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
  WITH
    L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
    L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
    L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
  SELECT TOP (@n) n FROM Nums ORDER BY n;
GO

-- sample data, 5,000,000 rows
DECLARE
  @num_users          AS INT          = 1000,
  @intervals_per_user AS INT          = 5000,
  @start_period       AS DATETIME2(3) = '20110101',
  @end_period         AS DATETIME2(3) = '20110107',
  @max_duration_in_ms AS INT  = 3600000; -- 60 nimutes

TRUNCATE TABLE dbo.Sessions;

WITH C AS
(
  SELECT 'User' + RIGHT('000000000' + CAST(U.n AS VARCHAR(10)), 10) AS username,
      DATEADD(ms, ABS(CHECKSUM(NEWID())) % 86400000,
        DATEADD(day, ABS(CHECKSUM(NEWID())) % DATEDIFF(day, @start_period, @end_period), @start_period)) AS starttime
  FROM dbo.GetNums(@num_users) AS U
    CROSS JOIN dbo.GetNums(@intervals_per_user) AS I
)
INSERT INTO dbo.Sessions WITH (TABLOCK) (username, starttime, endtime)
  SELECT username, starttime,
    DATEADD(ms, ABS(CHECKSUM(NEWID())) % (@max_duration_in_ms + 1), starttime) AS endtime
  FROM C;

This code fills the table with 5,000,000 rows.

Feel free to create indexes to support your solution.

 

Good luck!

BG

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish