Skip navigation

T-SQL Puzzle – Grouping consecutive rows with a common element

You will be working with a table called Attendance containing attendance
information of students in classes. Run the following code to create the
Attendance table and populate it with sample data:

SET NOCOUNT ON;
USE tempdb;
GO
IF OBJECT_ID('dbo.Attendance', 'U') IS NOT NULL
  DROP TABLE dbo.Attendance;
GO
CREATE TABLE dbo.Attendance
(
  student VARCHAR(10) NOT NULL,
  dt      DATETIME    NOT NULL,
  slot    INT         NOT NULL,
  attend  BIT         NOT NULL,
  PRIMARY KEY(student, dt, slot)
);

INSERT INTO dbo.Attendance VALUES('A', '20060801', 1, 1);
INSERT INTO dbo.Attendance VALUES('A', '20060801', 2, 1);
INSERT INTO dbo.Attendance VALUES('A', '20060801', 3, 1);
INSERT INTO dbo.Attendance VALUES('A', '20060802', 1, 1);
INSERT INTO dbo.Attendance VALUES('A', '20060802', 2, 0);
INSERT INTO dbo.Attendance VALUES('A', '20060802', 3, 1);
INSERT INTO dbo.Attendance VALUES('A', '20060802', 4, 1);
INSERT INTO dbo.Attendance VALUES('A', '20060803', 1, 1);
INSERT INTO dbo.Attendance VALUES('A', '20060803', 2, 0);
INSERT INTO dbo.Attendance VALUES('A', '20060803', 3, 0);
INSERT INTO dbo.Attendance VALUES('B', '20060801', 1, 0);
INSERT INTO dbo.Attendance VALUES('B', '20060801', 2, 1);
INSERT INTO dbo.Attendance VALUES('B', '20060801', 3, 0);
INSERT INTO dbo.Attendance VALUES('B', '20060802', 1, 0);
INSERT INTO dbo.Attendance VALUES('B', '20060802', 2, 0);
INSERT INTO dbo.Attendance VALUES('B', '20060802', 3, 0);
INSERT INTO dbo.Attendance VALUES('B', '20060802', 4, 0);
INSERT INTO dbo.Attendance VALUES('B', '20060803', 1, 1);
INSERT INTO dbo.Attendance VALUES('B', '20060803', 2, 1);
INSERT INTO dbo.Attendance VALUES('B', '20060803', 3, 0);
Contents of the Attendance table:
SELECT * FROM dbo.Attendance;

student    dt         slot        attend
---------- ---------- ----------- ------
A          2006-08-01 1           1
A          2006-08-01 2           1
A          2006-08-01 3           1
A          2006-08-02 1           1
A          2006-08-02 2           0
A          2006-08-02 3           1
A          2006-08-02 4           1
A          2006-08-03 1           1
A          2006-08-03 2           0
A          2006-08-03 3           0
B          2006-08-01 1           0
B          2006-08-01 2           1
B          2006-08-01 3           0
B          2006-08-02 1           0
B          2006-08-02 2           0
B          2006-08-02 3           0
B          2006-08-02 4           0
B          2006-08-03 1           1
B          2006-08-03 2           1
B          2006-08-03 3           0
The attendance table contains a row for each student, date and time slot (a
session). In a given date there might be several time slots (sessions)
represented by their ordinals (1, 2, 3, ...). As you can see in the sample data,
different dates can have different numbers of sessions.
The attend column represents the attendance status (1 - student attended the
session, 0 - student did not attend the session).

Your task is to write a set-based query (no cursors) that groups consecutive
rows (order determined by dt, slot) with the same attendance status for each
student. For each group, you need to return a row with the following attributes:
student, from date, from slot, to date, to slot, attendance status, number of
sessions/slots in the group. The tricky part is that the time slots/sessions are
not represented by a single attribute, rather by the combination of attributes
dt and slot.

For example, the following set of rows represents for student A a
consecutive period of time with the same attendance status:
student    dt                      slot        attend
---------- ----------------------- ----------- ------
A          2006-08-01 00:00:00.000 1           1
A          2006-08-01 00:00:00.000 2           1
A          2006-08-01 00:00:00.000 3           1
A          2006-08-02 00:00:00.000 1           1
Here, student A had no attendance status change from date 2006-08-01, slot 1
through date 2006-08-02, slot 1.
So the output should show the following row for this group of sessions:
student from_dt    from_slot  to_dt      to_slot  attend cnt
------- ---------- ---------- ---------- -------- ------ ----
A       2006-08-01 1          2006-08-02 1        1      4
Description of output columns:
from_dt, from_slot: date and slot when period started
to_dt, to_slot: date and slot when period ended
attend: attendance status in the period (1 - student attended session,
0 - student did not attend session)
cnt: number of slots/sessions in the period

Here’s how the complete output should look like, sorted by student,
from_dt, from_slot:

student from_dt    from_slot  to_dt      to_slot  attend cnt
------- ---------- ---------- ---------- -------- ------ ----
A       2006-08-01 1          2006-08-02 1        1      4
A       2006-08-02 2          2006-08-02 2        0      1
A       2006-08-02 3          2006-08-03 1        1      3
A       2006-08-03 2          2006-08-03 3        0      2
B       2006-08-01 1          2006-08-01 1        0      1
B       2006-08-01 2          2006-08-01 2        1      1
B       2006-08-01 3          2006-08-02 4        0      5
B       2006-08-03 1          2006-08-03 2        1      2
B       2006-08-03 3          2006-08-03 3        0      1
Remember, no cursors. Besides the solution’s code, please provide some
narrative explaining the logic behind your solution.

You have a week to work on the solution. We will give away two prizes: one
winner will be chosen randomly out of those who provided correct solutions;
another winner will be chosen based on the fastest correct solution.
Performance will be measured against a table with around 100,000 rows
(100 students, 365 days, 3 slots a day, random attendance status). I will use
the following code to populate the table for performance measures:
TRUNCATE TABLE dbo.Attendance;

INSERT INTO dbo.Attendance
  SELECT
    'S' + RIGHT('00000000' + CAST(Students.n AS VARCHAR(9)), 9) AS student,
    DATEADD(day, Dates.n - 1, '20060101') AS dt,
    Slots.n AS slot,
    ABS(CHECKSUM(NEWID())) % 2 AS attend
  FROM
    dbo.Nums AS Students,
    dbo.Nums AS Dates,
    dbo.Nums AS Slots
  WHERE Students.n 
Nums is an auxiliary table of numbers containing a column called n with a 

sequence of integers (1, 2, 3, ...).
You are not allowed to make design/schema changes to the Attendance table,
but you are allowed to create indexes as you see fit.

Enjoy, and good luck!

--
BG
 
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish