First Look at System-Versioned Temporal Tables-Part 1: Creating Tables and Modifying Data

SQL Server 2016 introduces support for system-versioned temporal tables based on the ISO/ANSI SQL:2011 standard. A table without system versioning enabled holds only the current, most recent, state of its rows. You cannot query past, deleted or pre-updated states of rows. For the purpose of our discussion, I’m ignoring row-versioning capabilities related to concurrency control, like the multi-versioning concurrency control (MVCC) support of the In Memory OLTP engine, and the row versioning support of the snapshot and read committed snapshot isolation levels for disk-based tables. This article is the first in a two-part series that focuses on the ability to keep and query both current and past states of deleted and updated rows in the long run. With system versioning enabled for a table, SQL Server keeps both the current and the past states of modified rows. New query clauses allow you to request to see the data correct to a specified time, or even a specified period.

Note that with system-versioned temporal tables SQL Server is responsible for maintaining the different versions of the rows based on the time the user modifies the data (UTC time based on the system clock where the SQL Server instance runs). This means that you have only current and past versions of rows, and that you cannot define explicitly the validity period of a row like a planned future change. This makes system-versioned temporal tables a good tool for cases where the modification time needs to determine when the row state changes. Examples for such cases include auditing and time travel queries for legal-related purposes.

The SQL standard also supports a feature called application-time period tables, where you can explicitly define the validity period of a row, including in the future. SQL Server 2016 doesn’t support this feature, but let’s hope we’ll see such support in the future, as it will allow much more (and much needed) flexibility. For example, consider a table where you keep product prices. With a system-versioned table, the time when you apply the update is considered the time when the product price changes becomes effective; you need to time your modification very carefully to make sure the price change becomes effective when you need it to. With application-time period tables you define the validity period of the row explicitly, so in our example you can define a future time when the product price change will become effective.

There are a few other features related to temporal tables that the initial implementation in SQL Server 2016 is missing, but looking at the half full glass, it is certainly great to see support for temporal tables starting in SQL Server. Hopefully, it’s just the beginning of many more related features to come as this area is extremely important and practical.

In this series I provide a first look at system-versioned temporal tables. In Part 1 I cover creating tables and modifying data. In Part 2 I cover querying data and optimization considerations.

I’d like to thank several people who participated in discussions on the topic for their insights. From Microsoft: Borko Novakovic (the feature’s PM), Carl Rabeler and Conor Cunningham. Fellow SQL Server MVPs: Hugo Kornelis, Simon Sabin, Matija Lah, Phil Brammer, Louis Davidson, Jeffrey Moden and Erland Sommarskog.

Note: At the date of this writing the latest available public build of SQL Server 2016 is CTP2. Make sure you check the official product documentation for information about any changes and additions in later builds.

Creating System-Versioned Temporal Tables

You can enable system versioning for a table either when you create it as part of the CREATE TABLE statement, or after the fact as part of the ALTER TABLE statement. In order to make a table a system-versioned table it needs the following:

A primary key
The table option SYSTEM_VERSIONING to be set to ON
Two non-nullable DATETIME2(
) columns representing the start and end of the row’s validity period
- The start column should be marked with the option GENERATED ALWAYS AS ROW START
- The end column should be marked with the option GENERATED ALWAYS AS ROW END
Designation of the period columns: PERIOD FOR SYSTEM_TIME (, )
A linked history table (which SQL Server can create for you) to hold the past states of modified rows

In this article I’ll use a sample database called TemporalDB, and within it a system-versioned table called Employees to hold the current state of employee rows and a linked history table called EmployeesHistory to hold the history rows. Use the following code to create the sample database if it doesn’t already exist, and to cleanup any previous Employees and EmployeesHistory tables if they currently exist:

SET NOCOUNT ON;
IF DB_ID(N'TemporalDB') IS NULL CREATE DATABASE TemporalDB;
GO
USE TemporalDB;
GO
IF OBJECT_ID(N'dbo.Employees', N'U') IS NOT NULL
BEGIN
  IF OBJECTPROPERTY(OBJECT_ID(N'dbo.Employees', N'U'), N'TableTemporalType') = 2
    ALTER TABLE dbo.Employees SET ( SYSTEM_VERSIONING = OFF );
  IF OBJECT_ID(N'dbo.EmployeesHistory', N'U') IS NOT NULL
    DROP TABLE dbo.EmployeesHistory;
  DROP TABLE dbo.Employees;
END;

After the aforementioned requirements, the following code creates a system-versioned table called Employees, letting SQL Server create the related history table with the specified name EmployeesHistory:

CREATE TABLE dbo.Employees
(
  empid INT NOT NULL
    CONSTRAINT PK_Employees PRIMARY KEY NONCLUSTERED,
  mgrid INT NULL
    CONSTRAINT FK_Employees_mgr_emp REFERENCES dbo.Employees,
  empname VARCHAR(25) NOT NULL,
  sysstart DATETIME2(0) GENERATED ALWAYS AS ROW START NOT NULL,
  sysend DATETIME2(0) GENERATED ALWAYS AS ROW END NOT NULL,
  PERIOD FOR SYSTEM_TIME (sysstart, sysend)
)
WITH ( SYSTEM_VERSIONING = ON ( HISTORY_TABLE = dbo.EmployeesHistory ) );

If the specified history table doesn’t already exist, SQL Server creates a new one based on the current table definition, but without a primary key and with a clustered index on (empid, systart, sysend).

Figure 1 shows how the system-versioned Employees table and the related history table appear in SQL Server Management Studio (SSMS).

Figure 1: Current and history tables in Object Explorer

Observe that the history table doesn’t appear as a separate item, rather below the related current table. Notice also that SSMS identifies the tables as System-Versioned and History in parentheses. If you want to tell programmatically whether a table is a temporal one, you can use the OBJECTPROPERTY function with the TableTemporalType property. The function returns 2 for a system versioned table and 1 for a history table.

If, when defining a table as a system-versioned table, you don’t provide your own name for the history table, SQL Server names it based on the pattern MSSQL_TemporalHistoryFor_.

Suppose that you already had a table called Employees with existing data and you wanted to turn it into a system-versioned table. To achieve this you would need to add the period start and end columns (with defaults since they must be non-nullable), the PERIOD designation, set system versioning to on, and connect the table to either a new or an existing history table. Here’s an example of how your code might look like to achieve this (don’t actually run this since your Employees table is already a system-versioned one):

BEGIN TRAN;

-- Add required period columns and designation
ALTER TABLE dbo.Employees ADD
  sysstart DATETIME2(0) GENERATED ALWAYS AS ROW START NOT NULL
    CONSTRAINT DFT_Employees_sysstart DEFAULT('19000101'),
  sysend DATETIME2(0) GENERATED ALWAYS AS ROW END NOT NULL
    CONSTRAINT DFT_Employees_sysend DEFAULT('99991231 23:59:59'),
  PERIOD FOR SYSTEM_TIME (sysstart, sysend);

-- Remove temporary DEFAULT constraints
ALTER TABLE dbo.Employees
  DROP CONSTRAINT DFT_Employees_sysstart, DFT_Employees_sysend;

-- Turn system versioning on
ALTER TABLE dbo.Employees
  SET ( SYSTEM_VERSIONING = ON ( HISTORY_TABLE = dbo.EmployeesHistory ) );

COMMIT TRAN;

As I mentioned, the reason the code adds the period columns with default constraints is because they must be non-nullable. Once default values are assigned in the period columns in the new rows, you can drop the default constraints since in future modifications SQL Server will automatically assign the values in those columns based on the modification time.

Once system versioning is enabled for a table, you are not allowed to make DDL changes that add, alter or drop columns in both the current and the history tables. Also, you’re not allowed to modify data directly in the history table. You are allowed to create and drop indexes on the tables. If you need to make a change that is disallowed while system versioning is on, you need to first turn system versioning off, apply the change, and then turn it back on. It’s recommended that you perform the whole process in one transaction in order to take a schema lock and keep it until the change is complete.

To apply such a change your code might look like this (don't actually run it now; this is provided for illustration purposes):

BEGIN TRAN;

-- Turn system versioning off
ALTER TABLE dbo.Employees SET ( SYSTEM_VERSIONING = OFF );

... apply your change here ...

-- Turn system versioning back on
ALTER TABLE dbo.Employees
  SET ( SYSTEM_VERSIONING = ON
    ( HISTORY_TABLE = dbo.EmployeesHistory,
      DATA_CONSISTENCY_CHECK = ON ) );

COMMIT TRAN;

When enabling system-versioning and creating a link to an already existing history table, SQL Server will always perform a structural consistency check to verify that the schema requirements are met (primary key, period columns designation, type, nullability). However, you can control whether SQL Server will also perform a data consistency check (on by default). The data consistency check ensures that the period end column values are greater than or equal to the respective period start column values, and that the periods for the different versions of the same row (same primary key) don’t overlap.

Indexing

SQL Server allows you to control the indexing on the current and history tables. If you prefer to use columnstore technology (requires Enterprise edition) to optimize both storage and performance, you can use clustered columnstore indexes on both tables. But keep in mind that the current table requires a primary key, and a primary key has to be enforced with a unique rowstore B-tree based index. SQL Server allows you to use different indexing on the two tables. For example, over time the history table will likely be substantially bigger than the current table, and hence the use of columnstore technology there is more important. If it makes sense for you to use rowstore technology with the current table, you can do so and use columnstore technology only with the history table.

If you let SQL Server create the history table for you, it will automatically create a rowstore clustered index with page compression enabled, with the key list based on the columns: (, , ). In our example the key list will be (empid, sysstart, sysend). If you create the history table yourself, SQL Server won’t index it for you when you link it to the current table, so you need to remember to do it.

The Employees table in our sample database currently has only a nonclustered index on the empid column, which SQL Server created to enforce the primary key. Use the following code to create a clustered index on the table with the key list based on the primary key column followed by the system period start and end columns:

CREATE UNIQUE CLUSTERED INDEX ix_Employees
  ON dbo.Employees(empid, sysstart, sysend);

If you wanted to create a clustered columnstore index on the current table, you could have done so either as an inline part of the table definition or after the table creation, like so (don't actually run this code since we want to use the clustered rowstore index in our example):

CREATE CLUSTERED COLUMNSTORE INDEX ix_Employees ON dbo.Employees;

As for the history table, as mentioned, if you let SQL Server create it for you, it will also define a clustered index on it. Use the following code to see the indexing that SQL Server defined on our history table:

SELECT name, type_desc,
  STUFF(
    ( SELECT N',' + C.name AS [text()]
  FROM sys.sysindexkeys AS K
    INNER JOIN sys.columns AS C
      ON K.id = C.object_id
      AND K.colid = C.column_id
  WHERE K.id = I.object_id
    AND K.indid = I.index_id
  ORDER BY K.keyno
  FOR XML PATH('') ), 1, 1, N'') AS index_keys
FROM sys.indexes AS I
WHERE object_id = OBJECT_ID(N'dbo.EmployeesHistory');

This query generates the following output:

name         type_desc  index_keys
-------------------- ---------- ----------------------
ix_EmployeesHistory  CLUSTERED  empid,sysstart,sysend

You can also query the sys.partitions view to see that SQL Server used page compression. If you wanted to change the indexing strategy from rowstore to columnstore at this point, you could have done so with the following code (don't actually run this code since we want to use the default rowstore index in our example):

CREATE CLUSTERED COLUMNSTORE INDEX ix_EmployeesHistory
  ON dbo.EmployeesHistory WITH (DROP_EXISTING = ON);

In Part 2 of the series I get into further indexing considerations as part of the discussion about querying data.

Modifying data

One of the benefits of using the built-in feature of system-versioned temporal tables is that SQL Server is responsible for maintaining the older versions in the history table. The application submits its usual modifications against the current table, and SQL Server creates the older versions and writes those to the history table. You could roll your own solution using, for example, triggers, but this would involve more development resources. Furthermore, there are some aspects of the built-in feature that aren’t that straightforward to emulate.

When you insert new rows into the (current) table, SQL Server sets the system period start column value to the insert transaction time in UTC, and the system period end column value to the maximum supported value in the type. Since in our table we used precision zero (one second precision) for the period columns, the maximum supported value is 9999-12-31 23:59:59. Suppose that now is 2015-06-01 19:54:04 UTC (call it T1). You submit the following code at T1 to add two rows:

INSERT INTO dbo.Employees(empid, mgrid, empname)
  VALUES(1, NULL, 'David'),
    (2, 1, 'Eitan');

Note that in an explicit transaction SQL Server records the time when the BEGIN TRAN statement was executed and not the time when the first or current statement within the transaction was executed. For example, suppose that you submit the BEGIN TRAN statement in the following code at T2 (2015-06-01 19:54:20 on my system):

BEGIN TRAN; -- T2

  PRINT 'Transaction start time: ' + CONVERT(CHAR(19), SYSDATETIME(), 121);

  INSERT INTO dbo.Employees(empid, mgrid, empname)
    VALUES(4, 2, 'Seraph'),
      (5, 2, 'Jiru');
  WAITFOR DELAY '00:00:05';

  INSERT INTO dbo.Employees(empid, mgrid, empname)
    VALUES(6, 2, 'Steve');

  PRINT 'Transaction end time: ' + CONVERT(CHAR(19), SYSDATETIME(), 121);

COMMIT TRAN;

Even though there’s a delay of five seconds between the two INSERT statements within the transaction, all rows added by the transaction will have T2 recorded as their system period start time. Also, if there was a delay between the BEGIN TRAN time and the time the first statement within the transaction executed, the BEGIN TRAN time is what would count.

Run the following code to add a few more rows at T3 (2015-06-01 20:01:41 on my system):

INSERT INTO dbo.Employees(empid, mgrid, empname)
  VALUES(8, 5, 'Lilach'),
    (10, 5, 'Sean'),
    (3, 1, 'Ina'),
    (7, 3, 'Aaron'),
    (9, 7, 'Rita'),
    (11, 7, 'Gabriel'),
    (12, 9, 'Emilia'),
    (13, 9, 'Michael'),
    (14, 9, 'Didi');

Query the current table at this point:

SELECT *
FROM dbo.Employees;

On my system I got the following output:

empid      mgrid      empname  sysstart           sysend
---------- ---------- -------- ---------------------- ----------------------
1      NULL       David    2015-06-01 19:54:04    9999-12-31 23:59:59
2      1      Eitan    2015-06-01 19:54:04    9999-12-31 23:59:59
3      1      Ina      2015-06-01 20:01:41    9999-12-31 23:59:59
4      2      Seraph   2015-06-01 19:54:20    9999-12-31 23:59:59
5      2      Jiru     2015-06-01 19:54:20    9999-12-31 23:59:59
6      2      Steve    2015-06-01 19:54:20    9999-12-31 23:59:59
7      3      Aaron    2015-06-01 20:01:41    9999-12-31 23:59:59
8      5      Lilach   2015-06-01 20:01:41    9999-12-31 23:59:59
9      7      Rita     2015-06-01 20:01:41    9999-12-31 23:59:59
10     5      Sean     2015-06-01 20:01:41    9999-12-31 23:59:59
11     7      Gabriel  2015-06-01 20:01:41    9999-12-31 23:59:59
12     9      Emilia   2015-06-01 20:01:41    9999-12-31 23:59:59
13     9      Michael  2015-06-01 20:01:41    9999-12-31 23:59:59
14     9      Didi     2015-06-01 20:01:41    9999-12-31 23:59:59

Naturally, you will get different values in the sysstart column.

Query the history table:

SELECT *
FROM dbo.EmployeesHistory;

Since so far you only added rows, SQL Server had no older versions to write to the history table, and hence it is empty at this point:

empid      mgrid      empname  sysstart           sysend
---------- ---------- -------- ---------------------- ----------------------

When you delete a row from the (current) table, SQL Server moves the row from the current table to the history table with the system period end time set to the modifying transaction’s start time. When you update a row in the (current) table, SQL Server treats the operation as a delete followed by an insert; namely:

1. It moves the pre-updated state of the row to the history table with the system period end time set to the modifying transaction’s start time.

2. It inserts a row with the new, post-updated, state to the current table, with the system period start time set to the transaction’s start time, and the system period end time set to the maximum value in the type.

To demonstrate deletes and updates run the following code at T4 (2015-06-01 20:11:01 on my system):

BEGIN TRAN;

  DELETE FROM dbo.Employees
  WHERE empid IN (13, 14);

  UPDATE dbo.Employees
    SET mgrid = 3
  WHERE empid IN(9, 11);

COMMIT TRAN;

Then make some more changes at T5 (2015-06-01 21:32:20 on my system):

BEGIN TRAN;

  UPDATE dbo.Employees
    SET mgrid = 4
  WHERE empid IN(7, 9);

  UPDATE dbo.Employees
    SET mgrid = 3
  WHERE empid = 6;

  UPDATE dbo.Employees
    SET mgrid = 6
  WHERE empid = 11;

  DELETE FROM dbo.Employees
  WHERE empid = 12;

COMMIT TRAN;

Query the current table:

SELECT *
FROM dbo.Employees;

Table 1 has the output that I got on my system after the changes submitted at T5.

Table 1: Contents of Employees table after changes at T5

empid      mgrid      empname  sysstart           sysend
---------- ---------- -------- ---------------------- ----------------------
1      NULL       David    2015-06-01 19:54:04    9999-12-31 23:59:59
2      1      Eitan    2015-06-01 19:54:04    9999-12-31 23:59:59
3      1      Ina      2015-06-01 20:01:41    9999-12-31 23:59:59
4      2      Seraph   2015-06-01 19:54:20    9999-12-31 23:59:59
5      2      Jiru     2015-06-01 19:54:20    9999-12-31 23:59:59
6      3      Steve    2015-06-01 21:32:20    9999-12-31 23:59:59
7      4      Aaron    2015-06-01 21:32:20    9999-12-31 23:59:59
8      5      Lilach   2015-06-01 20:01:41    9999-12-31 23:59:59
9      4      Rita     2015-06-01 21:32:20    9999-12-31 23:59:59
10     5      Sean     2015-06-01 20:01:41    9999-12-31 23:59:59
11     6      Gabriel  2015-06-01 21:32:20    9999-12-31 23:59:59

Query the history table:

SELECT *
FROM dbo.EmployeesHistory;

Table 2 has the output that I got on my system after T5.

Table 2: Contents of EmployeesHistory table after changes at T5

empid      mgrid      empname  sysstart           sysend
---------- ---------- -------- ---------------------- ----------------------
6      2      Steve    2015-06-01 19:54:20    2015-06-01 21:32:20
7      3      Aaron    2015-06-01 20:01:41    2015-06-01 21:32:20
9      7      Rita     2015-06-01 20:01:41    2015-06-01 20:11:01
9      3      Rita     2015-06-01 20:11:01    2015-06-01 21:32:20
11     7      Gabriel  2015-06-01 20:01:41    2015-06-01 20:11:01
11     3      Gabriel  2015-06-01 20:11:01    2015-06-01 21:32:20
12     9      Emilia   2015-06-01 20:01:41    2015-06-01 21:32:20
13     9      Michael  2015-06-01 20:01:41    2015-06-01 20:11:01
14     9      Didi     2015-06-01 20:01:41    2015-06-01 20:11:01

Naturally, you will get different values based on the times you run your transactions. You can map your transaction start times to T1 through T5 and find those in the outputs that you get.

What’s next?

In this article I introduced system-versioned temporal tables—a new feature in SQL Server 2016. I explained how to enable system versioning for a table, indexing considerations, and what happens when you modify data. I explained the benefits compared to a custom solution in terms of the savings of development and testing resources. But wait, there’s more! In Part 2 I cover querying data and optimization considerations.

Comments

Plain text