SQL Server 2012 introduced support for the sequence object as an alternative to the identity column property for generating surrogate keys. I covered the sequence object in the articles "SQL Server 2012 T-SQL at a Glance – Sequences," "Sequences, Part 1," and "Sequences, Part 2."
Recently, I learned some surprising things about how SQL Server handles caching for sequence and identity; those discoveries have resulted in new recommendations and best practices that I'll share with you in this article.
I'd like to thank the following people for sharing not only information about this topic but also their own insights on the topic: Tobias Ternstrom, Umachandar Jayachandran, Erland Sommarskog, Hugo Kornelis, Rick Byham, and Ola Hallengren.
Note that the information in this article is based on testing on my own system, with SQL Server 2014 RTM and SQL Server 2012 SP2. Microsoft might change the default cache values without notice, so make sure to do your own testing before making decisions about the optimal cache size in your environment.
The Sequence CACHE Property
The sequence object's CACHE property defines how often SQL Server needs to write a recoverable sequence value to disk. For example, suppose that you create a sequence with MINVALUE 1 and CACHE 50 (also the undocumented default). When you request the first value, SQL Server writes to disk the recoverable value 50 and keeps in memory two members with the size of the sequence type—one holding the current value (1) and the other holding how many values remain (49) before the next cache-related disk write will take place. After the first 50 requests, the memory members hold the values current: 50 and remaining: 0. The next request will result in a write to disk of the recoverable value 100 and writes to memory of the values current: 51 and remaining: 49.
If you shut down SQL Server cleanly with the SHUTDOWN statement or if you stop the SQL Server service from SQL Server Management Studio (SSMS), the Services applet, or SQL Server Configuration Manager, the current value is written to disk. But if there's an unclean termination of the process, such as in a power failure event, the current value isn't written to disk, and the recovery process will recover the on-disk value. In this case you can end up with a gap of up to the cache property value.
What some people don't realize is that even with no cache, neither identity nor sequence guarantees that you won't have gaps between the values. Upon request for a new value, SQL Server locks the internal resource only to increment the value but then immediately releases the lock to allow others to obtain new values. SQL Server doesn't wait until the transaction containing the request completes. Therefore, if the transaction is rolled back, the identity or sequence value change isn't undone. The next request for a value will be after a gap.
When using identity or sequence, irrespective of the cache value you use, you must accept the fact that you can end up with gaps. If this is unacceptable, you'll need to use an alternative solution for generating keys. With this in mind, to determine the appropriate cache size for your environment, you must to ask yourself how many values you're willing to lose in one shot if there's an unclean termination of the process. The greater the value, the better performance you'll get. Later in this article, I share the results of a test I ran comparing performance with different cache values.
Default Cache Values
Microsoft doesn't want to publish the default cache values they use for sequence and identity because they want to be able to change them without notice as they see fit. The SQL Server Books Online entry for the CREATE SEQUENCE command includes the following note concerning the default sequence cache value:
If the cache option is enabled without specifying a cache size, the Database Engine will select a size. However, users should not rely upon the selection being consistent. Microsoft might change the method of calculating the cache size without notice.
Even though the default cache values aren't officially published, you can figure them out with a simple test. I'll demonstrate a test with the INT type and then provide the results I obtained for other types as well.
The first step in the test is to create a sequence with the default cache and a table with an identity column in a user database (testdb) by running the following code:
IF DB_ID(N'testdb') IS NULL CREATE DATABASE testdb; USE testdb; IF OBJECT_ID(N'dbo.Seq1', N'SO') IS NOT NULL DROP SEQUENCE dbo.Seq1; IF OBJECT_ID(N'dbo.T1' , N'U') IS NOT NULL DROP TABLE dbo.T1; CREATE SEQUENCE dbo.Seq1 AS INT MINVALUE 1; CREATE TABLE dbo.T1 (keycol INT IDENTITY);
The next step is to request a couple of values by running the following code:
SELECT NEXT VALUE FOR dbo.Seq1; GO 2 INSERT INTO dbo.T1 DEFAULT VALUES; GO 2
Next, run the following code to query the current sequence and identity values:
SELECT current_value FROM sys.Sequences WHERE object_id = OBJECT_ID(N'dbo.Seq1'); SELECT IDENT_CURRENT(N'dbo.T1');
You will get 2 as both the current sequence and the current identity values.
To force an unclean termination of the SQL Server process, open Task Manager (Ctrl+Shift+Esc), right-click the SQL Server process, and choose End task.
Next, restart the SQL Server process (from SSMS, the Services applet, or SQL Server Configuration Manager). Then query the current sequence and identity values again:
USE testdb; SELECT current_value FROM sys.Sequences WHERE object_id = OBJECT_ID(N'dbo.Seq1'); SELECT IDENT_CURRENT(N'dbo.T1');
I obtained the current values 50 for sequence and 1001 for identity. The default for the INT type with sequence is 50 and with identity is 1000. The reason the current value for identity after restart is 1001 and not 1000 is that the first cache write occurs after the second request, whereas with sequence it occurs after the first request.
I ran a similar test to check the default values for the other types. Figure 1 shows the results.
type sequence identity --------- --------- --------- TINYINT 50 10 SMALLINT 50 100 INT 50 1000 BIGINT 50 10000 NUMERIC 50 10000
As you can see, sequence has the default 50 with all types, but, curiously, identity has a different default depending on the type.
Running a similar test with identity in SQL Server versions prior to SQL Server 2012 shows that in those versions, identity has no cache. This explains why in SQL Server 2012 and later, people started getting a gap of 1000 after an unclean termination of the process or a failover in an AlwaysOn Availability Group. In fact, some consider this behavior as a bug, as the Microsoft Connect entry "Failover or Restart Results in Reseed of Identity" shows.
However, the reality is that just as I explained previously, even with no cache you can have gaps. Nevertheless, Microsoft introduced trace flag 272 to allow you to revert to pre-SQL Server 2012 behavior, in which identity uses no cache. Just keep in mind that disabling the cache will have performance repercussions, as the next section demonstrates.
Performance of Cache Values
The performance of generating surrogate keys with identity and sequence can vary depending on several factors, including whether you create the values in tempdb or a user database, which of the two features you use, and the cache value you use. I ran a performance test to measure the effects of the different factors and to compare the performance of sequence with identity. I used the code in Listing 1 to prepare the environment for the test.
-- Create database testdb IF DB_ID(N'testdb') IS NULL CREATE DATABASE testdb; ALTER DATABASE testdb SET RECOVERY SIMPLE; -- Create sequence in testdb USE testdb; IF OBJECT_ID(N'dbo.Seq1', N'SO') IS NOT NULL DROP SEQUENCE dbo.Seq1; CREATE SEQUENCE dbo.Seq1 AS INT MINVALUE 1; -- Create helper function GetNums in testdb IF OBJECT_ID(N'dbo.GetNums', N'IF') IS NOT NULL DROP FUNCTION dbo.GetNums; GO CREATE FUNCTION dbo.GetNums(@low AS BIGINT, @high AS BIGINT) RETURNS TABLE AS RETURN WITH L0 AS (SELECT c FROM (SELECT 1 UNION ALL SELECT 1) AS D(c)), L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum FROM L5) SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n FROM Nums ORDER BY rownum; GO -- Create sequence in tempdb USE tempdb; IF OBJECT_ID(N'dbo.Seq1', N'SO') IS NOT NULL DROP SEQUENCE dbo.Seq1; CREATE SEQUENCE dbo.Seq1 AS INT MINVALUE 1; GO
The preparation includes creating a user database called testdb, creating a helper function called GetNums that returns a sequence of integers in the requested range, and creating sequences in both the user database and tempdb.
I then used the code in Listing 2 to conduct the actual performance test.
-- To enable TF 272: DBCC TRACEON(272, -1), to disable: DBCC TRACEOFF(272, -1) SET NOCOUNT ON; --USE tempdb; -- to test in tempdb USE testdb; -- to test in user database testdb DECLARE @numrecords AS INT, @sizemb AS NUMERIC(12, 2), @logflushes AS INT, @starttime AS DATETIME2, @endtime AS DATETIME2; CHECKPOINT; BEGIN TRAN ALTER SEQUENCE dbo.Seq1 CACHE 50; -- try with NO CACHE, 10, 50, 1000, 5000, 10000 IF OBJECT_ID(N'dbo.T', N'U') IS NOT NULL DROP TABLE dbo.T; -- Stats before SELECT @numrecords = COUNT(*), @sizemb = SUM(CAST([Log Record Length] AS BIGINT)) / 1048576., @logflushes = (SELECT cntr_value FROM sys.dm_os_performance_counters WHERE counter_name = 'Log Flushes/sec' AND instance_name = 'testdb' -- to test in testdb -- AND instance_name = 'tempdb' -- to test in tempdb ) FROM sys.fn_dblog(null, null); SET @starttime = SYSDATETIME(); -- Actual work SELECT CAST(n AS INT) AS n -- to test without seq or identity -- NEXT VALUE FOR dbo.Seq1 AS n -- to test sequence -- IDENTITY(INT, 1, 1) AS n -- to test identity INTO dbo.T FROM testdb.dbo.GetNums(1, 10000000) AS N OPTION(MAXDOP 1); -- Stats after SET @endtime = SYSDATETIME(); SELECT COUNT(*) - @numrecords AS numrecords, SUM(CAST([Log Record Length] AS BIGINT)) / 1048576. - @sizemb AS sizemb, (SELECT cntr_value FROM sys.dm_os_performance_counters WHERE counter_name = 'Log Flushes/sec' AND instance_name = 'testdb' -- to test in testdb -- AND instance_name = 'tempdb' -- to test in tempdb ) - @logflushes AS logflushes, DATEDIFF(ms, @starttime, @endtime) AS durationms FROM sys.fn_dblog(null, null); COMMIT TRAN CHECKPOINT;
The code measures the performance of populating a table with 10,000,000 rows, without sequence or identity, with sequence, with identity, in both the user database and in tempdb, with different cache values. The code measures the amount of logging (number of records and total size) by querying the fn_dblog function before and after the insert.
The measurements and the insert are executed in the same transaction to prevent the log from recycling itself. The code also measures the number of log flushes caused by the insert, by querying the performance counter Log Flushes/sec from the sys.dm_os_performance_counters view. Despite the fact that the counter name implies that it's a per-second measure, in practice it's simply the total number of flushes so far.
Figure 2 shows the results of the performance test I ran on my system.
Most of the columns are self-explanatory. The normduration column is durationms minus the duration of the insert without generating sequence or identity values. This measure gives you a sense of the time that should be attributed to the generation of the 10,000,000 sequence or identity values without the time of the insert itself.
Notice a few very interesting things about the results. The performance of identity and sequence in tempdb is quite similar when using the same cache values. Also, in tempdb the performance is much better than in a user database, especially with small cache values. The reason for this is that in tempdb there's no reason to log the cache-related writes to disk since upon restart nothing needs to be recovered. In a user database, a log record has to be written for every cache-related write to disk.
There's a curious significant difference in performance between sequence and identity in a user database with no cache and low cache values. For example, observe that with no cache it takes less than 2 minutes to generate 10,000,000 identity values but almost an hour to generate that many sequence values. That's despite that fact that the amount of logging is similar.
The reason for this difference has to do with how the log buffer flushes are handled in the two cases. With identity, if a failure occurs, there's no problem upon recovery if a value that was inserted but not committed is reused. Therefore, there's no reason to flush the log buffer after every cache-related write. The log buffer is flushed when it's full.
With sequence, the situation is different. A client application can request a new sequence value and already use it even if the transaction didn't commit. If a failure occurs, there's no way to request the new sequence value back from the client. Therefore, SQL Server flushes the log buffer after every cache-related write to disk. Observe in Figure 2 that a sequence with no cache in a user database has as many (or more) log flushes as the number of sequence values generated, whereas identity with no cache has significantly fewer flushes.
Test Before Using
When you need to store transient data somewhere and generate surrogate keys for the rows, use tempdb whenever possible. Both sequence and identity perform significantly better in tempdb than in a user database. There's no logging or log flushes related to cache writes to disk. However, remember that even in tempdb, the performance of identity and sequence degrades as the cache value gets smaller. Therefore, when using a sequence, you need to specify a large cache value such as 10,000. With identity, you should avoid enabling trace flag 272. (This is generally true, not just in tempdb.) Remember that in any case, neither identity nor sequence guarantees that you won't have gaps between values.
In a user database, if you want to get similar performance when converting from identity to sequence, you need to compensate for the forced log flushes. This means that you need to use a cache value for sequence that's greater than the value identity uses for the same type. For example, with the INT type, identity uses a cache value of 1000; therefore, to get similar performance with sequence, you need to use a larger value such as 5000. In a user database, you need to avoid using sequence with NO CACHE or very low cache values because doing so results in very poor performance due to the frequent log flushes.
The information in this article is based on testing on my system, with SQL Server 2014 RTM and SQL Server 2012 SP2. Microsoft might change the default cache values without notice; therefore, you should do your own testing before making decisions about the optimal cache size for your environment.