At first glance, it seems there's nothing special to consider before you use the IDENTITY column property, SQL Server's auto-numbering mechanism. But especially in large environments, you'll first want to understand some of IDENTITY's workings and limitations. In some cases, you might prefer to use your own auto-numbering or sequencing mechanism.To help you decide on the best course, I provide some facts about IDENTITY, discuss some of its limitations, and show two alternatives.
Here are a few things that you should know about the IDENTITY property:
- You can use IDENTITY not only with integers, but also with any numeric data type that has a scale of 0. This is a good thing, especially when you need to support numbers larger than the largest one that the integer data type can hold.
- IDENTITY, combined with a PRIMARY KEY or UNIQUE constraint, lets you provide a simple unique row identifier without the need to maintain it yourself. SQL Server does the work for you.
- Unlike Oracle's Sequence function, SQL Server doesn't provide an independent sequencing mechanism. Instead, the sequence provided by the IDENTITY property applies to a single table. When you need a sequencing mechanism that generates numbers for use in different tables—as, for example, with keys that must not overlap across tables—you'll want to choose an alternative to IDENTITY.
- The IDENTITY property and the column it's attached to are inseparable, which means that you can't add or remove IDENTITY from an existing column.You can use workarounds, but these can be very slow, especially with large tables. For example, to add IDENTITY to an existing column, you can create a new table, copy the data from the original table, drop the original table, and then give the new table the original table's name.You can use a similar process to remove IDENTITY from an existing column. Remember, though, that these activities require table downtime—which can be lengthy for large tables—and lots of logging.
- When you use an INSERT statement against a table that has an IDENTITY column, the IDENTITY value increases whether the INSERT succeeds or fails—even if it fails on a PRIMARY KEY or CHECK constraint violation. If a transaction is rolled back, whether implicitly or explicitly, the new IDENTITY column value isn't rolled back, so you might end up with gaps in your sequencing even when you're not deleting any rows. This can be a problem when you're creating sequences that can't contain gaps, such as for invoice numbers.
- You can't update partitioned views that are based on tables that have an IDENTITY column.
- The TRUNCATE TABLE statement resets the IDENTITY property in the table to the seed. To retain the IDENTITY counter, use DELETE instead of TRUNCATE TABLE, but keep in mind that DELETE is much slower because it's fully logged.
Creating Custom Sequencing
Because of the IDENTITY limitations I've described, you might prefer to use your own sequencing mechanism. Let's look at alternatives for two common situations.
When you want the change in the sequence to be rolled back when a failure occurs or a transaction is rolled back, you need a mechanism that will lock the sequence, causing requests for subsequent sequence values to wait in a queue. In this case, for each sequence that you need to maintain, you can simply create a table with one row and one column, like this:
CREATE TABLE Seq(val INT NOT NULL); INSERT INTO Seq VALUES(0);
When you need a new sequence value, run the following specialized update, which increments and then retrieves the sequence value:
DECLARE @i AS INT; UPDATE Seq SET @i = val = val +1; SELECT @i;
Every time you run this update, SQL Server places an exclusive lock on the sequence, increments the value, stores it in the @i variable, and then releases the lock. You can then use the @i variable for any purpose. Note that if this code appears in an external transaction, SQL Server releases the lock only when the whole transaction is complete. Thus, if the transaction fails, the sequence won't be incremented and no gaps or unaccounted-for numbers will be generated.
You can encapsulate this logic in a stored procedure that returns the new sequence value as an output parameter:
CREATE PROC usp_newseq @v AS INT OUTPUT AS UPDATE Seq SET @v = val = val + 1;
Whenever you need a new sequence value, call the procedure, like this:
DECLARE @i AS INT; EXEC usp_newseq @v = @i OUTPUT; SELECT @i;
If you want an independent sequencing mechanism and you don't care about gaps in the sequence numbers, there's an even more efficient approach than the one I've just described. If gaps aren't a problem, there's no reason to lock the sequence until the end of the transaction. You can release it as soon as the value is incremented and you retrieve the sequence value. However, there's no way to tell SQL Server within an existing transaction that you want a certain part of the transaction to be committed, releasing the lock. SQL Server holds an exclusive lock until the end of the transaction, no matter what you do. But remember what I mentioned earlier about how the IDENTITY property increments regardless of whether a transaction succeeds or fails. There's a reason for this behavior. SQL Server locks the IDENTITY resource for a table only for the time required to increment the resource.
With this in mind, you can create a custom sequence based on the IDENTITY property. Create a table to maintain the sequence values, but this time with a column that has an IDENTITY property and no rows at all:
CREATE TABLE Seq2 (val INT NOT NULL IDENTITY);
Every time you need a new sequence, use the code in Listing 1, which inserts a new row in the sequence table and retrieves the new IDENTITY value generated using the SCOPE_IDENTITY function. But there's a problem: Because all you do is insert a new row and then retrieve the new value, the sequence table keeps growing and you must clear the table once in a while. You can't clear it using the fast TRUNCATE TABLE statement because TRUNCATE TABLE resets the IDENTITY value to the seed. You can use a DELETE statement to clear the table, but as I mentioned earlier, DELETE can be slow because it's fully logged.
Here's a trick: Use a transaction with a savepoint to roll back the INSERT without affecting any external transaction in which this code might appear. You'll continue to get new sequence values, but the sequence table will never get a commit row and therefore won't continue to grow. You might want to encapsulate this code in a stored procedure like the one in Listing 2.
When you need a new sequence value, use this code:
DECLARE @i AS INT; EXEC usp_newseq2 @v = @i OUTPUT; SELECT @i;
Ironically, you've used IDENTITY to avoid using IDENTITY. But of course, you've used IDENTITY in the sequence table to avoid using it in your base data tables, and now you have an independent sequencing mechanism.
Choosing IDENTITY or a Custom Sequence
Now that you've seen some of the limitations of using the IDENTITY property and also some alternatives, you can choose the best course for your situation. There are times when using the IDENTITY property makes sense. For example, you can let SQL Server generate key values in your table when the key column has no special meaning other than uniquely identifying the row. Before doing this, though, make sure you understand the limitations I've discussed here. For example, using IDENTITY to generate invoice numbers is a bad choice because failed INSERT statements generate gaps in the identity sequence.
When you need an independent sequencing mechanism (e.g., invoice numbers that you maintain in multiple tables), you'll have to develop your own. I've shown you two techniques, one that prevents gaps but is less scalable, and one that allows gaps and is more scalable.
A Note About Puzzles
A few months ago, when I first introduced logical puzzles to my column, I intended to provide a few puzzles, then stop. I wanted to show how logical SQL and T-SQL are and encourage you to practice logical puzzles. Besides being fun, puzzles improve your T-SQL problem-solving skills.
The feedback I've received about logical puzzles has been amazing, and I thank everyone who's contacted me. At times, the logical puzzles have spurred more interest than the T-SQL content, so I've decided to continue the puzzles. This month I'm introducing a new one (actually it's an old puzzle, but a new one for me) that I got from my father. You can find it in the sidebar "The Logical Puzzle." Enjoy!