UDFs Provide a New IDENTITY

Downloads
16354.zip

Use SQL Server 2000's UDFs to solve database programming problems

Recently, I was pondering the complexities of IDENTITY values, and I ran across a user-defined function (UDF) trick that Zoltán Kovács, a SQL Server MCT from Hungary, created. His use of UDFs prompted me to write this article, which shows you how SQL Server 2000's new UDF feature can provide an alternative to the IDENTITY property.

Using the IDENTITY property to create IDENTITY columns lets you fill those columns with automatically incremented values without user intervention. However, if many DELETE and INSERT statements modify the database, the final IDENTITY column values won't be consecutive because when you delete a row, SQL Server doesn't reuse its IDENTITY value. The standard data type for IDENTITY columns is int, usually unsigned, which gives you values up to 2,147,483,647. However, if you expect a table to have more than 2 billion insertions in its lifetime, you can select a different data type (such as numeric) for the IDENTITY column. You might use SQL Server 2000's new bigint data type to increase the limit to 9,223,372,036,854,775,807 INSERT operations. But bigint uses 8 bytes of storage space, making it more expensive than the int data type. If you don't expect to have more than 2 billion insertions into the table, you might apply a technique to automatically reuse lost IDENTITY values. Or, you can use UDFs instead of IDENTITY.

Providing Default Numeric Values

Any SQL Server built-in function can be a default value for a particular field. You can extend SQL Server 2000's functionality by defining your own scalar UDFs. In this way, you encapsulate complex operations in an easy-to-use function, and invoke your own UDF wherever SQL Server allows an expression. (For more information about UDFs, see Robert Vieira, "User-Defined Functions," November 2000.) You can use a UDF to provide a default value for a primary key column in the same way that you might use the NEWID() function or the IDENTITY property to provide a default value. For example, let's look at an IDTest table with two columns: ID and value.

CREATE TABLE IDTest(
	ID int PRIMARY KEY CLUSTERED,
	value varchar(40) NULL
)
GO

To use the IDTest table, you must supply values for the ID column because it isn't an IDENTITY column. Let's create a UDF called fn_GetNewID() to return unique autonumbered values for the ID column:

CREATE FUNCTION dbo.fn_GetNewID()
  RETURNS int
AS
BEGIN
DECLARE @newid AS int
IF EXISTS (SELECT * FROM IDTest WHERE ID = 1)
  SELECT @newid = MIN(ID) + 1 FROM IDTest AS A
WHERE NOT EXISTS(SELECT * FROM IDTest 
  WHERE ID = A.ID + 1)
ELSE
  SET @newid = 1
RETURN @newid
END

The fn_GetNewID() function returns the first available out-of-sequence ID value. Whenever you insert a new row, fn_GetNewID() searches the IDTest table for the first nonconsecutive entry and returns that ID value.

Before you apply fn_GetNewID() to generate the IDTest table's default ID column values, let's try to predict the function's performance. Consider the following query:

SELECT @newid = MIN(ID) + 1 FROM IDTest AS A
WHERE NOT EXISTS(SELECT * FROM IDTest 
WHERE ID = A.ID + 1)

The number of scans required to execute this query equals the number of rows in the IDTest table because SQL Server must test the ID value in every row for the nonexistence of the ID+1 value. (In this case, the scans are clustered index scans because this table has a clustered index.) If, during the execution of this query, another connection exclusively locks any record in the IDTest table, that connection blocks the query, so the proposed fn_GetNewID() function isn't very scalable.

However, you could avoid searching for nonconsecutive numbers by storing every end-of-sequence number for the IDTest table in another table called sysIDDiffs. The process for tracking end-of-sequence values, as State 1 in Figure 1, page 46, shows, begins with the IDTest table empty. You keep only one row in the sysIDDiffs table—the row with ID = 0. In State 2 in Figure 1, inserting one row in IDTest gives you ID = 1. The sysIDDiffs table contains only one row—ID = 1—the last ID sequence in IDTest. In State 3 in Figure 1 (after you insert nine more rows), the only entry in sysIDDiffs is ID = 10, because 10 is the last value in IDTest's ID sequence.

However, in State 5 in Figure 2, page 46, you delete rows 3, 5, 8, and 9, and the new end-of-sequence values in sysIDDiffs become 2, 4, 7, and 11. In State 6 in Figure 2, a new row receives ID = 3, the first end-of-sequence entry available in sysIDDiffs. Because 2 is no longer an end-of-sequence value, you remove it from sysIDDiffs. The only values remaining in sysIDDiffs are 4, 7, and 11.

You might store only one value in the sysIDDiffs table--a value that corresponds to the lowest end-of-sequence number. But if you did, every time you inserted a new row, you'd have to recalculate the entry in sysIDDiffs by using a technique similar to the initially proposed fn_GetNewID() function. This approach would minimize the benefits of the double-table structure.

Now you can use the proposed fn_GetNewID() function to search the sysIDDiffs table for the lowest available end-of-sequence value and, eventually, insert this value into the IDTest table. However, you can't modify tables inside a UDF, so you must use a different procedure to update sysIDDiffs' contents. You create triggers on the IDTest table to update the contents of sysIDDiffs whenever the ID field in the IDTest table gets a new value. As you use this strategy for more tables, you can identify each row in sysIDDiffs by the table to which the value belongs.

To create the sysIDDiffs table, enter

CREATE TABLE sysIDDiffs(
TableID int,
ID int,
CONSTRAINT PK_sysIDDiffs PRIMARY KEY CLUSTERED (TableID, ID))
GO

where TableID is the value that system function OBJECT_ID(@ObjectName) returns. For the IDTest table, the TableID field is equal to OBJECT_ID('IDTest'). Having a clustered index on (TableID, ID) guarantees fast data retrieval. Using the clustered index, sysIDDiffs will be smaller than IDTest. The worst-case scenario occurs if you delete every other row in IDTest. In that case, IDTest and sysIDDiffs will have almost the same number of rows. Even then, however, you benefit from the fact that the relatively small row size of sysIDDiffs results in faster access than the production table, which typically has more columns than IDTest does.

To modify the fn_GetNewID() function so that it accepts TableID as a parameter and searches for values in the sysIDDiffs table, use the following code:

ALTER FUNCTION dbo.fn_GetNewID(
@TableID int)
RETURNS int
AS
BEGIN
		DECLARE @ID int
		
		SELECT @ID = MIN(ID)
			FROM sysIDDiffs (XLOCK)
			WHERE TableID = @TableID
		SET @ID = ISNULL(@ID, 0) + 1
		RETURN @ID
END

The modified fn_GetNewID() function searches for the minimum value of ID, which corresponds to the table defined by @TableID (the result of the OBJECT_ID() function). Fn_GetNewID() then stores this minimum value in the integer variable @ID. If the sysIDDiffs table doesn't contain a TableID entry, the SELECT statement returns NULL. Using the ISNULL() function, fn_GetNewID() determines whether @ID is still null. If it is, ISNULL() uses the second argument, converting @ID to 0; otherwise, @ID keeps its value. Regardless of whether @ID keeps its value, fn_GetNewID() increases @ID by 1 so that fn_GetNewID() retrieves the next end-of-sequence value. Last, fn_GetNewID() returns @ID's value.

Note that the fn_GetNewID() function uses the XLOCK optimizer hint, which Microsoft introduced with SQL Server 2000, after the sysIDDiffs table reference in the FROM clause. Using XLOCK here forces an exclusive lock on the selected row, which prevents other connections from modifying this entry or reading it until the transaction that holds the lock terminates. In this situation, XLOCK prevents two connections from getting the same value at the same time, but it doesn't prevent other connections from inserting other values into sysIDDiffs.

Using Triggers to Maintain sysIDDiffs

To modify the IDTest table so that it uses the new fn_GetNewID() function as a default constraint to generate ID column values, execute the following code:

ALTER TABLE IDTest
ADD CONSTRAINT def_IDTest
DEFAULT (dbo.fn_GetNewID(OBJECT_ID('IDTest'))) FOR ID

The ID column doesn't accept NULL values because this column has a primary key constraint, but SQL Server evaluates the default constraint first. In this case, the fn_GetNewID() function retrieves a valid value for the ID column before checking the primary key constraint. Therefore, the following statements are valid:

INSERT IDTest (Value)
VALUES ('test')

Let's take a closer look at this functionality: To evaluate a default constraint on the IDTest table, you select data from a different table, the sysIDDiffs table. This functionality wasn't available in earlier SQL Server releases because it requires using UDFs, which are a new feature with SQL Server 2000. You could also use UDFs in CHECK constraints, expanding the possibilities to check for values in other rows or tables. In theory, DEFAULT provides constant values for columns. You don't want to provide a constant value for a primary key column because such a column won't accept duplicates. But if you use UDFs, you can provide a unique default value for each row.

You're now ready to create triggers to update sysIDDiffs whenever an INSERT, DELETE, or UPDATE occurs in IDTest. You can consider any Data Manipulation Language (DML) statement as a DELETE-INSERT sequence. INSERT statements don't have the DELETE component, and DELETE statements don't have the INSERT component, but you can think about UPDATE statements as a DELETE followed by an INSERT. Web Listing 1 shows how you can create one trigger on the IDTest table to trace all three actions (for download instructions, see the More on the Web box, page 48).

The trigger has two sections. First, the trigger applies the required DELETE operations in the sysIDDiffs table according to the DELETE operations in the IDTest table. If an UPDATE operation in IDTest fires the trigger, the trigger considers this action a DELETE-INSERT sequence. For an INSERT action, the trigger skips this operation and doesn't change any data in sysIDDiffs. Second, the trigger applies the required INSERT operations in sysIDDiffs according to the INSERT operations in IDTest.

Testing fn_GetNewID()

Let's test the complete process for trying to reproduce the states that Figures 1 and 2 show. If you already created the IDTest table, drop it by executing

DROP TABLE IDTest

To test the system from the beginning, execute the script that Web Listing 2 shows. This script creates the sysIDDiffs table and the fn_GetNewID() function. Then, create IDTest with the default constraint and the udt_IDTest trigger by executing the code in Web Listing 1. You should now have empty IDTest and sysIDDiffs tables. To determine whether the tables are empty, execute the following query:

SELECT *
FROM IDTest

SELECT ID
FROM sysIDDiffs
WHERE TableID = OBJECT_ID('IDTest')
GO

The result should be

ID        value
--------- -------------------
ID 
---------

You are now in State 1 of Figure 1 with one exception: IDTest never used the fn_GetNewID() function, so sysIDDiffs is still empty. To create exactly the same situation as State 1, you can insert and delete one row in IDTest, as follows:

INSERT IDTest (Value)
VALUES ('dummy')

DELETE IDTest

SELECT * FROM IDTest

SELECT ID
FROM sysIDDiffs
WHERE TableID = OBJECT_ID('IDTest')
GO

Executing this code produces the following output:

ID        value
--------- ----------------
ID
---------
0.00

This result is the same as State 1 in Figure 1. Web Listing 3 provides the code to reproduce each state from Figures 1 and 2.

Multirow Considerations

If you try to insert multiple rows in the IDTest table at one time with a single INSERT...SELECT statement, you get an error message. To test this error, create a temporary table, and insert some rows into it, as follows:

CREATE TABLE #Multirow (Value sql_variant)

INSERT #Multirow
SELECT '1st multivalue'

INSERT #Multirow
SELECT '2nd multivalue'

INSERT #Multirow
SELECT '3rd multivalue'

Then, use INSERT...SELECT to insert the contents of the #Multirow temporary table into the IDTest table:

INSERT IDTest (Value)
SELECT Value
FROM #Multirow

Executing the preceding statement produces the following error message:

Server: Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 
'PK__IDTest__40058253'. Cannot insert duplicate key in
object 'IDTest'.
The statement has been terminated.

The error occurs because SQL Server applies the default constraint first, so fn_GetNewID() executes for each row to retrieve the next valid ID value. However, the function retrieves the same value each time, because the trigger hasn't modified the sysIDDiffs table yet. After fn_GetNewID() evaluates the default value, the primary key checks for uniqueness and raises the error because the three rows in IDTest have exactly the same ID.

SQL Server 2000's new INSTEAD OF trigger can help with this error. INSTEAD OF INSERT triggers fire before any actual data modification but after fn_GetNewID() produces the default values. In the virtual table inserted, the INSTEAD OF INSERT trigger shows the rows you just tried to insert, including the default values that the fn_GetNewID() function already generated for the ID column. You can also check inside the trigger to determine whether the original INSERT operation affected only one row; if so, you can proceed with a normal INSERT operation. Otherwise, you can convert the multirow INSERT operation into multiple single-row INSERT operations.

Web Listing 4 shows the code for the isr_IDTest INSTEAD OF INSERT trigger. You use a cursor on the inserted table to evaluate the INSERT operation, row by row. An extra benefit of the INSTEAD OF INSERT trigger for multirow operations is that the trigger checks to determine whether the user tried to insert duplicate values in the ID field, in which case the trigger uses the fn_GetNewID() function to generate new values. After you execute the code in Web Listing 4 to create the INSTEAD OF INSERT trigger, execute the INSERT operation again:

INSERT IDTest (Value)
SELECT Value
FROM #Multirow

SELECT *
FROM IDTest

SELECT ID
FROM sysIDDiffs
WHERE TableID = OBJECT_ID('IDTest')

Running this code produces no errors and returns the results that Figure 3 shows.

Another potential problem occurs if you use the TRUNCATE statement to remove every row in the table. Because SQL Server doesn't register in the transaction log any deletion of rows during the execution of the TRUNCATE statement, TRUNCATE doesn't fire any trigger (not even an INSTEAD OF trigger). Therefore, you can't automatically reset the entries in the sysIDDiffs table after the execution of the TRUNCATE statement, in the same way that SQL Server resets the IDENTITY values after TRUNCATE. However, you can use a stored procedure called sp_RebuildIDDiffs to regenerate the values in sysIDDiffs after an unlogged operation. The sp_RebuildIDDiffs stored procedure accepts a table name as a parameter, searches for end-of-sequence values in the selected table's ID field, and inserts those values into sysIDDiffs.

You can also use sp_RebuildIDDiffs to generate entries in sysIDDiffs for any preexisting table if the table has a unique integer field called ID with no defined default value. In this case, you create triggers similar to those for IDTest to keep the data in your new table and sysIDDiffs consistent. Web Listing 5 shows the code for sp_RebuildIDDiffs, which has three steps:

Delete all previous entries in sysIDDiffs related to the selected table; the @Table parameter contains the name of the selected table.
Build a dynamic T-SQL statement to insert into sysIDDiffs every end-of-sequence value from the ID field in the @Table table. You must use dynamic execution to execute this statement because the INSERT...SELECT statement doesn't accept a variable as a table name.
Build a dynamic T-SQL statement to insert the ID = 0 entry in sysIDDiffs if the ID field from @Table doesn't contain the value ID = 1. You execute this statement dynamically.

After executing the sp_RebuildIDDiffs stored procedure, you can continue to insert rows into the selected table. The fn_GetNewID() function will automatically provide new values for the ID field.

UDFs Everywhere

SQL Server 2000 gives you more flexibility than earlier releases of SQL Server by letting you define UDFs as table or column constraints. And using the new INSTEAD OF trigger feature helps programmers solve complex updating operations. The more you explore UDFs, the more you realize that you can solve many database programming problems by using UDFs as scalar values or dynamic resultsets in the same way that you define and use functions in programming languages.

UDFs give you more flexibility than stored procedures because you can use UDFs on almost any T-SQL statement. UDFs are more efficient than views because UDFs accept parameters, and they're more flexible because a UDF's definition can contain more than one SELECT statement. If you have to design a new stored procedure or a new view, ask yourself: Could I achieve the same results with a UDF?

Comments

Plain text