Don't Overlook the New SQL Server 2005 NEWSEQUENTIALID() Function

When a new version of SQL Server 2005 is released, it’s only natural that a subset of the new features gets a lot of exposure, while other features get little or no exposure. However, some of the overlooked features are very interesting and have important practical uses. For example, NEWSEQUENTIALID() is a new function in SQL Server 2005 that I almost overlooked until SQL Server MVP Steve Kass pointed it out to me. This new function lets you generate sequential globally unique identifiers (GUIDs).

The NEWSEQUENTIALID() function generates a 16-byte value of data type uniqueidentifier that's always greater than any GUID generated previously on the same computer by this function. Despite the function's name, the values that it generates are not really sequential; they're just greater than the previous value. This function provides an alternative to the NEWID() function, which generates random GUIDs and provides no guarantee that it will generate sequential values. GUIDs are used for different purposes, such as providing unique keys across tables, databases, instances, and servers. If you need to generate GUIDs that are always greater than any previously generated one on the same machine, use the NEWSEQUENTIALID() function to do so.

NEWSEQUENTIALID() Shortcomings

NEWSEQUENTIALID() returns values that include the MAC address of the NIC, if one is detected in the computer. Usage of the MAC address guarantees that the generated value will be globally unique (across servers). If you use the NEWSEQUENTIALID() function on a computer that doesn’t have a NIC installed, the values generated aren't guaranteed to be unique across servers, rather unique only for that computer. Also, the value generated by NEWSEQUENTIALID() isn't as secure as the one generated by NEWID() because a hacker can readily identify the NIC's MAC address within the generated value and anticipate what the next generated value will be. NEWSEQUENTIALID() invokes internally a Windows API called UuidCreateSequential(), which generates a value based on the NIC's MAC address and an internal hardware timestamp. If you want the gory details about UuidCreateSequential() and the structure of the values it generates, read the Microsoft article "UuidCreateSequential" at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/rpc/rpc/uuidcreatesequential.asp.

Using NEWSEQUENTIALID()

You can use the NEWSEQUENTIALID() function only in a DEFAULT constraint of a column in a table. You can't invoke it independently by using SELECT or SET statements. If you enter the statement

SELECT NEWSEQUENTIALID();

you'll get the error message: Msg 302, Level 16, State 0, Line 1. Instead, you can use the NEWSEQUENTIALID() function in a DEFAULT expression for a column of data type uniqueidentifier in a CREATE TABLE or ALTER TABLE statement. You can't combine this function with other operators to form a complex scalar expression.

To understand how to use this function, run the code in Listing 1 to create and populate the T1 table. The code creates a DEFAULT constraint on the column ID and invokes the NEWSEQUENTIALID() function. Notice the six INSERT statements and the delays between the INSERTs to allow gaps between the GUID values. It should take about one minute to run this code.

After running the code in Listing 1, query the table by using the following statement. You should get output similar to what Table 1 shows (but with different GUIDs and date time—DT—values):

SELECT * FROM dbo.T1 ORDER BY id;

Notice in the preceding statement that I want the data sorted by ID; observe in Table 1 that the sorted data reflects the entry order indicated by the DT column. You might find it hard to identify which bytes within the GUIDs are most significant (for comparison purposes) and which ones are least significant. The uniqueidentifier representation doesn't organize the bytes in order by their precedence.

Displaying Uniqueidentifier Values in Binary Format

You might want to keep the GUIDs in a binary string, organized from left to right in most-significant to least-significant order. This way, it will be easier for you to tell which value is greater than another. However, even the binary representation of uniqueidentifier values doesn't reflect the true comparison or sorting behavior. To see this type of behavior, run the following query to convert the ID values to binary strings and sort the rows by the binary representation of the IDs

SELECT CAST(id AS BINARY(16)) AS bin_id, dt, val
FROM dbo.T1
ORDER BY bin_id;

Now, look at the output in Table 2. If you sorted the values by their binary representation, the assignment order of the values isn't reflected in the output. Instead, you'll see that the rows that were inserted at 20:13 were sorted before rows that were inserted at 20:12. Note that when you attempt to run the preceding statement, you'll get different GUIDs and DT values than the ones that appear in Table 2 because your T1 table was populated with different values than mine by the NEWSEQUENTIALID and GETDATE functions. The converted binary values of the GUIDs don't appear in the correct sort order; this might or might not be apparent in your case. The different segments within a uniqueidentifier value aren't organized by comparison precedence from left to right.

To figure out comparison precedence of the different bytes, you can populate a table with 16 uniqueidentifier values, where all bytes are zeroed save one. In each value, set only one of the bytes to a nonzero value (e.g., 0x11). Run the code in Listing 2 to create a temporary table called #T and populate it with valid data. Next, run the following query to sort the values from #T table by ID in descending order, and you'll get the output that Table 3 shows.

SELECT id FROM #T ORDER BY id DESC;

You can learn about the comparison precedence of the different bytes in the formatted uniqueidentifier by looking at the order of the values shown in the results. To learn about the comparison precedence of the bytes in the binary representation of the values (which is different than in the formatted values), enter the following statement to convert the values to binary ones, and sort them by the binary strings in descending order:

SELECT id, CAST(id AS BINARY(16)) AS bin_id FROM #T ORDER BY id DESC;

Now, look at the output that Table 4 shows. You'll see that the binary representation of uniqueidentifier values isn’t the same when compared with the way those values are formatted for both presentation and entry purposes. If you prefer to store sequential GUIDs as binary strings that maintain correct comparison behavior, you need to rearrange the different bytes by their comparison precedence. To figure out comparison precedence of the bytes in the binary string, use the following query to generate the output that Table 5 shows:

WITH BinIDs AS
(
  SELECT CAST(id AS BINARY(16)) AS bin_id,
    ROW_NUMBER() OVER(ORDER BY id DESC) AS rownum
  FROM #T
)
SELECT rownum, CHARINDEX(0x11, bin_id) AS byte
FROM BinIDs
ORDER BY rownum;

The rownum column represents the sort precedence of the byte number specified in the byte column. For example, rownum 1, byte 11 means that the eleventh byte is highest in sort precedence, or in other words, is the most significant for sorting and comparison purposes. The common table expression (CTE) query calculates row numbers based on the order of ID DESC. Remember that each ID value has a different byte that isn't zeroed. The outer query uses CHARINDEX to identify the position of the nonzero byte within the binary string. By using the information obtained in the previous query, you can use code like the following to rearrange the bytes—from left to right, most significant to least significant—so that the binary strings get correct sorting and comparison behavior.

WITH BinIDs AS
(
  SELECT CAST(id AS BINARY(16)) AS bin_id, dt, val
  FROM dbo.T1
)
SELECT
    SUBSTRING(bin_id, 11, 6)
  + SUBSTRING(bin_id,  9, 2)
  + SUBSTRING(bin_id,  7, 2)
  + SUBSTRING(bin_id,  5, 2)
  + SUBSTRING(bin_id,  1, 4) AS good_bin_id,
  dt, val
FROM BinIDs
ORDER BY good_bin_id;

You can also add a computed column (bin_id) to table T1 that generates the binary representation of the IDs and even creates a UNIQUE constraint on the IDs, as the following code shows:

ALTER TABLE dbo.T1
  ADD bin_id AS 
      SUBSTRING(CAST(id AS BINARY(16)), 11, 6)
    + SUBSTRING(CAST(id AS BINARY(16)),  9, 2)
    + SUBSTRING(CAST(id AS BINARY(16)),  7, 2)
    + SUBSTRING(CAST(id AS BINARY(16)),  5, 2)
    + SUBSTRING(CAST(id AS BINARY(16)),  1, 4) UNIQUE;

Now you can rely on bin_id as a sequential GUID holder and expect the output in a format that's more readable. For example, the following query returns the rows from table T1 sorted by bin_id:

SELECT bin_id, dt, val FROM dbo.T1 ORDER BY bin_id;

Table 6 shows the query's results.

Generate NEWSEQUENTIALID() Before the INSERT Statement

As I mentioned previously, you can specify the NEWSEQUENTIALID() function only in a DEFAULT column constraint, meaning that you can't get a new value before you INSERT a row. If you use the NEWID() function, use a SET or SELECT statement to invoke the function and generate the value before the INSERT statement. If your application needs to generate a new key value before the actual INSERT statement—for example, to store the key on a different computer from the one where the database resides—you can still do so. You can create a sequencing table (call it NewSeqIDs) with an identity column as the primary key (call it new_identity) and a column with a DEFAULT constraint that invokes the NEWSEQUENTIALID() function (call it new_seqid). You can also create a column (call new_bin_seqid) that will compute the GUID as a binary string (that compares and sorts correctly) if you prefer the binary representation. Every time the application needs a new key, you insert a row in the sequence table and get the newly generated value by querying the row in which new_identity is equal to SCOPE_IDENTITY(). The value is then available to you before you actually use it in an INSERT statement against the target data table. As an aside, if you don’t need to enforce uniqueness of primary keys outside the scope of the instance, you can use the generated identity values as your global instancewide sequential keys. Identity values, which are stored as BIGINT, are more compact than GUIDs (8 bytes versus 16 bytes), but GUIDs have the advantage of being globally unique across instances and computers (assuming a NIC exists in a computer).

Use a stored procedure (call it usp_get_new_seqid) to invoke the INSERT statement against the NewSeqIDs table. The stored procedure returns three output parameters: the new identity value, new sequential ID, and binary form of the new sequential ID. Run the code in Listing 3 to create the NewSeqIDs table and the usp_get_new_seqid stored procedure.

Notice in Listing 3 that the stored procedure’s code defines a savepoint called S, issues the INSERT statement, then rolls back the procedure’s activity that took place since the savepoint (essentially the INSERT statement). The rollback to the savepoint helps to avoid the need to clear the sequence table from time to time to keep it small. Note that new identity values or new sequential IDs are still generated although you roll back the insertion of the row. Subsequent invocations of the stored procedure will generate higher values. Also, neither the identity resource nor the internal hardware timestamp resource used by NEWSEQUENTIALID() gets locked by an external transaction. Thus, this sequencing mechanism is asynchronous and won't block or cause queuing problems when multiple, different transactions invoke the stored procedure.

To test the stored procedure, run the code in Listing 4, and you'll get the output that appears in Table 7 with a newly generated identity value and sequential ID (both as UNIQUEIDENTIFIER and as BINARY). Note that I set NULL default values for the three parameters in the stored procedure. If want to see only one of the return values, you can provide a variable to that relevant output parameter; you don’t have to provide a variable for each parameter.

To demonstrate the application activity that generates new sequential IDs (say, the binary form) and then use these IDs in INSERT statements, run the code in Listing 5. The code creates a table called T2 and issues several INSERT statements after invoking the usp_get_new_seqid procedure to get binary sequential IDs. Next, run the following query to return the rows from table T2 sorted by the binary sequential IDs (ID column). Your output should look similar to the results listed in Table 8.

SELECT * FROM dbo.T2 ORDER BY id;

Powerful New Sequential IDs

For many applications, generating primary keys is a crucial, integral part of the system. Remember that how you design the primary keys and the mechanism you use to generate them can have many implications, such as integrity, concurrency, and performance. Use the NEWSEQUENTIALID() function to generate GUIDs that are always greater that any previously generated GUID on the same machine. In upcoming columns, I’ll continue to explore other similarly useful new T-SQL 2005 features that you might have overlooked.

Comments

Plain text