T-SQL programmers often use user-defined functions (UDFs) to encapsulate algorithm implementations. However, several aspects of UDFs constrain programmers' capabilities. In "UDF Back Doors" (April 2004, InstantDoc ID 41845), I discussed some UDF limitations and some back doors that let you circumvent them. I showed how to create scalar and table-valued special UDFs and how to modify data through UDFs. This month, I cover another aspect of UDFs—related to their atomicity—that limits programmers' capabilities and a couple of back doors that circumvent this limitation. I have to stress here that I don't recommend using back doors; rather, I want to highlight the limitations of UDFs that cause people to use the back doors. I also discuss a supported UDF feature that most people simply don't know about: modifying data through inline UDFs.
When you use nondeterministic T-SQL functions (functions that can return different results in different invocations even when they have the same input) such as RAND() and GETDATE() in your queries, SQL Server invokes those functions only once per query, not once per row. Therefore, a query such as
USE Northwind SELECT RAND() AS rnd, GETDATE() AS dt, * FROM Orders
returns the same values in the rnd and dt result columns for all rows. If you're not aware of this behavior, some of your query results might surprise you. For example, if you want to return a random row from the Orders table, you might try to run the following query:
SELECT TOP 1 * FROM Orders ORDER BY RAND()
Specifying an expression in the ORDER BY clause is logically similar to adding it to the SELECT list as a result column, then instead of returning the result in the output, just sorting the rows by that column. By specifying ORDER BY RAND(), you expect SQL Server to invoke the RAND() function for each row and to sort the rows by the various random values SQL Server generates for the different rows. But because SQL Server invokes RAND() only once for the query, all rows get the same value (a float in the range 0 to 1), so the value doesn't affect the order of the result rows and you don't get a different random row every time you invoke the query.
SQL Server also invokes a nondeterministic function only once per query when you have an INSERT SELECT statement that invokes the GETDATE() function. Suppose you try to insert the results of a query into a table, using the GETDATE() function in the SELECT list to store the date and time SQL Server inserted the row. SQL Server assigns all the rows the same result value of the single GETDATE() invocation; therefore, the value represents not the time that SQL Server inserted each row but rather the time that SQL Server invoked the function.
The atomicity-of-functions rule has one exception. The NEWID() function, which returns a globally unique identifier (GUID), must return a different value with each invocation. This is the only nondeterministic function that SQL Server invokes once per row instead of once per query. For example, running the following query gives you a different value in the GUID column for each row:
SELECT NEWID() AS GUID, * FROM Orders
NEWID() returns a fairly random value if you run it on Windows 2000 and later releases. On earlier releases, the Media Access Control (MAC) address of the network card appears unscrambled in the GUID that NEWID() generates, so the values you get aren't random. On these releases, you need to use the following query to get a random row with each invocation:
SELECT TOP 1 * FROM Orders ORDER BY NEWID()
You might try to circumvent SQL Server's atomicity-of-functions limitation by creating a function that returns the result of the RAND() or GETDATE() functions:
CREATE FUNCTION dbo.fn_perrow_rand() RETURNS float AS BEGIN RETURN RAND() END
But such an attempt fails, generating the following error message:
Server: Msg 443, Level 16, State 1, Procedure fn_perrow_rand, Line 4 Invalid use of 'rand' within a function.
This use of the RAND() function is invalid because you can't invoke nondeterministic functions from within UDFs.
However, you can circumvent this limitation. A couple of undocumented and unsupported back doors let you create UDFs that invoke nondeterministic functions and operate separately for each row within a query. The first UDF uses the OPENQUERY() function similarly to the way I used it in April's examples. Remember that SQL Server doesn't parse the string a query submits to a linked server, so you can issue any query you want as long as it returns a table. The OPENQUERY() function appears in a query's FROM clause, so it has to return a table. If you refer to your own server as the linked server, your server submits the query to itself. Along those lines, you can implement the fn_perrow_rand() function by referring to the RAND() function through OPENQUERY() instead of directly:
CREATE FUNCTION dbo.fn_perrow_rand() RETURNS float AS BEGIN RETURN (SELECT rnd FROM OPENQUERY(\[server_name\], 'SELECT RAND() AS rnd')) END GO
To test the function and see that it returns a different value for each row, run the following query:
SELECT dbo.fn_perrow_rand() AS rnd, * FROM Orders
However, using OPENQUERY() significantly degrades query performance because you invoke a distributed query with each invocation of the UDF—once for every row.
Another back door—using a view to produce random results—gives much faster performance. First, create a view that returns the result of a nondeterministic function, such as the following VRand view:
CREATE VIEW VRand AS SELECT RAND() AS rnd GO
You can then create a function that retrieves the rnd column from the view and invoke this function once per row within a query. To test this approach, first create the fn_perrow_rand2() function:
CREATE FUNCTION dbo.fn_perrow_rand2() RETURNS float AS BEGIN RETURN (SELECT rnd FROM VRand) END GO
Then, issue the following query several times:
SELECT TOP 1 * FROM Orders ORDER BY dbo.fn_perrow_rand2()
Notice that you get a different row with almost every invocation.
In a similar manner, you can create and use the dbo.fn_perrow_getdate() function and a view to return the result of the nondeterministic GETDATE() function:
CREATE VIEW VGetDate AS SELECT GETDATE() AS dt GO CREATE FUNCTION dbo.fn_perrow_getdate() RETURNS datetime AS BEGIN RETURN (SELECT dt FROM VGetDate) END GO
You could then, for example, invoke the function from your INSERT SELECT statements.
Unfortunately, Microsoft closed these back doors in the latest beta builds of SQL Server 2005, and chances are that it will block them in a future service pack of SQL Server 2000. Meanwhile, you can create functions of your own to handle row atomicity.
The second UDF back door I want to discuss this month is fully supported, but I consider it a back door because few people know about it. SQL Server supports two types of table-valued UDFs (UDFs that return a table): multistatement and inline. The former UDF has a body containing T-SQL statements whose purpose is to populate the returned table. The latter is called inline because it contains one SELECT statement that returns a table, and you embed that statement in the query that invokes the function. SQL Server supports modifying data through inline table-valued functions, much like you can with views. I like to think of inline UDFs as parameterized views because they're similar in how you write them and in how the optimizer treats them. The only difference I see is that inline UDFs let you use input parameters and views don't.
To see how to modify data through inline functions, run the code that Listing 1 shows to create the fn_CustOrders() UDF in the Northwind database. This UDF accepts a customer ID as an argument and returns a table containing the orders of that customer. To test the function, run the following query to return the orders of customer ALFKI:
SELECT * FROM fn_CustOrders(N'ALFKI')
You can now insert, update, and delete values from the Orders table through the fn_CustOrders() UDF. And, as with views, you can grant users permission to modify data only through the UDF without giving them direct access to the Orders table. In this way, you allow modifications to orders of only one customer at a time. For example, run the following code, which returns ALFKI's orders, adds 1 day to the order date, and returns customer ALFKI's orders after the modification, then rolls back the transaction:
BEGIN TRAN SELECT * FROM fn_CustOrders(N'ALFKI') UPDATE fn_CustOrders(N'ALFKI') SET OrderDate = OrderDate + 1 SELECT * FROM fn_CustOrders(N'ALFKI') ROLLBACK TRAN
Figure 1 shows the results of running this code. Callout A shows ALFKI's orders before the modification, and callout B shows them after the modification. Notice that the order dates in callout B are exactly 1 day later than the ones in callout A. Similarly, when your business rules demand, you can restrict users to inserting and deleting the orders of only one customer at a time, only through the function, and only if the users have the correct permissions on the function.
To Use or Not to Use?
As you've seen, SQL Server invokes nondeterministic functions such as RAND() and GETDATE() only once per query, which might limit you in providing solutions to some problems. Also, you're not allowed to invoke nondeterministic functions within your UDFs. You can use UDF back doors to get around this limitation, at least for now. However, I strongly recommend not using these back doors; in the latest SQL Server 2005 beta builds, the doors were closed, so they might also be closed in a future SQL Server 2000 service pack. But to limit the rows that a modifying statement can affect, you can modify data through inline UDFs—a fully supported feature that most people overlook.
So why bother knowing about back doors that are going to be closed? For one thing, you might need to maintain code that other people wrote. Some programmers use the back doors because they need the functionality. I hope that Microsoft's SQL Server developers will soon realize how important it is to introduce those features as supported functionality. Next month, I'll discuss back doors to views.