User-defined functions (UDFs) in SQL Server 2000 make reusing logic easy and help simplify complicated SQL statements. But do UDFs have any hidden performance costs?
I use UDFs regularly, and if you use them properly, they're efficient and effective. However, as with other SQL Server solutions, you need to be aware of the performance implications of using UDFs. And to understand those implications, you need to know how SQL Server handles T-SQL.
You've probably heard the mantra, "Row-by-row operations are bad; set-based operations are good." Many DBAs follow this rule of thumb when they use server-side cursors, but most people don't understand how row-by-row processing affects performance when you use UDFs. Let's look at some examples that show the potentially negative effects of using a UDF. Listing 3 shows the T-SQL code for the two UDFs in these examples.
The function DoAlmostNothing() does just that: It returns an integer value of 10. The function PerformHeavyIO() also does what its name implies: It returns the count of rows in the master database's sysmessages table. That value is meaningless, but the query is a simple way to write a function that generates a predictable, relatively high number of logical reads. SQL Server Profiler reports that the query against sysmessages requires 170 logical reads.
Now, in the Northwind database, let's run the three queries that Listing 4 shows and use Profiler to capture some information about the queries as they run. Each query returns 100 rows from the Northwind Orders table, but the second and third queries in Listing 2 also reference the above UDFs. Both the UDFs are simple, but the UDF in the third query forces SQL Server to perform 170 logical I/Os each time the UDF executes.
Table 1 shows the results of the test queries. The second query, which references DoAlmostNothing(), is relatively efficient when you compare it with the base query, which doesn't reference a UDF. But when you run the third query, which references the PerformHeavyIO() UDF, the number of reads jumps from 412 to 17,412, and the average time required to process each row jumps from 1.3ms to 21.23ms. Calling this UDF is inefficient compared with simply running the first query, which doesn't contain a UDF. Why did the processing requirements increase so dramatically?
If you're capturing the SP:Completed event in Profiler, the answer is obvious. Profiler shows that SQL Server needs to execute the UDF once for each row in the result set. PerformHeavyIO() might not seem expensive if you think it requires only 170 reads. However, when you multiply that number by 100 rows, you get 17,000 logical reads, which is very expensive. The query still runs quickly, but those 17,000 logical I/Os can create a problem if the query executes frequently in a high-transaction—volume environment. You might not notice a performance problem during development and unit testing, but performance might be unacceptable when you run the application under a heavier production workload.