More Options for Handling Arrays as Inputs

Downloads
100975.zip

Executive Summary:
Use a Common Language Runtime (CLR) split user-defined function (UDF) or SQL Server 2008’s new table-valued parameters to handle arrays as inputs in T-SQL.

SQL Server DBAs need to know how to handle arrays as inputs. One solution is to use dynamic SQL, but as I explain in “Handling Arrays,” this solution has both security and performance problems. Another solution uses a T-SQL user-defined function (UDF) that splits an array into its elements. The function accepts a string with a comma-separated list of values as input and returns a table result with a row for each element. This solution is secure and more efficient than the dynamic SQL approach, but it’s also quite complicated, as I explain in “Handling Arrays as Inputs.”

I discuss two additional solutions here. The first approach implements a Common Language Runtime (CLR) UDF that splits an array into its elements. The second technique uses SQL Server 2008’s new table-valued parameters.

CLR Split UDF

SQL Server 2005 introduced CLR integration, which lets you develop routines with .NET languages. T-SQL is useful for certain types of tasks, especially tasks that involve data manipulation; however, T-SQL isn’t ideal for tasks such as splitting arrays that involve string manipulation and iterative logic. Procedural languages such as .NET languages are better than T-SQL at performing such tasks. .NET languages typically provide better performance, richer capabilities, and simpler implementation. (For more information about T-SQL vs. CLR, see “CLR or Not CLR: Is That the Question?”)

A CLR version of a UDF that splits a string with a comma-separated list of values to the individual elements is simple to implement because the string type supports a method called Split that already implements the split logic. Listing 1 uses C# to define such a UDF. (This listing is courtesy of Dejan Sarka and Steve Kass.) Listing 1’s function header sets the FillRowMethodName attribute to “ArrSplitFillRow.” ArrSplitFillRow is a method that is defined after the fn_SplitCLR function’s definition, and its purpose is to convert the input object to a string. The header also defines an attribute called TableDefinition with the schema of the output table; this attribute is required in the header only if you want to deploy the assembly in the database from Visual Studio as opposed to using manual deployment. The function invokes the built-in Split method of the string type to split the input array.

Before you deploy the function in the Northwind database, run the following code to ensure that any existing version of the fn_split UDF is dropped:

USE Northwind;

IF OBJECT_ID('dbo.fn_split', 'IF') IS NOT NULL DROP FUNCTION dbo.fn_split;

Next, follow the instructions in “5 Steps for Developing and Deploying CLR Code in SQL Server” to deploy the fn_split function from Listing 1 in the Northwind database. Name the project SplitCLR, and specify the C drive as the location for the project folder. Then run the code in Listing 2 to create the usp_getorders procedure. This procedure accepts a string with a comma-separated list of order IDs as input and returns information about the requested orders. It does so by joining the result of the fn_split function with the Orders table. Note that in case the input string contains multiple occurrences of an order ID, the procedure will return multiple occurrences of that order in the output. If you need to return unique occurrences in such a case, add the DISTINCT clause to the SELECT list, or rewrite the query to use the EXISTS or IN predicate.

Run the following code to test the stored procedure, executing it with two different arrays:

EXEC dbo.usp_getorders N'10248,10250,10249';

EXEC dbo.usp_getorders N'10260,10270,10265,10290';

Then query the syscacheobjects view to analyze plan reuse behavior:

SELECT objtype, usecounts, sql

FROM sys.syscacheobjects -- in 2000 query master.dbo.syscacheobjects

WHERE SQL LIKE '%89CE9E3A-943A-42F2-AEB6-0C924CB18387%'

  AND sql NOT LIKE '%sys%';

You’ll get the output in Table 1, which indicates that the plan was reused.

The CLR implementation of the fn_split function is several times faster than the T-SQL implementation. Like the T-SQL implementation, the CLR implementation is secure because it doesn’t expose the system to SQL injection. In addition, the CLR implementation allows efficient reuse of execution plans. (For information about enhancements to CLR table-valued user-defined functions in SQL Server 2008, see the web sidebar “Ordered Common Language Runtime Table-Valued User-Defined Functions.”)

SQL Server 2008 Table-Valued Parameters

SQL Server 2008 introduces support for table types and table-valued parameters. A table type is simply a way to persist a table definition in the database for later use as the type for a table variable or a table-valued parameter. For example, the following code creates a table type called OrderIDs:

IF TYPE_ID('dbo.OrderIDs') IS NOT NULL DROP TYPE dbo.OrderIDs;

CREATE TYPE dbo.OrderIDs AS TABLE

  pos INT NOT NULL PRIMARY KEY,

  orderid INT NOT NULL UNIQUE

);

As an example for using the table type, the following code declares a table variable of the OrderIDs type and manipulates the variable with INSERT and SELECT statements:

DECLARE @T AS dbo.OrderIDs;

INSERT INTO @T(pos, orderid) VALUES(1, 10248),(2, 10250),(3, 10249);

SELECT * FROM @T;

This code returns the output in Table 2.

The benefit of table types isn’t only the ability to use them as types for table variables, but more importantly the ability to use them as types for input parameters of stored procedures and functions. Thus, you can pass multiple values as input to your routines instead of relying on cumbersome solutions such as passing strings with comma-separated lists of values. As an example, run the code in Listing 3 to create a new version of the usp_getorders stored procedure that accepts a table-valued parameter as input instead of a string with a comma-separated list of values.

Note the use of the READONLY clause, which is mandatory in SQL Server 2008. SQL Server 2008 allows reading from table-valued parameters but not writing to them.

When you need to execute a stored procedure, you must declare a variable of the table type, populate it with rows, and pass it as input to the stored procedure. As an example, run the code in Listing 4 to execute the stored procedure with two different inputs. You can query the syscacheobjects view to observe the plan reuse behavior:

SELECT objtype, usecounts, sql

FROM sys.syscacheobjects -- in 2000 query master.dbo.syscacheobjects

WHERE SQL LIKE '%84B72B28-8D27-45FC-89BB-2A84226AE0E8%'

  AND sql NOT LIKE '%sys%';

Table 3 shows the output of this query. The usecounts attribute in the table indicates that the plan was used twice.

Internal handling of table-valued parameters is similar to handling of table variables. For example, SQL Server doesn’t maintain distribution statistics (histograms) on table-valued parameters, and their scope is limited to the local batch.

SQL Server 2008 client APIs are also enhanced to support table-valued parameters. Like the CLR split UDF approach, the implementation of stored procedures using table-valued parameters is secure, and it allows efficient reuse of previously cached execution plans.

Recommendation

The necessity of passing an unknown number of values as input to a routine is common in T-SQL. You can use one of four techniques that I demonstrated to accomplish this task: dynamic SQL, a T-SQL split function, a CLR split function, and table-valued parameters. The number of options available depends on the version of SQL Server you’re using. I don’t recommend the dynamic SQL approach because it has security and performance problems. The T-SQL split function is secure and efficient, but it’s complicated. The CLR version of the split function is faster and less complicated than the T-SQL version, but it requires a non-T-SQL implementation. If you’re running SQL Server 2008, I recommend the table-valued parameter solution because it’s secure and efficient, and it uses T-SQL exclusively. For more information about using arrays as inputs in SQL Server stored procedures, see SQL Server MVP Erland Sommarskog’s research and findings, at www.sommarskog.se/arrays-in-sql.html.

SQL Server 2008 Table-Valued Parameters

Recommendation

Comments

Plain text