Handling Arrays

Downloads
100453.zip

Executive Summary: One way to use arrays as input for stored procedures in Microsoft SQL Server is to employ dynamic SQL.

Stored procedures don’t support arrays as inputs, but there’s a common need to develop procedures that accept an unknown number of input parameters. For example, suppose you need to create a procedure in the Northwind database that accepts an unknown number of order IDs as input and returns information from the Orders table about the requested orders. One of the most obvious solutions is to use dynamic SQL. This option is available in SQL Server 2008, 2005, and 2000.

Implementing the Solution

To use the dynamic SQL solution, run the code in Listing 1 to create the stored procedure usp_getorders. The implementation of the procedure is very simple. The procedure accepts a character string parameter called @arr that holds a comma-separated list of order IDs as input. The procedure constructs a character string with a query against the Orders table in a variable called @sql. In the query’s filter the code concatenates the contents of @arr in the parentheses of the IN predicate. Finally, the code uses the sp_executesql procedure to dynamically execute the code residing in the @sql variable.

I restricted the input parameter @arr to 2,000 characters. If you need to support longer strings, SQL Server 2000 lets you go up to 4,000 characters with the NVARCHAR data type and 8,000 characters with VARCHAR (which will require you to use EXEC instead of sp_executesql). In SQL Server 2005, you can use NVARCHAR(MAX) or VARCHAR(MAX) and go up to the maximum supported large object size (2GB).

Run the following code to test the procedure with two different sets of order IDs:

EXEC dbo.usp_getorders N'10248,10250,10249';
EXEC dbo.usp_getorders
  N'10260,10270,10265,10290';

For each invocation, the procedure constructs a query string in which the comma-separated order IDs appear as constants, then executes it. The query returns information from the Orders table about the requested orders. For example, the first invocation of the procedure returns the output in Table 1.

Limitations

Unfortunately, the dynamic SQL solution has serious disadvantages. These drawbacks include security, performance, and limited control over the order of rows in the output.

Security. The procedure uses dynamic SQL without validating the input, which completely exposes the procedure to SQL injection. To see how easy it is to inject code here, run the following command:

EXEC dbo.usp_getorders N'0); RAISERROR( 'This
  could have been much worse than a RAISERROR
  statement.', 16, 1); --';

The dynamic batch that is constructed by the stored procedure is:

SELECT OrderID, OrderDate, CustomerID, EmployeeID
FROM dbo.Orders
WHERE OrderID IN(0); RAISERROR( 'This could have
  been much worse than a RAISERROR statement.',
  16, 1); --);

In this example I injected a RAISERROR statement to the code for demonstration purposes, but a real injection would likely be much worse. This call to the stored procedure produces an empty set as output because I specified only one order ID that doesn’t exist (order ID 0), followed by the output of the injected RAISERROR statement.

You can try to prevent the injection by inspecting the input. In our case the input should have only digits and commas, so input validation is simple. Add the following code at the beginning of the stored procedure to check whether the input contains any character other than a digit or a comma, and if it does, raise an error and return from the procedure without executing the dynamic batch:

IF @sql LIKE N'%\[^0-
9,\]%'
BEGIN
  RAISERROR('Input
may contain SQL
injection. Procedure
aborted.', 16, 1);
  RETURN;
END

Run Listing 2 to create the revised procedure.

Continue to page 2

Try injecting code through the revised procedure by running the following command:

EXEC dbo.usp_getorders N'0); RAISERROR( 'This
  could have been much worse than a RAISEERROR
  statement.', 16, 1); --';

This time you get the following error:

Msg 50000, Level 16, State 1, Procedure usp_
  getorders, Line 11
Input may contain SQL injection. Procedure
  aborted.

However, if the input isn’t restricted to digits and commas (e.g., when the elements in the array are supposed to be character strings), you’ll find it difficult to guarantee a validation that leaves absolutely no exposure to SQL injection. (For more information, see SQL Server Books Online—BOL—“SQL Injection” at msdn.microsoft.com/en-us/library/ms161953.aspx.)

Performance. When a query has a plan in cache, in order for subsequent query invocations to be able to reuse the cached plan, the query strings (that of the query for which the plan was cached and those of the subsequent query invocations) need to be the same. The exception is when SQL Server auto parameterizes the query, but this exception doesn’t apply to our procedure. For each unique input to our procedure, a different query string is constructed, and therefore a new plan is produced and cached. Instead of optimizing the query only once and reusing the cached plan in all subsequent invocations of the procedure, SQL Server will reoptimize almost every time the procedure is invoked, thus generating a large number of execution plans. To demonstrate plan caching and reuse behavior, I’ll use the version of the procedure that you create by running the code in Listing 3.

As you can see in the procedure’s code, I incorporated a GUID as a block comment within the query. Based on this GUID, it will be easy to locate the plans specifically associated with the query of interest in cache. After creating the procedure with the GUID as a comment, run the following code to execute the procedure with two different inputs:

EXEC dbo.usp_getorders N'10248,10250,10249';
EXEC dbo.usp_getorders
  N'10260,10270,10265,10290';

You can locate the plans associated with our procedure and dynamic batch by querying sys.syscache objects (or master.dbo.syscacheobjects, in versions before SQL Server 2005). This object has a row for each plan in cache. You can look for plans in which the SQL text (sql attribute) contains the specific GUID you planted as a comment. Assuming you executed the stored procedure twice with different arguments as I suggested, run the following code to inspect the plans associated with our query in cache:

SELECT objtype, usecounts, sql
FROM sys.syscacheobjects -- in 2000 query master.
  dbo.syscacheobjects
WHERE SQL LIKE '%DDAF1499-A7FF-4A9C-AE83-
  FCE5A33F5151%'
 AND sql NOT LIKE '%sys%';

For each plan in cache, the query returns the object type (Proc for procedure plan, Prepared or Adhoc for a prepared statement such as one prepared by sp_executesql), a count of how many times the plan was used, and the SQL text for which the plan was created. This query returns the output in Table 2. Note that in SQL Server 2008 the objtype attribute will show Adhoc instead of Prepared.

Dynamic SQL operates in its own batch and isn’t considered part of the batch of the calling procedure; therefore, you get separate plans for the procedure (Proc object type) and the dynamic batches (Prepared or Adhoc object type). The procedure plan is just a shell in our case, simply encapsulating a call to the dynamic batch. As you can see, SQL Server did reuse the procedure plan for the second invocation of the procedure. However, you can see that there’s a separate prepared statement plan for each unique dynamic batch, meaning that the actual query plan wasn’t reused. Think of the implications for performance when the procedure is invoked very frequently with different sets of order IDs; consider the cost of all those compilations, and the memory required to hold all the plans.

Note that as of SQL Server 2005, recompiles are done at the statement level rather than the batch/ procedure level; this method enables a RECOMPILE query hint (in addition to the existing RECOMPILE procedure option). When the RECOMPILE query hint is specified, SQL Server doesn’t bother to keep the plan in cache. Because with the dynamic SQL solution chances for plan reuse are very low to begin with, at least you can mitigate the memory pressure issue by adding OPTION(RECOMPILE) at the end of the query. Run the code in Listing 4 to create the revised stored procedure (notice the change in GUID as well).

Then, execute the revised procedure with different inputs:

EXEC dbo.usp_getorders N'10248,10250,10249';
EXEC dbo.usp_getorders N'10260,10270,10265,10290';

Query the cached plans:

SELECT objtype, usecounts, sql
FROM sys.syscacheobjects -- in 2000 query master
  .dbo.syscacheobjects
WHERE SQL LIKE '%995F4DF8-6518-47EE-AB9E-
  2E2876DA1700%'
 AND sql NOT LIKE '%sys%';

You get the output in Table 3, indicating that although the procedure’s plan was reused (the shell for the call to the dynamic batch), the plans for the prepared statements weren’t cached.

Control. To work around the problem of limited control over the order of the rows in the output, you can add an ORDER BY clause to the query to sort the output by existing attributes from the Orders table. However, there’s no straightforward way for you to ask to sort the output by the order of appearance of the order ID in the input array. For example, given the input array 10248,10250,10249, you want the output to be sorted with 10248 first, 10250 second, and 10249 third. Although this task isn’t impossible, solutions based on the existing implementation of the procedure are tricky and inefficient.

Coming Up

Although using dynamic SQL to handle an unknown number of input elements in a stored procedure is a common solution, I don’t recommend it. You should consider a less problematic solution, such as using a T-SQL split function, a CLR split function, or SQL Server 2008’s table valued parameters. I’ll discuss these alternatives in subsequent articles.

Comments

Plain text