Working with Variables in T-SQL, Part 2

In "Working with Variables in T-SQL, Part 1," January 2002, I explained how to declare and assign values to local variables in T-SQL. You can use variables in T-SQL in much the same way you would use them in other development environments, such as Visual Basic (VB). For example, you can use a variable as a loop counter and store intermediate results of long or complex calculations for later reuse. However, some uses for variables are specific to T-SQL, letting you write efficient, short, and dynamic code. For example, you can use variables when you update data in a table. Remember that one T-SQL UPDATE statement can affect many rows. While SQL Server processes each row, it can read the variable's data, store the data in a column, and change the variable's content. An UPDATE statement that involves variables can be very efficient when used properly, but if you don't understand exactly how the UPDATE statement works, you can write code that's logically incorrect. In this article, I discuss the special UPDATE syntax in T-SQL, when to use it, and more important, when not to use it. You can also use variables when you perform dynamic execution in T-SQL. Dynamic execution lets you construct a T-SQL statement in a variable, then execute it. I discuss dynamic execution as well as an undocumented feature of T-SQL that lets you assign values to variables in a dynamically executed statement.

Using Variables to Update Data

You can use variables in some interesting ways to update and aggregate data. For example, you might want to maintain your own data sequences instead of using the IDENTITY column property. SQL Server implements the IDENTITY property at the table level, not at the database level. SQL Server limits a table to one IDENTITY column, but you can create a table to supply your own sequences and assign each row in the table a different sequence. Then, you can use variables to both retrieve the next sequence and increment it. One column in the table holds the sequence name, and another column holds the value of the sequence, as the following example shows:

CREATE TABLE Sequences
(
  seq_name varchar(10) NOT NULL 
PRIMARY KEY,
  seq_val int NOT NULL DEFAULT (0)
)

To create 10 new sequences, you insert 10 rows with your choice of sequence names, as Listing 1 shows. To obtain a new value from Listing 1's SEQ4 sequence, you'd typically write an UPDATE statement, followed by an assignment statement, as Listing 2 shows. The output from executing the code in Listing 2 equals 1, meaning that you've returned the first sequence value. The problem with this code is that because it's not wrapped in an explicit transaction, someone else might update the SEQ4 sequence between the execution of your UPDATE and SET statements, and your variable could contain the wrong value.

To avoid this problem, you can wrap your code with an explicit transaction. The UPDATE statement keeps an exclusive lock on the modified row until the transaction ends, which guarantees that no one can modify the row until SQL Server commits or rolls back the transaction. This workaround doesn't make any special use of variables, but rather exploits the transactional capabilities of SQL Server. Another way to avoid the problem is to use a T-SQL-specific UPDATE syntax that lets you modify a row and assign a value to a variable—all in the same statement, as Listing 3 shows. Note that this special UPDATE syntax isn't ANSI-compliant. When you execute the code in Listing 3, you return an output of 2, which means that you've obtained the second sequence value.

You can also aggregate data in a local variable during an UPDATE statement. For example, suppose you want to add 10 percent to the unit price of all the order lines in order 10248. (Order 10248 is in the Northwind database's Order Details table.) And suppose you also want to calculate a new total volume (quantity * price) for those order lines after the 10 percent price increase. Typically, you'd issue an UPDATE statement, then follow it with a subquery containing an assignment, as Listing 4 shows. The code in Listing 4 works correctly; however, it accesses the Order Details table twice—once to update the unit prices and once to retrieve the total volume. To streamline performance, you can use just one UPDATE statement, as Listing 5 shows. The code in Listing 5 aggregates the values in the variable while the data is being updated, so the code accesses the Order Details table only once.

When Not to Use Variables to Aggregate Data

Understanding exactly how to use the variable techniques I discuss in this article to aggregate data is important if you want your code to work efficiently. But understanding when not to use variables is just as important. Avoid scenarios in which you can't guarantee that the variable techniques will produce the results you expect. For example, consider the OrderLines table that Listing 6 creates. Suppose you had to produce a query to return the order lines details as well as the total quantity of all the order lines with the same order ID and an order line that's less than or equal to the current row's order line. To achieve the desired results, run the SELECT statement that Listing 7 shows. You should get the output that Figure 1 shows. The problem with this solution is that the query accesses the table many times, rendering poor performance.

To retrieve the order lines data with better performance, you might be tempted to use the techniques that I demonstrated earlier, such as adding another column to the table to store the running sum or using an UPDATE statement that scans the table only once, as Listing 8 shows. While the UPDATE statement scans the rows, you might use the variable to aggregate the values and store the intermediate results of the aggregation in the running_sum column. But here's a case in which the technique might not produce the results you expect.

Although the running_sum column seems to store the correct values after the update, this solution has a problem. The UPDATE statement assumes that SQL Server will physically access the rows in the same order as they were inserted into the table (sorted by orderid, orderline). If the rows aren't accessed in orderid, orderline order, SQL Server updates the running_sum column with incorrect values. This erroneous update might occur because of the fundamental nature of T-SQL. A T-SQL UPDATE statement is a set-based statement issued against a relational database, meaning that you can tell SQL Server what you want, but you can't control how SQL Server accomplishes the task behind the scenes. Neither T-SQL nor ANSI SQL supports an ORDER BY clause in an UPDATE statement, so you can't force SQL Server to make the update scan the table in orderid, orderline order. SQL Server can use many different access methods to perform the same task, each of which can access the rows in a totally different order.

I suggest that you use the solution in Listing 7 instead, which doesn't use variables. The same suggestion applies every time your UPDATE statement relies on a specific physical order of rows being processed. If you want to aggregate data and you don't have to worry about the physical access order of the rows, you can safely use variables in an UPDATE statement. For example, suppose you want to add a surrogate key to a table that contains duplicates. Run the script that Listing 9 shows to create a table called T1 containing duplicate rows and no primary key. If you want to add to T1 a surrogate key containing arbitrary values that have no meaning other than to uniquely identify the rows, you can add an IDENTITY column that will serve as the primary key. If you prefer not to use the IDENTITY property, you can use an UPDATE statement containing a variable that generates a different value for each row. To test the approach that uses the special UPDATE statement, run the code that Listing 10 shows. First, the code adds an integer column that allows NULLs. (You can't add to an existing table a column that doesn't allow NULLs and doesn't have a default value.) Next, the script issues an UPDATE statement by using a variable to assign a sequence of values to the integer column, altering the integer column so that it won't allow NULLs. Then, the logic designates that column as the primary key. (You can't assign a primary key to columns that allow NULLs.) Figure 2 shows the output for Listing 10.

Using Dynamic Execution to Assign Values to Variables

We've seen how to use variables to write efficient UPDATE statements, and we've explored other situations in which variables can be useful. However, T-SQL has some limitations when using variables. Let's look at those limitations and some ways to circumvent them.

You can substitute expressions for the variables in your query, but you can't substitute object names. For example, if you need to select from a table whose name is stored in a variable called @table_name, you might be tempted to use the following syntax, which T-SQL doesn't support:

DECLARE @table_name AS sysname
SET @table_name = N'Orders'

SELECT * FROM @table_name

Instead, you have to use dynamic execution, which lets you construct your T-SQL statement dynamically in a variable and execute it by using the EXEC command or the sp_executesql stored procedure. For example, to make the above example work, you can construct the SELECT statement in a character string, then execute the statement by using the sp_executesql stored procedure, as Listing 11 shows.

Now for a Puzzle

Now that you understand how dynamic execution works, let's look at a situation in which you want to perform a dynamic calculation and store the result in a variable. Let's say you have a table name, the name of the primary key column, the names of two data columns, and a key value stored in five variables called @table_name, @key_col_name, @col1_name, @col2_name, and @key_val, respectively. You need to extract the column values (whose names are in the variables @col1_name and @col2_name) from the table (whose name is in the variable @table_name) from the row (whose key column name is in the variable @key_col_name) and the key value (which is in the variable @key_val). You need to store the product of the two column values in a variable called @result. For example, when the values that Listing 12 shows are stored in the variables, the output should be 15 (EmployeeID x ShipVia = 5 x 3 = 15).

You need to use dynamic execution here because the table and column names are stored in variables; the main problem is to assign the result to the variable @result. The sp_executesql stored procedure has an undocumented feature that lets the stored procedure set a value to a variable that's declared in the batch that invoked sp_executesql. SQL Server Most Valuable Professional (MVP) Umachandar Jayachandran was one of the first to discover this undocumented feature. If you're familiar with the basics of using I/O parameters in a stored procedure, you understand how to use output parameters with sp_executesql. This stored procedure employs elements that T-SQL uses when creating and invoking a stored procedure. For example, the CREATE PROCEDURE statement has the following syntax:

CREATE PROC 
  
AS



GO

And you invoke the stored procedure using the following syntax:

EXEC

If you wanted to create a stored procedure that has two input arguments and returns the product of those arguments in an output parameter, you would use the code that Listing 13 shows to create the procedure. Then, you would invoke the stored procedure, as Listing 14 shows. The sp_executesql command operates as if it creates and executes a stored procedure by using the same elements, but it has the advantage of constructing the > section dynamically. Using the elements shown earlier in the CREATE PROC and EXEC statements, you construct the sp_executesql statement as follows:

EXEC sp_executesql
  @stmt = N'',
  @params = N'',

Before looking at the puzzle's solution, let's convert the usp_Product stored procedure to sp_executesql. You can use the techniques demonstrated in this simple example to solve the puzzle. To convert the usp_Product stored procedure to sp_executesql, simply substitute the appropriate elements from the CREATE PROC and EXEC usp_Product statements from Listing 13 and Listing 14 into sp_executesql, as Listing 15 shows. Assign the section of the stored procedure (N'SET @r = @p1 * @p2' ) to the @stmt parameter, the section (N'@p1 AS int, @p2 AS int, @r AS int OUTPUT') of the stored procedure to the @params parameter, and follow with the section (@p1 = @myp1, @p2 = @myp2, @r = @myr OUTPUT). The result is 15, which means that the static usp_Product was successfully converted to sp_executesql by using dynamic execution. If you understand the code in Listing 15, you have all the tools you need to solve the puzzle: Just add the table name, data and key column names, and key value to the equation, as Listing 16 shows.

The Importance of Understanding the Details

To efficiently and effectively work with variables in T-SQL, you need to be aware of the many small details surrounding how SQL Server processes and uses variables. After you've mastered those details, you can use variables in many interesting and sophisticated ways, such as using custom sequences and calculating aggregations to produce efficient T-SQL code. If you've found other innovative ways to use variables in T-SQL, send them to me; I'll try to share them in future issues.

Comments

Plain text