Using Fake Temporary Tables

Downloads
16377.zip

Unlike old-time cursor-based database systems, which work with single rows of data, modern relational database management systems (RDBMSs) such as SQL Server work with sets of data. However, SQL Server's set-oriented database engine doesn't let you directly send or receive data sets as parameters in interprocess communications such as triggers or stored procedure calls. SQL Server limits parameters to standard data types, and the ANSI SQL-92 standard doesn't have a data type for data sets (or recordsets, in ADO terminology). However, you can trick SQL Server into accepting data sets as parameters for interprocess communications by using a technique I call fake temporary tables.

Temporary tables let you store intermediate results or reuse results from previous queries in a complex, long-running report. Temporary tables also let you easily represent private data sets, which are available only to the session that created the table. However, you can't create temporary tables (or any tables, for that matter) in triggers except in SELECT INTO statements, and stored procedures can't accept temporary tables as parameters. But you can use fake temporary tables—permanent tables that emulate temporary tables—inside triggers and as parameters for stored procedures. Let's look at how to create fake temporary tables, then explore how you can use the tables inside triggers and in stored procedure calls.

About Temporary Tables

A temporary table, which SQL Server stores in its tempdb database, is a private set of rows and columns that belongs to one execution context. Because a temporary table belongs to a specific execution context (or connection), when someone closes that connection, every temporary table created in that connection disappears.

The names of local temporary tables—as opposed to global temporary tables, which I talk about later—begin with the pound (#) character. For example, executing the CREATE TABLE statement

CREATE TABLE #tmpA (
	id INT,
	value FLOAT)
GO
USE tempdb
GO
EXEC sp_help #tmpA

generates a temporary table with a name such as

#tmpA__. . .__000000000013

The numbers at the end of the name represent the table's unique relationship to the connection that created it. If another connection executes the same CREATE TABLE script, SQL Server creates a second #tmpA temporary table that has a different name suffix and that no one outside the second connection can see. The different temporary tables can even have different structures, although I recommend that you preserve a consistent table structure among processes to prevent maintenance problems.

A stored procedure that you call from inside a process can use temporary objects created in that process. You can even create within that stored procedure a temporary table with the same name as the original temporary table, and SQL Server will give the stored procedure's temporary table a unique name to avoid conflicts. Even better, SQL Server will automatically destroy the stored procedure's temporary table after the stored procedure finishes execution. However, you can't pass the stored procedure a temporary table or have the procedure return a temporary table to the calling procedure.

Global temporary tables also belong to a specific connection, but unlike local temporary tables, global tables are visible from other processes. The names of global temporary tables begin with two pound symbols (##). But SQL Server doesn't add a suffix to global temporary tables to identify the processes that created them, so trying to create the same global temporary table from another process produces a name conflict.

Many standard processes require you to join permanent tables to temporary tables. However, SQL Server always creates temporary tables in tempdb, which forces SQL Server to inefficiently cross database boundaries to join tables from different databases. But you can't create temporary tables outside of tempdb; so, what can you do to create a table outside tempdb to hold private, intermediate data sets?

Creating Fake Temporary Tables

Suppose you have a complex stored procedure that produces a long sales report. The report includes multiple sections, and your current stored procedure creates a temporary table, which the stored procedure reuses throughout its processing to optimize queries that produce other sections of the report. You want to avoid unnecessary joins to the original Products table, so you create a temporary table with all the necessary data. This table might have a structure such as

CREATE TABLE #SelectedProducts (
	Prod_ID INT NOT NULL PRIMARY KEY,
	Supplier_ID INT NOT NULL,
	Description VARCHAR(200),
	Price MONEY)

However, what if multiple users need to simultaneously use the sales reporting process? In addition, you might need to call an external process, which would also need access to the temporary table. You might use OLE automation system stored procedures to call an external component, use the sp_start_job system stored procedure to execute a job, or use xp_cmdshell to call an external program. However, you can't send the temporary table to an external process, and sending just the name of the temporary table doesn't work because the unique name that SQL Server generates for the table is different from the one you used in the CREATE TABLE statement.

Instead of using the previous CREATE TABLE statement, you could create a permanent table, such as

CREATE TABLE fttSelectedProducts (
	Inst_ID INT NOT NULL,
	Prod_ID INT NOT NULL,
	Supplier_ID INT NULL,
	Description VARCHAR(200) NULL,
	Price MONEY NULL,
CONSTRAINT PK_ftt_SelectedProducts
PRIMARY KEY NONCLUSTERED (Inst_ID, Prod_ID))
GO
/* Optional nonclustered index to optimize searching on Prod_ID */
CREATE NONCLUSTERED INDEX idxSelectedProducts_Prod_ID
	ON fttSelectedProducts (Prod_ID)

This example creates a permanent table called fttSelectedProducts. But notice the new column Inst_ID, which identifies the execution context that you created the table in and which enables the permanent table to act like a local temporary table. Specifying NULL contents for every nonprimary key field is important if you want to easily insert dummy rows in this table, such as the initial row that you insert to block the Inst_ID value.

To generate a new Inst_ID value for every new execution of the CREATE statement, you can call a stored procedure such as GetInstID_SelectedProducts, which selects, locks, and returns a new unused value for the instance ID column. Listing 1 and Listing 2 show two examples of GetInstID_SelectedProducts. Listing 1's version of GetInstID_SelectedProducts uses the RAND() function, which works with any SQL Server release to generate random float values for Inst_ID. Listing 2's version uses the NEWID() function, which Microsoft introduced with SQL Server 7.0, to assign a unique Inst_ID value of type uniqueidentifier. Although the NEWID() solution is simpler and easier to maintain than the RAND() solution, the NEWID() version uses 16 bytes to store the uniqueidentifier field value instead of the 4 bytes that RAND() uses to store its integer field. Later, I cover more performance considerations that you should explore before deciding whether to use RAND() or NEWID().

When you have a way to generate unique Inst_ID values, you can use the appropriate subset of fttSelectedProducts as if it were a temporary table. Just include WHERE Inst_ID = @Inst_ID to limit the results to the appropriate private data set. To call a stored procedure or an external utility to process the data set, you need to pass the procedure or utility only one parameter: the value of @Inst_ID. Just make sure that no process uses the dummy entry (in this example, Prod_ID=0), which this example uses only to block the new Inst_ID value.

When you no longer need the data in the fake temporary table, just use a stored procedure or a simple DELETE statement to remove the data set:

DELETE FROM fttSelectedProducts
WHERE Inst_ID = @Inst_ID

If you forget to remove the data set, your fake temporary table will contain wasted space. To guard against this wasted space, you can create an autostart job to clean out the table every time SQL Server starts. Or, you can define a job that tests whether the table is in use and truncates the table if it isn't.

Now that you understand how to create fake temporary tables, you can use them to solve real-world problems. Let's explore two examples: how to use fake temporary tables inside triggers and how to use fake temporary tables to execute stored procedures affecting a batch of changes. (Also, for implications of using this technique with SQL Server 2000, see the sidebar "Fake Temporary Tables and SQL Server 2000.")

Using Fake Temporary Tables Inside Triggers

Suppose you want to call extended stored procedure xp_sendmail from a trigger to send managers a message that contains information about modified salary rows in the Employees table. You can't read Inserted or Deleted tables from xp_sendmail because these virtual tables are private to the execution of the trigger, and you can't create a temporary table inside the trigger. However, you can use a permanent table inside the trigger to hold information about the modified rows. And a fake temporary table is nothing more than a permanent table that holds information private to a specific execution context. Inside the trigger, you need to simply call a stored procedure to generate a new random value that identifies the execution context of the trigger. Note that writing too much application logic inside triggers isn't safe or efficient; you're better off storing such logic in stored procedures. The following example demonstrates how to select a data set from a trigger, then send the data set to another procedure.

First, create a fake temporary table that includes the fields you require from the Inserted and Deleted tables, including the common primary key fields and Inst_ID. The fake temporary table must include previous and new salary information and employee name:

CREATE TABLE fttEmployeesSalary (
	Inst_ID INT NOT NULL,
	Emp_ID INT NOT NULL,
	Employee_Name VARCHAR(50) NULL,
	PreviousSalary MONEY NULL,
	NewSalary MONEY NULL,
	
CONSTRAINT PK_ftt_EmployeesSalary
	PRIMARY KEY NONCLUSTERED (Inst_ID, Emp_ID))
GO
/* Optional nonclustered index to speed searching process on Emp_ID */

CREATE NONCLUSTERED INDEX idxEmployeesSalary_Emp_ID
	ON fttEmployeesSalary (Emp_ID)

Next, create a stored procedure called GetInstID_EmployeesSalary, which generates a unique Inst_ID value as Listing 1's and Listing 2's different versions of GetInstID_SelectedProducts do. You can now create your trigger, udtEmployeesSalary, which Listing 3 shows. When anyone modifies salary data in the Employees table, the trigger sends an email notification to the employees' appropriate managers.

Let's look more closely at how the udtEmployeesSalary trigger works. First, the trigger executes the GetInstID_EmployeesSalary stored procedure to retrieve the process's instance identifier from the @Inst_ID variable. Using this instance identifier, the trigger inserts into the fttEmployeesSalary fake temporary table the Emp_ID and Employee_Name of the employee whose information has been modified, as well as the employee's old and new salaries. The trigger then creates a message containing as parameters the name of the employee who modified the data and the date and time of that modification. Next, the trigger creates a query that it will send through email. The query, which contains information about the changes to the Employees table, uses the fake temporary table to filter the information by instance ID (Inst_ID = @Inst_ID) and to avoid the dummy entry (Emp_ID > 0). The trigger then uses xp_sendmail to send the query result as an email message and deletes from the fake temporary table any rows marked with the current instance ID.

You could create a narrower scope for your fake temporary table, creating it to store only modified data and leaving the employee name in the Employees table. But to store only modified data, you'd have to join fttEmployeesSalary to the Employees table in every query so that you could retrieve additional employee information. My example, which uses only the fake temporary table, consumes more storage space, but it's more efficient than storing only salary changes because it uses fewer pages to produce the final result.

Batch Execution of Stored Procedures

You can use stored procedures to optimize your system, and you can parameterize stored procedures to accept different values, as long as you use consistent data types. Many times, you might find sending a table as a parameter to a stored procedure useful. However, T-SQL doesn't provide an array, object, or data set data type. You can pass a table name to a stored procedure only as a string—you can't pass a table as a data set. Even when you pass a table name, you must rely on SQL Server's dynamic execution to use the name—an inefficient operation that requires SQL Server to optimize the query on every execution and that bypasses the benefits of stored procedure reuse. So, how can you send a data set to a stored procedure for further processing?

Suppose you have a stored procedure called CheckInvoice, which converts proposed invoices into permanent ones. The procedure performs three actions:

It updates the Invoices table to set the Checked field to 1 for checked invoices.
It updates two denormalized fields in the Clients table: TotalPreInvoiced (the total value of proposed invoices) and TotalInvoiced (the total amount invoiced).
It notifies the finance department of which invoices to print and send to clients by using xp_sendmail to email the appropriate Invoice_No values.

To perform these functions, you would typically call the CheckInvoice procedure once per invoice by sending the procedure the Inv_ID value, which identifies the invoice. But with fake temporary tables, you can make this checking process more efficient. Simply modify the CheckInvoice procedure to use a fake temporary table that lets you check all the invoices at once. The fake temporary table uses the instance identifier to specify which invoices the operator has marked as permanent. Let's call the modified stored procedure CheckInvoices to differentiate it from the original CheckInvoice procedure.

For this example, you create a Visual Basic (VB) application that shows proposed invoice details and lets the operator mark them as permanent by using a check box called chkCheckInvoice. When the operator has finished checking the invoices and wants to send the changes to the server, the operator clicks Commit Command (cmdCommit) to send changes to the server.

The application's back end uses a fake temporary table to hold the checked invoices. Whenever the operator selects an invoice as permanent, the application inserts a new record into the fake temporary table. And whenever the operator clears a previously selected invoice, the application deletes the corresponding record from the table. After the operator makes the changes permanent, the application calls the CheckInvoices procedure to execute the required actions. To optimize this process, you can create stored procedures to insert entries into and delete them from the fake temporary table. (Note that you can't use a similar procedure to convert permanent invoices back to proposed invoices because that process has legal and financial ramifications.)

Listing 4, page 51, shows the T-SQL code for creating the fake temporary table. (The online version of Listing 4 also shows the T-SQL code for the stored procedures that generate a unique Inst_ID and that insert and delete invoices from the table. See the More on the Web box for download instructions.) Web Listing 1, available online, shows the simplified VB code for the application. Here are the steps you'd take to create such an application:

Create the fake temporary table fttCheckedInvoices, which contains only two fields: Inst_ID and Inv_ID.
Create the GetInstID_CheckedInvoices stored procedure, which searches for an unused Inst_ID instance identifier value on fttCheckedInvoices whenever the operator starts a new session.
Create stored procedure AddCheckedInvoice, which inserts new entries into fttCheckedInvoices whenever the operator selects an invoice.
Create stored procedure DelCheckedInvoice, which removes entries from fttCheckedInvoices whenever the operator clears an invoice.
Create the CheckInvoices stored procedure, which accepts only Inst_ID as a parameter, to send the data set of permanent invoices to the server when the operator clicks Commit.
Remove entries from fttCheckedInvoices after the procedure executes successfully.

With this fake temporary table strategy, you avoid concurrency conflicts, because every connection uses its own set of data. If you think that page locks might be a problem in SQL Server 6.5, you could create the fake temporary table's primary key as CLUSTERED so that you spread entries from different connections onto different pages (remember that Inst_ID is the first field in the primary key).

Optimizing Fake Temporary Tables

Fake temporary tables are simple and leave you little functionality to optimize. You need to define a primary key to ensure uniqueness, and you need to define a nonclustered index on the original primary key—in case you have to link the fake temporary table to other tables to retrieve extra information. Deadlock situations with fake temporary tables are unlikely. Every connection uses its own set of rows, which the Inst_ID column identifies, and you have an index starting with Inst_ID, so only one connection uses the unique path seek of each Inst_ID value.

Creating tables, temporary or not, involves some overhead. SQL Server inserts information into system tables to define the new table and creates allocation structures to support storage in the table. However, a fake temporary table is a permanent table that already exists: For example, the system catalog already contains its definition, and SQL Server has already created the necessary allocation pages to control data storage. If a stored procedure uses a temporary table, SQL Server must create and destroy the temporary table on every execution of the stored procedure. So, using fake temporary tables in stored procedures also improves the procedures' execution speed. And applications use the values entered in a fake temporary table immediately after their insertion, so SQL Server keeps the values in cache and removes them when the connection ends, keeping physical disk reads to a minimum.

As you saw earlier, using NEWID() to generate the Inst_ID value as a uniqueidentifier data type would simplify your code, but at a cost: Uniqueidentifiers use 16 bytes. Using the NEWID() function will provide completely different 16-byte numbers, as Listing 5, page 51, shows. Extracting 4 bytes only, to get performance similar to that of random integer numbers, is difficult because the NEWID() function guarantees the uniqueness of 16 bytes but not of any part of the number in isolation. The results of Listing 6 and Listing 7 show that using the RAND() function to generate and insert random integer values takes about 15 percent longer than using NEWID() to insert uniqueidentifier values, but searching for integer values is twice as fast as searching for uniqueidentifier values.

Listing 8 shows a comparison of values that RAND() manually generates versus values that an algorithm based on NEWID() generates. As you can see, the NEWID() algorithm was 40 percent faster than the manual RAND() at producing instance identifier values. But the real values depend on the actual fake temporary table configuration. The best advice is to test your real-world situation.

Fake temporary tables can open up a whole new world of possibilities for designing your database for more efficient processing and for more functionality. Your specific database needs and your developer imagination will surely suggest other uses for this technique.

Comments

Plain text