Congratulations to Ronel Fernandez, software engineer at R Systems in El Dorado Hills, California, and Brian J. Parker, systems coordinator at Pittsburgh-based UPMC Health System. Ronel won first prize of $100 for the best solution to the February Reader Challenge, "Inserting NULL Values." Brian won second prize of $50. Honorable mention goes to Marek Skotnica, who was the first-place winner in our October Reader Challenge. Here’s a recap of the problem and the solution to the February Reader Challenge.
Kurt is the database architect for a company that uses SQL Server 2000 for data warehousing. He needs to expose one of the tables in the data warehouse—the Test table—to external components. Among the components that access this table is a Windows NT batch script that uses the SQL Server Bulk Copy utility to import data from a file. The Test table has the following structure:
CREATE TABLE Test ( TestID int NOT NULL PRIMARY KEY CLUSTERED, Col1 varchar( 10 ) NOT NULL )
Components typically insert data directly into the table. However, in this example, some of the inserted rows don’t contain valid values for TestID, the primary key column (i.e., some of the rows’ values are NULL), and Kurt can’t modify the component to generate primary key values for the invalid rows. Help Kurt design a solution that lets the component insert rows containing both valid and NULL primary key values into the table but requires minimal changes to the component.
Kurt can use the INSTEAD OF INSERT trigger introduced in SQL Server 2000 to replace the component’s INSERT statement action, then manipulate the inserted rows. He knows that defining an INSTEAD OF INSERT trigger on the table directly won’t let him insert rows with NULL primary key values, so he defines the following view on the table:
CREATE VIEW ImportTest ( TestID, Col1 ) AS SELECT TestID + 0, Col1 FROM Test
The component can now insert data through this view because the view provides a structure similar to that of the Test table. The one difference between the two structures is that in the view, you can represent NULLs by adding zero to the TestID column (or by using an expression like TestID * 1), so that where no value exists, you have an acceptable value in that field. After you’ve defined the view, you can define the INSTEAD OF INSERT trigger that lets you insert the NULL values into the primary key column. When Kurt defines an INSTEAD OF INSERT trigger on the view, the trigger code manipulates the inserted rows and performs the actual insert into the Test table. The following complete statement includes the INSTEAD OF INSERT trigger that performs those operations:
CREATE TRIGGER InsertTest ON ImportTest INSTEAD OF INSERT AS SELECT IDENTITY( int ) AS seq, i.Col1 INTO #NullRows FROM inserted AS i WHERE i.TestID is NULL INSERT INTO Test ( TestID, COl1 ) SELECT i.TestID, i.Col1 FROM inserted AS i WHERE i.TestID IS NOT NULL INSERT INTO Test ( TestID, COl1 ) SELECT COALESCE( ( SELECT MAX( t.TestID ) FROM Test AS t WITH( UPDLOCK ) ), 0 ) + i.seq, i.Col1 FROM #NullRows AS I
Now Kurt can instruct the component to insert data into this view instead of into the Test table. After the INSTEAD OF INSERT trigger traps the INSERT INTO statement, the rows will be available in the "inserted" virtual table for manipulation. First, Kurt retrieves the rows that have NULL values in the TestID column, then inserts those rows into a temporary table that includes an identity column. For each row, the identity column generates a sequence number that he can use later.
Let’s walk through the statement step by step. The code’s first three rows set up the trigger:
CREATE TRIGGER InsertTest ON ImportTest INSTEAD OF INSERT AS
The statement that follows performs the retrieval and insertion of the rows that contain NULL primary key values:
SELECT IDENTITY( int ) AS seq, i.Col1 INTO #NullRows FROM inserted AS i WHERE i.TestID is NULL
Next, Kurt needs to generate the new primary key values for the rows containing NULL values by using the following SELECT statement:
SELECT COALESCE( ( SELECT MAX( t.TestID ) FROM Test AS t WITH( UPDLOCK ) ), 0 ) + i.seq, i.Col1 FROM #NullRows AS i
The SELECT statement obtains each new primary key value by taking the current maximum value from the Test table, then adding the generated sequence number to the value. The trigger always starts an implicit transaction, and the UPDLOCK hint in the SELECT statement ensures that other concurrent transactions can’t obtain the same maximum value. Now, Kurt can insert the rows containing NULL values and the rows with valid primary key values into the Test table by executing the INSERT statements:
INSERT INTO Test ( TestID, COl1 ) SELECT i.TestID, i.Col1 FROM inserted AS i WHERE i.TestID IS NOT NULL INSERT INTO Test ( TestID, COl1 ) SELECT COALESCE( ( SELECT MAX( t.TestID ) FROM Test AS t WITH( UPDLOCK ) ), 0 ) + i.seq, i.Col1 FROM #NullRows AS i
To test the solution, Kurt can use the following INSERT statement:
INSERT INTO ImportTest ( TestID, Col1 ) SELECT 1, 'a' UNION ALL SELECT NULL, 'b' UNION ALL SELECT 2, 'c' UNION ALL SELECT NULL, 'd'
By using the INSTEAD OF INSERT trigger feature in SQL Server 2000, Kurt can provide an efficient solution that requires minimal modification to the external components.
Now, test your SQL Server savvy in the March Reader Challenge, "Leveling the Load" (below). Submit your solution in an email message to [email protected] by February 20. SQL Server MVP Umachandar Jayachandran, a SQL Server Magazine technical editor, will evaluate the responses. We’ll announce the winners in an upcoming SQL Server Magazine UPDATE. The first-place winner will receive $100, and the second-place winner will receive $50.
Here’s the challenge: Frank is the data warehouse architect for a company that hosts its databases on SQL Server 2000 and 7.0. Every week, the company imports between 1 million and 5 million orders into the data warehouse’s Orders table. The weekly import contains orders placed only during that week, but the Orders table contains an aggregate of data from many weekly imports. The following statement creates the Orders table and its relevant columns:
CREATE TABLE Orders ( OrderID int NOT NULL PRIMARY KEY CLUSTERED, CustomerID int NOT NULL REFERENCES Customers, OrderDate datetime NOT NULL ) CREATE NONCLUSTERED INDEX Orders_CustomerID ON Orders ( CustomerID )
Frank is particularly interested in reducing the time required to load the orders data, increasing the efficiency of the queries that access the table, and streamlining data archival. The queries that he wants to optimize access several weeks of data. Assuming that Frank has an efficient loading mechanism that uses the SQL Server BULK INSERT API, how can he redesign the table to meet his objectives?