Copying Data with Dependencies

Recently I received a request from a customer for a solution to a problem related to copying data from one set of tables to another. The request seemed very straightforward, but as it turned out, the solution wasn't that trivial. Because the request seemed like a common need, I figured that others would be interested in the topic as well. The article starts by presenting the task in question, then it covers a failed attempt to handle the task, and then a couple of solutions that work.

Copying Data

The task that's the focus of this article is to copy data from one set of related tables to another, including dependent rows, while preserving the correct relationships between the rows. I'll use a simple generic data model to demonstrate the situation and to describe solutions. Figure 1 is a diagram that shows the data model for the source and target sets of tables.

Ben-Gan SQL557 Fig1-lg
Figure 1: Data Model

Listing 1 contains the code to create the source tables and fill them with sample data. Query the source tables by running the following code:

SELECT * FROM dbo.A;
SELECT * FROM dbo.B;
SELECT * FROM dbo.C1;
SELECT * FROM dbo.C2;

Tables 1 through 4 show the current contents of the source tables.

Ben-Gan SQL557 Table1

Ben-Gan SQL557 Table2
Ben-Gan SQL557 Table3
Ben-Gan SQL557 Table4
Run the code in Listing 2 to create the target tables. Notice that each of the source and target tables has a surrogate key created by an identity property. The surrogate keys are the ones used to relate rows between the tables.

Your task is the following: Given an input key id_a (@id_a) in table A, copy the row and all related rows in B, C1, and C2 into the tables AA, BB, CC1, and CC2, respectively, such that:

New surrogate keys are generated in the target tables.
The correct relationships are kept between the target rows as they were in the source tables.

In this example, the request is to copy only one row from table A plus all dependencies in the other source tables; however, your solution should work correctly even when multiple rows from A (and their dependencies) need to be copied.

Solution Using the MERGE Statement

As the first step in your solution, you need to copy the relevant rows from A into AA, and you need to record somewhere which target surrogate key was generated in AA for each source surrogate key in A. You need to record this information to enable the subsequent step that copies the relevant rows from B into BB, while preserving the correct relationships with the rows in AA based on the source relationships. Then, in a similar manner, you need to copy the relevant rows from C1 and C2 into CC1 and CC2, while maintaining the correct relationships with the rows in BB based on the source relationships.

You might try to implement the first step by using an INSERT statement with an OUTPUT option, and using the INTO clause, try to record the relationships between the source and target surrogate keys in a table variable. But if you try this approach you'll stumble upon a problem—the OUTPUT option in an INSERT statement allows only returning elements from the inserted row. It doesn't allow returning elements from the source row—such as the source surrogate key. As an example, try running the following code:

DECLARE @id_a AS INT = 2;

DECLARE @A_AA AS TABLE
(
  id_a INT NOT NULL,
  id_aa INT NOT NULL
);

INSERT INTO dbo.AA(val)
  OUTPUT SRC.id_a, inserted.id_aa
  INTO @A_AA(id_a, id_aa)
    SELECT val
    FROM dbo.A AS SRC
    WHERE id_a = @id_a;

You get the following error indicating that you're not allowed to refer to the source table's elements:

Msg 4104, Level 16, State 1, Line 10
The multi-part identifier "SRC.id_a" could not be bound.

Surprisingly, T-SQL does allow referring to the source table's elements in the OUTPUT clause when using the MERGE statement. You can find the details in "MERGE Statement Tips." So you write an artificial MERGE statement that implements an INSERT action. The tricky part about this approach is that the MERGE statement allows an INSERT action only in the WHEN NOT MATCHED clause, and this clause is activated when the MERGE predicate is false. So you write a MERGE statement with a predicate that's always false (e.g., ON 1 = 2) and then issue the INSERT action in the WHEN NOT MATCHED clause. Now the OUTPUT option allows you to return both source and target elements. Here's an example for using this approach to implement the first step in the solution (don't run the code yet):

DECLARE @id_a AS INT = 2;

DECLARE @A_AA AS TABLE
(
  id_a INT NOT NULL,
  id_aa INT NOT NULL
);

MERGE INTO dbo.AA AS TGT
USING (SELECT *
       FROM dbo.A
       WHERE id_a = @id_a) AS SRC
  ON 1 = 2
WHEN NOT MATCHED THEN
  INSERT(val) VALUES(SRC.val)
OUTPUT SRC.id_a, inserted.id_aa
INTO @A_AA(id_a, id_aa);

Based on this trick, Listing 3 provides the complete solution for copying the row from A with all of its dependencies in B, C1, and C2 to the target tables AA, BB, CC1, and CC2 respectively.

After running the code in Listing 3, query the target tables:

SELECT * FROM dbo.AA;
SELECT * FROM dbo.BB;
SELECT * FROM dbo.CC1;
SELECT * FROM dbo.CC2;

I got the outputs shown in Tables 5 through 8.

Ben-Gan SQL557 Table5
Ben-Gan SQL557 Table6
Ben-Gan SQL557 Table7
Ben-Gan SQL557 Table8

Solution Using Sequence Objects

Suppose that the copying task involves an additional requirement: Generate the target keys based on the order of source keys. Unfortunately, the solution with the MERGE statement doesn't give you any guarantees about the order in which the keys are generated. If you're using SQL Server 2012, you're in luck. You can replace your use of the identity column property with the new sequence object. You can find details about the sequence object in "Sequences, Part 1" and "Sequences, Part 2."

Using the sequence object, you don't need to use the trick with the MERGE statement because you can copy the rows from the source table to a table variable first, and in the same INSERT statement use the NEXT VALUE FOR function to generate the new keys. Then you can copy the rows from the table variable into the target table. This way you keep the connection between the source and target surrogate keys. As for the order in which the target keys are generated, the NEXT VALUE FOR function was implemented in T-SQL with support for an OVER clause where you can specify the desired order.

Use the code in Listing 4 to re-create the target tables. This time, there's no use of the identity property, but sequence objects instead. A DEFAULT constraint is used to request a new sequence value as an alternative to the identity property, so the key generation is still automated.

The following code demonstrates how to implement the first step in the copying task to copy the rows from A to AA while preserving the relationships between the source and target surrogate keys in a table variable (don't run the code yet):

DECLARE @id_a AS INT = 2;

DECLARE @A_AA AS TABLE
(
  id_a INT NOT NULL,
  id_aa INT NOT NULL,
  val VARCHAR(10) NOT NULL
);

INSERT INTO @A_AA(id_a, id_aa, val)
  SELECT A.id_a,
    NEXT VALUE FOR dbo.SEQ_AA OVER(ORDER BY A.id_a) AS id_aa,
    A.val
  FROM dbo.A
  WHERE id_a = @id_a;

INSERT INTO dbo.AA(id_aa, val)
  SELECT id_aa, val
  FROM @A_AA;

You can then use the table variable in the next step to copy the related rows from B to BB, and so on. You can find the complete new solution for the copying task in Listing 5.

After running the code in Listing 5, query the target tables and observe that the order of the keys in the target is the same as in the source:

SELECT * FROM dbo.AA;
SELECT * FROM dbo.BB;
SELECT * FROM dbo.CC1;
SELECT * FROM dbo.CC2;

Of course, observing that the order is the same isn't by itself a guarantee, but as I mentioned, the NEXT VALUE FOR function used with the OVER clause does give you this guarantee.

A Not So Trivial Pursuit

Sometimes tasks that seem to be trivial don't have trivial solutions. Copying rows with dependencies from one set of tables to another is such an example. What makes the task complicated in our case is the fact that all keys are surrogate keys generated by identity.

This article demonstrates two solutions to this need. One solution uses the MERGE statement with the OUTPUT clause, allowing you to return both the source and the target surrogate keys. But this solution doesn't give you any guarantees regarding the order in which the target keys are generated.

The second solution relies on the sequence object instead of the identity column property—and it does satisfy the ordering requirement, thanks to the fact that T-SQL supports an OVER clause with the NEXT VALUE FOR function, allowing you to specify the order in which the keys are generated.

Listing 1: DDL and Sample Data for Source Tables

SET NOCOUNT ON;
USE tempdb;

IF OBJECT_ID('dbo.C1') IS NOT NULL DROP TABLE dbo.C1;
IF OBJECT_ID('dbo.C2') IS NOT NULL DROP TABLE dbo.C2;
IF OBJECT_ID('dbo.B') IS NOT NULL DROP TABLE dbo.B;
IF OBJECT_ID('dbo.A') IS NOT NULL DROP TABLE dbo.A;

CREATE TABLE dbo.A
(
id_a INT NOT NULL IDENTITY PRIMARY KEY,
val VARCHAR(10) NOT NULL
);

CREATE TABLE dbo.B
(
id_b INT NOT NULL IDENTITY PRIMARY KEY,
id_a INT NOT NULL REFERENCES dbo.A(id_a),
val VARCHAR(10) NOT NULL
);

CREATE TABLE dbo.C1
(
id_c1 INT NOT NULL IDENTITY PRIMARY KEY,
id_b INT NOT NULL REFERENCES dbo.B(id_b),
val VARCHAR(10) NOT NULL
);

CREATE TABLE dbo.C2
(
id_c2 INT NOT NULL IDENTITY PRIMARY KEY,
id_b INT NOT NULL REFERENCES dbo.B(id_b),
val VARCHAR(10) NOT NULL
);

SET IDENTITY_INSERT dbo.A ON;

INSERT INTO dbo.A(id_a, val)
VALUES(2, 'a1'),(3, 'a2');

SET IDENTITY_INSERT dbo.A OFF;

SET IDENTITY_INSERT dbo.B ON;

INSERT INTO dbo.B(id_b, id_a, val)
VALUES(2, 2, 'b1'),(5, 2, 'b2'),(11, 3, 'b3'),(17, 3, 'b2');

SET IDENTITY_INSERT dbo.B OFF;

SET IDENTITY_INSERT dbo.C1 ON;

INSERT INTO dbo.C1(id_c1, id_b, val)
VALUES(2, 2, 'c1_1'),(3, 2, 'c1_2'),(5, 5, 'c1_3'),(7, 5, 'c1_4'),
(11, 11, 'c1_1'),(13, 11, 'c1_2'),(17, 17, 'c1_3'),(19, 17, 'c1_4');

SET IDENTITY_INSERT dbo.C1 OFF;

SET IDENTITY_INSERT dbo.C2 ON;

INSERT INTO dbo.C2(id_c2, id_b, val)
VALUES(1, 2, 'c2_1'),(2, 2, 'c2_2'),(4, 5, 'c2_3'),(6, 5, 'c2_4'),
(9, 11, 'c2_1'),(10, 11, 'c2_2'),(12, 17, 'c2_3'),(15, 17, 'c2_4');

SET IDENTITY_INSERT dbo.C2 OFF;

Listing 2: DDL for Target Tables

IF OBJECT_ID('dbo.CC1') IS NOT NULL DROP TABLE dbo.CC1;
IF OBJECT_ID('dbo.CC2') IS NOT NULL DROP TABLE dbo.CC2;
IF OBJECT_ID('dbo.BB') IS NOT NULL DROP TABLE dbo.BB;
IF OBJECT_ID('dbo.AA') IS NOT NULL DROP TABLE dbo.AA;

CREATE TABLE dbo.AA
(
id_aa INT NOT NULL IDENTITY PRIMARY KEY,
val VARCHAR(10) NOT NULL
);

CREATE TABLE dbo.BB
(
id_bb INT NOT NULL IDENTITY PRIMARY KEY,
id_aa INT NOT NULL REFERENCES dbo.AA(id_aa),
val VARCHAR(10) NOT NULL
);

CREATE TABLE dbo.CC1
(
id_cc1 INT NOT NULL IDENTITY PRIMARY KEY,
id_bb INT NOT NULL REFERENCES dbo.BB(id_bb),
val VARCHAR(10) NOT NULL
);

CREATE TABLE dbo.CC2
(
id_cc2 INT NOT NULL IDENTITY PRIMARY KEY,
id_bb INT NOT NULL REFERENCES dbo.BB(id_bb),
val VARCHAR(10) NOT NULL
);

Listing 3: Complete Solution Using MERGE with OUTPUT

Listing 3: Complete Solution Using MERGE with OUTPUT
DECLARE @id_a AS INT = 2;

DECLARE @A_AA AS TABLE
(
id_a INT NOT NULL,
id_aa INT NOT NULL
);

DECLARE @B_BB AS TABLE
(
id_b INT NOT NULL,
id_bb INT NOT NULL
);

MERGE INTO dbo.AA AS TGT
USING (SELECT *
       FROM dbo.A
       WHERE id_a = @id_a) AS SRC
ON 1 = 2
WHEN NOT MATCHED THEN
INSERT(val) VALUES(SRC.val)
OUTPUT SRC.id_a, inserted.id_aa
INTO @A_AA(id_a, id_aa);

MERGE INTO dbo.BB AS TGT
USING (SELECT K.id_aa, B.id_b, B.val
       FROM @A_AA AS K
         JOIN dbo.B
           ON K.id_a = B.id_a) AS SRC
ON 1 = 2
WHEN NOT MATCHED THEN
INSERT(id_aa, val) VALUES(SRC.id_aa, SRC.val)
OUTPUT SRC.id_b, inserted.id_bb
INTO @B_BB(id_b, id_bb);

INSERT INTO dbo.CC1(id_bb, val)
SELECT K.id_bb, C1.val
FROM @B_BB AS K
    JOIN dbo.C1
      ON K.id_b = C1.id_b;

INSERT INTO dbo.CC2(id_bb, val)
SELECT K.id_bb, C2.val
FROM @B_BB AS K
    JOIN dbo.C2
      ON K.id_b = C2.id_b;

Listing 4: DDL for Target Tables Using Sequence Objects

Listing 4: DDL for Target Tables Using Sequence Objects
IF OBJECT_ID('dbo.CC1') IS NOT NULL DROP TABLE dbo.CC1;
IF OBJECT_ID('dbo.CC2') IS NOT NULL DROP TABLE dbo.CC2;
IF OBJECT_ID('dbo.BB') IS NOT NULL DROP TABLE dbo.BB;
IF OBJECT_ID('dbo.AA') IS NOT NULL DROP TABLE dbo.AA;

IF OBJECT_ID('dbo.SEQ_CC1') IS NOT NULL DROP SEQUENCE dbo.SEQ_CC1;
IF OBJECT_ID('dbo.SEQ_CC2') IS NOT NULL DROP SEQUENCE dbo.SEQ_CC2;
IF OBJECT_ID('dbo.SEQ_BB') IS NOT NULL DROP SEQUENCE dbo.SEQ_BB;
IF OBJECT_ID('dbo.SEQ_AA') IS NOT NULL DROP SEQUENCE dbo.SEQ_AA;

CREATE SEQUENCE dbo.SEQ_AA AS INT MINVALUE 1;
CREATE SEQUENCE dbo.SEQ_BB AS INT MINVALUE 1;
CREATE SEQUENCE dbo.SEQ_CC1 AS INT MINVALUE 1;
CREATE SEQUENCE dbo.SEQ_CC2 AS INT MINVALUE 1;

CREATE TABLE dbo.AA
(
id_aa INT NOT NULL DEFAULT(NEXT VALUE FOR dbo.SEQ_AA) PRIMARY KEY,
val VARCHAR(10) NOT NULL
);

CREATE TABLE dbo.BB
(
id_bb INT NOT NULL DEFAULT(NEXT VALUE FOR dbo.SEQ_BB) PRIMARY KEY,
id_aa INT NOT NULL REFERENCES dbo.AA(id_aa),
val VARCHAR(10) NOT NULL
);

CREATE TABLE dbo.CC1
(
id_cc1 INT NOT NULL DEFAULT(NEXT VALUE FOR dbo.SEQ_CC1) PRIMARY KEY,
id_bb INT NOT NULL REFERENCES dbo.BB(id_bb),
val VARCHAR(10) NOT NULL
);

CREATE TABLE dbo.CC2
(
id_cc2 INT NOT NULL DEFAULT(NEXT VALUE FOR dbo.SEQ_CC2) PRIMARY KEY,
id_bb INT NOT NULL REFERENCES dbo.BB(id_bb),
val VARCHAR(10) NOT NULL
);

Listing 5: Complete Solution Preserving Order

Listing 5: Complete Solution Preserving Order
DECLARE @id_a AS INT = 2;

DECLARE @A_AA AS TABLE
(
id_a INT NOT NULL,
id_aa INT NOT NULL,
val VARCHAR(10) NOT NULL
);

DECLARE @B_BB AS TABLE
(
id_b INT NOT NULL,
id_bb INT NOT NULL,
id_aa INT NOT NULL,
val VARCHAR(10) NOT NULL
);

INSERT INTO @A_AA(id_a, id_aa, val)
SELECT A.id_a,
    NEXT VALUE FOR dbo.SEQ_AA OVER(ORDER BY A.id_a) AS id_aa,
    A.val
FROM dbo.A
WHERE id_a = @id_a;

INSERT INTO dbo.AA(id_aa, val)
SELECT id_aa, val
FROM @A_AA;

INSERT INTO @B_BB(id_b, id_bb, id_aa, val)
SELECT B.id_b,
    NEXT VALUE FOR dbo.SEQ_BB OVER(ORDER BY B.id_b) AS id_bb,
    K.id_aa, B.val
FROM @A_AA AS K
    JOIN dbo.B
      ON K.id_a = B.id_a;

INSERT INTO dbo.BB(id_bb, id_aa, val)
SELECT id_bb, id_aa, val
FROM @B_BB;

INSERT INTO dbo.CC1(id_cc1, id_bb, val)
SELECT
    NEXT VALUE FOR dbo.SEQ_CC1 OVER(ORDER BY C1.id_c1) AS id_cc1,
    K.id_bb, C1.val
FROM @B_BB AS K
    JOIN dbo.C1
      ON K.id_b = C1.id_b;

INSERT INTO dbo.CC2(id_cc2, id_bb, val)
SELECT
    NEXT VALUE FOR dbo.SEQ_CC2 OVER(ORDER BY C2.id_c2) AS id_cc2,
    K.id_bb, C2.val
FROM @B_BB AS K
    JOIN dbo.C2
      ON K.id_b = C2.id_b;

Comments

Plain text