Maintaining Referential Integrity

As a Relational Database Management System (RDBMS), SQL Server uses the referential integrity constraint to ensure that data in one table points to data in another table—and doesn’t point to data that doesn’t exist. SQL Server uses constraints, triggers, rules, and defaults to enforce referential integrity. In this article, I discuss the two main mechanisms that enforce referential integrity—constraints and triggers.

You don’t need to use constraints or triggers to enforce every relationship between tables. Instead, you can encapsulate data integrity in the client application or the stored procedure’s logic. Or, you can choose not to enforce a relationship. Choosing not to enforce referential integrity can yield better database performance because SQL Server doesn’t necessarily need to fetch data from a disk for validation. However, you run the risk of violating referential integrity. If you use constraints or triggers, you don’t need to worry about violating referential integrity. So, the choice is simple: If you don’t have special reasons to enforce integrity in your application—or not enforce it—you can choose between constraints and triggers.

Constraints define rules that enforce data integrity. Constraints are simple to maintain and, unlike other mechanisms, they are proactive. In other words, constraints occur before a change operation occurs. But constraints aren’t useful in situations where you need to maintain complex logic, such as when you’re conducting complex calculations, checking data in other tables, or cascading changes to other tables. You can use triggers to complete these tasks, but triggers are reactive—i.e., they occur after a change operation occurs. Moreover, triggers are more complex and might incur more processing overhead than constraints.

Your database design will dictate the method you choose to enforce referential integrity. A poorly designed database leads to the use of complex, resource-consuming mechanisms such as triggers. In most cases, if you’re working with a well-designed database, you can use the preferred mechanism, constraints. But different scenarios need different approaches in enforcing referential integrity.

Entity Integrity

Entity integrity defines a row as a unique entity in a table. You can use a single attribute (for example, the OrderID column in the Orders table) to uniquely distinguish a row from other rows, or you can use a combination of attributes (for example, OrderID and PartNo columns in the OrderDetails table).

SQL Server uses the primary key constraint to enforce entity integrity (for more information, see Poolet, "How to Choose a Primary Key," April 1999). A table can have only one primary key. The primary key constraint doesn’t allow duplicate values and doesn’t allow NULLs. It automatically creates an underlying unique index to disallow duplicate values. If you try to create a primary key on a column that allows NULLs, the action will fail and SQL Server will generate a system error.

SQL Server also supports the enforcement of an alternate key, which is another attribute or combination of attributes (column or columns) that uniquely identifies an entity, or a row in a table. SQL Server uses a unique constraint to enforce the alternate key. A table can have multiple unique constraints. Like the primary constraint, the unique constraint doesn’t allow duplicate values, but it allows one instance of NULL. If a unique constraint is defined on a single column, only one NULL is allowed. SQL Server treats the NULL values as equal, although NULLs are not necessarily equal because NULL is another way of representing an unknown value. If the unique constraint is defined on a combination of columns, SQL Server allows any unique combination of NULLs and known values. The unique constraint also creates a unique index to disallow duplicate keys. You can define the unique constraint on a column that allows NULLs. But the standard way to use the unique constraint is to combine it with NOT NULL. The UNIQUE NOT NULL property precedes the primary key constraint. The SQL Server query processor can use the index created by the primary key or unique constraints to get faster access to the data.

Another device SQL Server uses to maintain entity integrity is the identity property, which you can apply to a column of a numeric data type (tinyint, smallint, int, decimal(p,0), numeric(p,0)). The identity property is an automatic counter that you can define with a seed value and an increment value. Seed is the value for the first row loaded into the table. Increment is the incremental value that is added to the identity value of the previous row that was loaded. The default values for seed and increment are (1,1).You use the identity property if the value of a column is generated on the fly as an incremental value (for example, an order ID) or to uniquely identify a row that doesn’t have a good candidate for a primary key. You can define only one column per table with the identity property. You can tell SQL Server to fetch the identity property in a SELECT statement by stating the IDENTITYCOL keyword instead of the column name. SQL Server won’t let you define the identity property on a column that allows NULLs. The identity property itself doesn’t enforce uniqueness. If left unchanged, the column assigned with the identity property won’t incur duplicate values. But you can use the SET IDENTITY INSERT ON command to temporarily disable the identity property so that you can enter explicit values. In this case, SQL Server allows duplicate values. If you intend to use identity property as a means to maintain entity integrity, you need to use it with the primary key or unique constraints, or never disable it.

Referential Integrity Rules and the Foreign Key

In most cases, a foreign key constraint that references the primary table’s primary key maintains referential integrity. A foreign key is a column whose value matches the primary key in another table. Referential integrity is also called declarative referential integrity (DRI) because you define it as part of a table’s definition.

The simplest way to ensure referential integrity is to not allow changes to the primary key. But if you can’t follow this recommendation, you can use a more complex approach. Screen 1 shows relevant columns from a primary table and its related secondary table. The primary table, the Orders table, contains a row for each order. Each row contains the customer ID, order date, and other details. The secondary table, the OrderDetails table, contains one or more rows for each order, and each row contains the parts and quantities ordered. An order ID uniquely identifies the order. A combination of an order ID and a part number uniquely identifies the OrderDetails.

Screen 2 shows the relevant columns of a self-referencing table scenario in which the table plays both primary and secondary roles. For example, an Employees table can contain a row for each employee, and each row can contain the employee name, address, hire date, and other details. The table also contains the ID of the employee’s manager, who is also an employee. Corresponding employee IDs uniquely identify the employee and the manager.

You need to enforce four referential integrity rules for dependent, one-to-many relationships. Independent one-to-many relationships don’t follow these restrictions.

Users can’t delete a row from PrimaryTable if it refers to a row in SecondaryTable. In the first example above, users can’t delete an order that has order parts. In the second example, users can’t delete a manager who is in charge of employees.
Users can’t update PrimaryTable.col1 if it refers to rows in SecondaryTable. In the first example, users can’t update an order ID if that order has order parts. In the second example, users can’t update a manager ID if that manager is in charge of employees.
Users can’t insert a row into SecondaryTable if no related row is in PrimaryTable. For example, users can’t insert an order part for an order that doesn’t exist. Also, users can’t insert an employee if the entry for the employee’s manager doesn’t exist.
Users can’t update SecondaryTable.col1 if it doesn’t have a related row in PrimaryTable. For example, users can’t shift an order part to an order that doesn’t exist. And users can’t assign an employee to a manager if an entry for the manager doesn’t exist.

When users modify data in a table, SQL Server needs to check the data in the related table or tables to verify that the modification doesn’t violate the referential integrity rules. SQL Server checks data by adding steps to the execution plan. Therefore, performance decreases as the number of table references increases. To enforce referential integrity rules, you can create a foreign key and references constraint, as the statement in Listing 1 shows.

Also, you can establish the reference between SecondaryTable and PrimaryTable when you create SecondaryTable. And you can use the same constraint to enforce referential integrity in the self-referencing table, as Listing 2 shows.

To improve performance, you can create an index of the column or columns that define the foreign key. SQL Server uses indexes in queries that join the primary table and the secondary table, and the SQL Server query processor uses indexes to efficiently perform the JOIN operation. SQL Server doesn’t automatically create an index when you create a foreign key constraint, as it does with primary key and unique constraints.

Triggers and Cascading Changes

If you want to configure SQL Server to allow changes to the primary table and to cascade those changes to the secondary table, you need to decide whether to allow cascading delete and update operations. If you allow cascading delete and update operations, you need to enforce all referential integrity rules with triggers. A trigger can contain any programmatic logic that a stored procedure can. You can define a trigger for individual INSERT, UPDATE, or DELETE operations, or a combination of those operations.

Triggers fire after change operations occur. A trigger uses the transaction log to construct temporary tables called deleted and inserted. The trigger keeps the rows affected by the change in their old and new forms in the transaction, and the trigger can access and compare the deleted and inserted temporary tables. Without the use of triggers, SQL Server would enforce the foreign key and references constraint before the change operation occurred, thus preventing the cascade operation from triggering. If a foreign key or references constraint is enabled, the cascade operation won’t fire. Figure 1 shows a flow diagram of SQL Server referential integrity enforcement mechanisms.

Cascading DELETE Operations

The trigger in Listing 3 cascades DELETE operations from the primary table to the secondary table. If you implement cascading modifications, you have to disable existing referential integrity constraints and replace them with equivalent logic triggers. Otherwise, the cascading triggers won’t fire. A trigger is an implicit transaction, and as such, does not need an explicit BEGIN TRANSACTION statement. You can issue a ROLLBACK TRANSACTION statement that executes if a certain condition is met. The ROLLBACK operation reverses the changes made inside the trigger and reverses the changes that invoked the trigger. ROLLBACK operations decrease database performance because if an implicit or explicit ROLLBACK operation aborts an operation, a series of actions follows: First, the modification occurs, then SQL Server writes the modification to the transaction log, then SQL Server rolls back the modification.

The trigger above creates the deleted table. Because the deleted table resides in the cache, SQL Server can access it with little overhead. The deleted table contains the deleted rows from the table it references, in this case, the primary table. You can use a JOIN statement to implement the DELETE cascade between the secondary table and the deleted table.

Screen 3 shows the results when you issue a DELETE statement against the primary table (in this example, the Orders table). Unlike previous versions of SQL Server, SQL Server 7.0 allows more than one trigger of the same type on the same table. Cascading DELETE operations in a self-referencing table involves recursive deletes. This operation is usually application-specific and needs to be handled programmatically. If you decide not to cascade DELETE operations, but the table already has an UPDATE trigger that cascades updates, you need to enforce an invalid attempt to delete a row in the primary table that has related rows in the secondary table. You can use the DELETE trigger in Listing 4 to issue a ROLLBACK TRANSACTION statement that reverses the invalid operation, in this case, the invalid DELETE attempt.

Cascading UPDATE operations

You can use the trigger in Listing 5 to cascade UPDATE operations. You can use the function UPDATE(col1) to see whether the relevant column in the primary table was updated. This function returns true if col1 was updated. In an UPDATE operation, the UPDATE trigger creates two tables: The inserted table contains the new image of the affected rows, and the deleted table contains the old image of the affected rows. You can determine the number of rows that the UPDATE operation affected by checking @@rowcount. If no row was affected, you don’t need to do anything. If one row was updated, you can use a simple SELECT statement to pull the new col1 from the inserted table, and you can use the statement to update the secondary table by joining it to the deleted table. Avoid multiple updates to the primary key of the primary table.

Screen 4 shows the results when you issue an UPDATE statement against the primary table (in this example, the Orders table). This update trigger is sufficient for a self-referencing table, too.

If you decide not to cascade UPDATE operations, but you have a DELETE trigger on the table that cascades deletes, you need to enforce an invalid attempt to update a row in the primary table that has related rows in the secondary table. You can use the UPDATE trigger in Listing 6 to enforce an invalid attempt.

Preventing Invalid Changes to SecondaryTable

The preceding examples implement rules 1 and 2, which I presented at the beginning of this article. You can use the trigger in Listing 7 to implement rules 3 and 4.

One trigger is sufficient to prevent invalid inserts and updates to the secondary table. In both cases, you need to ensure that every row in the result table has a matching row in the primary table. The UPDATE and INSERT triggers create the inserted table that contains the new rows, so you can simply check whether the row count in the inserted table is the same as @@rowcount that contains the number of rows that the operation affects.

Encapsulating the Logic

You can create a stored procedure that encapsulates the logic that this article presents. Listing 8 contains the stored procedure definition.

To implement the stored procedure, you simply collect users’ input and combine it in the ALTER TABLE ADD CONSTRAINT or CREATE TRIGGER commands.

Choosing the Right Mechanism

Armed with knowledge of how constraints and triggers work, you can choose the best method to enforces referential integrity in your database. For more information about referential integrity, see Inside SQL Server (Microsoft Press) by Kalen Delaney and Ron Soukup.

Comments

Plain text