SQL By Design: The Four Integrities

In previous issues, I talked about various concepts and looked at many examples of database design to help you ensure the databases you design and develop are useful and meet the needs of the people they serve. Now let's pull together some of these concepts, the four database integrities, and look at them more closely. Proper use of the four integrities—entity integrity, referential integrity, domain integrity, and business integrity—can ensure that your databases' content is accurate and consistent.

Entity IntegrityEntity integrity ensures that each row of a table is uniquely identified, so that it can be retrieved separately if necessary. The concept of entity integrity is basic to database design and implementation. The primary key of a table is one or more columns that you use to uniquely identify each row of the table. The value set of a primary key is unique; no two values may be the same. See "How to Choose a Primary Key," April 1999, for more information about primary keys, what they are, and why they're important.

The primary key enforces entity integrity. The rules of entity integrity state that no primary key can be null and that no change can render the primary key null. These rules guarantee that every row of a table is accessible, whether you're retrieving data or modifying it. You can retrieve each row separately by specifying the value of a primary key.

Null isn't a space (for character and date data types) or a zero value (for number data types). Null is a condition that represents at least one of three states of the data value: not applicable, applicable but not available, or applicability unknown. (See "The Reason for NULL," October 1999, for more information on nulls.)

The standard technique for enforcing entity integrity in the SQL Server environment is to specify a primary key for each table. You can add the specification to a CREATE TABLE statement or an ALTER TABLE statement, as Listing 1 shows. When the primary key command executes, SQL Server builds a unique index on the primary key. This approach enforces the rule that says you can't insert a duplicate value into a column that has a unique index built on it.

Referential Integrity

Referential integrity ensures that a value in one table references an existing value in another table. The rule of referential integrity states that the value of a foreign key must be within the domain of its related primary key, or it must be null. A domain is the set of valid values for any column.

Foreign keys—table columns that establish links from one table (the master, or referenced table) to another (the detail, or referencing table)—are the implementation of a one-to-many (1:M) relationship between two tables. A foreign key (a column in the referencing table) must always have a corresponding primary key, which is a column having the same data type and length on the referenced side of the relationship. The domain of a foreign key can't fall outside the domain of its corresponding primary key. The domain must be the same, or the foreign key can be null.

A foreign-key value of null signifies an independent, or non-identifying, 1:M relationship. (See "The Foreign Key," August 1999, for an explanation of dependency in 1:M relationships.) In an independent 1:M relationship, a row in the referencing table can exist without a related row in the referenced table. Therefore, you don't need a foreign key value.

In a dependent (identifying) relationship, every row in the referencing table must have a related row in the referenced table. You must enforce a dependent 1:M relationship in the database. You enforce this rule by declaring a foreign-key reference at the time of table creation (in the CREATE TABLE or ALTER TABLE command) or through triggers. Declaring a reference at the time of table creation is known as declarative referential integrity (DRI). Although DRI code executes faster than trigger code, both schemes monitor and maintain referential data integrity. You can't insert into the referencing table a foreign-key value that doesn't already exist in the referenced table. Listing 2 is an example of DRI: The pub_id column references the primary key column of a table called Publishers.

Triggers

A trigger is a special kind of stored procedure that is called automatically in response to some action on a table, such as the successful insert, delete, or update of a row. The code embedded in a trigger tests a condition before the action (e.g., update, delete) is taken on a row or before a row is inserted into the table. Then, based on the test results, the rest of the code in the trigger might execute. Because triggers are code, you can write very complex tests and instructions in a trigger. For instance, you can enforce cross-database referential integrity with triggers, and you can cascade delete operations from the referenced table to associated rows in the referencing table. Trigger code fires only after constraints, including DRI, are checked. DRI executes faster than triggers, but is more restricted in what it can do. You never want to apply both DRI and referential integrity triggers to enforce the same relationship, because DRI always executes first, and even if the DRI failed, the trigger would never fire. Listing 3 is an example of using triggers to enforce referential integrity. The code tests for the presence of a corresponding row in the referenced table when a row is inserted into the referencing table.

Domain Integrity

Domain integrity ensures that all the data items in a column fall within a defined set of valid values. Each column in a table has a defined set of values, such as the set of all numbers for Pubs..zip (five-digit), the set of all character strings for Pubs..au_lname, and the set of all dates for Sales..ord_date. When you limit the value assigned to an instance of that column (an attribute), you are enforcing domain integrity. Domain integrity enforcement can be as simple as choosing the correct data type and length for a column. In the Pubs database's titles table, the column PubDate has a datetime data type and is not nullable. The creator of the titles table chose a non-null condition and a datetime data type for this column. This choice ensures that a value is always present for the PubDate column, and that the value is a valid date, such as 11/22/99. In this way, the creator put in place some controls to maintain domain integrity.

The ANSI SQL-89 and SQL-92 standards define a CREATE DOMAIN statement, which Transact SQL (T-SQL) supports as user-defined data types (UDTs) combined with check constraints. The ANSI SQL-89/92 domain is derived from existing base data types, as in the following pseudocode example:

CREATE DOMAIN wholesale_price AS DECIMAL(5,2) 
  CONSTRAINT whsale_price_not_negative
  CHECK (value >=0) NOT DEFERRABLE

You can create elementary domains in T-SQL by creating UDTs built on the base SQL Server data types. Also, you can add nullability options to a T-SQL UDT, as in the following example from the T-SQL online reference guide:

sp_addtype birthday, datetime, NULL

When you use this UDT in a CREATE or ALTER TABLE statement, you have to complete the domain integrity enforcement process by adding check constraints. A check constraint, applied at the time of table creation, enforces domain integrity by limiting the set of values that can be assigned to a column. Listing 4 contains an example of a check constraint.

DRI also is a form of domain integrity. In an enforced 1:M relationship, the domain of a foreign key and the domain of its corresponding primary key must be the same. In Listing 4, you can never insert a value for pub_id that doesn't already exist in the Publishers table. Thus, you've restricted the domain of title6.pub_id by the foreign-key reference.

Along this line, when you create a table specifically to restrict values in a column in another table (enforce domain integ-rity), the first table is called a reference, or lookup, table. The lookup table has a 1:M relationship with the table it modifies, and this relationship is always en-forced. For instance, in Listing 5 the column Pubs.type references the type column in a table called TypeTable. Listing 5 contains a description of the structure of the lookup table TypeTable and sample values. If you try to insert a row into Pubs..title7 and you use a value for type that isn't present in the TypeTable database, your insert operation fails. The DRI rule specified in Pubs..title7, coupled with the list of values in the TypeTable table, enforces domain integrity.

You can use constraints in T-SQL code to enforce domain integrity. For example, if you want to enforce the rule that a phone number is always null or a 10-digit value, you can use code similar to that in Listing 6.

Business Integrity

Business integrity, also called user integrity, ensures that the database enforces user-defined business rules, regulations, policies, and procedures. You usually enforce business integrity by using stored procedures and triggers. A stored procedure is a query that resides on the database server and processes rows and returns results. Triggers can enforce business integrity rules behind the scenes, because they fire without the user even realizing that they've been activated. Listing 7 contains an example of trigger code that enforces business integrity.

Integrity Counts

Data integrity is vital to a database. If the data values are questionable, the information derived from the data will be useless, or at least of greatly diminished value. The four database integrities are a set of rules that you can enforce in your database. Properly defined and applied, they work as a team to ensure that the data stored in your database is accurate and consistent.

Comments

Plain text