Certifiably SQL: Data Constraints and Integrity

Downloads
7450.zip

Tips for SQL Server certification exams

This month, I look at another table-centric aspect of a topic from Exam 70-29, the "Implementing a Database on SQL Server 7.0" course exam. The topic is creating tables that enforce data integrity and referential integrity, and implementing constraints. That sentence looks like two topics, but constraints are the preferred way to enforce integrity. In this article, I look at the different types of constraints and when and why you use them. I discuss how to implement constraints, and when you might need to disable them.

Constraints and Integrity

A constraint is a limitation that you place on the data that users can enter into a column or group of columns. A constraint is part of the table definition; you can implement constraints when you create the table or later. You can remove a constraint from the table without affecting the table or the data, and you can temporarily disable certain constraints.

The distinction between column-level and table-level constraints is that you define a column-level constraint on one column only. Therefore, you can define it as part of the column definition. However, to define a constraint on more than one column—such as a primary-key constraint defined on two columns—you need to define the columns in the table first, then add the constraint definition at the end of the table definition. That constraint type is at the table level. In practice, you don't need to worry about the distinction because the syntax doesn't change.

You can ensure data integrity by various means, including the use of triggers and other procedural code. But in part because constraints are ANSI-SQL compliant, they're the method Microsoft recommends. Different types of constraints address different types of integrity, and looking at constraints in the context of the integrity types they support and enforce can be helpful.

The Primary-Key Constraint

The primary-key constraint designates a column, or a combination of columns, as the primary key for the table. This action enforces entity integrity, which requires each row to have a unique identifier, so that data modifications or queries always refer to a specific row without ambiguity. When you place a primary-key constraint on a column, you're requiring each row in that column to have a unique value, which can't be NULL. A primary-key constraint on multiple columns requires the combination of values in those columns to be unique. Examples of primary keys include invoice number, employee ID, purchase order number, and item or part number.

You can have only one primary-key constraint per table. When you define a primary-key constraint, SQL Server automatically builds an index on the designated column or columns. By default, the index is clustered, although you can specify a nonclustered index. This unique index enforces the entity integrity for the primary key. You can't have a primary-key constraint without the index, and you can't drop the index. To remove the index, you must drop the constraint.

When you define the constraint and build the index, any existing data must already meet the constraint requirements of no duplicate values and no nulls. If it doesn't, SQL Server can't build the index, and the constraint definition fails. So you can't disable the function that checks existing data when you add a primary-key constraint. After you define the constraint, all new data and any modified rows must satisfy the constraint. With a defined primary-key constraint, you don't need to add any code to enforce entity integrity. However, if you're accessing SQL Server through an application such as Microsoft Access or a Visual Basic (VB) application, that interface must be able to handle the returned condition when a constraint violation occurs. The syntax for adding a primary-key constraint, using an example from the Pubs database, is

ALTER TABLE \[dbo\].\[authors\] 
ADD CONSTRAINT \[UPKCL_auidind\] PRIMARY KEY
CLUSTERED (\[au_id\] )

Note the naming convention SQL Server uses: UPKCL means Unique, Primary Key, CLustered. Name your constraints when you create them, and use a naming convention you're comfortable with. If you don't, SQL Server will generate a name that includes a string of hexadecimal characters, such as UPKCL_authors_ auid_01a5b84e45d. This naming system makes it hard to refer to the constraint later.

The Unique Constraint

The unique constraint enforces uniqueness in a column or combination of columns. You'd typically use it for a candidate, or surrogate, key—a column or columns that you could have chosen as a primary key, but didn't. One example is a table that includes employee ID as the primary key but also enforces uniqueness on Social Security number (SSN). Another example is a delivery service storing drivers' license numbers, but because two states might have issued the same number, the company enforces the unique combination of state plus driver's license number.

The unique constraint also enforces uniqueness by building an index on the designated column or columns. The default is a nonclustered index, but you can specify a clustered index. A unique index permits one NULL value. A second NULL value produces a constraint violation error. As with the primary-key constraint, you can't drop this index unless you also drop the constraint. And you can't disable the checking of existing data because SQL Server can create the index only on data that contains no duplicate values. Because of the presence of this index, you can't temporarily disable checking on new data, either.

You can have multiple unique constraints on a table. For example, the employees table might have unique constraints on SSN, driver's license + state, telephone extension, and security badge number.

The Foreign-Key Constraint

The foreign-key constraint defines the relationship between a column or combination of columns in the current table and a column or combination of columns in another table. In other words, it enforces referential integrity. This relationship might be one to one, such as in the case of an employee in the payroll table who must already exist in the employees table. Or it could be a many-to-one relationship. A typical example of a many-to-one foreign-key relationship is the Customer ID in the invoices table. In the invoices table, the CustomerID column is the foreign key; it refers to the customers table's CustomerID column, which is the primary key. The invoices table might have several entries for one customer.

Placing a foreign-key constraint on CustomerID ensures that the CustomerID a user enters into an invoice is valid. In other words, you can't invoice a nonexistent customer. But it also enforces another rule: You can't delete a customer with an associated entry in the invoices table. This restriction removes the risk of orphaned records.

A foreign-key constraint in a table must refer to a column or columns in the referenced table that either form the primary key in that table or have a unique set of values that a unique constraint enforces. A foreign-key constraint doesn't automatically build an index. However, building one is a good idea because it speeds the process of referencing and joining the tables. The columns that participate in the foreign-key relationship are also the columns you use to join the tables. If you need a customer address on an invoice, you join the invoices table to the customers table by using CustomerID as the join column.

When you create a foreign-key constraint, it checks existing data by default, but you can use the WITH NOCHECK option to specify that it skip this operation if you know that the data already meets the requirements of the constraint. Or perhaps you're aware that the existing data doesn't meet the requirements. For example, suppose you add a SalespersonID column to the orders table, to track who took the order. You place a foreign-key constraint on this column to enforce a rule that the SalespersonID must have a valid corresponding entry in the employees table. But you don't want to check the old orders because they'll fail the check. In that case, you use WITH NOCHECK. A foreign-key constraint can also reference other columns in the same table; in that case, you use the REFERENCES keyword, but you can omit FOREIGN KEY. You might have a constraint specifying that for any employee, the supervisorID must be an existing employee.

A table can have multiple foreign-key constraints. The orders table might have foreign-key references to the customers table for the customerID, the employees table for the salesperson taking the order and the person packing the order, and the shipping table for the freight company's name and contact data. The syntax for adding foreign-key constraints to the sales table in the Pubs database looks like this:

ALTER TABLE \[dbo\].\[sales\] 
   ADD FOREIGN KEY (\[stor_id\]) REFERENCES
\[dbo\].\[stores\] (\[stor_id\]),
   FOREIGN KEY (\[title_id\]) REFERENCES \[dbo\].\[titles\]
   (\[title_id\])

This statement adds two foreign keys. For any entry in the sales table, the stor_id must be a valid entry in the stores table, and the book sold must have a valid title_id in the titles table.

The Default Constraint

The default constraint tells SQL Server what value to place in a column if you don't specify a value during data entry. This constraint applies only to INSERT statements; when you're updating existing data, it leaves the current value (or the NULL) in place. You can have only one default value per column, and this constraint always applies to just one column. Because it applies only during INSERT statements, you don't need to check existing data when you create the constraint. The default you specify must be of the correct data type and must meet the requirements of the check constraints and rules you've applied. The syntax for adding a default constraint—in this case, unknown for the phone number—in the authors table follows:

ALTER TABLE \[dbo\].\[authors\] ADD 
CONSTRAINT \[DF_authors_phone\] DEFAULT ('UNKNOWN')
   FOR phone\]

The Check Constraint

The check constraint enforces domain integrity. You can use it to ensure that an entered value falls within an acceptable range or follows a particular pattern, such as a telephone number or a ZIP or postal code. A table can have multiple check constraints. You can even impose multiple conditions on the data in a column by adding more than one constraint or by writing one complex constraint. You can check existing data when you create the constraint, or you can temporarily suspend the check constraint.

A check constraint can reference other columns in the same row of the table, but can't go outside the table. So, for example, you can't use a check constraint to check whether the quantity of an item ordered is greater than the quantity in stock. To do that, you need to use triggers. The syntax for the check constraints on the authors table adds two constraints: one on the author ID format and one on the ZIP code.

ALTER TABLE dbo.authors WITH NOCHECK ADD 
CHECK ((au_id LIKE '\[0-9\]\[0-9\]\[0-9\]-\[0-9\]\[0-9\]-
   \[0-9\]\[0-9\]\[0-9\]\[0-9\]')),
CHECK ((zip LIKE '\[0-9\]\[0-9\]\[0-9\]\[0-9\]\[0-9\]'))

Constraints vs. Triggers

So how do you choose between a constraint and a trigger? If a constraint will do the job, such as enforcing referential integrity, use it. If you need more complex logic, use a trigger. Understanding the basic differences can help you choose between the two.

First, constraints are proactive: They prevent unwanted actions from happening. Triggers are reactive: they undo the damage. A constraint tells you not to delete a customer with outstanding invoices. A trigger rolls back the DELETE operation if you try to delete a customer with outstanding invoices. Constraints apply to the current table. Triggers can reference other tables, even in another database or server. You need to program the action you want a trigger to take. With constraints, SQL Server defines the actions.

Also, triggers allow custom error messages. A trigger can tell you, "Sorry, but that customer you just deleted has outstanding invoices, so I rolled back the DELETE operation." A constraint just returns a cryptic message about a constraint violation of the foreign-key relationship, which means something to you, but not to most database users. Most of the time, your users will connect to SQL Server through an interface such as Access or a VB application, so you can handle the returned error codes and translate them to a friendly message from the client application to the user.

Constraints can be useful in helping you maintain data integrity. The syntax looks complicated, but if you approach it one step at a time, it makes sense. For more examples of the syntax to use when creating constraints, see the online lab files that accompany this article. Subscribers can download these files and the answers to the questions in the sidebar "Practice Questions: Constraints and Integrity," page 70, at the link to this article at http://www.sqlmag.com. For the answers to the November practice questions, see "Answers to November Practice Questions: Creating SQL Server 7.0 Databases."

Comments

Plain text