SQL By Design: The Circular Reference

Downloads
5448.zip

Come around to better database design

This month, instead of starting with an idea and creating a model to support the concept, I'll analyze a poor design construct: a situation I call the circular reference. A circular reference is a recursive condition in which one table references a second table, which in turn references the first table.

The Scenario

The example is a fairly well-normalized design for a contact management database. The portion of the design I'll use to highlight the circular reference contains three tables:

Customer (for storing Company name), CustLocation (for storing company office addresses), and CustContact (for storing people contact information). If you are a WebSQL subscriber and want to test this setup, you can obtain the code from the code library link at http://www.sqlmag.com/.

Figure 1 illustrates the circular reference by depicting the interrelationships among these three tables. The heavy lines connecting the tables represent the primary relationship; the light lines represent a secondary relationship between the same tables. Read the relationships from left to right. Thus, one primary relationship is that each company (Customer) has one or more locations where it does business (CustLocation), but each CustLocation works with only one company (Customer). Similarly, each business location (CustLocation) houses any number of business contacts (CustContact), but each business contact works out of only one business location (CustLocation). Starting again from the left, the Customer table has an attribute (BillingSiteNo) that references the SiteNo of table CustLocation. Likewise, the CustLocation table has an attribute (PrimaryContactNo) that references ContactNo of the CustContact table.

According to design documents, the developer intended to store the site number of the billing location for each customer in the Customer table and the contact number of the primary contact person in the CustLocation table. But despite the good intentions, the developer created circular references. CustLocation references Customer, which in turn references Cust-Location. CustContact references CustLocation, which in turn references CustContact.

Screen 1 illustrates the circular references in a logical manner and in greater detail. The heavy arrows represent the primary relationship between the tables. They point to the referenced column. For example, the tblCustLocation table's CustNo column has a heavy arrow pointing to the tblCustomer table's CustNo column. These two columns each store customer numbers. In tblCustomer, CustNo is the primary key, the unique identifier for each row in this table. In tblCustLocation, CustNo is also the customer number, but it's a foreign key that refers to CustNo in tblCustomer. These two columns represent a relationship between these two tables. If tblCustLocation.CustNo references tblCustomer .CustNo, then tblCustLocation references tblCustomer. This situation means you have a one-to-many (1:M) relationship between the two tables.

These two tables have a second referencing relationship, which the light arrows represent in Screen 1. For example, the tblCustomer table's SiteNo column has a lighter arrow pointing to the tblCustLocation table's SiteNo column. This light arrow represents the designer's intention to have the Customer table store the site number of the billing location for each customer. More important, the light arrow points in the opposite direction of the heavy arrow. The light arrow also specifies a 1:M relationship, but this time it's from tblCustLocation to tblCustomer—just the opposite of the relationship expressed by the heavy arrow! These two 1:M relationships, pointing in opposite directions between the same two tables, form the circular reference.

In an implemented database, the best approach is usually to enforce foreign key references in the CREATE TABLE statement, in the ALTER TABLE statement, or with trigger code. Enforced referential integrities (RIs) prevent a user from entering into the referencing table a value that doesn't exist in the referenced table. In other words, enforced RIs prevent you from entering into tblCust-Location a customer number that isn't already present in tblCustomer.

However, if you enforce the 1:M relationship between tblCustomer and tblCustLocation and the opposing 1:M relationship between tblCustLocation and tblCustomer (SQL Server 6.5 and 7.0 both allow this setup), you won't be able to insert a row into either tblCustomer or tblCustLocation. To insert a row into tblCustomer with these enforced RIs in place, you have to use a value for SiteNo that is already present in tblCustLocation. But you can't insert a row into tblCustLocation until you've listed that customer number in tblCustomer. Then, you can't insert that customer number in tblCustomer until the site number is already present in tblCustLocation. But you can't . . . and 'round and 'round you go.

The relationships between tblCustLocation and tblCust-Contact are similar. The heavy arrow (the primary relationship) implies that tblCustContact.SiteNo references tblCustLocation .SiteNo. The light arrow implies that tblCustLocation.ContactNo references tblCustContact.ContactNo. If you enforce these references to maintain data integrity, you won't be able to insert a row into either tblCustLocation or tblCustContact. This situation defeats the purpose of a database.

A Better Way

Figure 2 shows a better way to conceptually represent the billing address and the primary contact. This entity relationship diagram (ERD) uses a supertype-subtype construct (for a description of these types, see SQL By Design, "Supertypes and Subtypes," May 1999) to distinguish between a billing address and other address types and between the primary contact and all other types of contact people. The conceptual representation is more complex than the eventual database implementation, as you'll see in a moment.

In this model, each Customer relates to one or more CustLocations, and each CustLocation belongs to one Customer. At each CustLocation, you'll find any number of customer contacts, and each Cust-Contact operates out of one CustLocation. A CustLocation can be one of two types, either a billing address or other address, but it cannot be both. The ERD specifies that exclusion by using the double vertical line at the bottom of the CustLocation entity leading into the letter d (meaning disjoint) in the circle. The subtype discriminator, AddressType, will be either a B (billing) or an O (other) for each CustLocation in this table. If you need to specify more than just billing or other location types for your design, you can easily add more subtypes, such as W (for warehouse) or R (for retail outlet).

Similarly, each CustContact can be one of two types, either a primary or a secondary contact. A contact must be one or the other; it cannot be both. Again, the ERD specifies this exclusion using the double vertical line at the bottom of the CustContact entity, which leads into the letter d in the circle. The subtype discriminator, ContactType, will be either a P (primary) or an S (secondary) for each CustContact in this table.

Implementing the ERD concept is easier than conceptualizing it because of the similarity of the subtype attribute sets. The attributes for BillingAddress and OtherAddress are the same, as are those for PrimaryContact and SecondaryContact. When you map a conceptual design (ERD) to a logical and subsequently physical model and you have this type of situation, you usually combine the subtypes and the supertype into one table. Thus, the set of tables derived from this model will look like that in Screen 2.

Only one arrow goes between each pair of tables in this logical model, specifying a 1:M relationship from the topmost table down. You can enforce both of these relationships in database code to maintain data integrity.

For data entry in this set of tables, you must first insert a new Customer record into tblCustomer. Then you can insert a business location into tblCustLocation. Finally, insert contact names and phone numbers into tblCustContact. You'll never have a record for a contact person who works at a nonexistent office of a nonexistent company if you follow this design.

Comments

Plain text