What's the best way to handle the explosion of communications methods we're seeing in today's lifestyles? We're experiencing a greater and greater demand to capture all types of phone numbers—home phone, business phone, cellular phone, mobile phone, pager, fax phone, home fax. Add to that demand the proliferation of email addresses, usually more than one per person, that we also need to capture. Many database administrators and designers struggle to adapt existing database architectures to accommodate all this information.
Many database people resort to a technique I call the simple addition solution. With each new type of telephone number or email address, you add the new means of contact to the appropriate table on an as-needed basis. Screen 1, is a data entry/edit form from a contact management database; it shows the different types of phone numbers and an email address. As you use this database, if you need another type of phone number or space for a second email address, you simply add a new field to the table, then add a corresponding text box to this form.
Unfortunately, adding columns to a table to accommodate home phone, second home phone, work phone, email1, email2, etc., is the same as creating an array. An array is a structure that can accommodate multiple occurrences of a data value. Table 1 shows weeks (numbered as 1 and 2) and days of the week in an array. In Visual Basic (VB), you might define an array that could store the days of the week in this way:
Dim MyWeek, MyDay MyWeek = Array("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun") 'Return values assume lower bound set to 1 (using Option Base statement) MyDay = MyWeek(2)' MyDay contains "Tue" MyDay = MyWeek(4)' MyDay contains "Thu"
A relational database management system (RDBMS) presents data in two dimensions, as a set of rows and columns, with each intersection of a row and a column occupied by a data value or a NULL condition (lack of data). Neither the RDBMS architecture nor the SQL programming language (the 1992 SQL-2 standard) supports arrays. So if you want to build an array in a relational database management system, you have to restructure it to match the architecture in Table 2.
Table 2 shows how this same array would appear in an RDBMS table. Compare Table 2 to the array in Table 1, where the week number occurs once, but for each week number, you have seven entries for days of the week. In the relational database representation of the data (Table 2), each day is in a separate row, with an associated week number. As a result, the week number is repeated for each day.
Many database tables contain arrays. Some are subtly identified, such as home phone, work phone, and fax phone. Others are much more obvious, such as Phone_1, Phone_2, and Phone_3.
Although the array is appealing for informational data retrieval, as in an OLAP or decision-support system, such arrays are difficult and awkward to use in an OLTP environment because of the high level of data redundancy (week number repeated many times) and the uncertainty regarding data maintenance. The function and purpose of an OLAP/decision-support system is to provide storage and a means of retrieving information, not just disparate bits of data. Compared with data stored in an OLTP database, data stored in an OLAP/decision-support environment is significantly less fragmented, and when retrieved, can to present a more complete picture of the business environment. Furthermore, OLAP (and often decision-support) databases don't let the end-users modify the data.
In an OLTP, or transactional, database, a flattened array can mean complications with data management and data retrieval. For instance, in the contact manager on Screen 1, the following situations could arise, leading to misinterpretations, data maintenance headaches, or violations of data integrity:
- If the company changes its main phone number or fax number, you must change all occurrences of the old number for all Cascade Coffee Roasters contacts, or the data in your Contact Manager will be out of sync.
- If Cascade Coffee Roasters adds a second phone line for public access, you'll have to decide how to handle this change. Should you change the work phone number of half your Cascade Coffee Roasters contacts to this new number? Or should you campaign to have a new field added to the form (and thus a new column to the underlying table) in which to store the new phone number? If you opt for the latter, must the second work phone field remain blank (null) for all contacts in your database who do not work for Cascade Coffee Roasters?
- If you are successful in your request for a second work phone number field and you receive a directive to produce a phone list of company contacts, how do you handle the situation? Because only a few contacts will have a second work phone number, do you leave the second phone number out of the report entirely, thus excluding some information? Or do you include it, and end up with a report that has blank space for most of the contacts' second work phone numbers?
- You learn that Nancy Davolio's phone number has changed to (206)-555-9645. You have two issues to deal with in this situation. First, which of Nancy's phone numbers changed? Second, do you want to retain the old number, just in case the source of your information proves less than reliable?
The contact manager is of limited value for a serious contact management function. How-ever, some minor changes and enhancements could make it a useful source of information on client companies and their employees; business contacts and how to reach them; and telephone numbers, email addresses, and Web sites of everyone on your contact list. Proper normalization of the various parts of this contact manager and its underlying database will render this tool useful. (For details on how to normalize data, see SQL By Design, "Why You Need Data Normalization," Premiere issue).
Figure 1 is an entity model of the company-contact-communications portion of this contact manager database. Briefly, an entity is some-thing, such as a company (tblCompany), about which you want to store data, such as company name and address. An attribute is a property that describes an entity (CompanyName of tblCompany). A relationship is how one entity associates itself to other entities in the model. (I will describe the entity model in detail later.)
In this model, you can see the relationships between different entities. Each company can have many associated contacts. Each Contact has any number of associated phone numbers, email addresses, and Web sites (ContactComm). A value contained within the CommType table categorizes each means of contact. A company need not have a contact person of record; a company may have multiple phone numbers directly associated with it. (A more detailed description of the architecture is at the end of this article.)
Data manipulation and management is easy and straightforward with this contact manager schema. When you create the physical model in SQL Server, each entity will become a table. The attributes for each entity will become columns in the associated table. You'll relate the tables to one another as follows:
- for each 1:1 relationship (this model has no examples of 1:1 relationships), the two tables will have the same primary key;
- for each 1:M relationship (example, tblCompany to tblContact), the primary key of the master table (tblCompany) becomes a foreign key column in the detail table (tblContact).
One report you'll probably want to generate is a list of company contacts and phone numbers. The code in Listing 1 gives you the result you see in Table 3.
A list of company URLs might be handy, in case you need to visit your client's Web sites. Listing 2 produces a list of URLs for companies only, as Table 4 shows.
Data entry and on-screen viewing of the data will be nearly as easy and convenient with the new schema as with the old. Screen 2, is an example form constructed on the new schema. The upper part of the screen contains data from tblContact and tblCompany. The lower part of the screen is a subform that lists all phone numbers, email addresses, and URL values that belong to a contact—in this case, Nancy Davolio.
The Entity-Relationship Diagram
An entity model graphically displays the informational requirements of a database, as shown in Figure 1. Many people use the entity model to graphically display the database schema—the tables and the columns in a table—because the software used to create the entity model displays this structure so clearly. Terms that describe an entity model (e.g., entity, attribute) are not the same as those that describe the physical model (e.g., table, column, or field). Table 5, page 64, translates the terms from one environment to another. For this discussion, we'll use the relational database design terms. Remember that an entity is analogous to a table, and an attribute is analogous to a column. One entity is like a single row in a table.
The model in Figure 1 contains four entities. The focus of this model is split between tblCompany and tblContact. The tblCompany entity maintains the register of company names and addresses, in simplest form. The attributes that are part of tblCompany are company name, a street address in two parts (AddressLine1 and AddressLine2), city, state (StateCode), ZIP code (PostalCode), region, and country. Including a Region attribute gives you the capability to filter by specific areas of the country or the world. In this simple model, a company can be part of one and only one region. The primary key (see "How to Choose a Primary Key," April 1999), CompanyID, uniquely identifies each company.
The tblContact entity represents a list of people who work for the organizations in tblCompany. The ContactID surrogate primary key uniquely identifies each contact, and has a first name, a last name, a salutation for letters, and a title designating position or function in the company. The attribute CompanyID links each contact to one and only one company. The attribute for LastMeetingDate needs to be a Long DateTime data type to adequately deal with the year-2000 issue and to give as much information as possible (day of the week, time of the day). The Notes attribute, which is a memo data type, can hold any amount of text, and acts as a notepad to record points made at a meeting, or reminders for future meetings.
The tblCommType entity represents all the different means by which you might be able to get in touch with a person: phone number (home, business), fax phone, cellular phone, pager, email address (personal, corporate), even Web site URL. The attribute CommType is the primary key.
The fourth and last entity in the group is tblContactComm, which associates companies or contacts with the various CommTypes. TblContactComm has its own surrogate primary key, CCID, and two foreign keys (fkeys). The fkey, ContactID, relates to either tblContact or tblCompany. TblCompany. CompanyID, tblContact.ContactID, and tblContactComm .ContactID are all long integer data types. Thus, you can refer to either tblCompany.CompanyID or tblContact.ContactID by tblContactComm.ContactID.
Limiting the domains of CompanyID and ContactID so that they don't overlap is a good idea, to avoid confusion when you join tables. For example, if you limit CompanyID to values less than 50,000, and you have more than 50,000 ContactID values, no CompanyID value and ContactID value will ever be the same. Thus, you eliminate confusion as to whether a phone number belongs to a company or a contact.
The other foreign key in tblContactComm is CommType, which relates back to tblCommType. It identifies each instance of the entity as a home phone or a work phone or an email address. The attribute CommValue is the phone number or email address or URL. The attribute PreferredMethod is a yes/no/unk (unknown) and identifies, for each contact or company, whether this is the preferred method of contact. There is no restriction on how many phone numbers or email addresses you can list as the preferred method of contact for a person or company. Last, CommNotes gives space for notes or comments for each phone number, email address, and URL.
Each of the entities has a common set of attributes, LastUpdate and ByWhom. These attributes track when each instance of an entity is inserted or updated, and by whom. You can assign LastUpdate a default value of GetDate(). This value will ensure that when someone inserts a row into the table, the current date (and time) is included.
To automatically LastUpdate when updating a row, you'll have to write a stored procedure or trigger code. You can assign ByWhom a default value of System_User, which captures the login of the person making the addition or change to the table.
Relationships between entities ned to reflect the business rules by which a company operates. The relationship between tblCompany and tblContact is 1:M; a company has many contact people, a contact person represents a single company. The relationship between tblCompany and tblContactComm is also 1:M: A company can have many methods of contact of record; a contact value (phone number, email address, or URL) relates back to a single company. TblContact and tblContactComm also have a 1:M relationship; a contact person can have many phone numbers, email addresses, and URLs on file, and a contact value (phone number, email address, or URL) relates back to a single contact person. And finally, the relationship between tblCommType and tblContactComm is 1:M; each type of communications scheme (home phone, work phone, fax phone, email, URL) can be used many times in tblContactComm, but each CommType in tblContactComm relates back to a single CommType in tblCommType.
Although some people may say that the entity-relationship model is not the best design for person communications (that is, it's not in fifth normal form), this design is more flexible than the flattened array. You'll find that you have more capability and less redundancy, and thus fewer data maintenance headaches, with this design.