This article is the fourth in a series about data modeling. In "Data Modeling," April 2000, I discussed gathering the requirements for the project. Next, I covered process modeling in "Process Modeling," May 2000, to illustrate what happens to the data as it moves through the system. Then, in "Entity Modeling," June 2000, I developed a concept model, or entity relationship diagram (ERD), of the database. This month, I forward-engineer that ERD into a logical model, which is a closer representation of the evolving database.
Like the ERD, a logical model is a representation of reality. You usually derive it from the ERD, although you can create a logical model by using only normalization techniques and the tests for first, second, and third normal form. (For a full explanation of data normalization, see SQL by Design, "Why You Need Data Normalization," March 1999.) The logical model contains tables, columns, portable data types (see the sidebar "Portable Data Types" for a definition), primary and foreign keys, column nullability (NULL or NOT NULL), suitability for indexing, default values, and check constraints (if appropriate). The logical model doesn't look like the ERD; it's more like a view into the database catalog. The logical model, like the conceptual model, is vendor-neutral. It represents a normalized design and shows what the database or system will look like, but it doesn't explain how to implement the database. You need to make the logical model general enough for easy conversion to a physical model for any vendor's database environment.
Data modeling isn't an exact science. Different software products promote different methodologies or ways of creating and implementing a data model. Various CASE tools use different names for similar stages of data-modeling development. One tool's concept model is another's logical model, and many different notations (e.g., OMG, ORM, UML, IDEF1X, Crow's Foot) add to the confusion. No matter which software tool you use, or even if you use none at all, remember this: Modeling and planning are two fundamental aspects of the same process. You must do both before you build your database.
For this column, I create the logical model manually, without using a CASE tool. This approach gives me a chance to review the entities, attributes, and relationships, and to make sure I've defined everything correctly. I can also review the data types I chose for the ERD, making additional modifications if necessary.
Creating a Logical Model
Creating a logical model is easy when you work from an accurate ERD. In this column, I build a logical model by top-down design, or forward-engineering the ERD. The alternative, bottom-up design, builds the model by applying the rules of data normalization.
The rules of forward-engineering (mapping) an ERD to a logical model are generally straightforward. Entities in the ERD become tables in the logical model. Attributes become columns in the table. Each primary identifying attribute becomes a primary key (pkey). Figure 1 shows a mapping of the PUBLISHER and PUBLICATION entities to the PUBLISHER and PUBLICATION tables and shows the foreign key (fkey) PubID in the PUBLICATION table.
For two entities in a one-to-one (1:1) relationship, both tables will have the same primary key. In a nonidentifying one-to-many (1:M) relationship, an instance of either entity can exist without a related instance of the other. In the Pubs database, you could have an employee without a job title. In this case, the relationship between JOBS and EMPLOYEE would be nonidentifying. As a result, EMPLOYEE would inherit the column JobID from the JOBS table as a foreign key. Because this relationship is nonidentifying, the foreign key can be NULL.
Forward-engineering an identifying 1:M relationship is a little more complex. In the Pubs database, tables TITLE and TITLEAUTHOR have an identifying 1:M relationship. A row in TITLEAUTHOR can't exist without a related row in TITLE. The primary key of the identifying entity (TITLE) becomes a foreign key of the weak entity (TITLEAUTHOR). A weak entity is one on the many side of the relationship that can't exist without a related entity on the one side of the relationship. You then combine the foreign key with an identifier from the weak entity to form a composite primary key candidate.
Binary many-to-many relationships (two entities related —M:N) generate three tables in the logical model. Each entity becomes a table, and the relationship becomes an intersection (or associative) table. The intersection table inherits the primary keys of the two tables in the M:N relationship initially as foreign keys, but also as possible choices for the composite primary key.
Ternary relationships (three entities related many-to-many-to-many‚ M:N:O) generate four tables in much the same way as the binary many-to-many relationship. This time the possible choice for the intersection table's primary key is a composite of the primary keys of the three tables in the M:N:O relationship.
Figure 2 shows a quaternary (four-way) relationship: EMPLOYEE to TASK to ARTICLE to PUBLICATION. The intersection table that relates these four tables to one another is PUBL_ACTIVITY. The primary key candidate is a composite of EMPLOYEE.PersonID, TASK.TaskID, ARTICLE.ArticleID, and PUBLICATION.PublicationID.
Note that simply because a column or set of columns is a possible primary key doesn't mean that it's the best choice. The primary key here is a surrogate column named PublActivityID. A surrogate column represents or stands in for another column or columns. I don't like using a composite for the primary key because composites can slow client/server applications, so I chose the surrogate column PublActivityID as the primary key for PUBL_ACTIVITY.
When modeling data, pay special attention to supertype/subtype relationships. As Figure 3 illustrates, a Person can be an Employee, an Author, or both. Conversely, each Author is a Person, each Employee is a Person, and a Person can be both Employee and Author. The forward-engineering rule states that if the subtypes' attribute sets differ substantially from one another, the supertype entity and each subtype entity become separate tables. If little or no difference exists in the subtype attribute sets, the supertype and subtype entities combine to form one table. To distinguish between one subtype and another, you must add to the table a new column (a discriminator) called status, person type, or person category. As Figure 3 shows, I chose to map the supertype/subtype relationship into three separate tables: PERSON, EMPLOYEE, and AUTHOR. All three tables share the same primary key, PersonID.
The logical model is a great way to document your database design. Tables 1 and 2 (available online at http://www.sqlmag.com) show the complete logical model of the evolving database. Table 1 describes the tables that will be in the database. Table 2 lists the tables, associated columns, and column properties.
Creating a logical model lets you review your ERD. (While preparing this column, for example, I discovered that I had defined the ERD's primary identifying attribute of ARTICLE as text(50); it should have been an integer data type. I've made the change in the logical model.) And because the logical model helps people visualize the populated tables, you can easily display it to clients. If it's in electronic form, such as a Microsoft Excel spreadsheet, you can easily search and find specific column and table names. And if you load the logical model into a database, you can share it with your organization to illustrate what kind of data each database stores.