Data-Modeling Contention

It’s Sunday night, and my column is due on my editor’s desk tomorrow morning. Sometimes, I just can’t think profound or even clever thoughts, and this is one of those times. So, I thought I would let you, the reader, take an active part in this column. Over the years, I’ve talked about many things that have generated strong opinions,interesting feedback, and even a flame or two.This month, I thought I’d introduce a few contentious topics, then discuss them with you on SQL Server Magazine’s Forums area.

I’m going to take a stand:We shouldn’t avoid contentious topics in data modeling. In many cases, the very fact that a topic is contentious means that it’s significant and that it might be of vital importance to the accuracy, integrity, and efficiency of the data, the data model, and management of the data itself. My goal isn’t to change anyone’s mind; rather, my goal is enlightenment. My hope is that, by participating in or even observing these forum discussions, you’ll understand and maybe even appreciate another person’s point of view or be exposed to ideas that you hadn’t considered before.

You’ve formulated your current viewpoint as a result of your job, the organization’s structure, and your personal experiences. Your working environment—be it your job,the organization’s structure, technical standards, and your personal experiences in the trenches—affects your perception and most likely your deeply held opinions. There’s nothing like hands-on experience in a multitude of situations to show you what works and what doesn’t.

So let’s review some of the more popular contentious topics in data modeling. I’ve included six broad topics with specific areas of interest, in hopes that you’ll go to the Database Design forum (http://sqlforums.windowsitpro.com/web/forum/categories.aspx?catid=69&entercat=y) to share ideas, comments, questions, and helpful tips.

Class Words in Your Naming Conventions

Let’s start with an easy topic:naming conventions (aka naming standards or class words). Has your organization or group defined naming conventions or standards that you need to follow when creating a name for entities or tables, attributes or columns, relationships, constraints, or other database objects? Do you use naming conventions all the time? Does your organization enforce the use of naming conventions, or class words, and if so, how?

Here’s an example: Suppose you create a table named Customer. In the Customer table, there’s an attribute that represents each customer’s first name.Would you call that attribute CustomerFirstName, CustFirstName, or just FirstName? Why?

Do you use Hungarian notation, the naming convention popularized by Microsoft Access? Recently, I worked on a project with an application development group that had established and was using naming conventions for all aspects of the project. One comment that caught my eyewas,“Hungarian notation is so yesterday!”

Some folks think that class words are metadata. Metadata is more than just data about data. Metadata is another dimension of data that gives meaning and structure to data. Metadata also helps you identify, assess, and manage data. By adding the name of a table as a prefix to an attribute, you can argue that you’re associating the attribute with the table and strengthening the attribute’s “identity.” In this respect, I think class words can be viewed as meta-data. What do you think?

Surrogate Keys vs. Natural Keys

I think the discussion of surrogate versus natural primary keys can attract even the most reluctant lurker to the forum to weigh in with an opinion. In the SQL Server world, a surrogate key is often an integer column that has an identity property: an autonumber function. A natural key is, well, a natural key. You can read more about what constitutes a surrogate or a natural primary key in the article “Surrogate Key vs. Natural Key” (February 2002, InstantDoc ID 23449). No mudslinging now; I want only good,well-reasoned arguments about this topic.There are pros and cons to each type of primary key, so go ahead and make a comment.

Design for Compliance or Ignore It

Regulatory compliance refers to systems and procedures that organizations should have in place to ensure that workers are aware of and comply with the laws and regulations that govern your industry. Regulatory compliance is a must for almost every industry—financial, manu- facturing, real estate, retail, and health care—and it isn’t unique to the United States. For example, the European Union (EU) has a long history of regulatory compliance,with many regulations inher- ited from its component countries.

With the Sarbanes-Oxley (SOX) Act and Health Insurance Portability and Accountability Act (HIPAA) requirements hanging over everyone’s head;with increasing demands for financial accountability, traceability, and reliability; with antispam legislation on the docket and more promised; and with the focus on privacy and security, the demands for strong data integrity have never been higher. Does your company require you to design applications that meet the compliance regulations, or do you just ignore the regulations? If you’re working under a tight deadline,do you neglect to add the additional audit controls dictated by these compliance requirements? Or do you stand your ground and insist on additional time and resources so that your model and implementation can pass the compliance litmus test?

Using Supertypes and Subtypes

In the past, readers have asked me how to design and implement supertypes and sub-types.I use supertypes and subtypes to better understand how all the components of a PERSON entity fit together, what the components mean, and how they interrelate. Figure 1 shows an example of a supertype or subtype structure. This structure is based on a conceptual data model for part of an organization, and the model makes the assumption that I can use the PERSON supertype to represent all people. In addition to being a PERSON,each individual within the organization might also be a Board Member, a Benefactor, a Director, or a Field Investigator. Figure 1 shows eight subtypes, or subcategories, of PERSON. Supertype and subtype structures are classified as complete or incomplete. To determine whether the structure is complete or incomplete, you need to ensure that every PERSON falls into at least one of the subtype categories. If you find that a PERSON doesn’t fit into a category, you know that there’s a “missing” category that you’ll have to add in the future.

Supertype and subtype structures are also classified as mutually exclusive or overlapping. If supertypes and subtypes are mutually exclusive, a PERSON is limited to participating in only one of the subtypes; if they’re overlapping, a PERSON can be part of two or more subtype classifications. Sometimes, because of a PERSON’s role, subtypes can’t overlap; for example, the same PERSON usually can’t occupy the Director and Assistant Director roles at the same time. In Figure 1, you can infer that Field Investigators and Field Investigator Trainees report to the Director(s). Relationships between the various subtype entities aren’t uncommon. Mapping them out in a data model provides a clear and unambiguous picture of the situation.You can read more about supertypes and subtypes in “Supertypes and Subtypes” (May 1999, InstantDoc ID 5226) and “The Case of the Overlapping Subtypes” (November 2003, InstantDoc ID 40280).

I believe that there’s a time and a place for using supertypes and subtypes, but I’m more interested in your opinions and practices.Do you design supertypes? Or as object modelers, do you generalize your class objects or do you use specialized class objects when you implement the model? How do you implement designs that include supertypes? Do you create one supertype entity or a bunch of subtype entities?

Process Models or Data Models

When your group or company starts a project, what do you use to determine the requirements and guide the process? I think there’s usually a correlation between the maturity of the organization and the amount of time a project group spends on defining and documenting the requirements and doing the design work before they write a single line of code. Currently, I’m working on a project in which the lead programmer is assuming many of the responsibilities of a project manager. He tends to lean toward a process-centric approach for every aspect of the project. I, of course, take a strong data-centric approach to the development process. Have you had this experience?

Which drives your development efforts: the process models or the data models? In other words, is your organization process-centric or data-centric, and how do you feel about that? When you answer, indicate what type of work you do.

Making Database Changes

We all know that during the development process people will suggest changes for a database.Who makes these changes in your organization—a data administrator, a DBA,an application programmer, the CIO, or all four? When these changes are made,do you have a process for documenting them? Is everyone who makes schema changes required to record these changes? How do you think schema changes should be handled,and who should be in charge of making these changes? When you weigh in on this topic, please preface your opinion with your job title.

Head Over to the Forum

These six hot-button topics should be enough to get you thinking and to get a good discussion started. Don’t try to address all the topics at once. Choose one, voice your opinion, then go on to the next topic. See you on the Database Design forum at http://sqlforums.windowsitpro.com/web/forum/categories.aspx?catid=69&entercat=y.

Comments

Plain text