In "Data-Modeling Contention," September 2006, I discussed using supertypes and subtypes when modeling data. Since that article's publication, I've received email from readers asking for more information about when and how to use supertypes and subtypes both for data modeling and implementing in a database. I'll try to answer these questions, but first let's discuss why you'd use supertypes and subtypes.
When to Use Supertypes and Subtypes
You can use supertypes and subtypes when you're creating the conceptual data model. Supertyping and subtyping are techniques that data modelers use to better understand how data is structured and to understand the metadata. Supertypes and subtypes are structures that are used in the requirements-gathering stage of database development.
One might argue that subtyping is simply establishing a set of categories for a generalized entity, so why don't we just use a simple lookup table instead of the supertype-subtype structure? Sometimes we do. For example, when a category is simple and lends itself to a one-or two-word description, such as Microsoft Outlook's Task Status—Not Started, In Progress, Completed, Waiting on someone else, or Deferred— using a lookup or reference table and creating a foreign key reference to the main entity is the right thing to do. In that case, Task Status to Task would be a one-to-many (1:M) relationship, as the conceptual data model in Figure 1 shows.
However, some categorizations require more than one word or phrase to describe their properties.The simple 1:M reference won't work when an entity can fall under two or more categories at the same time. Keywords that define a product are prime examples of this condition. For example, a jacket can be categorized as menswear, outerwear, and winter wear. The keywords are brief enough to justify not using a subtype structure; however, you'll have to create a many-to-many (M:N) relationship between Keyword and Product, as the conceptual data model in Figure 2 shows.
Once the categorization begins to take on a life of its own, with many attributes that describe how the category looks and behaves, it's time to invoke the supertype-subtype structure. I'm going to start this discussion assuming that you're acquainted with the basics of supertypes and subtypes. If you need a review, read "Supertypes and Subtypes," May 1999, and "The Case of the Overlapping Subtype," November 2003.
Data modelers use two kinds of supertype-subtype structures: mutually exclusive and overlapping. In a mutually exclusive structure, an entity falls into no more than one subtype category. In an overlapping structure, an entity can be classified as zero, one, two, or more subtypes. The exclusive supertype-subtype is analogous to a 1:M relationship; the overlapping supertype-subtype is more like an M:N relationship. Figure 3 shows a generic model of a supertype-subtype structure.
Implementing Exclusive Supertype-Subtype Structures
Within the exclusive supertype-subtype structure, you can implement the data model in three ways: supertype table only, subtype tables only,and supertype table plus subtype tables. Let's look at an example of each type of implementation and discuss why I've implemented them as I have. Supertype table only: North Pole Clothiers. The North Pole has one garment manufacturer, appropriately named North Pole Clothiers.The company makes men's clothing, women's clothing, clothing for elves, and the occasional unisex, unispecies garment.This scenario is an example of a supertype-subtype structure in which the characteristics of each subtype are the same (i.e., size, color, inseam). Equally important, the processes that go into creating a garment for men, women, or elves—or even a unisex, unispecies garment—are also all the same. If the subtype entities don't have attributes of their own, then it's appropriate to implement just the supertype table. In this case, for example, a supertype table called Garment could account for all the characteristics in all the garment categories.
Subtype tables only: North Country Outfitters. North Country Outfitters arranges excursions into the outback and offers canoe and kayak rentals by the day in addition to selling camping, hiking, and outdoor activity products.The Goods and Services subtypes are so different that you'll want to implement subtype tables only. Although Goods and Services have a few attributes in common (e.g., Name, CurrentAvailability, DateAddedToInventory), most of the characteristics of Goods (products for sale or rent) are very different from those of Services. So a supertype table called Inventory would be redundant or unnecessary.
Both subtypes are related to entities such as Sale and Sale Return, but they also have relationships independent of each other. For instance, Services—such as a guided trip— can be reserved for a future time (associated with the reservation system) and can be associated with contract employees. Goods and Services are even related to each other, because certain services (e.g., canoe trips) require that the company provide goods (e.g., canoes, paddles, life jackets). In this subtype-only implementation, each subtype table will inherit the attributes of the supertype table, with the exception of the supertype surrogate primary key, if one exists.
Supertype table and subtype tables: MyDreamHome Construction Company. MyDreamHome is a residential construction company that builds custom homes, develops entire tracts of low-to medium-priced housing, and remodels older homes. These three types of projects have attributes and relationships in common, so you'd create a supertype table, Building Project, containing an extensive attribute set that's relevant to all three subtypes—Custom Home, Tract Development, and Remodel. However, because the subtype activities are quite different, each subtype is exclusive of the others and will have attributes that don't pertain to the other subtypes. There won't be any relationships between subtypes, but some of the subtypes might have the same or similar relationships to outside entities— for example, both Custom Home and Remodel could be related to outside architectural firms for custom plans, whereas Tract Development will use blueprints created specifically for the development and won't be related to outside entities.
In a situation such as this—when you have a full set of attributes in common in the supertype and an equally full set of attributes as well as varying relationships for each subtype—it's appropriate to implement both a supertype table and a subtype table. For queries that need to return data from both the supertype table and the subtype tables in such an implementation, realize that you'll be joining the two tables.
The supertype table in this configuration should contain a flag field called a subtype discriminator—a number or short code that indicates which subtype a row in the supertype is associated with. Single letters work best; for example, you could use C to represent Custom Home,T for Tract Home, and R for Remodel. Think of the subtype discriminator as a pointer from the subtype table to the supertype. Its purpose is to let you avoid unnecessary two-table joins.If you need data from only the supertype table, but just for a certain subtype, such as Custom Home, you can create a query that returns only data for which the subtype discriminator equals C.
Implementing Overlapping Supertype-Subtype Structures
Within the overlapping supertype-subtype structure, you can implement the data model in two ways: with a supertype table only or with a supertype table plus subtype tables.You wouldn't want to implement only subtype tables because of the danger of data redundancy every time you have a row of data that can fit into more than one subtype category (such as Person, which can fit under Prospect, Customer, or Vendor). Let's look at a brief example of each implementation and discuss why I've implemented them as I have.
Supertype table only: City University Class Scheduling System. City University has offered conventional classes for many years and recently began offering the same classes online. In the example model that Figure 3 shows, the supertype table would be Class, with Online and Onsite as subtypes. Course information, such as course code, long and short course titles, and description, are contained in an entity called Course, which is outside the supertype-subtype structure. Whenever a course is offered, it must have a time,location,instructor(s),and student(s),and there's an associated entry in Class.The online classes aren't offered on demand, so they must be scheduled the same way they would be if they were being held in a classroom. Both types of classes also need teachers, books and materials,and students.The subtypes represent different methods of delivery for a class, but neither subtype contains attributes that are specific to just that subtype. However, the supertype entity contains all the attributes that describe both subtypes and participates in all relationships that either subtype requires.The best implementation solution here is to implement just the supertype table.
Supertype table and subtype tables: County Library Titles Database. County Library has books, videos, and DVDs.The same title can appear in more than one format; print books that are reproduced as books on tape are also produced on DVD. These differing formats require different policies for retention, length of checkout, and storage. These titles have many attributes in common, but each has attributes and relationships that are specifically its own.For that reason,I'd implement a supertype table called Title plus a set of subtype tables that include Books,Videos, and DVDs.
Understanding Your Data
Supertypes and subtypes can help you understand your data requirements. If you take the time to carefully map out supertype and subtype relationships for your data, you'll create a more effective conceptual data model and a better-performing database.