The Case of the Overlapping Subtype

Margaret has a problem. A database designer, Margaret is responsible for creating a database that the American Association of Retired Persons (AARP) will use to keep track of its members and other people involved with the organization. However, because the people in the database can participate in the organization in many different ways (or might not be actively involved at all), Margaret's database has several empty subtypes, and she's not sure how to deal with them. She's still working on the database design, and she wants to avoid the performance and coding problems that can result from a badly designed table.

Margaret decided to consult the experts. First, she found Jeffrey Bane's article "How Many's Too Many?" June 2003, but the article didn't answer her questions because, in the article, each person in the database is associated with at least one subtype. Margaret's database contains people who aren't associated with a subtype. To keep her design intact, Margaret also wonders whether she should create extra tables that have no attributes (other than the person identifier) to contain the subtypes that have no extra attributes.

Bane referenced one of my columns, "Supertypes and Subtypes," May 1999, and although that article addressed Margaret's concerns about including empty subtypes, it left her wondering what might be the best way to handle those empty subtypes. Margaret wrote to me for advice about creating the best design for her situation.

Working with an Atypical Design

Margaret's situation isn't unusual. An "empty" subtype occurs in a supertype-subtype relationship in which the subtype doesn't have any attributes. Margaret explained her database design like this: She has a supertype entity she calls Person. As Figure 1 shows, each person can be a volunteer, part of the literacy group, a member of AARP, a board member, or an advisor. A person can belong to all, none, or some of the subtypes. The subtypes overlap, and Margaret's organization doesn't store any information about the subtypes.

Regarding the two articles she read, Margaret notes that the schemas in both articles work because each person is associated with at least one subtype. In my article, I explained that data that's specific to each subtype (the subtype's attributes) is stored in separate tables. In Margaret's situation, when subtypes have no extra attributes—indicating that a person isn't associated with any subtypes—she needs to know whether to create extra tables that have no attributes (other than the person identifier) just to keep the database design intact.

Margaret's a pretty savvy DBA, so she came up with her own solution. She decided that creating five nearly empty subtype tables in her production environment would be silly, so she created one table—PersonType—for the subtypes, as Figure 2 shows. The PersonType table contains the foreign key PersonID and a second attribute, PType. Each record in PersonType contains for each PersonID a subtype value of Volunteer, Literacy, AARP, Board Member, or Advisor.

Margaret isn't entirely happy with her solution for two reasons. First, having the PersonType table requires her to write more-complicated queries; to find all the people who have an attribute (e.g., all people who are volunteers), she not only has to relate the Person table to the PersonType table, she has to filter on the PType attribute. Margaret is looking for a way to categorize people that doesn't require her to constantly confirm their existence in the PersonType table. Second, Margaret is concerned that by combining the five nearly empty subtype tables into the PersonType table, she's breaking the rule that says the supertype-to-subtype relationship should be 1:1. She's right to be concerned: PersonType isn't a subtype table; it's a child of the Person table.

Conceptually, Margaret's design doesn't break the supertype-to-subtype 1:1 rule. A person can be a volunteer, a person can belong to the Literacy group, a person can belong to AARP, a person can be a board member, a person can be an advisor—these are all 1:1 relationships. However, the subtypes overlap—a supertype can be identified with more than one subtype. Margaret's situation is more complicated and more difficult to solve than the disjoint or exclusive subtype condition that Bane's article describes. And in "Supertypes and Subtypes," I didn't address the various ways to handle overlapping subtypes in a production environment.

You probably already know that if a subtype has attributes, you need to create a table for that subtype. For example, if in addition to the PersonID, the subtype Board Member had attributes such as YearStart, LengthOfTerm, and PositionHeldOnBoard, these columns would comprise the subtype table BoardMember. However, if a subtype doesn't have attributes and you have several such empty subtypes that overlap, you need to create only a supertype table and a related child table—which Margaret has done in the solution that Figure 2 shows.

Logically, Margaret's solution is the best way to resolve the situation of one person belonging to multiple groups. She's storing only non-null data in the PersonType table—a person gets an entry in the PersonType table only if he or she belongs to one of the listed groups. Margaret can add a Group table to list the groups and attributes such as GroupCode, GroupLongName, GroupAddress, GroupStartDate, and GroupMembershipAmount. Then, PersonType becomes the associative table that links a person to a group. Margaret has built extensibility into her design.

Margaret's concern about the complexity of the queries she'll write is valid. For example, every time she wants to find AARP members, she has to write queries such as

SELECT * FROM Person
 WHERE PersonID IN (SELECT PersonID FROM 
 PersonType WHERE PType = 'AARP')

SELECT *
FROM Person p JOIN PersonType t ON 
p.PersonID = t.PersonID WHERE t.PType = 'AARP'

The queries will become even more cumbersome when Margaret needs to retrieve a list of people who belong to two or more groups. For example, to retrieve the names of people who are both AARP members and volunteers, she'll have to write a query like the following:

SELECT * FROM Person
 WHERE PersonID IN (SELECT PersonID 
 FROM PersonType WHERE PersonType = 'AARP')
 AND PersonID IN (SELECT PersonID FROM 
 PersonType WHERE PersonType = 'Volunteer')

Such complicated queries can seriously impede performance.

In Figure 1's Person table, you can see an attribute named PersonSubType. This is the subtype discriminator, which you add to the supertype table for a disjoint condition (or non-overlapping condition). The purpose of the subtype discriminator is to help you avoid having to write subqueries or joins just to find information such as the names of all AARP members. But when subtypes overlap, this simple discriminator structure doesn't work. You have to modify the structure in one of two ways.

Option 1. In the Person supertype table, the subtype discriminator could be a column named GroupSubType, which is a varchar(5) data type and nullable. A single character—V, L, A, B, or D (for Volunteer, Literacy, AARP, Board member, and aDvisor, respectively)—represents each subtype group. Now, the GroupSubType column might be null, or it might contain the letter V (for a volunteer), VB (for a volunteer and board member), or VLABD (if someone is a groupie). A query that returns the names of people who are AARP members would end up looking like this:

SELECT * FROM Person WHERE GroupSubType LIKE '%A%'

Option 2. The second way to implement the overlapping-subtype discriminator is to create not one but five subtype discriminators, one for each group. You can define each subtype discriminator column as a bit data type (1 or 0) and nullable. Now, if a person is a volunteer only, you turn on just the SubTypeVolunteer column. For the groupie, all five subtype columns are turned on. A variation of this technique would be to make all five subtype columns a char(1) data type and nullable and use the single-letter codes instead of the bit data type. If you wanted to create indexes on these subtype columns, you'd have to create them as char, not bit, data types. With this technique, the query that returns the AARP members would be

/* If using the bit data type for the 
subtype discriminator columns: */

SELECT * FROM Person 
WHERE SubTypeAARP = 1

/* If using the char(1) data type for 
the subtype discriminator columns: */

SELECT * FROM Person 
WHERE SubTypeAARP = 'A'

The greatest benefit of either technique is that you don't have to join tables or write subqueries to get the data you want. But each technique has a drawback. Option 1's drawback is that each query contains a wildcard character, such as the percent sign (%), at the beginning of a search string. Wildcards cause SQL Server to perform a table scan instead of using any index that exists on the column GroupSubType. The table scan could slow Margaret's query performance to an unacceptable degree. Option 2's drawback is that if you need to add more subtype groups, you have to alter the Person table's structure to accommodate them. During the alteration, the table will be unavailable to users. The number of rows in your Person table, the amount of query activity against the Person table, and the amount of time you set aside for database maintenance will influence your solution choice.

Neither solution is elegant; each has its drawbacks. However, although these solutions are less than ideal, they solve Margaret's problem and let her query the Person table without having to reference the PersonType table in the same query. Margaret chose Option 2 because her organization doesn't use the database 24 x 7, and if she has to expand the table occasionally by adding a new subtype discriminator field, she can do so easily. Like the other savvy database designers and DBAs who regularly read this magazine, Margaret is discovering the finer points of how to handle supertypes and subtypes.

Comments

Plain text