In "The Trouble with Type Tables," I discussed the importance of creating type tables and how to set them up correctly in a normalized database structure. One aspect that I didn't discuss is including columns in type tables for the purpose of effective dating (i.e., documenting when something goes into effect and when it's no longer in effect). So, after I describe when you'd use effective dating and what problems it can prevent, I'll provide an overall strategy for working with effective dating in type tables.
Knowing When to Use Effective Dating
As I discussed in "The Trouble with Type Tables," you use type tables to avoid embedding lookup data in an application. A type table is nothing more than a normalized database table that provides at a minimum two primary pieces of data: a unique ID and a description attached to a unique ID.
There are two common kinds of type tables you'll encounter in a normalized database structure: statically defined type tables and dynamically defined type tables. In a statically defined type table, the values rarely change over time and any changes are carefully planned and executed. These type tables are intentionally configured to not overlap or reuse existing values. Instead, the data is often additive in nature. That is, the table will grow but typically will never contract because expired or retired data is never removed so that historical context isn't lost. Even when a description attached to an ID changes, this change is usually accounted for during the development process.
In a dynamically defined type table, values are usually modifiable by administrators. Typically, administrators are provided with a base set of values for a given dynamic type table. Administrators might then dynamically change the type tables by adding to, deleting from, or updating the existing data. I'll concentrate on this kind of type table because it's prone to problems.
Understanding the Potential Problems
Dynamically defined type tables can present problems in relational designs. Changing values can have a huge impact on reporting as well as data archiving. For example, suppose you have a type table containing data about auto dealers. In this table, you have a record in which the ID number is 1 and the description is Dave's Auto Wholesaler. As the production database is used, the records in that database have this record attached to them. This is the expected and desired behavior. From a reporting standpoint, data that's migrated out of the system (e.g., to a corporate data mart) also has this information attached to it. As a result, when end users run a report, the data for Dave's Auto Wholesaler is attached to the records that appear in the report.
Suppose that Dave's Auto Wholesaler goes out of business. Because it's no longer a valid dealership, the administrator needs to change the dynamically defined type table. On the one hand, there's a history attached to Dave's Auto Wholesaler, so the record can't simply be deleted. On the other hand, that record can no longer be used in the production database. This is where dealing with dynamically defined type tables gets sticky.
Tracking the Removal of Records
There are several possible approaches to removing records that shouldn't be in the production database. One approach is to simply inactivate the record by extending your type tables to include an IsInactive flag. This flag is simply a bit field with a value of 0 or 1. You can use it to filter out inactive records in a drop-down list box, for example, so that inactive records can't be referenced. The use of the IsInactive flag preserves the old record, allowing the history to be maintained, but keeps the record from being reused because the application will filter it out as a possible selection. Type tables that use this approach follow the design pattern shown in Figure 1.
Inactivating a record is a valid way to handle removing a particular type table record from common usage and is one that I've used in my own designs. However, beyond simply being a filter, it doesn't provide any context to the data. This approach might not be appropriate for all situations. A good example of this is tracking mortgage rates. It might be critical to know the dates when a particular mortgage rate was in effect.
A more comprehensive approach to effective dating of reference data is to couple the use of the inactive flag with the use of start and end dates. Type tables that support effective dating follow the design pattern shown in Figure 2.
As you can see, the start and end dates for each record are tracked in the StartDate and EndDate columns in the type table. So, to remove a record, an administrator needs to set its IsInactive flag to 1 (inactive) and enter the date and time when the record was marked as inactive in the EndDate column. This provides the historical context for the data.
If marking a record inactive and setting the end date to a non-NULL value seem redundant, you're partially correct. Setting the IsInactive flag indicates only that a record is no longer in use. Start and end dates indicate when a particular record was in effect. Because of this additional metadata (i.e., data about the data), anyone who is reporting off of this type table data knows when a particular record was in effect and hence valid.
In Figure 2, notice that StartDate has an asterisk next to it. This indicates that it's defined as NOT NULL in my modeling tool. All type table records must have a start date, but they don't need to have an end date. In fact, the absence of an end date means that a given record is an active record.
Tracking the Creation and Modification of Records
Besides tracking when records are removed, effective dating can be used to track when records are created or modified in a type table. When administrators create a new record, the new record will, by default, be set to active and the start date will be set to the date and time when the record was saved. There is no end date in this case because the record is valid and available for use.
Sometimes tracking the start and end dates of type table records isn't enough and you need to know about other changes made to the records. To handle modifications to an existing record, the concept of data lineage comes into play. Let's revisit the Dave's Auto Wholesaler example. Suppose that after an administrator inactivates the record for Dave's Auto Wholesaler, someone else edits that record, reusing its ID (ID 1) for a new auto dealer named Bob's Dynamite Autos. ID 1 now has two different descriptions. In an archival or reporting database, ID 1 has the description Dave's Auto Wholesaler attached to it, whereas in the production database, ID 1 now has the description Bob's Dynamite Autos. In database design, having two "truths" like this isn't good practice because it leads to misinterpretation of the data. In this case, the start and end dates aren't sufficient to identify when ID 1's old description (Dave's Auto Wholesaler) was inactivated and when ID 1's new description (Bob's Dynamite Autos) took effect.
Technically, you can handle this situation by simply inactivating the ID 1 - Bob's Dynamite Autos record and creating a new one with a different ID. However, this breaks any historical view of the data. Using data lineage, you're able to remedy this situation and determine the history of a record. To implement data lineage, you need to add one more column named PredecessorID to the type table. The PredecessorID column contains the pointers, or record IDs, that show the progression of a record from one version to another over time. Figure 3 shows the design pattern with the PredecessorID column.
With this design pattern, administrators will be able to fix the problem by:
- Closing out the ID 1 - Bob's Dynamite Autos record (i.e., inactivating it and entering the end date)
- Creating an entirely new record (ID 2 - Bob's Dynamite Autos)
- Setting the new record's PredecessorID value to 1
Figure 4 shows what this would look like in the type table.
Notice that record ID 1 has no predecessor ID attached to it. This means that this is the root record in the chain.
This solution satisfies the requirement of knowing the record's history, decouples the second record from the first one, and lets you know that record ID 2 is a child of record ID 1. Although you have a new ID to deal with, you're no longer representing two truths for record ID 1. Record ID 1 has only one meaning. This solution also avoids the situation where you have compound primary keys for a type table record, which is a design pattern that you typically want to avoid. In my designs, I make sure that the ID values are always unique, numeric, and meaningless, which is the true definition of a surrogate key.
An Elegant Solution
Although there are several other approaches to working with type tables that fundamentally achieve the same result, this solution is clean and elegant. It preserves record history and doesn't create multiple truths for a given ID. It's more work for the application developers, but it works better from a reporting, data management, and historical maintenance perspective.