In the early 1970's, Edgar F. Codd, a pioneer of the relational model for databases, gave us the rules of normalization to help us design and maintain relational databases. These rules have been at the core of relational database design for over 40 years now. Unfortunately, these great ideas are still often poorly implemented, either through the lack of understanding of Codd's rules, organic growth, or too little time to implement a design correctly.
Related: Tips for Using Foreign Keys
It's kind of like the way we drive. We know what the speed limit is in our neighborhood, and we know that we are required to wear a seatbelt while driving, but we are just going down to the corner to pick up some ingredients for dinner that we forgot to purchase at the store earlier. Pretty soon, it becomes a habit of not wearing a seat belt and speeding through the neighborhood. At least it's our habit until we're pulled over for speeding and are also cited for the lack of a seatbelt. Oops. (No, this did not happen to me.)
Living up to "the letter of the law" for Codd's rules, but not following through to make sure they are adhered to, is the same thing as continuing to break driving laws. Not following the rules becomes a bad habit that you have to pay for later. Thank goodness we have SQL Server Integrations Services (SSIS) to clean up data messes. However, it would be nice if we could simply use SSIS to combine and aggregate data and not for cleaning up messes that should never have been in our database in the first place causing inaccurate metrics and implementation headaches as a database matures. Let's take a look at a few common bad habits that need to be broken in order to provide better, consistent data.
Foreign Key Constraints Are Important Too
Codd's first rule states that the data in the database should not be repeated. Data should be uniquely represented. Once we apply this rule to our tables, we end up with the data broken-up across several tables. An example of this would be a CustomerOrder table that contains Orders with a foreign key to an OrderState table that tells us what State our Customer Order is currently at, and a SalesAssociate table that tells us which Sales Associate created the Order for our Customer, as shown in Figure 1.
If one more step is taken, adding of foreign key constraints, then many headaches can be prevented in the future as show in Figure 2. The beauty of foreign keys is two-fold. First, and foremost, adding constraints guarantee that the data in the foreign keys are valid. Second, adding constraints doesn't allow the data from the primary table that is being used in a foreign key, to be deleted.
Let's take a look at some of the silent killers that I have seen when foreign key constraints are not added to the database model. I'll continue to use the tables I mentioned above in my discussion.
There Are Dust Bunnies in the Tables
In the following scenario, all three tables exist, but bad data is showing up in the CustomerOrder table. Here are some of the symptoms that are appearing.
- Orders are "stuck" and are not moving through the system, but no errors are occurring.
- The reports that the Sales Associates are receiving don't match the summary reports that managers are receiving.
- The data in the SalesAssociate table was deleted in production.
After comparing the data in the OrderStateID and SalesAssociateID fields to the tables that the values should be coming from, OrderState and SalesAssociate respectively, it is discovered that the tables are out-of-date. Nobody has been maintaining the tables, and thus, there are dust bunnies hiding out in there (and they have big sharp fangs).
Foreign key constraints would have prevented these problems. Here's how.
- The Orders were getting "stuck" because a new manual process had been introduced for special orders. Since there wasn't a foreign key constraint in place, there wasn't a process to stop the new OrderStateID from being entered into the system. Since the application didn't know about it, the Order couldn't move forward.
- The new Sales Associates were entered into the SalesAssociate table. They were all given their SalesAssociateIDs for entering Orders, but they were given the wrong ones. The IDs they were given don't exist in the system. When the reports were generated, INNER JOINS were used since only existing Sales Associates should be included in the report. This excluded all the new Sales Associate's Orders because their IDs didn't exist in the database.
- When a foreign key constraint is in place, the rows in a table that are referenced in another table through a foreign key, cannot be deleted. This doesn't stop rows that aren't referenced from being deleted, but at least records in the other tables won't be orphaned.
The Case of the Missing Order States
I've seen this problem too many times to count. I inherit a database. There are foreign keys in a table, but the table they should map to is nowhere to be seen. Since there are no foreign key constraints, I can't "follow the lines" to find the table. I have to go through all the tables to verify that the data isn't in a poorly named table or in a named value pair table.
This leads me to a trip to the application developers in search of the Order State values. Here are some of the responses I get.
- The application is taking care of all the Order States. The application is making sure that they are accurate.
- I'm verbally told all the Order States that are in the application. (But I find there are even more than they told me actually being used.)
- I'm told to go to a certain location in the application where there is a hard-coded drop down list. The Order States are listed there.
- The application developers have to go through the code because the Order State values are throughout the code.
- The application developers don't know. They don't have the source code (worst news).
There are couple of different things you need to do. First, convince the application team to change the code to get the Order State values from the database if possible. Here are some reasons you can use to convince them.
- If the CustomerOrder table is updated outside of the application, then the wrong data can be inserted into the database.
- Upon researching the data, it was discovered that bad data has already been put in the database. (One time, I found a question mark in a field that was only supposed to have numbers. Yes, it was also suffering from having an incorrect data type.)
- If the list of possible Order States changes, then only the database needs to change. (Note: Some work may still need to be done if workflow logic is programmed into the application.)
- Having the Order States in its own table will allow for foreign key constraints to be added to the database to help with the accuracy of the data.
After you convince the application team to make the changes described above, the second thing you need to do is confirm that all the values they give you are being used. The best scenario will be that the data matches on both sides. The worst scenario is that the database has more values than the application has. This will take research to determine if the unaccounted values are bad data, discontinued data, or simply forgotten data.
Once all the data is accounted for, then you can create a table with the values and create a foreign key constraint to the new table.
All of these lurking silent killers can be prevented if the initial implementation had used foreign key constraints to the foreign keys themselves.
Mickey Stuewe is a database architect. Her focus is on database modeling, T-SQL optimization, requirements gathering, and report design using the SQL Server stack. Stuewe blogs regularly at mickeystuewe.com and co-authored the book, SQL Server 2012 Reporting Services Blueprints. Follow Stuewe on Twitter @SQLMickey.