Luddites are back! Actually, Luddites have never totally disappeared, not since someone coined the term in the early 1800s, naming the movement after its probablymythical leader, Ned Ludd. A Luddite is someone who resists—sometimes violently—change. Originally, Luddites were involved in a social movement among English workers who rioted and destroyed machinery during the Industrial Revolution because it was bringing change to manufacturing and replacing thousands of skilled workers and artisans with machines. Technology was forcing lifestyle changes. Since then, we've used the term Luddite to describe anyone who is adamantly opposed to change, especially when the change is initiated by technology. A Neo-Luddite is one who resists technological change and the accompanying social changes. I'm going to go one better: I'm going to coin the term relational Luddite. A relational Luddite sees data as a single dimension—as a flat file (read about flat files in Solutions By Design, "Responsible Denormalization," October 2000, InstantDoc ID 9785). A relational Luddite can't get his or her head around the idea of dividing data into a set of relational tables.
Working with a Relational LudditeRecently, I had occasion to work with an individual—let's call him Bob—whom I would classify as a relational Luddite. Bob almost single-handedly runs an international nonprofit organization. When he came to the organization, membership data was recorded on one spreadsheet. Bob led the organization into the twentieth century when he imported the spreadsheet into Microsoft Access—but that's as far as he got. Bob simply can't visualize separating data into one-to-many (1:M) or many-to-many (M:N) relations. Because Bob had no formal training in relational databases, he wasn't comfortable normalizing the data himself, and he couldn't afford to hire someone to properly normalize the data. As a result, he grew used to seeing the data in a flat-file format.
By switching over to a database management system (from Microsoft Excel to Access), Bob became more productive. He taught himself to write queries and learned the benefits of using an auto-numbered primary key—a unique identifier for each member that let him carry over a member's name and address to financial spreadsheets without having to re-enter this information. In a broad, functional sense, Bob really was "getting" the idea of relational databases, but he'd become comfortable with his single, large Access table.
Bob is ultimately responsible for the data and its accuracy, which includes maintaining member information, generating mailing lists and labels, mailing letters to remind members about their annual fee and upcoming events, processing purchase orders for products (e.g., books, t-shirts), and keeping the organization running in general. Recently, Bob brought me in as a consultant to help him improve productivity. The first thing I wanted to do was to normalize the membership table, but Bob wasn't ready to take that step. He explained that after I was gone, he would still be responsible for the data and he wasn't sure if he was up to a relational database.
I understood where Bob was coming from. I might not have agreed with him, but I understood. Many people today know how to run their business but have a limited understanding of how to use technology (e.g., database concepts) to improve business processes. Don't assume that that statement justifies bad design or lack of proper data modeling, planning, and implementation. Any organization that employs a database professional would be remiss if it limited itself to the solution I'm about to describe.
Big Strides Toward RelationalitySo what could I do to help Bob stabilize his database environment and data while he learned to trust in the concept of relationality? First, I looked at Bob's original data and table structure in the Members table and determined which fields contained the information he uses the most, then organized it into five queries: qryMemberNameInfo, qryResidenceAddressInfo, qryMailingAddressInfo, qryMemberContactInfo, and qryMembershipActivityInfo. You can see an illustration of these queries in Web Figure 1 at http://www.sqlmag.com, InstantDoc ID 50205. Second, Bob was using the autonumber property for MemberID, but he made some mistakes when he set up the Members table. He wanted to know whether more than one person in a household is a member of his organization, but he also wanted to limit mailings to one per household. The new qryMember-NameInfo table includes a Family Member ID column. Currently, if the value under Family Member ID is zero, the individual is the head of the household or the only person in that family who is a member of the organization. If a number greater than zero is under Family Member ID, then the individual is the head of the household, implying a family group. This flag is what limits mailings to one per household. This design has lots of inherent problems, but I'll leave that discussion for another column.
Third, Bob was storing duplicate data in the Members table because he entered a residence and a mailing address, even when they're the same address. When someone moved, he updated the residence address. But if that person had mail sent to a different address, Bob had to make a decision: Should he update the mailing address to match the new residence address or leave the mailing address as is? Most often, Bob had to telephone the member to clarify the situation.
Fourth, Bob, like other database neophytes, had created columns out of data values. He wanted to keep track of the people who were engaging in organization activities, such as holding office and becoming a committee member. Bob made a separate column for each office, committee, and activity in the organization. If a member was currently holding an office such as section director, Bob could insert the section name in the Section Director column. If a member was a researcher or field investigator, Bob inserted a Y (for Yes) in the Researcher or Field Investigator column, and so on. Bob was happy with this database because he felt it gave him a lot of flexibility. I believed that he could craft a much better solution by following the examples shown in the articles SQL By Design, "Supertypes and Subtypes" (May 1999, InstantDoc ID 5226) and "The Case of the Overlapping Subtype" (November 2003, InstantDoc ID 40280). However, as I said, Bob isn't ready to normalize his data.
Finally, Bob lost historical data over the years. He recorded the initial membership date and the last renewal date for each member. However, he couldn't tell which members might have dropped away from the organization and returned—he could only query for members who hadn't renewed. Also, when Bob looked at qryMembershipActivityInfo, he couldn't determine who held offices or was a committee member in the past; he could query only for current directors or committee members. Retaining and keeping track of the members who are most active in an organization gives the organization an edge on understanding what it takes to attract and retain new members. Unfortunately, Bob couldn't do this type of analysis because his database wasn't set up to support member attraction and retention data.
Small but Comfortable StepsI can do a lot for Bob. He had already discovered the concept of the primary key; thus, he enforced entity integrity by using the auto-number property with MemberID. I can help him maintain domain integrity by creating as many reference or lookup tables as necessary and relating them to the main Members table. In so doing, I can enforce good naming standards on the lookup tables and index the Members table's foreign keys to improve performance. (You'll find additional information about these database concepts in the articles listed in the Related Reading box.)
Bob needs a lookup table to manage the mailings. If a member wanted to receive organization information by some method not already listed in the database, Bob would have to add a new column to the Members table. Since Bob is the only person using this database, adding a column to a table isn't a problem for him. However, if Bob were to hire an assistant and had to share the database, he would realize that any time he wanted to make a change to the underlying table structure he would run the risk of disrupting his assistant's work. A best-practices change would be to create a reference or lookup table, which Web Figure 2 shows, and relate it to a modified version of the Members table, which Figure 1 shows.
I can review the data in the Members table and, where appropriate, add constraints such as the current date and time for the field Date Renewed. I can also add an input mask to help with data entry for the fields Res Zip Code and Mail Zip Code. These constraints also help to enforce domain integrity and simple business-rules integrity. I can make suggestions regarding data types and help Bob deal with the changes if he decides to follow my advice. I can help Bob optimize his queries and create new ones. I can create schemes to find duplicate data and show him how to research and remedy duplicate data problems. I can create data entry forms for Bob, thus making his job of adding and updating members much easier than when he was using the gridlike table. I can help Bob develop techniques for using Access as a Mail Merge data source in Microsoft Word and Microsoft Publisher. I can show Bob how to link Excel to the Members table and use that application to create reports that look like spreadsheets for various organizational units scattered around the world, as Web Figure 3 shows.
Working with Bob has been an interesting learning experience. I've been able to provide him with a good return on his investment, even though his database violates the rules and regulations that so tightly govern relational databases. Bob and I are slowly building a trust relationship, and maybe in time I'll be able to bring his membership database into compliance with best practices in the relational database world. Here's hoping!