Corporate developers and DBAs often inherit databases created by others—and inherit the problems that come with them. If you've ever begun a "review and improve" project by creating a database diagram in Enterprise Manager only to have the new diagram displayed as one long row of tables, you know the frustration of inheriting a database with no foreign keys and probably no referential integrity. Your next step is to identify where these relationships should exist so that you can create the physical relationships in the database that enforce the logical relationships between the tables. Then, you need to locate existing records that violate these relationships (the "garbage data") so that you can clean the data before you create the foreign keys. This process can certainly be exhausting and tedious. I've created some scripts to make these steps easier.
First, let's create some tables to serve as an example. The script in Listing 1 creates three tables (based on the Northwind database) and populates them with sample data to demonstrate the problems that arise when your database is lacking referential integrity. The tables are logically related, but no foreign keys exist to enforce the relationships.
By using the four information schema views COLUMNS, CONSTRAINT_COLUMN_USAGE, TABLE_CONSTRAINTS, and TABLES, you can identify foreign key candidates in your database, then determine what garbage data has already been inserted. (For information about how to use information schema views, see Kalen Delaney, Inside SQL Server, "Property Functions, Schema Views," April 2000, InstantDoc ID 8238.) The script in Listing 2 locates all likely foreign key fields by searching for all columns that have the same name and data type as a primary key or unique key field in another table, returning the common field name and each table's name. Figure 1 shows the output that you get when you run Listing 2's script.
Listing 3 shows how you can use the query in Listing 2 in a cursor that generates and executes SELECT statements to return rows in each child table that would violate the foreign key relationship if you attempt to create it. In addition, the script will print the SELECT statement you use to list these records. Because you can't create the necessary foreign key constraints until this garbage data is modified or removed, automatically identifying these records can be a significant time saver. Figure 2 shows the output you get when you run Listing 3's code.
Because Listing 2's script relies on a consistent naming convention in which the Foreign Key field and the Primary Key field both have the same field name, it won't work in all situations. For example, consider the relationships between ReportsTo and EmployeeID and between ShipVia and ShipperID. Although the logical relationships exist, the field names are different, so these scripts won't identify them. Despite these limitations, I've had several occasions to use these techniques over the past few years, and the output serves as an excellent starting point for the cleanup that is inherent in using someone else's database.