Using Information Schema Views

Microsoft added information schema views in SQL Server 7.0 to allow safe access to metadata from the system tables. According to SQL Server 2000 Books Online (BOL), using information schema views is one of only three supported methods of obtaining metadata (you can also use system stored procedures or system functions). In "Investigating System Objects," January 2002, I briefly looked at information schema views and noted that Microsoft created these kinds of utilities so that people wouldn't have to access the system tables directly—a method that Microsoft discourages. SQL Server 2000 manages information schema views differently than SQL Server 7.0 does, so you must pay attention to which version you're using when you read about information schema views.

Viewing the Difference

Much of the technical description of information schema views in SQL Server 2000 BOL seems to be left over from SQL Server 7.0 BOL. In "Investigating System Objects," I explained that in SQL Server 7.0, each database has its own set of information schema views. SQL Server 2000 has only one set—in the master database. But when you reference one of these views from within a user database, the contents of the view reflect your current database. However, you'll find that the SQL Server 2000 BOL documentation assumes that separate sets of information schema views still exist in each database.

For example, let's consider the TABLES view, which lists all the tables and views in a database (a view is considered a virtual table). If you use the following query to look at the contents of the TABLES view in the master database, you'll see all the information schema views—and all the user tables and user-defined views.

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

The names of the information schema views are in all uppercase characters. I never create object names by using all uppercase characters, so I can easily see that this query, when executed from the master database on a SQL Server 2000 server, returns the names of the information schema views with all the other user table and view names. But if you run this query in any other database in SQL Server 2000, you won't see the information schema views. And if you run this query in SQL Server 7.0, you'll see the names of the information schema views regardless of which database you're in when you run the query.

Figure 1 shows the names of the 20 information schema views in SQL Server 2000. I retrieved this list by using the following query:

SELECT  TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'VIEW'
AND TABLE_SCHEMA = 'INFORMATION_SCHEMA'

As I mentioned in "Investigating System Objects," Microsoft added information schema views to SQL Server for ANSI compliance. However, the ANSI-supported methods for referencing objects in a database are different from the methods that SQL Server uses. ANSI doesn't support the concept of an object owner; instead ANSI defines a schema. SQL Server 2000 BOL defines a schema as "in the SQL-92 standard, a collection of database objects that are owned by a single user and form a single namespace. A namespace is a set of objects that cannot have duplicate names. For example, two tables can have the same name only if they are in separate schemas, no two tables in the same schema can have the same name." These differences lead to cumbersome object names and the need to use the username Information Schema when accessing the objects. This username lets you access SQL Server objects in a manner similar to the ANSI standard.

BOL also says, "In Transact-SQL, much of the functionality associated with schemas is implemented by database user Ids." So, whereas Information Schema might seem strange for a username in SQL Server, the name sounds more reasonable as the name of a schema or a collection of objects.

Using the Views

Two common database-administration questions are "How can I tell whether a column is a foreign key?" and "How can I find all the foreign keys that reference a particular table?" Let's look at how you can use information schema views to find the answers. I'd like to thank SQL Server Most Valuable Professional (MVP) Dejan Sarka for providing this solution. I've changed the object names so that the solution is generic.

Before we examine the code for accessing foreign key information, let's review the vocabulary we use to talk about foreign keys. You're probably aware that a referential constraint is the same as a foreign key constraint, but many names exist for the two tables involved in foreign key relationships. Some people call the tables the master and detail tables; others dub them the parent and child tables. Both conventions imply a hierarchy, suggesting that one of the tables is superior to the other. In a relational database, this hierarchy doesn't exist. Two tables simply have a relationship—neither is superior to the other. So I call the two tables the referenced table and the referencing table, as Joe Celko suggests in his book Joe Celko's SQL for Smarties: Advanced SQL Programming, 2nd edition (Morgan Kaufmann Publishers, 1999).

In SQL Server, you can define a foreign key to reference any columns defined as having either a primary key or a unique constraint (both of which must be unique). The referenced table is the table that contains the unique value, and the referencing table is the one that contains the foreign key column or columns that must match the unique key.

Look again at the list that Figure 1 shows. The most important information schema view for our purposes is REFERENTIAL_CONSTRAINTS, which shows two sets of three columns that name constraints. The first three columns are the three parts of the name of the constraint in the referencing table: CONSTRAINT_CATALOG, CONSTRAINT_SCHEMA (the owner of the referencing table), and CONSTRAINT_NAME. Next, you see three columns for the three parts of the name of the constraint in the referenced table: UNIQUE_CONSTRAINT_CATALOG, UNIQUE_CONSTRAINT_SCHEMA (the owner of the referenced table), and UNIQUE_CONSTRAINT_NAME. Because the REFERENTIAL_CONSTRAINTS information schema view lists the catalog name for each table, which maps to the database name in SQL Server, you might think that the foreign keys can reference objects in another database. However, SQL Server foreign key constraints can reference objects only in the local database.

Now that you've found the names of the constraints involved, you can use that information to find the names of the tables and columns involved in the constraint. To do so, you need the two information schema views CONSTRAINT_COLUMN_USAGE and CONSTRAINT_TABLE_USAGE. You need to access each of these views twice; each is joined once with the name of the foreign key constraint and once with the name of the unique (or primary) key. So although you need only three information schema views to find all the foreign keys in a database, your query looks like a five-table join because two of the views appear twice.

Listing 1 shows the query. The alias Refs refers to the REFERENTIAL_CONSTRAINTS view. The alias FKtable refers to the CONSTRAINT_TABLE_USAGE view when accessing the foreign key information, and the alias PKtable refers to the CONSTRAINT_TABLE_USAGE view when accessing the primary key information. Similarly, the alias FKcolumns refers to the CONSTRAINT_COLUMN_USAGE view when accessing the foreign key information, and the alias PKcolumns refers to the CONSTRAINT_COLUMN_USAGE view when accessing the primary key information.

If you want to see only the foreign keys in a particular table, you can add a WHERE clause and specify the value for FKtable.table_name. For example, if you want to see the foreign keys in the titleauthor table in the Pubs database, you can add the following WHERE condition to the query in Listing 1:

WHERE FKtable.table_name = 'titleauthor'

Figure 2 shows the results that the WHERE condition returns. Note that the titleauthor table has two foreign keys: The au_id column references the authors table, and the title_id column references the titles table.

To make this code more generic, I could write my own system stored procedure and pass a table name in as a parameter, then return all the foreign keys in that table.

But this solution isn't perfect. It works for foreign keys based on a single column, but for composite keys, it returns extraneous information. Extraneous information is a common problem in self-joins, and one for which no easy, generic solution exists. For more information about self-joins, see "Using Self-Joins" in BOL. Consider the two tables that the code in Listing 2 creates. If you run the five-table join above, adding the WHERE clause

WHERE FKtable.table_name = 'tab2'

you get the results that Figure 3 shows. Each foreign key column in tab2 is combined with each primary key column in tab1. So in Figure 3, we can't tell whether column c in tab2 corresponds to column a or b in tab1 because the join returned both combinations. One solution might be to insist that referencing columns always have the same name as the columns they reference, but because such systematic referencing isn't required, it seems overly restrictive.

To make the query in Listing 2 return only relevant information, you'd have to know which column in the constraint definitions was the first column, which was the second, and so on (a foreign key can have as many as 16 columns). Basically, you'd need to add a condition that specified that the column positions were the same for the columns in each row of the output. In a single-column key, both the referencing column and the referenced column would be in position 1, so such a solution would work even if we didn't have composite keys.

To make this solution work—even in the case of composite foreign keys—you need to use the information schema view KEY_COLUMN_USAGE. In the next T-SQL Admin column, I'll show you a complete solution that references this information schema view. I'll also show you some problems that you can't solve by using only information schema views, system stored procedures, and system functions. Information schema views contain a wealth of information, and you can't really begin to appreciate their full potential until you start using them all and examining the T-SQL code that defines them. Although these system-supplied utilities provide enough information that, for most purposes, you'll never need to directly access the system tables, direct system table access is still required for some metadata.

Comments

Plain text