Recently I was working on a new database design for a client, and I was asked to prepare a data dictionary of the database. Instead of creating a dedicated single-use document containing the data dictionary, I thought it might be more convenient to design a general mechanism for generating data dictionaries in a database. Such logic could be encapsulated within stored procedures, and the logic would be reusable.
In terms of output, I thought it would be helpful to see the dictionary information in the form of an online SQL Server Reporting Services (SSRS) report. In this way, a data dictionary could be hosted on a company’s departmental reports site, and different team members could refer to the data dictionary report as needed. In addition, because SSRS allows reports to be saved in Microsoft Excel, Microsoft Word, XML, and other formats, the data dictionary information could easily be exported into another report or document.
In this article I present a set of stored procedures that you can use to generate a data dictionary of schemas and tables within a database. I also introduce a simple SSRS report that you can use to publish the data dictionary online. I tested the accompanying code in SQL Server 2008; it also works in SQL Server 2005.
All data dictionary information within SQL Server is set and accessed through extended properties. According to SQL Server Books Online (BOL), extended properties let you “add text, such as descriptive or instructional content, add input masks, and add formatting rules as properties of objects in a database or of the database itself. For example, you can add an extended property to a schema, a schema's view, or to a column in the view. Because extended properties are stored in the database, all applications reading the properties can evaluate the object in the same way. This helps enforce consistency in the way data is treated by all the programs in the system.”
Extended properties have several different levels. SQL Server BOL explains these levels as follows: “For specifying extended properties, the objects in a SQL Server database are classified into three levels: 0, 1, and 2. Level 0 is the highest level and is defined as objects that are contained at the database scope. Level 1 objects are contained in a schema or user scope, and level 2 objects are contained by level 1 objects. Extended properties can be defined for objects at any of these levels.”
To set an extended property, you use a T-SQL command such as sp_addextendedproperty (to add a new property) or sp_updateextendedproperty (to update an existing property). Similarly, you use the sp_dropextendedproperty command to drop an existing extended property in the database.
The spAddToDataDictionary Stored Procedure
I created the spAddToDataDictionary stored procedure as a simple wrapper around some logic to add an extended property in the database (or update it, in cases in which the property already exists). Web Listing 1 contains the code for the spAddToDataDictionary stored procedure. You can use this procedure to add new data dictionary entries for tables and table columns in a database. It’s a quick way to load data dictionary information into a database if it isn’t already available.
The spAddToDataDictionary stored procedure accepts arguments for the schema name, table name, and column name, as well as a description. It automatically handles the level 1 (e.g., schema) and level 2 (e.g., table) specifications as appropriate. As an example, suppose your database contains a table definition for vehicles defined as follows:
CREATE TABLE dbo.Vehicle
VehicleID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY,
VehicleName VARCHAR(32) NOT NULL
You might use the following set of stored procedure calls to enter the corresponding data dictionary entries into the database:
-- Set the Dictionary Entry for the Table
EXEC \\[dbo\\].spAddToDataDictionary 'dbo', 'Vehicle', 'A Table Listing the Different Vehicles';
-- Set the Dictionary Entry for the Table Columns
EXEC \\[dbo\\].spAddToDataDictionary 'dbo', 'Vehicle', 'VehicleID', 'A Unique ID for the Vehicle';
EXEC \\[dbo\\].spAddToDataDictionary 'dbo', 'Vehicle', 'VehicleName', 'The Vehicle Name';
Notice that the schema name is dbo and the table name is Vehicle. For the first call, you’re specifying data at the table level, so it isn’t necessary to specify a column name. In the second two calls, you’re specifying column-level information, so a column name (VehicleID and VehicleName, respectively) must be passed in.
The spGenerateDataDictionary Stored Procedure
After you enter the data dictionary table and column information into the database, you can use the spGenerateDataDictionary stored procedure to generate data dictionary report output. Web Listing 2 contains the code for this procedure.
The spGenerateDataDictionary stored procedure retrieves information from several catalog views (sys.schemas, sys.tables, sys.columns, sys.types) and pulls in information about schemas, tables, columns, and column types to the database. The procedure then uses the sys.extended_properties catalog view to obtain the extended property descriptions for these objects. (These extended property descriptions provide the information for the data dictionary table descriptions and table column descriptions.)
The procedure has two optional parameters: @SchemaName and TableName. If the parameters aren’t specified, they both default to wildcard (\\[All\\]), meaning that all schemas and/- pullinggcolumnready availablehed to that DBAs and developers can have access to the data dictionary information.or my case or all tables are included.
As an example, suppose you created the spGenerateDataDictionary stored procedure within the SQL Server 2008 AdventureWorks database. Running the following command
would return information about all the schemas and tables in the AdventureWorks database.
Similarly, if you ran the command
EXEC \\[dbo\\].\\[spGenerateDataDictionary\\] 'HumanResources';
the procedure would return only information about tables in the HumanResources schema of the AdventureWorks database.
Similarly, the following command
EXEC \\[dbo\\].\\[spGenerateDataDictionary\\] 'HumanResources', 'Department';
would return only information about a particular table (i.e., Department) within the HumanResources schema of the AdventureWorks database.
Figure 1 shows the output of the spGenerateDataDictionary stored procedure. Note that the value in the first column is simply the server name (@@SERVERNAME). For each table group in the output, a separate row describes the information for that table (corresponding to ColumnName = \\[N/A\\]), with one or more rows for each table field. Separate columns contain the schema name, table name, data type, length, nullability, and description—as you might expect. The XType column identifies whether a particular column is a primary key (PK), foreign key (FK), or both (PK, FK).
If a data dictionary description isn’t available for a particular table or table column (e.g., if the extended property for that table or column was never set), the stored procedure sets the Description field in the output to \\[Missing Definition\\]. Indeed, you can run the spGenerateDataDictionary procedure routinely (e.g., once a week) and look for occurrences of \\[Missing Definition\\] in the output to find objects that might not be fully notated in the data dictionary. You can then use the spAddToDataDictionary stored procedure or another mechanism to manually enter the missing information for these entities. This approach is a good way to keep the dictionary up-to-date. When a column has a type such as XML, VARCHAR(MAX), NVARCHAR(MAX), or VARBINARY(MAX), the Length (MaxLength) field will show a value of -1, because SQL Server might not know the actual column length in these cases.
Building the Data Dictionary Report
I created an SSRS report that you can use to publish the data dictionary. You can use SQL Server 2008 Business Intelligence Development Studio (BIDS) to open the SSRS report project, called DataDictionaryReportProject.sln, which contains the report called DataDictionaryReport.rdl. This simple report form lets you call the spGenerateDataDictionary stored procedure with appropriate parameters and display the output onscreen.
Within the RDL file (i.e., DataDictionaryReport.rdl), you need to edit the data source definition to point to the server and database you want to use, as callout A in Listing 1 shows. Note that because the report form uses integrated security, you don’t need to specify a user ID and password. If the data dictionary report will be published to a remote reports site on a different server in your enterprise, you might need to tweak the connection information and/or security credentials to enable connection. If you aren’t using an administrative account, you’ll need to grant explicit database access rights to the account you use to connect to the database. The account should have execute rights to the spGenerateDataDictionary procedure within the database, as well as to any other procedures your reports call.
When you run the report, you’ll be prompted to select a schema, as Figure 2 shows. As you can see in Figure 2, the schemas for the AdventureWorks database include dbo, HumanResources, Person, Production, Purchasing, and Sales. When you select a specific database schema, the report output is filtered to show only the tables within that schema. If you don’t need to filter the schema name, you can select \\[All\\], which includes all schemas in your report’s database.
After you select the schema, the Table parameter’s drop-down menu will include the available table names for the selected schema. As Figure 3 shows, the tables for the HumanResources schema include Department, Employee, EmployeeAddress, EmployeeDepartmentHistory, EmployeePayHistory, JobCandidate, and Shift. Again, you can select \\[All\\] to include all the tables within the selected schema.
Click View Report to generate the report. Figure 4 shows the report output.
Because data dictionaries are useful for both DBAs and developers, it’s handy to have a quick mechanism for generating them. This article describes some basic stored procedures that you can use to build a data dictionary of tables in a SQL Server 2008 database, then use a simple SSRS report to report on that dictionary. Although this article is limited to tables, you can easily enhance the underlying stored procedures to support other objects in the database (e.g., views, user-defined functions). You can publish a data dictionary report either locally or to a departmental reports site; your organization’s DBAs and developers can then access the information as necessary. For more information about creating data dictionaries, see “Use Extended Properties to Create a Data Dictionary” and “Script to build a SQL Server Data Dictionary and report with Microsoft Excel.”