Synchronize Metadata Across SQL Server Database Copies

DBAs often encounter change-tracking problems when they try to keep multiple copies of a database synchronized across various environments. In addition to the standard development, test, quality assurance, and production versions, there can be copies of the database for implementations by divisions, departments, and development teams. You need to be able to check the status of database definitions quickly and easily to see if they're synchronized to support application changes that depend on the database structure.

We've devised a solution that lets you see whether multiple copies of a database are synchronized, even when there's no direct access between the SQL Server systems containing those databases. The solution produces a database synchronization report that shows discrepancies between copies of a database. You can implement this solution whenever a database change is made or at regular intervals.

The database synchronization solution is relatively simple to implement because it uses metadata (i.e., data about the database objects) that's extracted by running T-SQL queries against databases. The metadata is extracted from system tables and system views in two or more databases and stored in flat files. The flat files are transported to a central location and loaded into a composite table that's queried to group and compare the metadata. A comparison is done on each column's metadata, which consists of the column's object, name, and attributes. The comparison verifies that the data type, length, nullability, and primary key position are the same for the column definitions in all the databases.

The results of the comparison are put in the database synchronization report, which shows the columns and their attributes for each table, view, and index in each database. With just a glance at the report's Summary field, you can find out whether there are discrepancies between the databases' contents.

To implement database synchronization solution, you need to perform the following steps:

Create the composite table in a central location.
Prepare the queries to extract the databases' metadata.
Run the queries and transport the results.
Load the results into the composite table.
Run the database synchronization report.

We'll walk you through each step, then show you what to look for in the report.

Step 1: Create the Composite Table

The process to obtain a database synchronization report starts with creating the composite table, which is named DB_META, in a central location. This table will hold the metadata extracted from each database. Listing 1 contains the code you can use to create it.

Step 2. Prepare the Queries

As mentioned previously, a T-SQL query is used to extract the metadata from each database. You can find this query in the MetadataExtractionQuery.sql file, which you can download by going to www.sqlmag.com, entering 101901 in the InstantDoc ID text box, and clicking the 101901.zip hotlink. This query extracts metadata from columns in a database's tables, views, and indexes. A UNION operation is used to combine these data sets, and the metadata is inserted into a table named ZDB_META.

Table 1 shows the 10 fields in the ZDB_META table. Note the first field named DB_Identifier. Each database you want to include in the database synchronization report must have a database identifier, which is a one-character value that uniquely identifies that database. You can use uppercase and lowercase letters and single digits, so you can include up to 62 databases in the report without using special characters. The one-character values allow the results for each column to be aligned in the report's Summary field, making it easy to compare those results.

Table 1: The 10 Fields in the ZDB_META Table.

You need a customized query file for each database you want to include in the database synchronization report. Creating a customized query file is simple. First, make a copy of MetadataExtractionQuery.sql and rename it so that the filename includes the database identifier. For example, if you intend to run the query file against the database that has the identifier of 1, you can name the file something like MetadataExtractionQuery1.sql.

After you've renamed the query file, you need to open and modify it. In two spots, you'll find the code

'database-number-assigned' AS \[DB_Identifier\],

In this code, replace database-number-assigned with the database identifier.

In two spots, you'll find the code

isc.table_catalog = 'database-name'

In this code, replace database-name with the database's name.

If you need to specify a specific schema to identify the database columns, you must make an additional modification. You need to uncomment the line

-- AND isc.table_schema = 'schema-name'

and replace schema-name with your schema's name. MetadataExtractionQuery.sql's comments indicate where to make the four required and one optional modifications.

Step 3. Run the Queries and Transport the Results

At this point, you're ready to run the customized queries against the databases. For each database, you can create a batch file that not only runs its customized query but also transports the query's results to a flat file on an FTP server. (The data from individual databases can be extracted and collected in a variety of ways. We chose a simple process of extracting, transporting, and reloading the data.)

Listing 2 shows how to set up a batch file that can be scheduled to run as a background process. The first command in Listing 2 uses the sqlcmd.exe utility to open a command window and run the customized query. (Alternatively, you could use osql.exe or an equivalent SQL Server utility to run the query.) In this command, you use the -U and -P options to specify your login credentials and the -d and -S options to specify the database you want to run the query against and the server on which that database resides. You use the -i option to specify the customized query file's pathname (e.g., \\SQL3\DBSyncReport\MetadataExtractionQuery1.sql). The -o option indicates the pathname of the output file that will contain the query's results (e.g., \\SQL3\DBSynchReport\MetadataExtractionQuery1.txt). The -h and -w options specify the header and width information. You don't need to change these values.

The second command in Listing 2 uses the bcp.exe utility to transfer the query's results to a flat file. In this command, you specify the table you want to export (which will always be DB_META), the database in which that table resides, and the database's owner in the format \[\[database.\]owner.\]DB_META. You use the -o argument to specify the name of the flat file that will receive the exported data (e.g.,\\SQL3\DBSynchReport\MetadataExtractionData1.txt). You use the -S argument to identify the SQL Server instance containing the DB_META table and the -U and -P arguments to provide your login credentials. The -c argument indicates that the operation needs to be performed using a character data type.

The last command in Listing 2 uses the FTP command to copy the flat file to an FTP server. Specifically, the command uses the -s parameter to specify a text file that contains the commands to run. For example, you might specify \\SQL3\DBSynchReport\MetadataExtractionFTPCommand1.txt as the text file and include the following commands in it:

OPEN servername-or-IP-address
userid
password
PUT outfile-name
BYE

where servername-or-IP-address is the name or IP address of the FTP server you want to access using the specified credentials (userid and password) and outfile-name specifies the name of the flat file you want to copy to the FTP server. This filename should be the same one you used with the -o argument in the BCP command. Note that the exact commands in the text file will depend on your FTP version. Check your version's documentation.

Step 4. Load the Results into the Composite Table

The next step is to copy each flat file from the FTP server to the central location (i.e., the database that contains the composite DB_META table), then load its contents into the composite table. Once again, you can use a batch file to automate these tasks, as Listing 3 shows. The load process executes a T-SQL command to delete existing rows in the composite DB_META table and then executes the BCP SQL utility to load the new data records into the composite DB_META table. The scripts are very similar and reference a SQL command file and the BCP data file as input (-I) rather than output (-o).

Using batch files to run the queries, transport the queries' results, and load the results into the composite table eliminates the need for a last-minute rush to initially collect the metadata. You can also use batch files for the ongoing collection of metadata. The batch files to run the queries and transport the results would be similar on each server, as would the batch files to load the data into the composite table, but some details would need to change.

When there's a new flat file that contains updated metadata for a database, you can run a batch file that copies the new flat file from the FTP server to the central location, deletes all the existing rows in the composite table for that database, then runs bcp.exe to append the new flat file's contents to the composite table. This batch script file might look like the one in Listing 4.

Step 5. Run the Report

After the composite table contains the metadata for all the databases, you can run DatabaseSynchronizationReport.sql, which you'll find in the 101901.zip file, to create the report. DatabaseSynchronizationReport.sql is written to compare five databases. To compare more than five databases, you need to make three modifications.

For example, suppose you want to compare six databases. To do so, you first need to change the Summary column code in Listing 5. Specifically, you need to change the code in callout A to

'.'+RTRIM(ISNULL(DB5.DM_DBID,'-'))+
'.'+RTRIM(ISNULL(DB6.DM_DBID,'-'))+
AS \[Summary\]

Next, you need to add an additional LEFT OUTER JOIN statement, which Listing 6 shows. You'd put this statement after the last LEFT OUTER JOIN statement in DatabaseSynchronizationReport.sql.

Finally, you need to change the GROUP BY code in Listing 7. Specifically, you need to change the code in callout A to

'.'+RTRIM(ISNULL(DB5.DM_DBID,'-'))+
'.'+RTRIM(ISNULL(DB6.DM_DBID,'-'))

Understanding the Report

The database synchronization report presents the contents of all the queried databases so that the information can be viewed together. The data for each object is grouped and sorted by object and column name. Figure 1 shows a sample report for three queried databases.

Figure 1: Annotated Sample Database Synchronization Report.

The report's Summary field shows whether there's a discrepancy across the databases. Ideally, the Summary field of each column record should list the database identifiers for all the queried databases (e.g., "1.2.3.4.5" for five queried databases, "1.2.3.-.-" for three queried databases). This makes missing columns easy to spot. For example, in the sample report in Figure 1, we highlighted a missing-column discrepancy in yellow.

Attribute discrepancies appear as multiple entries for a column. In Figure 1, we highlighted this type of discrepancy in purple. The information in the Data-Type, Max-Length, Key-Pos, and Null fields can help identify what attribute differs between the databases.

Note that the report includes an object-separator row at the start of each new object. These rows split the results into groupings, making it easier to find and differentiate between objects.

Adapt as Needed

The database synchronization solution is highly adaptable. For example, although we designed MetadataExtractionQuery.sql and DatabaseSynchronizationReport.sql to work with metadata in SQL Server 2008, 2005, and 2000 databases, you can adapt these scripts to work with similar data in other database systems, such as Oracle or DB2.

Similarly, we designed the solution to work in an environment in which multiple copies of a database reside on servers that can't directly access each other. However, this solution will work in all types of environments. If there's network access between servers or if multiple database copies (with different names or in different instances) reside on the same server, you can streamline the process. For example, you might be able to load the metadata in the ZDB_META tables directly into the DB_META table. Your environment will determine the best extraction, transport, and load methods to use.

You can even adapt the database synchronization report to better meet your needs. For example, you can remove the header row or the object-separator rows. Or if all you want to see is the Summary field for each column record, you can add a WHERE clause to the report query to select only those columns. If there are five queried databases, you'd add the clause

WHERE RTRIM(ISNULL(DB1.DM_DBID,'-'))+
 '.'+RTRIM(ISNULL(DB2.DM_DBID,'-'))+
 '.'+RTRIM(ISNULL(DB3.DM_DBID,'-'))+
 '.'+RTRIM(ISNULL(DB4.DM_DBID,'-'))+
 '.'+RTRIM(ISNULL(DB5.DM_DBID,'-'))
 <> '1.2.3.4.5'

Find and Fix Discrepancies Before They Cause Problems

Discrepancies between copies of a database might not seem like a significant problem, but the impact of a missing database change can be quite serious. A major database-definition error can cause a new release to be backed out of production. Even correcting a minor database-definition error during a project's implementation can take several weeks. These delays can hurt your schedules and your project's credibility.

Manually rechecking the column definitions in large numbers of tables, views, and indexes is an extremely tedious and error-prone task. To save time and avoid errors, try the database synchronization solution. Using the information that the database synchronization report provides, you'll be able to quickly find and fix all discrepancies so that your database copies are perfectly synchronized.

Comments

Plain text