Suppose you're a DBA using SQL Server 7.0 to support online transaction processing (OLTP) databases of 50GB or more in a virtually 24 * 7 operation with a narrow maintenance window. Suppose also that your users require ad hoc query access to your data for either reporting purposes or data warehouse development. For this example, you would want to maintain a copy of the production data on a separate server to avoid database performance problems such as blocking caused by users issuing long-running queries against tables that programs are also issuing inserts and updates on.
Transactional replication is the process of reissuing against remote secondary servers those transaction commands that SQL Server has initially committed on a primary production server. Transactional replication is a popular method of maintaining an updated copy of a production database. The term for the remote secondary servers is Subscribers; the term for the primary production server is Publisher. Transactional replication maintains the subscriber servers' data with minimal latency and requires a minimum of DBA maintenance. Latency determines the amount of time between when SQL Server commits the transactions against the production database and when SQL Server applies those same transactions to the Subscribers' copy of the data.
SQL Server 7.0 lets you configure a transactional replication scenario in which one or more Subscribers receive the transactions committed against the Publisher. The Microsoft Snapshot Agent facilitates an initialization process that synchronizes the Publisher's schema and data with all Subscribers. Although using the Snapshot Agent is standard practice, its synchronization process could exceed your maintenance window, depending on the size of your database. However, you can set up transactional replication for your database without using the Snapshot Agent to initialize the database schema and data. For the example in this article, you must perform a full Publisher database restore to the Subscriber server during a nonproduction period to manually synchronize the schema and data.
Implementing Transactional Replication
Implementing transactional replication by backing up the Publisher database and restoring it at the Subscriber server requires performing the following tasks in sequential order. You must configure the Distribution, Publisher, and Subscriber servers; construct publications; restore the Publisher database at the Subscriber server; create the stored procedures required for replication to take place; disable publishing on the Subscriber database; and either push the subscriptions to the Subscribers or set up the Subscribers to pull the subscriptions. Let's walk through each of these steps.
Configuring the servers. The Distribution server receives all replicated transactions from the Publisher server, stores these transactions in the Distribution database, and distributes them as transaction commands to the Subscriber server. You must configure the Distribution server first. To configure the Distribution server (which can be the same server as the Publisher server), start SQL Server 7.0 Enterprise Manager. From the Enterprise Manager menu, select Tools, Replication, Configure Publishing, Subscribers, and Distribution. Click Next on the resulting Configure Publishing and Distribution Wizard window.
The next window asks you to select a distributor. Click Next, and the Specify Snapshot Folder window appears. You won't copy any data to the Snapshot folder because you're going to manually synchronize the data. So, click Next to customize the Distribution server's configuration. For this example, take the default option, which configures the current server as its Publisher and Distribution server and enables all registered SQL Server systems to be Subscriber servers. Then, click Next, Finish; SQL Server configures your server as a Distribution server and creates the Distribution database on it.
Constructing a publication. A publication is a set of articles. Articles are tables on which you can define horizontal or vertical filters so that whenever SQL Server commits a transaction against an article, SQL Server replays that transaction command at the Subscriber database. To create a publication through Enterprise Manager, select Tools, Replication, Create and Manage Publications. The Create and Manage Publications on YourServerName window appears, as do the databases from which you can define publications. Click Create Publication to access the Create Publication Wizard. Click Next to choose a publication database. For this example, select the Northwind database. Click Next to select the publication type from three options: Snapshot Publication, Transactional Publication, or Merge Publication. Choose Transactional Publication, and click Next. The next window asks whether you want to allow immediate-updating subscribers, which replicates data modifications made at the Subscriber database back to the Publisher database. Don't select this option, because the Subscriber database is read-only, and users will use it for ad hoc queries and other reporting purposes. The next window is Select Subscriber Types. Choose Servers running SQL Server 7.0, and click Next.
Now, you can specify the articles for your publication. For this example, choose the Employees table, and click Next. The Select your publication name and description window appears; enter Northwind_Pub1 in the Publication name text box, and click Next. Then, you customize the publication's properties. Use the default selection, which reads No, create the publication as specified. Click Next, Finish to complete the create-publication process.
Backing up the Publisher database and restoring the database at the Subscriber server. You're now ready to perform a full backup of the Publisher database on the Publisher server and restore it on the Subscriber server. When the backup completes, either shut down the SQL Server service or put the server into single-user mode. This step is crucial to ensuring data synchronization. Let's review how the replication process works to understand why this step is important.
The replication process begins with SQL Server marking for replication in the Publisher server's database transaction log the transactions against all articles in the publication. The Distribution server's Log Reader Agent clears the transactions marked for replication from the transaction log and copies them into the Distribution database. After the Distribution Agent starts, it replicates these transactions to the Subscriber database. However, the Distribution Agent can begin only after you've initialized the Subscriber database's schema and data. Therefore, you initialize schema and data by restoring the Publisher database at the Subscriber server. If the Publisher database were available to users for updating during the restore period, some job process or scheduled application might log in to the server and make data modifications. This action would make the data immediately out of sync between the Publisher database and the Subscriber database. After the restore, you enable a push subscription to the Subscriber database and restart the Publisher server in multiuser mode. The Subscriber database can now accept replicated transactions.
Creating the stored procedures. In addition to synchronizing the database schema and data, the Snapshot Agent also creates special stored procedures that the Subscriber database needs to process replicated transaction commands. Because this example bypasses the Snapshot Agent, you need to create these stored procedures manually.
Each insert, update, or delete operation against a publication article must have a corresponding insert, update, or delete stored procedure on the Subscriber database. Begin by using sp_helppublication to obtain the names of all publications in the Publisher database. The following shows the code you need to obtain this information:
USE Northwind GO EXEC sp_helppublication GO
This query returns Northwind_Pub1, which is the name you chose when you created the publication.
Next, you use the sp_helparticle stored procedure to obtain the article_id value associated with each publication, as follows:
USE Northwind GO EXEC sp_helparticle 'Northwind_Employees' GO
For the publication Northwind_Pub1's Employees article, the article_id value is 1. For subsequent articles in this publication, SQL Server increments the article_id values by 1.
You can now create the insert, update, and delete stored procedures for the Subscriber database. First, execute the commands that Listings 1 and 2 show to create the Subscriber database's insert and delete stored procedures, respectively. (To create the update stored procedure, execute the code in Web Listing 1. For download instructions, see the More on the Web box, page 57.) Note that the Employees table contains an IDENTITY column. To verify that the IDENTITY property is active, execute the following command from Query Analyzer:
SELECT name FROM Northwind..sysobjects WHERE type = 'U' AND (objectproperty(object_id(name), 'tablehasidentity') = 1)
The Employees table will appear among the table names in the result set. If you had used the Snapshot Agent to synchronize the schema and data, the Employees table wouldn't appear in the result set (i.e., the Snapshot Agent doesn't transfer the IDENTITY property as part of the schema synchronization process). You must add the SET IDENTITY_INSERT statement to the insert stored procedure to override the auto-incrementing action of the IDENTITY property because any rows inserted into the Publisher database's Employees table should also be inserted into the Subscriber database's Employees table.
The IDENTITY column's auto-incrementing action in the Publisher database increments the IDENTITY value, but you must explicitly insert this row into the Subscriber database's Employees table. Thus, we added the statement SET IDENTITY_INSERT Employees ON before the INSERT statement and SET IDENTITY_INSERT Employees OFF after the INSERT statement, as Listing 1 shows. In addition, because SQL Server disallows updating an IDENTITY column's value, we modified the update stored procedure by commenting out the line that would cause SQL Server to update the IDENTITY column. For this example, the IDENTITY column is the Employees table's EmployeeID column.
In addition to the IDENTITY property, the Subscriber database might also have triggers on database tables. You should disable all triggers because you don't want SQL Server to fire the trigger on the Subscriber database when the database receives an insert, update, or delete operation. Because the trigger has already fired on the Publisher database, firing the trigger on the Subscriber database might repeat the transaction.
Disabling publishing on the Subscriber database. You've manually synchronized schema and data by restoring the Publisher database at the Subscriber. In addition, recall that in the sequence of steps outlined here, you created the publication on the Publisher before backing up the database. Once a publication exists, SQL Server updates the row in the sysdatabases system table and sets its category value to 1. For this example, this action indicates that SQL Server has enabled the Northwind database for publishing. Because you restored the database on the Subscriber server, the Northwind database would be enabled for publishing on the Subscriber server also. To verify that this statement is true, execute the following query from Query Analyzer on the Subscriber server:
SELECT * FROM master..sysdatabases WHERE name = 'Northwind'
The categoryid is equal to 1. You should disable this property because you don't want SQL Server to create any publication or enable any subscriptions for the Subscriber database. To disable publishing for the Subscriber database, execute the commands in Listing 3 on the Subscriber server.
Pushing or pulling the subscriptions. Pushing a subscription means letting the Subscriber server passively accept transaction commands from the Distribution server; pulling a subscription requires that the Subscriber server establish a connection with the Distribution server and retrieve the replicated transaction commands. Thus, a subscription involves propagating data and updates to a Subscriber server, which is the server on which the copy of the production database is maintained and updated.
To push a new subscription, start Enterprise Manager, then select Tools, Replication, Push Subscription to Others. Under the Northwind database, highlight the publication Northwind_Pub1, click Push New Subscription to access the Push Subscription Wizard, then click Next. On the Choose Subscriber window, highlight the Subscriber server among all enabled Subscriber servers that you want to receive the replicated transactions, and click Next. In the Choose Destination Database window, click Browse to preview all the available databases on the Subscriber server. Highlight the Northwind database (i.e., the database to which you want to replicate the transactions against articles in your publication), and click OK. The Northwind database should now appear in the Subscription database name text box. If the database's name doesn't appear, enter it into the text box and click Next.
Now that you've established how to get the publication to the Subscriber server, you're ready to set the Distribution Agent's schedule. The Distribution server through the Distribution Agent ensures that SQL Server replicates updates made at the Publisher server to the Subscriber database. You can schedule the Distribution Agent to run at specified times or let it run continuously. The default is to let the Distribution Agent run continuously. If the Distribution server and Publisher server are the same, if you expect a large volume of user connections and transactions, and if you have multiple Subscriber servers, then the overhead that the Distribution Agent requires, in terms of memory and CPU utilization, could cause performance problems. However, with only one Subscriber server, you shouldn't have any performance problems.
Accept the default—to let the Distribution Agent run continuously—and click Next to display the Initialize Subscription window, which asks whether you want to use the Snapshot Agent to initialize the destination database's schema and data. For this example, choose No, the subscriber already has the schema and data. This option requires that you restore the Northwind database from your Publisher server backup. Click Next to have SQL Server check the SQLServerAgent service's status. The SQLServerAgent must be running because this agent monitors and schedules the replication agents. Click Next, Finish to complete the Push Subscription Wizard. Now, you have your transactional replication implementation ready to push the Northwind_Pub1 publication's subscription from the Northwind publication database to the Subscriber server.
Now that you've disabled publishing for the Subscriber database, you can start the Distribution Agent on the Distribution server to begin replication. The Distribution Agent automatically creates a table named msreplication_subscriptions in the Subscriber database. The table contains information about when the Distribution database's transaction batch began. In some cases, you might need to modify this table's entry_time column value. For example, if you decided to enable a push subscription to the Subscriber database before your database backup and the Publisher server was in multiuser mode during the Subscriber database restore, data modifications might have occurred. These data modifications would result in entering transactions into the Distribution database through the Log Reader Agent. After the restore has completed, SQL Server wouldn't yet have committed these transactions to the Subscriber database because the backup wouldn't reflect these transactions, so the Subscriber database wouldn't reflect them either after the restore. Also, the time at which SQL Server put these transactions in the Distribution database will be earlier than the time at which the restore completed. When you start the Distribution Agent, the entry_time column value reflects only the time at which SQL Server created the msreplication_subscriptions table. The difference between the entry_time column value in the msreplication_subscriptions table and the time at which SQL Server copied the first batch of transactions into the Distribution database could cause data synchronization problems between the Publisher and Subscriber databases.
To update the entry_time column in the msreplication_subscriptions table with the value corresponding to the time at which the Publisher database backup completed, execute the following query against the Distribution database.
SELECT * FROM msrepl_transactions WHERE entry_time >= Backup Completion Time
In the result, which Figure 1 shows, the Xact_SeqNo column contains a hexadecimal value that corresponds to the entry_time column value. Next, update the timestamp in the Subscriber database's msreplication_subscriptions table by using the following code; this update lets you restart the Distribution Agent with the next available transaction ID:
UPDATE msreplication_subscriptions SET transaction_timestamp = 0x00000DB70001D9B10032
Last, because replication replays some transactions that already exist in the Subscriber database, you might experience some primary key violations. Conversely, if you start the Distribution Agent and receive no primary key violations, replication might have missed some transactions. You can make sure you have all the transactions and no duplicates by using the stored procedure sp_publication_validation to validate the Subscriber data by both rowcount and checksum.
Virtually Maintenance Free
Many IT shops have large SQL Server 7.0 databases supporting mission-critical OLTP applications. These applications might also support business functions almost continuously and have limited database maintenance time. Often, companies also require that IT provide reporting capabilities and maintain a reporting server.
If time doesn't permit you to use the Snapshot Agent for transactional replication, you might resort to either log shipping (restoring to a secondary server each transaction log backup from a primary server) or a daily restore from the previous full backup to maintain a copy of the reporting server's database. Either way, the reporting server's data lags significantly behind the production server's data. In addition, the reporting server might be unavailable because of an attempted restore of corrupted transaction logs or because of how long a full database restore requires. These scenarios could require much of your time and attention.
We've shown you how to set up transactional replication without the Snapshot Agent. Although this setup process can require significant time and effort—especially for databases with many tables—the completed replication implementation should be virtually maintenance free. In addition, if problems occur that cause the Subscriber and Publisher databases to fall out of sync, the time required to reinitialize the data will be minimal; you'll already have completed all the work involved in creating the stored procedures required for replication, and you can devote your DBA resources to other important tasks.