More and more businesses today need their data to be available on multiple servers and at remote sites, and they need the data to be as closely synchronized at these locations as possible. When multiple copies of valuable data exist, availability of that data improves. For example, when one site goes down, you can divert traffic to another site or server. In addition, DBAs can spread the load across servers to avoid overloading one server and to improve response time for user queries—especially when the server is physically close to the users.
Let's briefly consider a scenario that illustrates failover and load-sharing needs for a database system that includes a three-tiered application at two geographic sites. Each site runs a Web server, an application server, and a database server. When the system is operating efficiently, the Web server and application server at each site split their user requests between the database servers so that the two database servers can share the request load. However, when one of the database servers or databases is unavailable, the Web and application servers can fail over all their database requests to the database server at the other site. After the database server at the first site is back in operation, the two database servers again share user requests.
When an organization maintains one active site and reserves the other site as read-only, a DBA's duties are relatively simple. However, responsibilities soon become complex when the organization decides to place multiple sites in active mode and to synchronize data among all active sites. One option SQL Server provides for this setup is transactional replication. Explaining the basics of transactional replication is beyond the scope of this article; for more information about this type of replication, see Cedric Britt and Samuel Penn's article "Transactional Replication without the Snapshot Agent," May 2001, InstantDoc ID 20005, and the "Transactional Replication" section of SQL Server Books Online (BOL).
SQL Server 2000 offers two transactional replication options that let data updates take place at the Subscriber. In the first option—Transactional Replication with Immediate Updating Subscribers—SQL Server uses a two-phase commit to simultaneously update the Publisher in the same transaction as the Subscriber. A two-phase commit locks the affected row at all sites involved in the replication while an update takes place at one site. This locking ensures that no latency occurs between the time a Subscriber is updated and the time the Publisher reflects the update. Note that this option requires the Publisher and Subscriber to be running and connected at all times; if they're not, users can't make updates to the Subscriber.
The second option is Transactional Replication with Queued Updates, which I call TRQU in this article. In contrast to Transactional Replication with Immediate Updating Subscribers, TRQU requires latency between the time an update takes place at the Subscriber and the time the same update appears at the Publisher. One disadvantage of using this replication method is that it introduces the possibility that a row might be updated with different data at multiple sites simultaneously and that the data won't be consistent across sites until a conflict-resolution mechanism removes the data inconsistency. You define the rules for resolution—such as Publisher wins or Subscriber wins—in the TRQU setup. Consequently, updates at one site can overwrite updates at the other site. The advantage of using TRQU is that the Publisher and Subscriber don't always need to be connected, and the Publisher can be down while the Subscriber is being updated. Therefore, TRQU offers higher data availability to users than Transactional Replication with Immediate Updating Subscribers does. Note that you also can use merge replication to replicate data in a system that keeps more than one site active. However, although transactional replication supports two-phase commit, merge replication doesn't support it. For more information about merge replication, see "Merge Zone" (November 1999, InstantDoc ID 6193) by Bob Pfeiff and Ted Daley.
TRQU's unique properties make it appropriate for business situations that
- require you to perform data updates at the Publisher and at the Subscriber
- don't need the data at the Publisher and Subscriber to be in perfect sync at all times and can tolerate latency between updates at one site and the same updates at the other site
- accept the possibility that SQL Server can update the same row of data simultaneously at both the Publisher and the Subscriber and that updates at one will overwrite updates at the other
- require that SQL Server maintain transactional integrity
- might require temporary disconnection of the Publisher and the Subscriber or might require tolerance of such a disconnection without stopping updates at both sites
User-profile data, for example, fits these business conditions for organizations that consider this type of data noncritical. However, other organizations might consider this data to be crucial. Therefore, the utility of TRQU depends on what an organization uses the data for. In this article, the first of a series, I show how to set up TRQU and describe the resulting changes that take place to the database schema.
Setting Up TRQU
The process of setting up TRQU involves more than 40 screens, so I can't include them all in this article. Instead, I list the steps in the process and provide important screen shots at milestone junctures so that periodically you can take your bearings.
TRQU setup is divided into three steps. Step 1 is to configure the Distributor, the distribution database name and location, the snapshot folder, the Publisher, and the Subscriber and to enable the publication database. Typically, you perform this step on the Distributor. In Step 2, you configure the publication and articles within the publication. Typically, you perform this step on the Publisher. Finally, in Step 3, you configure a push subscription that involves the publication you defined in Step 2 and a Subscriber and subscription database to which you'll push the publication. Typically, you perform this step on the Publisher. For this article, I use a push subscription to replicate the authors table from the Pubs database to a table of the same name in the Northwind database.
Step 1: Configuring the Distributor and Publisher
Before you proceed, note the contents of the sysservers and sysxlogins tables in the master database on the Distributor and Publisher so that you can identify the new rows that are added to these tables during TRQU setup. Pay attention also to the SQL Server Agent jobs defined on each of these machines so that you can identify additions that take place during the process. In my example, I've combined roles in one machine, but connecting three machines to one another is more efficient.
Configuring the Distributor. In Enterprise Manager, highlight the machine that you've designated as the Distributor (the Distributor name in the figures is MOHAN), then in the Tools menu, select Replication, Configure Publishing, Subscribers, and Distribution, as Figure 1 shows. If you're configuring replication for the first time, the Welcome to the Configure Publishing and Distribution Wizard screen appears. Click Next. The following screen, Select Distributor, lets you either specify the currently selected machine as the Distributor or choose another machine as Distributor. Select the current machine, then click Next.
The Specify Snapshot Folder screen asks you to specify a folder (preferably on the Distributor machine) where you want to store the snapshots. Snapshots are a set of schema and data files that define the state of an article (a table in this scenario) at a given time. Enter the path to a folder or select the default path, C:\Program Files\Microsoft SQL Server\MSSQL\ReplData, then click Next. The Customize the Configuration screen lets you customize your setup or accept the default setup. Choose the first option to customize your setup, then click Next.
In the Provide Distribution Database Information screen, you specify the name of the distribution database and the location of its data and log files. The distribution database is an important database on the Distributor machine; IT holds detailed information about the replication setup and provides intermediate storage for transactions replicated from the Publisher to the Subscriber. This database also maintains replication agent history data. Enter the database name and data and log file locations in the appropriate text boxes, then click Next. Figure 2 shows the default distribution name for the database and the default C:\Program Files\Microsoft SQL Server\MSSQL\Data folder for the data and log files.
Configuring the Publisher. In the Enable Publishers screen, select the machine you want to designate as Publisher for this Distributor, then click Next. Figure 3 shows the MOHAN server as the Publisher. On the Enable Publication Databases screen, select the Trans (i.e., transactional replication) check box in front of the Pubs database from which you want to replicate data. In this example, the Pubs database is designated as Publisher. Click Next. In the Enable Subscribers screen, select the machine you want to designate as the Subscriber. Figure 4 shows the MOHAN server selected as the Subscriber. Click Next. On the Completing the Configure Publishing and Distribution Wizard screen, verify that all selections you've made so far are correct, then click Finish.
Configuring queued updates. After seeing a screen that displays the setup progress, you'll see a dialog box that says Enterprise Manager successfully installed whichever server you chose as the Distributor for whichever server you chose as the Publisher. Click OK. Then, a dialog box says that the Replication Monitor will be added to the console tree in Enterprise Manager on the Distributor. Click OK. If you've configured the Distributor and Publisher correctly so far and no error messages have appeared, you should see the following configurations in Enterprise Manager. On the Distributor, you should see the distribution database (called distribution in the figure), a Replication Monitor folder and Agents subfolder on the Distributor, and one sub-subfolder for each replication agent. On the Publisher, a hand appears under the database that you chose as the publishing database. (In the example, I chose the Pubs database on the same machine, MOHAN.) When you expand the publishing database node, you should see a folder named Publications. At this stage, the Publications folder has no named publication under it because you haven't yet defined a publication.
Now look at the contents of the sysservers and sysxlogins tables in the master database on the Distributor and Publisher machines. Note the additions to these two tables—among them a server named repl Distributor and a login named Distributor admin. Also note that the setup process has added jobs at the Distributor. At this stage, the created jobs are merely replication cleanup jobs, which I'll describe in greater detail in a future article.
Step 2: Defining the Publication
After you've configured the Subscriber and the Publisher, the next step is to create a publication. This publication contains the data that you want to share with the Subscribers.
Configuring the Publisher. In Enterprise Manager, highlight the Publisher (e.g., MOHAN), then on the Tools menu, choose Replication, Create and Manage Publications. On the resulting Publisher screen, highlight the database you specified for publishing (e.g., Pubs). Click Create Publication. On the Welcome to the Create Publication Wizard screen, be sure to select the Show advanced options in this wizard check box, as Figure 5 shows. Click Next. In the Choose Publication Database screen, highlight the publication database and click Next. In the Select Publication Type screen, select Transactional Publication and click Next.
In the Updatable Subscriptions screen that Figure 6 shows, select the second option, Queued updating. Note that if you didn't select Show advanced options in this wizard, as I mentioned earlier, this screen doesn't appear. Click Next. On the Specify Subscriber Types screen, select the Servers running SQL Server 2000 only option because queued updates is a new feature in SQL Server 2000. Click Next.
Configuring the articles. On the left side of the Specify Articles screen, which Figure 7 shows, be sure to check the Show column box in the Tables object type row. Note that only tables that have defined primary keys are eligible for TRQU. Select the tables you want to replicate (e.g., authors), then click Next. The Article Issues screen cautions you that the setup process will add a unique identifier column to the selected table if one doesn't already exist. SQL Server uses this column to detect a change in the state of a row. Click Next.
In the Select Publication Name and Description screen, type a name for the publication and a brief description or accept the default values. In this scenario, I chose the Pubs default name. Click Next. On the Customize the Properties of the Publication screen, you can select one of two options. The second option lets you create a publication based on the definition you've created so far. The first option lets you further customize publication options such as horizontal or vertical filtering, anonymous subscriptions, and scheduling the Snapshot Agent. When you complete this step, you reach the Completing the Create Publication Wizard screen, which summarizes all your selections so far. If everything's OK, click Finish.
This action brings up a screen that shows the progress of the publication creation, followed by a screen announcing that you successfully created the publication. Click Close. The next screen shows the publication that you defined (e.g., Pubs) within the publication database. You can also verify definition of the publication by looking in the Publications folder under the publication database in Enterprise Manager.
Step 3: Configuring a Push Subscription
In Enterprise Manager, highlight the Publisher and choose Tools, Replication, Push Subscriptions to Others. In the Create and Manage Publications on Publisher screen, select the publication you created (in this case, Pubs), then click Push New Subscription, as Figure 8 shows. On the Welcome to the Push Subscription Wizard screen, select the Show advanced options in this wizard check box and click Next. In the Choose Subscribers screen, select the SQL Server machine you want to designate as Subscriber, then click Next. In the Choose Destination Database screen, enter the name of the database at the Subscriber that you want to replicate to and from. (In TRQU, data replication takes place in both directions—Publisher to Subscriber and Subscriber to Publisher.) In this scenario, I use one system, so I designated another database, Northwind. Click Next. On the Select Set Distribution Agent Schedule screen, you can select either the option to run the agent continuously or the option to run the agent on a specified schedule. For this example, I chose to run the agent continuously. Then, click Next.
In the Initialize Subscription screen, choose the first option—Yes, initialize schema—and select Start Snapshot Agent immediately to begin initialization immediately. You select this option to ensure that the Publisher and Subscriber have identical copies of data when the replication starts. Click Next. On the Updatable Subscriptions screen, select the Queued updating option, then click Next. The Start Required Services screen should show that the SQL Server Agent service is running on the Distributor. Click Next. The Completing Push Subscription Wizard screen summarizes all your selections so far. Click Finish. The next screen shows the progress of the subscription creation, followed by a screen that shows successful creation of the subscription. Click Close.
In Enterprise Manager, you can display the subscription you defined for the given publication. Note that the TRQU setup process
- added new jobs at the Distributor (a new job for each agent involved in TRQU at the Distributor)
- listed all agents involved in the replication in the Replication Monitor folder
- changed the structure of the tables involved in TRQU
- added triggers on these tables at the Publisher and Subscriber
- added new stored procedures in the databases—procedure names all begin with sp_MS
To start TRQU, run the Snapshot Agent by right-clicking it, then clicking Start. Next, to perform the initial synchronization, click the Distribution Agent. Your TRQU setup should now be ready. Test it by entering or modifying data in the Publisher table and the Subscriber table and see how and when the changes propagate to the other system or database. Then, try modifying the same row from two places simultaneously and see what happens. When you change a row at the Publisher table, you should see the change reflected at the Subscriber table within a few seconds if you left the log reader and distribution agents to run continuously. When you make a change at the Subscriber table, that change should show up at the Publisher within a few seconds if your queue reader job is running continuously. Now try modifying the same row from two places simultaneously by opening two Query Analyzer sessions, one each to the Publisher and Subscriber. If you left the default Publisher wins policy as the conflict-resolution rule, the Publisher change overrides the Subscriber change. You can change the conflict-resolution policy after the publication is created as long as there are no subscriptions to the publication.
A detailed discussion of possible errors in the TRQU setup is beyond the scope of this article, but one problem in particular is worth noting. Most of TRQU's difficulties occur when the SQL Server Agent doesn't have sufficient permissions for the Publisher and Subscriber to do its job. Remember that the SQL Server Agent at the Distributor is doing most of the work connecting to the Publisher and Subscriber. To avoid this problem, make the SQL Server Agent service run under one domain account on the Publisher, Distributor, and Subscriber and make the account part of the Local Administrators group at the Windows OS level and part of the sysadmin server role within SQL Server.
Now that you've seen the steps involved in setting up TRQU, you can effectively use TRQU to keep business data closely—but not totally—synchronized at multiple sites while allowing data updates at all sites involved in replication. In my next article, I look at how SQL Server moves data between servers and evaluates applicable conflict-resolution algorithms.