Solving Merge Replication Performance Problems

One of my clients in the insurance industry has an application that has two types of users: online users who directly access the application's database,ApplicationDB, via the LAN and mobile users who access a local copy of the database on their laptops and then use SQL Server merge replication to synchronize the changes with the central database via VPN connections.

The client's IT department observed that application performance for the LAN users was slow, and numerous tests showed that the application would work much faster if only LAN users used it.The merge replication operations running in the background were significantly affecting online performance. Migrating the application to SQL Server 2005 had resolved many performance issues for mobile users, but online users continued to experience long response times for their queries.

Fortunately, we were able to point online and mobile users to different instances of the application database on different servers to boost application performance for online users and still let the company benefit from the real-time data transfer that merge replication offers. Here's how we did it.

Using Transactional Replication

We can achieve separation of online and mobile users by placing a copy of ApplicationDB on a separate server for online users and connecting the copy to a merge replication publisher database. Online users then query their own database, and a data exchange mechanism synchronizes the data entered by the mobile and online users.

There are three possible scenarios for keeping the separate databases synchronized:

Adding the online database as another subscriber in merge replication.
Using SQL Server 2005 Integration Services (SSIS—or DTS in SQL Server 2000).
Using bidirectional transactional replication to sync up the data between an online transaction processing (OLTP) database for the online users and the merge replication publisher database. I call the server that runs the merge replication publisher database the bridge server to reflect the fact that it connects the online and mobile users.

The first possibility,adding the online database as another subscriber,won't solve the problem. The same trigger-based mechanism will affect the performance of online operations.

Using SSIS, the second possible method, will add latency to the data synchronization process, and it requires designing a special "loopback prevention" mechanism (to prevent transactions from being sent back to the originating server).This method would also probably require the creation of additional data-staging areas on both the OLTP and merge replication sides.

Bidirectional transactional replication is the best choice for our needs. It has a built-in loopback detection mechanism. It transmits data much more quickly than merge replication, and its effect on online operations is minimal because the log reader in transactional replication operates with already committed transactions in the database log. Figure 1 shows the architecture for a bidirectional transactional replication solution. For additional information about bidirectional replication, see "How to: Configure Nonpartitioned,Bidirectional,Transactional Replication (Replication Transact-SQL Programming)" in SQL Server 2005 Books Online.

There are a couple of difficulties in dealing with bidirectional replication in this situation: We must create a conflict-resolution mechanism on both sides, and the merge replication publisher must accept the transactional replication load. Let's look at how to address these issues.

Conflict-Resolution Mechanism

We can resolve conflicts that occur when users make simultaneous changes to the different copies of the database by modifying three custom stored procedures. BOL contains this description of the stored procedures: "By default transactional replication makes all data changes at Subscribers through stored procedures that are generated by internal procedures for each table article in the publication.The three procedures (one each for inserts, updates, and deletes) are copied to the Subscriber and execute when an insert, update, or delete is replicated to the Subscriber." We must modify the procedures for each article in each publication.

One column of type datetime is required in each article to implement the conflict resolution mechanism in bidirectional transactional replication. If a table doesn't have such a column that stores the Universal Time Coordinate (UTC) time at which the transaction was created or modified, we must add the column to the table scheme. We'll name this column Version. The default value for this column should be getUTCdate().

To ensure that the value of column Version always changes when the record is updated, we must create a simple update trigger on each table. (This step might not be required if our developers confirm that each update statement also updates the value in the Version column.)

For this particular client, the rules needed to specify that when an online transaction and a mobile transaction were delivered to the bridge server at about the same time, the online transaction should be carried out. This preference for the online transaction reflects the fact that in most cases, the mobile transaction will have occurred earlier in time than the online one that arrives on the bridge server at about the same time—thus the online transaction will probably better reflect the current business situation.

You can see that the conflict-resolution stored procedures need to be different on the bridge and online sites—the two servers will have opposite conditions for accepting or rejecting incoming transactions.We must also allow for server time difference and replication latency in our conflict-resolution stored procedures. We might say that two records were updated at the same time if the difference between the transaction times is less than or equal to 15 seconds.

We could express the condition to reject incoming changes on the OLTP server as

IF DATEDIFF (ss, @curr_Version,
  @Version) <= 15

where @curr_Version is the value in the Version column in the current record on the destination server (the OLTP server, in this case) and @Version is the value in the Version column in the new record that was replicated to the OLTP server.

The opposite condition would exist for rejecting an incoming change on the bridge server, as in

IF DATEDIFF (ss, @Version,
  @curr_Version) > 15

In this expression, @curr_Version is the value in the Version column in the current record on the destination server (the bridge server, in this case) and @Version is the value in the Version column in the new record that was replicated to the bridge server.

Configuring the Bridge Server to Accept the Replication Load

To configure the bridge server to accept the load from both merge and transactional replication, we modify the value of the published_in_tran_pub property to true in each article in the merge replication publication.This property, which was introduced in SQL Server 2000 Service Pack 3 (SP3) is false by default. When true, the published_in_tran_pub property indicates that an article in a merge publication is also published in a transactional publication.

When we change this parameter in sp_changemergearticle, we must invalidate the snapshot and reinitialize the subscribers. Listing 1 shows the complete command syntax for our case in this article. If you were to use the command in your environment, you would need to substitute the publication name and article name in Listing 1 with the proper names for your situation.

The value of 1 for the @force_invalidate_ snapshot parameter means that a new snapshot is required and gives permission for the existing snapshot to be marked as obsolete and a new snapshot generated.The value of 1 for the @force_reinit_subscription parameter means that changes to the merge article will cause existing subscriptions to be reinitialized, and it gives permission for the subscription reinitialization to occur.

This command will succeed only if the default setting of @pre_creation_cmd = N'drop' is used on the merge replication article.We can start snapshot agent jobs on the bridge server to regenerate snapshots for all merge publications.

Installing the Demo

I've built a small example to demonstrate the configuration of one SQL Server system as the OLTP server, one SQL Server system as the bridge server, and one SQL Server system as a mobile (merge replication) pull subscriber. Figure 2 shows this demo configuration.This demo uses one table,Test1,created in the Pubs database. We can implement this solution in SQL Server 2000 SP4 or in SQL Server 2005 SP1. It can't be configured in the original release of SQL Server 2005 due to a minor bug, but the problem was resolved in SP1.

To install the demo,follow the steps below:

Install three instances of SQL Server 2005 SP1 or SQL Server 2000 SP4. In my demo code, I use the following names for the SQL Server instances: A2005 for the OLTP server, B2005 for the bridge server (merge replication publisher), and C2005 for the mobile server (merge replication subscriber).
Create tableTest1 in the Pubs database on each of the three servers by running the script that Web Listing 1 shows.
Using SQL Server Management Studio, configure distribution and publishing on the bridge server. Make the B2005 server its own distributor and the only publisher. Enable the Pubs database for both transactional and merge replication. If you're running SQL Server 2000, enable the OLTP server A2005 and the mobile server C2005 as subscribers.
Using Management Studio, configure distribution and publishing on the OLTP server. Make the A2005 server its own distributor and the only publisher. Enable the Pubs database for transactional replication. If you're running SQL Server 2000, enable the bridge server B2005 as a subscriber.
Using Management Studio, create a merge publication in the Pubs database on the bridge server B2005. In the sample code,Pubs_Merge_Test1 is used as the name of the publication (see Listing 1). For the purposes of this demo only, configure the merge agent to run continuously.
6. Using the Pull Subscription Wizard in Management Studio on the mobile server C2005, create a new pull subscription to the already configured merge publication (configuring the bridge server B2005 as the distributor and publisher).
In Management Studio on the bridge server B2005, run the script (in Listing 1) that changes the published_in_tran_pub property of the Test1 article. If desired, restart the snapshot agent to regenerate a snapshot, then restart the merge agent.
In Management Studio on the OLTP server A2005, run the script in Web Listing 2. This script creates the Pubs_Test1 transactional publication in the Pubs database on A2005, adds theTest1 article to this publication, and creates a subscription to this publication on the bridge server B2005. For additional information, see "How to: Configure Nonpartitioned, Bidirectional,Transactional Replication ( Replication Transact-SQL Programming)."
In Management Studio on the bridge server B2005, run the script in Web Listing 3. This script creates the Pubs_Test1 transactional publication in the Pubs database on B2005, adds the Test1 article to this publication,and creates a subscription to this publication on the OLTP server A2005.
In Management Studio on the bridge server B2005, run the script in Web Listing 4 to install custom stored procedures for transactional replication on the bridge server.
In Management Studio on the OLTP server A2005, run the script in Web Listing 5 to install custom stored procedures for transactional replication on the OLTP server.
For the purposes of the demo, you can modify the merge agent and distribution agent profiles to speed up data transmission. First, on the OLTP server, create a new profile called Speed for the distribution agent with the same values as the Default profile. Change the default value of the PollingInterval property to 2. Assign the Pubs_Test1 publication the new Speed profile, then stop and start the distribution agent on the OLTP server. Second, repeat these steps for the distribution agent on the bridge server.
Third, on the bridge server, create a new profile called Speed for the merge agent with the same values as the Default profile. Change the default value of the PollingInterval property to 2. Figure 3 shows the windows that you need to open to configure the profile for the merge agent.Assign the Pubs_Merge_Test1 publication the new Speed profile, then stop and start the merge agent on the bridge server.
Test data replication among all three servers.

Tried and True

The technology I've described was successfully implemented in production at the customer site in fall 2005.We separated the data loads of the mobile and online users without major changes in the application topology. We made the following relatively minor changes:

configured an additional instance of SQL Server
pointed online users to query the new database server
wrote three custom stored procedures for all the current articles in the merge publications
reinitialized all pull subscriptions after changing the value of the published_in_tran_pub property for all the articles

The customer wrote a code generator in TSQL to simplify the process of preparing custom stored procedures for 200-plus replicated tables.

Bidirectional transactional replication is transparent for both the SQL Server 2000 SP4 and SQL Server 2005 SP1 environments. The bidirectional and new peer-to-peer replication offered in SQL Server 2005 doesn't offer a conflict resolution mechanism. The improved merge replication mechanism in SQL Server 2005 is processing transactions of mobile users more quickly, but it remains trigger-based. So separating the online and mobile users in mixed environments could be necessary even after migration to SQL Server 2005 if you have hundreds of mobile users and a large number of online users.

Comments

Plain text