Looking Under the Hood

Maintaining multiple database servers at multiple sites in active mode and closely synchronizing copies of the data on all servers is a challenge for any DBA. But as long as you can tolerate a little latency, one good option for keeping your data current at all locations is to use SQL Server’s transactional replication with queued updates (TRQU). In the first article in this series, “Queuing Up,” December 2003, I showed how to set up TRQU. In this article, I look at how SQL Server replicates data from Publisher to Subscriber and from Subscriber to Publisher and describe how SQL Server resolves data conflicts. And, for information about TRQU security, see “Login Security for TRQU.”

Modifying the Database Schema

Setting up TRQU modifies the database schema at the Publisher and the Subscriber and adds jobs at the Distributor. Knowing the details of these modifications will help you understand the entire TRQU process.

Modifications at the Publisher. For this article’s illustration, I set up the Authors table in the Pubs database as part of the pubs publication. For each table involved in the replication, the TRQU setup process adds the msrepl_tran_version uniqueidentifier column and the sp_MSsync_upd_trig_Authors UPDATE trigger to the Authors table. In the Publisher database, TRQU adds the sp_MSSync_ins_Authors, sp_MSSync_del_Authors, sp_MSSync_upd_Authors, and sp_MScft_pubs_Authors stored procedures and the conflict_pubs_Authors conflict table.

Modifications at the Subscriber. In the subscribing database (the Northwind database in this scenario), the TRQU process adds the msrepl_tran_version uniqueidentifier column to each table that receives and sends data to the corresponding table at the Publisher—in this example, a table that has the same name and structure as the Authors table. For each replicated table, the setup process also adds to the subscribing database:

The conflict_pubs_Authors conflict table
The MSReplication_objects, MSReplication_queue, MSReplication_subscription, MSSubscription_agents, MSSubscription_articles, and MSSubscription_properties system tables
The trg_MSSync_upd_Authors, trg_MSSync_ins_Authors, and trg_MSSync_del_Authors triggers
The sp_MS_ins_Authors, sp_MS_del_Authors, and sp_MS_upd_Authors stored procedures

Modifications at the Distributor. The TRQU setup process adds four jobs at the Distributor. The Snapshot Agent job creates the snapshot in the repl_snapshot job category. The Log Reader Agent job reads the publishing database log and writes to a staging table on the Distributor in the repl_logreader job category. The Distribution Agent job applies the changes from the staging table on the Distributor to the destination table on the Subscriber in the repl_distribution job category. And the Queue Reader Agent job, in the repl_queuereader job category, reads the queue at the Subscriber and applies those changes to the Publisher.

Updating the Data

The setup process creates other jobs such as history cleanup, but these jobs aren’t relevant to this article’s discussion. So, now let’s look under the hood and determine what happens when TRQU updates data.

Changes at the Publisher. Insertions, deletions, and updates take place in the publishing table. An UPDATE statement, for example, fires the sp_MSSync_upd_trig_Authors trigger. This trigger updates a row’s msrepl_tran_version column with a new globally unique identifier (GUID) value. The new GUID distinguishes the row from its previous version. SQL Server uses this GUID value to detect data conflicts by comparing the row versions at the Publisher and the Subscriber.

Next, in the Publisher database log, SQL Server marks an entry for replication. The Log Reader job runs on a set schedule on the Distributor, connects to the Publisher, reads from the Publisher log the transactions that are marked for replication, then moves commands for those marked transactions to the msrepl_commands table in the distribution database on the Distributor. Finally, the Distribution Agent job runs according to the set schedule on the Distributor (for a push subscription), picks up the pending commands from the Distributor’s msrepl_commands table, then applies them to the corresponding table at the Subscriber.

To insert, delete, or update rows in the destination table, the Distribution Agent job typically executes the sp_MS_ins_Authors, sp_MS_del_Authors, and sp_MS_upd_Authors stored procedures in the subscribing database. These stored procedures contain INSERT, DELETE, and UPDATE statements. Microsoft documentation says that you can append custom code to these stored procedures if your business rules warrant the addition. Note that when the Distribution Agent applies the commands to the table, the triggers on the destination table at the Subscriber don’t fire because the triggers’ code contains a NOT FOR REPLICATION declaration.

Changes at the Subscriber. Next, insertions, deletions, and updates take place in the subscribing table. SQL Server takes four steps in applying these commands. First, SQL Server fires the trigger that corresponds to one of three commands: trg_MSSync_ins_Authors, trg_MSSync_del_Authors, or trg_MSSync_upd_Authors. Next, each trigger captures the appropriate change and writes it to the MSReplication_queue table. Then, as the trigger carries out these actions, it captures the row’s before and after states with the GUID value of the corresponding row versions. For example, the trigger captures an after state for an INSERT, a before state for a DELETE, and before and after states for an UPDATE. Finally, the Queue Reader job runs at the Distributor on a defined schedule, connects to the Subscriber, picks up the transactions from the MSReplication_queue table, connects to the Publisher, then applies the changes to the Publisher. The job applies these changes by using the appropriate stored procedure in the publishing database: sp_MSSync_ins_Authors, sp_MSSync_del_Authors, or sp_MSSync_upd_Authors.

Resolving Conflict

When the Queue Reader job executes one of the three stored procedures (sp_MSSync_ins_Authors, sp_MSSync_del_Authors, or sp_MSSync_upd_Authors), SQL Server checks for any data conflicts. If conflicts are evident, SQL Server preserves the “losing” row in the appropriate conflict table—in this scenario, the conflict_pubs_Authors table—at the Publisher and at the Subscriber, if you have chosen to view conflicts at the Subscriber by selecting the publication property that allows decentralized conflict reporting.

When you permit simultaneous data updates at the Publisher and the Subscriber, you risk having updates to the same row on both servers at once. Therefore, to maintain data consistency across the Publisher and Subscriber, you need to decide which change you want to override the other. TRQU offers three override options. When you select the Publisher always wins option, the change at the Publisher overwrites the change at the Subscriber; consequently, the change at the Subscriber is lost. Selecting the Subscriber always wins option means that the change at the Subscriber overwrites the change at the Publisher, and the change at the Publisher is lost. When you choose the option Publisher always wins and subscription is re-initialized, the change at the Publisher overwrites the change at the Subscriber; then SQL Server completely resynchronizes the Publisher and Subscriber, using the Publisher’s data as the master version. In this case, the change at the Subscriber is again lost.

To better understand the conflict resolution process, let’s look at the Publisher always wins option in detail. For illustration purposes, I’ve chosen the most complex algorithm from the conflict-resolution algorithms that you can use. Here’s the scenario: SQL Server has updated a row at the Subscriber table, and the Queue Reader Agent is applying the change to the Publisher table by using the sp_MSSync_upd_Authors stored procedure in the Publisher database. Note that if the Distribution Agent detects a conflict at the Subscriber when you’re pushing data from the Publisher to the Subscriber, the Distribution Agent takes no action, leaving the Subscriber row untouched during this run. During data reconciliation, the Queue Reader Agent puts compensating commands into the replication commands queue to ensure that the publisher data change overrides the subscriber data change; during the subsequent run, the Distribution Agent applies these changes to the Subscriber.

The flowchart that Figure 1 shows illustrates the steps involved in the conflict-resolution process for this scenario. Note that I focus my explanation on the core process, not on the numerous other operations that the program logic performs. I developed the flowchart by examining the sp_MSSync_upd_Authors stored procedure code. Moving from top of the chart downward, I’ll follow one branch of the algorithm to illustrate the general approach and flow of logic. You then can use the flowchart to understand all the branches in the algorithm simply by extending the following logic.

The sp_MSSync_upd_Authors stored procedure has as its main input parameters the old values of all the row’s columns at the Subscriber before the update, as well as the corresponding new values after the update. Important for the discussion here are the primary key values (au_id in the case of the Authors table), which uniquely identify the rows, and the msrepl_tran_version GUID values, which uniquely identify a specific version of each row.

In the first step, SQL Server looks for a row in the publishing table (hereafter called the Publisher row) that has the same primary key and row version GUID as the Subscriber row’s corresponding old values that you’re passing to the stored procedure. If SQL Server finds such a row, no simultaneous update has occurred at the Publisher, and the current Publisher row is the same version as the old Subscriber row before the update. SQL Server concludes that no conflict exists. However, if no matching row exists, a simultaneous update of the same row at the Publisher has occurred, and you have a conflict, as callout A in Figure 1 shows.

Next, SQL Server checks whether a primary key update has occurred at the Subscriber. As callout B shows, if the primary key hasn’t been updated, SQL Server looks for a row in the Publisher table that has the old primary key value. If SQL Server doesn’t find such a row, it uses the sp_MS_del_Authors stored procedure and the old primary key to generate a compensating DELETE command for the row, as callout C shows. If SQL Server finds a row in the Publisher table that has the old primary key value, SQL Server uses the sp_MS_del_Authors stored procedure and the old primary key to generate a compensating DELETE command for the row, as callout D shows. Finally, SQL Server uses the sp_MS_ins_Authors stored procedure and the current values in the Publisher row that has the same primary key to generate a compensating INSERT command for the row. In addition, SQL Server preserves the overwritten row in the appropriate conflict table—in this case, conflict_pubs_Authors—at the Publisher and, if you chose it as a publication property at the time you created the publication, also at the Subscriber.

Note that in all of the preceding scenarios, SQL Server puts the compensating commands into the msrepl_commands table on the Distributor server, bypassing the Publisher database log. Why does SQL Server take this action? SQL Server records in the Publisher database log the update that the Queue Reader Agent applies to the Publisher table; then, the Log Reader Agent reads the update again. Issuing compensating commands is SQL Server’s way of creating the same state at the Subscriber table as existed before the Queue Reader Agent applied the update. When the same state exists, the UPDATE command making its way through the Publisher database log follows the compensating command and updates the row correctly. The logic seems circuitous, but observing a few iterations of the conflict-resolution process should help you understand this concept.

Let’s look again at callout C. At that point in the flowchart, no rows at the Publisher match the old Subscriber row in primary key and version GUID. Second, the transaction didn’t update the Subscriber primary key. And at the Publisher, no rows have the same primary key as the old Subscriber row. So SQL Server knows that the row with the given primary key was deleted at the Publisher and therefore now needs to be deleted from the Subscriber.

When you select the Publisher always wins option, SQL Server needs to return to the Subscriber table and delete the row that has the same primary key as the row that was modified at the Subscriber and deleted at the Publisher, so that neither the Publisher nor the Subscriber has a row that contains that primary key. SQL Server will synchronize the Publisher and Subscriber data later. In a Publisher always wins setting, no updating occurs at the Publisher, and the Queue Reader Agent doesn’t modify the Publisher row when a conflict occurs.

To delete the modified row at the Subscriber, SQL Server uses the primary key to generate a compensating DELETE command. Then, SQL Server puts the command into the distribution database so that the next time the Distribution Agent runs, it picks up the command to run the sp_MS_del_Authors stored procedure, which deletes the appropriate row at the Subscriber.

Now, let’s take another look at the step that callout D depicts. At that point in the flowchart, a row doesn’t exist at the Publisher that matches the old Subscriber row in primary key and version GUID. Also, the transaction didn’t update the primary key. Furthermore, at the Publisher, a row exists that has the same primary key as, but a different GUID than the old Subscriber row. Now you can tell that the row with the given primary key was updated at both the Publisher and the Subscriber.

When you’ve selected the Publisher always wins option, SQL Server needs to delete the Subscriber table row that has the same primary key as at the Publisher, then insert a row in the Subscriber table that exactly matches all the current column values in the Publisher table that have the same primary key. Again, no updating occurs at the Publisher.

SQL Server uses the primary key to generate a DELETE compensating command, then drops the command into the distribution database. The next time the Distribution Agent runs, it picks up the command to run the sp_MS_del_Authors stored procedure, which deletes the appropriate row at the Subscriber.

SQL Server uses the current values in the Publisher row that contains the primary key to generate an INSERT compensating command, then drops the command into the distribution database. The next time the Distribution Agent runs, it picks up the command to run the sp_MS_ins_Authors stored procedure, which inserts the appropriate row at the Subscriber.

Give TRQU a Spin

TRQU is well suited for systems that can tolerate some data loss (in case of a conflict) but where a large degree of synchronization is essential. For example, user or group customizations of favorite Web sites might store preferences in SQL Server tables and have two or more sites for redundancy. Conversely, financial transactions can’t tolerate even minimal loss of data and aren’t good candidates for TRQU. In the final article in this series, I’ll describe how to back up and restore replication implementations.

Comments

Plain text