Behind the Scenes

Replication is built directly into SQL Server 7.0 and SQL Server Enterprise Manager; it isn't an add-on. You can use system stored procedures or SQL-Distributed Management Objects (SQL-DMO) to implement replication. But unless you have custom needs, it's easier to use the Create Publication Wizard.

For data distribution, SQL Server uses a publisher/subscriber metaphor. The publisher (the source database) makes data available to subscribers. Replication also creates a distributor, which manages the movement of data between the publisher and subscribers. In replication, site autonomy and the latency of data modifications between the publisher and subscriber are the tradeoffs, which the requirements of your distributed environment can help you balance.

Articles, which are based on tables, define the data to publish. A publication can have many articles, but subscriptions are based on publications, not articles. After you create the subscription, the subscriber synchronizes with the publisher. An initial snapshot usually performs this synchronization. SQL scripts and the bulk copy program (bcp) utility create the table for the article in the subscriber and load it with a snapshot of the data on the publisher. The snapshot agent connects to the publisher and writes a copy of each article's table schema to an .sch file on the distributor. If indexes and declarative referential integrity (DRI) components will also be replicated, the agent scripts out the selected indexes to a file with an *.idx extension on the distributor. If all subscribers are SQL Server machines, the snapshot is stored as a native .bcp file. If any subscriber is a heterogeneous data source, the snapshot is stored as a character mode (.txt) file. Together, the .sch and .bcp files act as a synchronization set that corresponds to the state of a table at a given point in time. Each article in a publication has a synchronization set whose files reside in a subdirectory of the distribution database's working directory.

Significantly, the directory where replication files reside is not secured by default. Microsoft recommends that you apply discretionary access control to the universal naming convention (UNC) share (default <drive>$ with a path name \\<computername>\<drive>$\Mssql7\Repldata) that the distributor uses to store snapshot files. This UNC share is an administrative or hidden share. Create an explicit nonadministrative share to the path listed above. If the distributor server is a Windows 9x machine, the snapshot folder defaults to using the <drive> without a share and a path of <drive>:\Mssql7\Repldata. To create pull subscriptions on a Windows 9x server, you must share the folder, thereby making it accessible to replication agents running at the publisher and subscribers. The replication agents need the necessary directory permissions to read and write to the files during the replication process. If you share the snapshot folder incorrectly or assign it incorrect directory permissions, replication fails. In Windows 9x, SQL Server Agent and the replication agents run under the security account of the user logging on to Windows. On NT platforms, the default setting lets replication agents run under the SQLServerAgent service login. However, you can configure the agents to use separate login IDs. Using our clients as an example, if some field reps were using NT Workstation laptops, you'd have to configure the Merge Agent to log in with the field user account to make the dynamic filtering work properly.

Subscriptions to merge publications use a priority setting for data-modification conflicts between subscribers. If two or more subscribers try to change the same data simultaneously, the subscriber with the highest priority wins and can change the data. You can assign this priority to each subscription individually or control it at the publisher. As you might guess, conflict resolution can get confusing when everyone is trying to update the same data. As with any distributed system, the clearer the ownership of data, the better. Although merge replication is designed to deal with conflicts, a setup with multiple updates to large data sets by many users quickly becomes difficult to manage. So, to avoid conflicts, publish only the minimum necessary, and maintain clear ownership of data.

Comments

Plain text