Getting Started with Transactional Replication

SQL Server replication encompasses a set of capabilities that have yet to be fully replaced by other features found natively in SQL Server. Some of the common use cases include the ability to distribute data to separate SQL Server instances and databases for reporting and for offloading activity away from a primary production database. Replication can bring data nearer to the users in distributed environments and enable scaling out activity by multicasting to multiple destinations. And more controversially, some have even found a place for replication from a disaster recovery or high availability perspective. (For more information about this particular subject, see Paul S. Randal's blog post "In defense of transactional replication as an HA technology.")

Although the learning curve for replication can be intimidating, the successful use of replication comes down to understanding the components involved. The best way to gain this understanding is to practice basic deployments, then investigate the associated user schema modifications, metadata objects, SQL Server Agent jobs, and associated agent settings. To help you, I'll first briefly go over replication terminology and describe how the various components work together toward a solution. I'll then step you through a standard deployment of transactional replication, which is a common type of replication.

Understanding the Basics of Replication

There are different types of replication, so I'll start off describing the feature set at a high level. Replication is a bundling of technologies that enable you to copy, distribute, and synchronize specific types of database objects and their associated data and dependencies from one database to one or more databases on the same SQL Server instance or different SQL Server instances (which is more common). The SQL Server instance hosting the database from which you'll be distributing data is called the Publisher. Within a user database, you can define one or more publications, which are a logical bundling of one or more articles. An article is a specific object within the publication database that you want to distribute to another database. Some of the permitted article object types include user-defined tables, stored procedures, and views.

Replication requires a separate database to house both metadata and in-flight data. This database is called the distribution database, and the SQL Server instance that hosts it is called a Distributor. The Distributor can be the same instance as the Publisher, a separate SQL Server instance, or the instance where the data is being distributed to. The decision of where to place the distribution database isn't always consciously considered, but when it is, it's typically due to performance-overhead considerations or availability concerns (e.g., if transactional replication is used in conjunction with database mirroring).

The entity that receives the data from the Publisher is called the Subscriber. The Subscriber can be the same SQL Server instance as the Publisher, the same SQL Server instance as the Distributor, or a separate SQL Server instance altogether. A specific Subscriber is defined by adding a subscription to a specific publication. The subscriber database can contain both replicated and nonreplicated objects and can host more than one subscription from different publications.

A subscription can be defined as push (data is pushed to the subscription database from the Distributor) or pull (data is pulled to the subscription database from the Subscriber). Push subscriptions are more common with snapshot and transactional replication deployments. Pull subscriptions are more common with merge replication because the subscriber might be disconnected more frequently and might need to control when data is refreshed on-demand. The push/pull decision is sometimes also based on the capacity and overhead of the various servers participating in the replication topology.

External executables, which are called replication agents, move the data from the Publisher to the Distributor and then to the Subscriber. The type of replication agent is dependent upon the type of replication being used.

Choosing a Replication Type

There are three major types of replication: snapshot, transactional, and merge. There are also variations based on these, such as peer-to-peer replication, but I won't be elaborating on them here.

Snapshot replication lets you distribute data from a specific point in time. That snapshot is distributed, but it's not updated further by the replication process unless a new snapshot is created and applied again to the Subscriber. Snapshot replication is most commonly used to establish the article schema and associated data on the subscribers for transactional and merge replication.

Merge replication allows Subscribers to directly modify the subscription data (articles) and then synchronize the modified rows with the Publisher. As mentioned previously, the Subscribers might be offline for periods of time, reconnect periodically, and synchronize on-demand. At this point, their changes are synchronized with the Publisher and vice versa. This introduces the possibility of data conflicts, where a Publisher and one or more Subscribers attempt to modify the same set of data.

This last type of replication—and the one I'll be discussing from this point forward—is transactional replication. When architected properly, transactional replication can provide low-latency streaming of data modifications made at the Publisher to one or more Subscribers. Replicated data is typically treated as read-only on the Subscriber. Modifications to Subscriber data will cause the overall data set to get out of sync with the Publisher data. Transactional replication offers the now deprecated "updatable subscriptions" feature, which allows Subscribers to replicate data from the Subscriber back to the Publisher. Peer-to-peer replication (a SQL Server Enterprise edition feature) also allows bi-directional replication of data modifications of transactional replication databases.

Implementing Transactional Replication

Because this is an introduction to transactional replication, I'll show you how to use various SQL Server Management Studio (SSMS) wizards to implement this functionality. (You can also use scripts. For more information, see the sidebar "Wizards or Scripts?")

Wizards or Scripts?
DBAs can use a combination of both SQL Server Management Studio (SSMS) wizards and scripts to implement and support transactional replication. If you're serious about supporting replication for mission-critical purposes, be sure to understand both methods. Use the wizards as a quick way to pre-generate the scripts, then test and modify them accordingly. Use the scripts to automate deployments, validate desired settings, and facilitate fast recovery. So, while you're stepping through the sample replication deployment, I encourage you to also use the script option in the various Wizard Actions screens in parallel so you can see what's happening behind the scenes.

If you want to follow along, I suggest that you use an environment similar to the following:

This demo uses two SQL Server 2008 R2 SP1 instances on separate servers in the same domain. (Note that the Enterprise edition isn't a requirement.) Virtual servers are fine, assuming they're properly networked. Some figures will reference my test servers, which are named SQLSKILLS-NODE1 (Publisher and Distributor) and SQLSKILLS-NODE2 (Subscriber).
Both SQL Server instances need the replication components installed. If you selected the SQL Server Replication option in the Feature Selection page when installing SQL Server, the replication components are installed. If you didn't select this option during installation, you can install the replication components separately.
The Agent service must be started on the Publisher SQL Server instance.
The test domain should ideally have a separate domain account for the Snapshot Agent, Log Reader Agent, and Distribution Agent processes. For example, I provisioned three accounts: SQLskills\SQLskillsSnapshotAGT, SQLskills\SQLskillsLogReaderAGT, and SQLskills\SQLskillsDistAGT.
The subscription will be on a database named AWReporting.

The SQL Server 2008 R2 version of the AdventureWorks sample database will be used for the publication database. You're going to replicate the table definition and associated data from the following tables in the AdventureWorks database:

[Person].[Address]
[Person].[Contact]
[Person].[CountryRegion]
[Person].[StateProvince]
[HumanResources].[Employee]
[HumanResources].[EmployeeAddress]

In addition, you'll replicate the [HumanResources].[vEmployee] view definition. Although a view doesn't contain data, you can still replicate the object definition, assuming you've included all dependent objects in the publication.

Configuring the SQL Server Instance for Publication

Here are the steps to configure a SQL Server instance as a new Publisher:

1. Using SSMS, click the Connect drop-down box in Object Explorer (left pane) and select the SQL Server instance that will be the Publisher and Distributor.

2. In Object Explorer, expand the SQL Server instance node. Right-click the Replication folder and select Configure Distribution.

3. In the Configure Distribution Wizard screen, click Next. This screen won't appear if you previously disabled it by clicking the Do not show this starting page again option.

4. In the Distributor screen, which Figure 1 shows, you designate whether this SQL Server instance will act as its own Distributor or if another server will act as the Distributor for local publications. For this demo, have the publisher act as its own Distributor and click Next.

Figure 1: Designating that the SQL Server instance will act as its own Distributor

5. If not already configured, the SQL Server Agent Start screen lets you select whether to automatically start the SQL Server Agent service when the computer is started. Select the Yes, configure the SQL Server Agent service to start automatically option and click Next.

6. The Snapshot Folder screen designates the root location for snapshot storage. For this demo, use the default local path. Choosing a local path will raise the warning that the folder doesn't support pull subscriptions. (Those require a network share.) Write down the default local path, because you'll need it later. Click Next.

7. The Distribution Database screen shows options for configuring the name of the distribution database and database file locations. In a production system, you'll want to ensure that you place the distribution database files on an I/O path that can meet throughput and latency requirements based on the replication topology volume. For this scenario, keep the defaults and click Next.

8. In the Publishers screen, you specify which servers can use the Distributor for publications. In this example, the desired SQL Server instance will already be selected. Clicking the ellipsis (...) to the right of the Distribution Database column shows additional settings, such as the Agent Connection Mode (specifies how to connect to the Publisher) and Default Snapshot Folder. Click Next.

9. In the Wizard Actions screen, you tell the wizard to automatically configure distribution, generate a script that you can use to configure distribution, or both. In this case, keep the default and click Next.

10. In the Complete the Wizard screen, select Finish. You'll see the status of the steps as they execute. Click Close when the wizard successfully completes all the steps.

Enabling a Database for Publication

The following steps detail how to enable a database for publication:

1. In Object Explorer, expand the SQL Server instance node you just enabled as a Publisher. Right-click the Replication folder and select Publisher Properties.

2. In the Publisher Properties dialog box, notice that there are two options in the Select a page pane on the left. The General page shows the Distributor settings for the specific publishers. For this demo, make sure the Distributor setting states This server acts as its own Distributor.

3. Select Publication Databases in the Select a page pane. You'll see a list of all databases available for replication. Select the check box that intersects the AdventureWorks row and Transactional column, as Figure 2 shows. Click OK.

Figure 2: Selecting the database to replicate

Configuring Security for the Agents

The following steps describe how to provision the minimum required permissions to the three replication agent service accounts used in this demo's topology:

1. On the Publisher SQL Server instance, execute the code in Listing 1 in a new query window. This code will create the Snapshot Agent login, create a user, and add db_owner membership in the distribution and publication databases.

USE [master];
GO
CREATE LOGIN [SQLskills\SQLskillsSnapshotAGT] FROM WINDOWS;

USE [distribution];
GO
CREATE USER [SQLskills\SQLskillsSnapshotAGT]
  FOR LOGIN [SQLskills\SQLskillsSnapshotAGT];

EXEC sp_addrolemember N'db_owner',
  N'SQLskills\SQLskillsSnapshotAGT';

USE [AdventureWorks];
GO
CREATE USER [SQLskills\SQLskillsSnapshotAGT]
  FOR LOGIN [SQLskills\SQLskillsSnapshotAGT];

EXEC sp_addrolemember N'db_owner',
  N'SQLskills\SQLskillsSnapshotAGT';

2. Execute the code in Listing 2. This code will create the Log Reader Agent login, create a user, and add db_owner membership in the distribution and publication databases.

USE [master];
GO
CREATE LOGIN [SQLskills\SQLskillsLogReaderAG] FROM WINDOWS;

USE [AdventureWorks];
GO
CREATE USER [SQLSKILLS\SQLskillsLogReaderAG]
  FOR LOGIN [SQLSKILLS\SQLskillsLogReaderAG];

EXEC sp_addrolemember N'db_owner',
  N'SQLskills\SQLskillsLogReaderAG';

USE [distribution];
GO
CREATE USER [SQLSKILLS\SQLskillsLogReaderAG]
  FOR LOGIN [SQLSKILLS\SQLskillsLogReaderAG];

EXEC sp_addrolemember N'db_owner',
  N'SQLskills\SQLskillsLogReaderAG';

3. Execute the code in Listing 3. This code will create the Distribution Agent login, create a user, and add db_owner membership in the distribution database. Note that SQL Server Books Online (BOL) states that the Distribution Agent account needs to be a member of the publication access list (PAL). The PAL secures access to the publication and prevents unauthorized logins from subscribing. There are idiosyncrasies in PAL behavior for some scenarios, so to learn more, read my blog post "When is the Publication Access List required?"

USE [master];
GO
CREATE LOGIN [SQLskills\SQLskillsDistAGT] FROM WINDOWS;

USE [distribution];
GO
CREATE USER [SQLskills\SQLskillsDistAGT]
  FOR LOGIN [SQLskills\SQLskillsDistAGT]

EXEC sp_addrolemember N'db_owner',
  N'SQLskills\SQLskillsDistAGT';

4. Connect to the Subscriber instance and execute the code in Listing 4. This code will create the Distribution Agent login, create a user, and add db_owner permissions in the subscription database.

USE [master];
GO
CREATE LOGIN [SQLskills\SQLskillsDistAGT] FROM WINDOWS;

USE [AWReporting];
GO
CREATE USER [SQLSKILLS\SQLskillsDistAGT]
  FOR LOGIN [SQLSKILLS\SQLskillsDistAGT]

EXEC sp_addrolemember N'db_owner',
  N'SQLskills\SQLskillsDistAGT'

5. Assign the Snapshot Agent account write permissions to the snapshot folder.

6. Assign the Distribution Agent account read permissions to the snapshot folder.

Although you assigned db_owner in various places, this is the official minimum required permissions. Compared with sysadmin permissions on a single account, what you've done is indeed a more secure configuration.

Creating a Publication

Follow these steps to create a new publication:

1. In Object Explorer, expand the Publisher SQL Server instance and expand the Replication folder. Right-click the Local Publications folder and select New Publication.

2. In the New Publication Wizard screen, select Next (assuming you haven't previously disabled this screen).

3. In the Publication Database screen, select AdventureWorks. Notice that you can't select multiple databases, because a single publication is scoped to a single database. Click Next.

4. In the Publication Type screen, select Transactional publication and click Next.

5. In the Articles screen in Figure 3, expand Tables and select the tables listed previously in the "Implementing Transactional Replication" section. Afterward, scroll down, expand Views, and select vEmployee (HumanResources).

Figure 3: Selecting the tables to replicate

6. In the Objects to publish list, click the top node value of Tables, click the Article Properties button, and select Set Properties of All Table Articles. Review the various article settings in the Properties for All Table Articles dialog box, which Figure 4 shows. Click OK.

Figure 4: Reviewing the article settings

7. In the Articles screen, select Next.

8. The Article Issues screen warns you that because you chose a view for an article, you should ensure that all view-referenced objects are available on the Subscriber. Click Next.

9. In the Filter Table Rows screen, you can add predicates that filter which rows within a table should be replicated. Be aware that filters can add latency in high throughput topologies. Make no changes, and click Next.

10. The Snapshot Agent screen lets you specify when to run the Snapshot Agent and create the new snapshot. Select the Create a snapshot immediately and keep the snapshot available to initialize subscriptions option. When you choose this option, you're actually setting the immediate_sync property to True, which impacts data retention in the distribution database. If you have a high throughput topology, you need to consider the effect that this setting might have. Click Next.

11. The Agent Security screen specifies the Snapshot Agent and Log Reader Agent credentials. Deselect the Use the security setting from the Snapshot Agent check box and click the Security Settings button for the Snapshot Agent.

12. In the Snapshot Agent Security dialog box in Figure 5, select the Run under the following Windows account option, enter the account and password, and click OK.

Figure 5: Entering the Snapshot Agent credentials

13. In the Agent Security screen, click the Security Settings button for the Log Reader Agent. Enter the account and password. Click OK, then Next.

14. In the Wizard Actions screen, select the Create the publication option and click Next.

15. In the Complete the Wizard screen, enter the publication name (I chose Pub_AdventureWorks) and click Finish. You'll see the status of the steps as they execute. Click Close when the wizard successfully completes all the steps.

Creating the Push Subscription

Follow these steps to create a new push subscription:

1. From the Replication folder on the Publisher/Distributor, expand Local Publications, right-click [AdventureWorks]:Pub_AdventureWorks, and select New Subscriptions.

2. In the New Subscription Wizard screen, click Next (assuming you haven't previously disabled this screen).

3. In the Publication screen, ensure the new publication is selected. Click Next.

4. The Distribution Agent Location screen specifies the type of subscription (push or pull). Keep the default and click Next.

5. In the Subscribers screen, click Add Subscriber, then Add SQL Server Subscriber. In the Connect to Server dialog box, connect to the Subscriber SQL Server instance.

6. In the Subscribers screen, there will now be a second server listed. Select the subscription database for that server from the drop-down list. Click Next.

7. In the Distribution Agent Security screen (see Figure 6), click the ellipsis to bring up the Distribution Agent Security dialog box. In it, enter the Distribution Agent account and password. Click OK.

Figure 6: Configuring security for the Distribution Agent

8. In the Distribution Agent Security screen, click Next.

9. In the Synchronization Schedule screen, leave the value at the default (run continuously) and click Next.

10. In the Initialize Subscriptions screen, keep the default to initialize the subscription immediately. Click Next.

11. In the Wizard Actions screen, keep the default Create the subscription(s) option selected and click Next.

12. On the Complete the Wizard page, select Finish. After all the steps have been successfully completed, click Close.

Testing the Replication Deployment

Now you should test the replication deployment to see whether it's working as intended. Follow these steps:

1. In the publication database, execute the code:

INSERT [Person].[Address]
(AddressLine1, City, StateProvinceID, PostalCode)
VALUES
('2222 Test Drive', 'Minneapolis', 36, 55410);

2. Wait about 10 seconds, then execute the following SELECT statement on the Subscriber:

SELECT AddressID
FROM [Person].[Address]
WHERE AddressLine1 = '2222 Test Drive';

If everything was implemented correctly, the newly inserted row should be returned.

3. Test the replicated view on the Subscriber by running the code:

SELECT COUNT(*)
FROM HumanResources.vEmployee;

When I ran this code, I received 290 rows and no failure message, indicating that all dependent objects were there.

Learning Through Practice

This introduction to implementing transactional replication only scratches the surface of what you'll need to understand in order to deploy and support transactional replication in your own environment. Although the scope of what you'll need to learn might seem daunting, the best way to grow your replication skills is to practice them in service of your company's application and business objectives.

Comments

Plain text