Database Mirroring for Disaster Recovery

Database mirroring is primarily an availability feature, but it has been gaining popularity as a disaster-recovery option. Asynchronous mirroring, in particular, is being leveraged for disaster recovery in scenarios where log shipping might have been used in the past. Asynchronous mirroring offers several enhancements over log shipping, making it an attractive option.

Asynchronous Mirroring vs. Log Shipping

Asynchronous mirroring and log shipping share several common traits:

Both can be used for increased availability and for disaster recovery.
Both protect at the database level. Anything external to the database must be synchronized separately.
Both provide limited point-in-time, read-only capabilities.
Neither one supports the simple recovery model.

Although they’re similar, each technology has some unique advantages over the other. The most obvious advantage with log shipping is its ability to log ship to multiple secondary databases. In addition, log shipping supports several features that asynchronous mirroring doesn’t, such as the bulk-logged recovery model, delayed replaying of log records, and FILESTREAM support. The features supported by asynchronous mirroring but not log shipping include simple reversal of roles, database snapshots, near real-time transfer of transactions, automatic resynchronization of partners after a role change, support for failover of a replication publication, and transparent client redirection.

One of the most compelling reasons to use asynchronous mirroring instead of log shipping is simplified management. This is particularly true for those times when you need to temporarily fail over the disaster-recovery server for maintenance or some other reason.

Understanding Asynchronous Mirroring

When I talk with administrators about implementing asynchronous mirroring as a disaster-recovery solution, one of their primary questions is about the potential for data loss. Not many people fully understand the nature of the data loss potential, and several misconceptions have arisen as a result. If you want to understand how data loss can occur, you first need to understand how the mirroring modes operate.

It’s easier to understand how the mirroring modes work if you think of mirroring as a system of queues. There’s a send queue on the principal server (aka principal) and a redo queue on the mirror server (aka mirror). The send queue’s job is to send log records to the redo queue as they are generated. The redo queue accepts the log records, hardens them to the mirror’s log, and sends back acknowledgements. The mirroring queues aren’t real service broker queues. The best way to describe them is as special views into the transaction log. The send queue is looking directly at the principal’s log file, and the redo queue is looking directly at the mirror’s log file.

With synchronous mirroring, when a transaction is committed, the log record is sent to the redo queue. The redo queue hardens it to the log and sends an acknowledgement back to the principal. When the principal receives the acknowledgement, it completes the commit operation on the principal. If the hardening fails on the mirror, the mirroring session will be suspended until the cause of the failure can be found and corrected. If the principal doesn’t receive confirmation from the mirror, the connection times out and the principal transitions to a disconnected state. When the principal is disconnected, transactions continue to be queued in the send queue and mirroring continues to operate in asynchronous mode until the session reconnects and the mirror catches up with the principal. Figure 1 illustrates how synchronous mirroring works.

Figure 1: How synchronous mirroring works

When mirroring is operating asynchronously, the principal doesn’t wait for a transaction to be hardened on the mirror. As soon as it sends its log record, it completes the commit operation on the principal. If the record hardening fails on the mirror, the mirror disconnects from the mirroring session and the session is suspended. Activity on the principal continues as normal, and transactions continue to be queued in the send queue. When the condition that caused the failure is corrected, the mirroring session can be resumed. Figure 2 illustrates how asynchronous mirroring works.

Figure 2: How asynchronous mirroring works

Figure 2 also shows when data loss might occur. If the principal fails or crashes when running asynchronously, there’s a limited window for potential data loss. Data that has been committed on the principal but has not yet been hardened to the log on the mirror could be lost if you force service on the mirror. If you wait for the principal to return to service, committed transactions won’t be lost.

A big misconception that many people have about the potential data loss is exactly when that loss might occur. If the principal is down and you force service on the mirror, data loss has not yet occurred. If data is going to be lost, it will be lost when the principal comes back online, reconnects, and takes over the mirroring role. Resuming the mirroring session in this scenario is a manual process. Data will not be lost until you resume the session and the principal transitions to the mirror role. When the principal transitions to the mirror role, it performs a synchronization check and any extra transactions it has are rolled back.

Another misconception is that the only option you have for failing over to the mirror is to force service, allowing for data loss. Technically, this is true for asynchronous mirroring. However, if you perform a manual failover, you can avoid this data loss.

Performing a Manual Failover

The process for manually failing over an asynchronously mirrored database is straightforward: You switch to synchronous mode before failing over, perform a traditional failover, and switch back to asynchronous mode after the failover. Here are the steps:

Change the mirroring mode (also known as the safety level) to synchronous using the command
```
ALTER DATABASE  SET SAFETY FULL;
```
where is the name of the mirrored database.
Wait for the databases to become synchronized.
Perform the manual failover with the command
```
ALTER DATABASE  SET PARTNER FAILOVER;
```
After the failover is complete, change the mirroring mode back to asynchronous using the command
```
ALTER DATABASE  SET SAFETY OFF;
```

When you change the operating mode from asynchronous to synchronous, the mirroring session will perform a quick synchronization check. If you attempt to fail over before this check occurs, the attempt will fail and you’ll receive an error message stating that the database wasn’t ready. The check generally takes a fraction of a second, so this failure is difficult to catch unless you run the command to switch the operating mode and the failover command in the same batch. If you encounter this error, simply reissue the failover command. The command should successfully execute on the second attempt.

You can see the synchronization check occur by monitoring the Pages Sent/sec performance counter of the SQLServer:Database Mirroring object. In the sys.dm_os_performance_counters Dynamic Management View (DMV), this counter shows the total number of synchronization checks since the SQL Server service was restarted. If you view this counter in Performance Monitor, the counter shows the total number of synchronization checks per second rather than a cumulative count, so be sure to use the DMV.

The query in Listing 1(below) uses the sys.dm_os_performance_counters DMV to check the Pages Sent/sec counter. You can find a full explanation of Pages Sent/sec and the other database mirroring performance counters in my blog post “SQLU HA/DR Week – Database Mirroring Performance Counters”.

Automating Failovers

Many environments aren’t simple single-database environments. They might contain several mirrored databases that use both synchronous and asynchronous mirroring. Mirroring failovers should be as quick as possible, particularly in complex environments in which a single application might require the availability of multiple databases. This can be achieved through automation.

An automation script for failing over asynchronous mirrors needs to perform the following steps:

Identify the databases to fail over.
Set SAFETY to FULL for all asynchronous mirrors.
Allow the mirrors that were running asynchronously to synchronize.
Fail over all mirrored databases.
Allow the failovers to complete and the mirror to synchronize.
Set SAFETY to OFF for all databases that were originally running asynchronously.

Above via the Download the Code link, you’ll find a script (Failover.sql) that you can use to fail over all mirrored databases on a server. This script requires that a linked server already exist on the current principal server. Failover.sql includes several time loops to allow certain processes time to complete. To ensure that you don’t wait too long for any one process to complete, you might need to reconfigure the maximum timeout for the time loops, as explained in the comments of the script.

In Pro SQL Server 2008 Mirroring (Apress, 2009), I provide a stored procedure that can be used for failing over all databases or a database by name. A similar stored procedure can also be found in my blog post “Database Mirroring: Maximizing Availability Through Automation”.

Synchronizing External Objects

No matter what technology you use for disaster recovery, you need to synchronize objects external to the database before a failover. Several types of objects might need to exist on the failover server in order for applications to operate properly. They include:

Logins
Credentials
Operators
Alerts
Linked servers
Server triggers
Server audits
SQL Server jobs
Database Mail profiles
Server configuration settings (e.g., max text replication size if a replication publisher)
File and backup paths
Windows shares
Custom drivers or add-ons
Windows services

One problem that seems to stymie a lot of people is that SQL Server authentication logins don’t work when they fail over to the mirror database. This problem is caused by mismatched SIDs. Windows logins get their SIDs from Active Directory (AD). SQL Server authenticated logins get their SIDs from the server on which they’re created. If you create identical SQL Server logins on two different SQL Server instances, you’ll get two different SIDs. Database users map to server logins by the SID, not the name.

A popular solution for this problem is to run a procedure to remap the database user to the login, thus updating the database user with a new SID. However, this is just a temporary solution. When you fail back to the original principal, you have to do it again. A better solution is to create the login on the mirror using the same SID as the login on the principal. To do so, you need to query the principal for the logins, their SIDs, and their password hashes. After you have this information, you can re-create the logins using the existing SIDs and passwords. The command to re-create the logins should follow the syntax:

CREATE LOGIN [SQLUser]

  WITH PASSWORD =

   HASHED,

  SID = ;

I created a stored procedure named dba_CopyLogins that handles Windows logins, SQL Server authenticated logins, server roles, and explicitly granted permissions. You can schedule this stored procedure to execute daily during the nightly maintenance window. The procedure relies on a linked server existing on the principal server. You simply pass in the name of the linked server to the principal as the @PartnerServer parameter. The 139794.zip file includes dba_CopyLogins.sql. You can also download it from my blog post “Transferring Logins to a Database Mirror”.

Asynchronous Mirroring as a Disaster-Recovery Solution

As technologies advance, the lines between availability and recovery are getting blurred. Log shipping and database mirroring are increasingly being used to fill both roles. Failover clustering and synchronous mirroring are the mainstays of high availability, whereas log shipping and asynchronous mirroring have been relegated to secondary availability options or primary recoverability options. If you’re currently using log shipping for disaster recovery, I encourage you to consider using asynchronous mirroring instead. I think you will appreciate its enhanced manageability and features.

Listing 1: Query to Check the Pages Sent/sec Performance Counter

DECLARE @ObjName sysname;

-- Set to "MSSQL$" if a named instance.

SET @ObjName =  N'MSSQL$' + CAST(SERVERPROPERTY('InstanceName')

  AS sysname);

-- If not a named instance (NULL), set to "SQLServer".

SET @ObjName = ISNULL(@ObjName, N'SQLServer');

-- Complete setting the object name of the counter.

SET @ObjName = @ObjName + N':Database Mirroring'


SELECT counter_name, instance_name, cntr_value

  FROM sys.dm_os_performance_counters

  WHERE object_name = @ObjName

  AND counter_name = 'Pages Sent/sec'

Comments

Plain text