Exchange 2010: High Availability with DAGs

Because email is a mission-critical application, Microsoft has invested a lot of engineering talent as well as money over the years to provide Microsoft Exchange Server with the ability to resist different types of failure and deliver a highly available service. Exchange Server 2007 was a watershed for high availability in many ways because of the introduction of log replication technology in local continuous replication (LCR), cluster continuous replication (CCR), and standby continuous replication (SCR). Now Exchange Server 2010 takes a new approach to high availability by introducing the Database Availability Group (DAG), which is based on many of these same log replication techniques.

However, working with DAGs introduces new concepts, design challenges, and operational concerns that administrators have to understand before bringing a DAG into production. This article covers the underlying concept and explains Microsoft's motivation for the introduction of DAGs in Exchange 2010. A future article from Paul Robichaux will discuss how to build your first DAG.

High Availability Goals for Exchange 2010

Microsoft's first goal with the Exchange 2010 availability story was to improve on the Exchange 2007 high-availability features. The Exchange 2007 implementation is a little immature and overly complex. Having three different types of log replication is confusing, and the lack of automatic failovers and the lack of a GUI to control end-to-end operations from creation to failover are the hallmarks of a V1.0 implementation.

These limitations aside, the basic technology involved all works: copying transaction logs from a source to a target server, validating their content, then replaying that content to update passive copies of databases. Microsoft's decision to focus on continuous log replication as the basis for high availability in Exchange 2010 is understandable, and the developers have delivered a more manageable and complete solution. Exchange 2010 doesn't support LCR, CCR, and SCR, but as we'll see, the DAG is more than an adequate replacement.

Microsoft's second development goal was to include sufficient functionality in Exchange 2010 to let customers build highly available infrastructures without having to invest in expensive third-party add-on products. Although there's no doubt that third-party technology boasts its own set of useful availability features, especially when coupled with high-end storage systems, Microsoft has a large and diverse Exchange customer base, not all of which can afford to invest in the financial and administrative cost of deploying add-on technology. Having a solid set of high-availability features built in to the product and administered through the standard management interfaces—Exchange Management Console (EMC) and Exchange Management Shell (EMS)—increases the attractiveness of Exchange as a platform, removes complexity, and avoids cost for customers in the small-to-midsized business (SMB) segment as well as for a large number of enterprise customers.

Finally, Microsoft wanted to let customers deploy highly available servers in an incremental nature. In previous versions of Exchange, you have to do a considerable amount of preparation to deploy a highly available solution. For example, if you want to deploy clustered Exchange servers, you have to ensure that suitable hardware is available, then install a Windows cluster, then install Exchange with the correct switches to create virtual Exchange servers running on the cluster and connected to cluster resources such as shared storage. This process isn't something that you do without planning.

The concept of incremental deployment as implemented in Exchange 2010 is that you can deploy typical Exchange Mailbox servers first, then decide to include those servers in a DAG as the need arises to incorporate more high availability into the environment. You can also gradually expand the DAG to include more servers or more database copies to add resilience against different failure scenarios as time, money, and hardware allows.

Microsoft introduced storage groups as the basis for database management in Exchange 2000. Databases fitted inside storage groups, which belonged to servers. All the databases in a storage group shared a common set of transaction logs, and transactions from all the databases in the storage group were intermixed in the logs. Storage groups were sometimes convenient, but eventually Microsoft determined that they introduced an extra layer of complication for administrators, and the process to remove storage groups from the product began in Exchange 2007. It therefore comes as no surprise that storage groups disappear in Exchange 2010.

Defining a DAG

Fundamentally, a DAG is a collection of databases and database copies that are shared across as many as sixteen servers. The DAG differentiates between a primary database—the one that you originally create and users currently connect to—and the copies that you subsequently create on other servers. The DAG can swap the database copies into place to become the primary database following a failure of the primary database. The failure might be a complete server failure that renders all of the databases on the server inaccessible or a storage failure that affects just one database. In either case, the DAG is capable of detecting the failure and taking the necessary action to bring appropriate database copies online to restore service to users.

Servers within a DAG can support other roles, but each server must have the Mailbox role installed because it has to be able to host a mailbox database. Servers can also be on different subnets and span different Active Directory (AD) sites as long as sufficient bandwidth is available. Microsoft's recommendation is that all servers in a DAG share a network with a round-trip latency of 250 milliseconds or less. An Exchange 2010 server running the Enterprise edition can support as many as 50 active databases but the Standard edition is limited to 5 databases. When you include passive database copies that a server hosts for other servers, this number is increased to as many as 100 total databases on the Enterprise edition.

The introduction of the DAG smashes the link between a database and the owning server to make portable databases the basic building block for high availability in Exchange 2010. This development is probably the most fundamental architectural change Microsoft has made in Exchange 2010.

Windows Clustering

Underneath the hood, the DAG uses Windows failover cluster technology to manage server membership within the DAG, to monitor server heartbeats to know what servers in the DAG are healthy, and to maintain a quorum. The big differences here from clustering as implemented in other versions of Exchange are that there's no concept of an Exchange virtual machine or a clustered mailbox server, nor are there any cluster resources allocated to Exchange apart from an IP address and network name. Another important management difference is that you never need to manage cluster nodes, the network, or storage resources using the Windows cluster management tools because everything is managed through Exchange.

The dependency on Windows clustering means that you can add Mailbox servers to a DAG only if they're running on Exchange 2010 Enterprise Edition on Windows 2008 (SP2 or R2) Enterprise Edition. It also means that all of the DAG member servers must be part of the same domain. You should also run the same version of the OS on all the DAG member servers; you definitely can't mix Windows 2008 SP2 and Windows 2008 R2 within the same DAG and it makes good sense to keep all the servers in the organization at the same software level.

Transaction Log Replication

Within the DAG, Exchange maintains the copies of the databases through a process of log replication. Transaction logs generated on the active server are copied by the Microsoft Exchange Replication service (MSExchangeRepl) running on each of the servers that maintain passive mailbox database copies, where the logs are validated and then replayed to update the passive copies. The DAG is the boundary of data replication for transaction logs. In other words, you can't replicate logs to a server in a different DAG and have Exchange replay the logs into a database replica there. It then follows that before you can create a copy of a database, it must reside in a DAG, and the target server must be part of the same DAG.

Figure 1 shows an example of a DAG containing three servers, each hosting two databases. Each of the databases is replicated to one other server to provide a basic level of robustness to a server outage. If server 1 fails, thus halting service to databases 1 and 2, the Active Manager process, which I'll discuss shortly, reroutes user connections to pick up the copies of the databases on servers 2 and 3. Users connected to database 1 are redirected to server 2 and users connected to database 2 go to server 3. Similarly, if the disk holding database 2 on server 1 fails, Active Manager detects the problem and reroutes traffic to server 3.

In Figure 1, each database has just one copy. You might decide that the probability that more than one server will ever fail at the same time is negligible, so it's sufficient to rely on the single additional copy. However, if the DAG extended across more than one data center, you would probably configure every database to replicate to all servers. In this scenario, copies of databases 1 and 2 would be present on server 3 so that if servers 1 and 2 were both unavailable, users could still get to their data by using the copies hosted on server 3.

The number of copies you can create for an individual database is limited only by the number of available servers in the DAG, disk space, and available bandwidth. The high capacity bandwidth available within a data center means that disk space is likely to be the biggest problem. This issue is somewhat negated by the ability to deploy databases on low-cost drives, providing there is sufficient rack space, power, and cooling within the data center to support the disks.

As an example, you could have an environment with 15 servers in a DAG. There are 110 active databases, each with 2 passive copies, for a total of 330 databases in the environment. The databases and copies are distributed evenly across all servers so that each server supports 22 databases. Some of these databases are active and supporting users; others are copies replaying transactions from primary databases. Each server has 18TB of storage. Having three copies of each database is a reasonable approach to ensuring high resilience against a wide range of failures, but don't forget to plan your design so that a failure that affects a rack can't prevent service to a database. In other words, you shouldn't deploy a rack that contains all the servers that host an active database and all of its passive copies.

Active Manager

Active Manager is a new component that runs as part of the replication service process on every server within a DAG. Active Manager is the orchestrator for Exchange 2010 high availability; it decides which database copies are active and which are passive—this happens automatically and doesn't require administrative input. However, administrators can dictate the preferred order of activation for database copies and dictate that some copies are never activated.

Active Manager runs on all servers within a DAG. One server in the DAG is the primary active manager (PAM), and all others are in a standby active manager (SAM) role. Whether in PAM or SAM mode, servers continually monitor databases at both the Information Store and Extensible Storage Engine (ESE) levels to be able to detect failures. When a failure is detected, a server asks the PAM to perform a failover. The server that hosts the PAM issues the request if it's still online, but if it's offline, another server seizes the role to become the PAM and brings database copies online.

The PAM owns the cluster quorum resource for the default cluster group that underpins the DAG. The PAM is responsible for processing topology changes that occur within the DAG and making decisions about how to react to server failures, such as deciding to perform an automatic transition of a passive copy of a database to become active because the server that currently hosts the active copy is unavailable for one reason or another. When a new database copy has been successfully mounted, the PAM updates the RPC Client Access service with details of the server that hosts the newly activated copy so that client connections can be directed to the correct server.

Automatic Database Transitions

The replication service monitors database health to ensure that active databases are properly mounted and available and that ESE has signaled no I/O or corruption errors on a server. If an error is detected, the replication service notifies Active Manager, which begins the process of selecting the best possible available copy, then makes that copy active to take the place of the failed database.

To make its choice, Active Manager creates a sorted list of available copies. It ignores servers that are unreachable or those where activation is temporarily blocked. The list is sorted by how current databases are to avoid data loss. When the list is available, Active Manager applies a set of criteria to make the final determination, applying each set of criteria until a database is selected. Up to twelve different checks are performed to locate the best possible database copy. If more than one database meets the same criteria, the Activation Preference value is used to break the tie and make the final selection.

The Activation Preference is a numeric property of a database copy that administrators use to control the order in which Exchange activates copies. For example, if a database fails and there are two copies, one with activation preference of 2 and the other with activation preference of 3, Exchange activates the copy with the lower activation preference, 2. This decision assumes that both copies are healthy (they've been replicating and replaying transaction logs to keep the database up-to-date); Exchange never activates an unhealthy database if a healthy copy is available.

An automatic failover can't occur if no database copy is considered satisfactory. If that happens, the administrator has to take action to either fix the problem with the original database or to bring one of the database copies to a state where it matches the required criteria.

After Active Manager determines the best copy to activate, it instructs the replication service on that server to attempt to copy any missing transaction logs from available sources. Assuming that all transaction logs can be retrieved, the Store on the selected server can mount the database with no data loss and then accept client connections. If some logs are missing, the Store applies the AutoDatabaseMountDial setting to decide whether to mount the database. AutoDatabaseMountDial is a property of a Mailbox server that you can manipulate with the Set-MailboxServer cmdlet. The default value is BestAvailability, meaning that a database can mount if up to 12 transaction logs are missing.

An administrator can mount a database that can't be mounted automatically by Active Manager. For example, Exchange won't activate a database copy if its content index isn't up-to-date. You can force Exchange to activate the copy with the Move-ActiveMailboxDatabase cmdlet. In this instance, you'd specify the -SkipClientExperience parameter to tell Exchange that it was OK to ignore the context index. The developers' choice of "SkipClientExperience" for the parameter reflects their view that having a content index available is important to deliver the full client experience. However, when a database is down, most administrators want to restore basic mailbox connectivity immediately and worry about slow or incomplete searches due to an out-of-date content index afterward.

As soon as the RPC Client Access layer is aware of the transition, it begins to redirect clients to the newly activated database. Client response to a transition is dependent on the client platform and version. Microsoft Office Outlook clients working in Cached Exchange Mode issue a notification that they have lost connectivity and then reconnect when the database is back online. Outlook 2010 is slightly different; it suppresses messages about lost connectivity for what are regarded as trivial reasons such as a network glitch, so you see a notification only when connectivity is reestablished.

Following a successful database mount, the Store requests the transport dumpster to recover any messages that were in transit. Active Manager also notifies the RPC Client Access service that a different copy of the database is now active so that it can begin to reroute client connections to that database.

When the fault is repaired on the original server and it comes back online, its copy of the database is passive and is obviously outdated compared with the other copies. The Store runs through a divergence detection process, then performs an incremental reseed to bring the database up-to-date. The first step is to determine the divergence point, which is done by comparing the transaction logs on the server with the logs on a server that hosts a current copy. The Store works out which database pages have changed after the divergence point, then requests copies of the changed pages from an up-to-date copy. These pages are replayed back until the repaired copy is synchronized with the other copies. The goal is to have all of this work happen and restore service to users within 30 seconds. The repaired database remains as a passive copy until the administrator decides to make it the primary copy again.

Big Promise from DAGs

There's no doubt that the introduction of the DAG in Exchange 2010 is big news. It's a fundamental change in the architecture of the Information Store and it lets administrators who might not have considered implementing highly available Exchange organizations revisit the topic because high availability is now baked into Exchange. The question is how effective the promise proves to be in production. We'll know the answer only after we see various DAG designs at work, the operational issues they provoke, and how they survive the inevitable failures that occur during deployments.

Comments

Plain text