Deploying Database Availability Groups in Exchange Server 2010

Microsoft adds new features to every Exchange Server release. Some of these features are destined to be quietly ignored and eventually retired—remember the Exchange Network News Transfer Protocol (NNTP) server? Others go out in a quick blaze of glory, such as active/active clustering. Still others introduce fundamental changes in the way we design and deploy Exchange. Exchange 2010’s new database availability group (DAG) feature falls into that last category. The idea of providing mailbox resiliency by distributing multiple copies of mailbox databases throughout the Exchange organization is solid, and its implementation in Exchange 2010 marks a major change for high-availability designs.

Tony Redmond’s “Exchange 2010: High Availability with DAGs” describes the technical fundamentals behind DAGs. If you’re not familiar with the basic underlying concepts, it’s worth a read before tackling this article, in which I’ll focus on how to deploy simple DAGs. But first, let's talk about prerequisites and other considerations.

DAG Prerequisites

The first, and biggest, prerequisite for DAG deployment is simple: You must be using Windows Server 2008 or 2008 R2 Enterprise Edition. If you have Standard Edition deployed, you won’t be able to place DAGs on that server unless you reinstall Windows. There’s no in-place upgrade from Standard to Enterprise. Unfortunately, that means that if you have a Standard Edition server that’s already running Exchange, that server can't be a DAG member server until you upgrade it. This predicament has affected many sites that have experimented with early deployments of Exchange 2010 on Server 2008 Standard, intending to upgrade their mailbox servers to DAG membership later.

From a network standpoint, DAG prerequisites are fairly straightforward. Exchange 2010 uses slightly different terminology from Exchange 2007. The MAPI network on a DAG member is for communicating with other Exchange servers and Active Directory (AD), whereas the replication network is for database replication traffic. In a significant change from Exchange 2007, Exchange 2010 now supports the use of a single network interface for both MAPI and replication networks, although the preferred design is still to use separate NICs and networks for those two functions. If the MAPI network interface fails, the server will fail its databases over to another DAG member. However, if the replication interface fails, replication traffic will silently move over to the MAPI network, reverting to the replication network when it becomes available again.

You can specify multiple replication networks, which is useful for complex topologies. However, every member of a given DAG must have the same number of networks defined. All members of a DAG should be able to communicate with no more than 250ms of network latency, but Microsoft warns that overall network performance is important, too—not just the latency measurements.

There are a couple more restrictions to keep in mind. All the members of a given DAG must be members of the same AD domain, although different DAGs can be members of different domains. And DAG names must be unique within the organization, and they must be 15 characters or fewer in length. (DAGs are the last remaining vestige of WINS remaining in Exchange. Perhaps the next version will get rid of them altogether.)

Can I Get a Witness?

Before we dive into building your first DAG, we need to talk about the role of the witness server. If you’re familiar with clustering, you’ll recognize the underlying concept of a quorum resource. The quorum is essentially a way for all the nodes in the cluster to know which nodes are in the cluster and which are active or failed at a given moment. Despite the fact that Microsoft avoids the word cluster when discussing DAGs, the fact remains that DAG members need a way to tell which DAG nodes are active and which aren’t. The Active Manager keeps track of this status, but it needs a way to store that status information in DAGs that have an even number of members.

Enter the witness server, which is nothing more than a file share on any Windows server in the forest. Microsoft recommends that you put the witness on a hub transport server so that it remains under the control of an Exchange administrator. The witness for a DAG can't reside on any mailbox server that’s a member of the same DAG. (It's legal to put the witness for one DAG on a member of a different DAG, but from a resiliency standpoint, that isn’t a great idea.) The server hosting the witness must be a member of the Exchange Trusted Subsystem security group. The system adds Exchange 2010 servers to this group during installation, but if you’re locating the witness on another type of server you'll need to verify that the server is added to that group.

When you create a DAG, you can optionally specify the server and directory to use for the witness. If you don’t specify these parameters, Exchange will try to locate the witness on a hub transport server that doesn’t also have the mailbox server role and will put the witness on the first one it finds.

When you create a DAG with an even number of nodes, Exchange automatically creates the witness share and related data on the server you specify. If you change a DAG with an even number of nodes by adding or removing a node so that it has an odd number of nodes, Exchange will helpfully remove the witness. You can see where this is going: If you grow your DAGs by adding individual nodes, prepare for a lot of excess flailing as Exchange adds and removes the witness each time the number of member servers changes.

Suppose you want to build a two-node DAG—the simplest possible configuration. The DAG itself will have a unique name and IP address, which might be static or DHCP-assigned. If you have multiple subnets in your MAPI network, you need one IP address in each subnet for the DAG. To keep things simple, we’ll assume that our MAPI network has a single subnet, 172.16.250.x.

The initial step, now, is to install Windows Enterprise Edition on the DAG member servers. Once that’s done, we can proceed to install Exchange 2010’s mailbox server role. At that point, the fun begins! There are several distinct steps to getting a DAG up and running. First, you create the DAG itself, then you add servers to it, then you allow seeding and replication to complete. Let’s tackle these steps in order.

Creating a New DAG

You can create DAGs by using the New-DatabaseAvailabilityGroup cmdlet or the New Database Availability Group wizard in the Exchange Management Console. Figure 1 shows the first page of the wizard with a DAG name, witness server name, and witness directory specified. Click the New button to create the DAG object in AD, and it will immediately appear in the Database Availability Groups tab of the Mailbox view of the Organization Configuration node in the Exchange Management Console, as Figure 2 shows. This view shows you the DAGs that exist in your organization and which servers are in them. If you want to see which mailbox databases are in which DAGs, you’d use the Database Management tab, which lists all of the known mailbox databases and shows which server (or servers) hosts each one.

Figure 1: The New Database Availability Group wizard

The newly created DAG is just an AD container object; it doesn’t yet have any servers or networks assigned to it. By default, your new DAG will get an IP address via DHCP. You can assign an IP address after the fact with the Set-DatabaseAvailabilityGroup cmdlet –DatabaseAvailabilityGroupIpAddresses switch, which lets you set IP addresses only for the MAPI network. Don’t use this to assign IP addresses for the replication network. It’s usually easier, though, to do this as part of creating the DAG from the Exchange Management Shell, as follows:

New-DatabaseAvailabilityGroup -name Seattle01 -WitnessServer einstein -DatabaseAvailabilityGroupIPAddresses 172.16.241.105

Managing DAG Membership

Of course, the new DAG can’t do anything until you add servers to it. To do so, you can use the Manage Database Availability Group Membership wizard in EMC (right-click the DAG object to start it) or the Add-DatabaseAvailabilityGroupMember cmdlet. In either case, all you need to do is pick the mailbox servers you want to add to the DAG, and Exchange does the rest of the work.

Figure 2: A view of multiple DAGs

This explanation is deceptively simple because Exchange is actually doing an awful lot of work behind the scenes:

If it’s not already installed, the Windows Failover Cluster (WFC) component of Windows Server is installed. This is roughly equivalent to setting up a Windows cluster, so it's nice to have this happen automatically.
The system creates a new Windows failover cluster object, using the name you specified for the DAG. Exchange uses only the cluster heartbeat, cluster database, and cluster network list.
The system registers a new A record with the name and IP address of the DAG in DNS.
The system adds the server as a member of the DAG object in AD.
The system updates the WFC database to include the new server and the databases that are mounted on it.
Active Manager receives information about the new database copies on the newly added node.

When you add more nodes to the DAG, the same basic steps take place. One additional step is required, however: The WFC model might need to change between node majority (if the DAG now has an odd number of members) or node and witness majority (if the DAG now has an even number of members). You can remove nodes from a DAG, but before you do so, you must remove the replicated mailbox database copies it contains. Use the Remove-DatabaseAVailabilityGroupMember cmdlet or EMC for that purpose.

Managing Mailbox Databases

After you create a DAG, you still have some things to do before it begins protecting your mailbox data. Namely, you have to specify which mailbox databases should be replicated to it. To do so, you add mailbox database copies to the DAG with the Add-MailboxDatabaseCopy cmdlet or by right-clicking the database in EMC and using the Add Mailbox Database Copy command.

Adding a mailbox database copy to a DAG member server instructs that server to start maintaining a replicated copy of the database. This procedure takes place in two phases. First, the database copy must be seeded: Exchange copies data from an existing replica to the new replica. Once the system has streamed the full database to the new target, it copies each new log file generated over a TCP socket connection, then replays it into the replica copy.

The seeding process as outlined in the Microsoft article "Managing Mailbox Database Copies" (technet.microsoft.com/en-us/library/dd335158.aspx) contains 28 distinct steps. Here's a brief summary:

Exchange performs several prerequisite checks to ensure that the source database and log files exist, and that the target location can receive the seeded data.
The Microsoft Exchange Replication service on the target requests that seeding start.
The Microsoft Exchange Replication service on the source suspends replication of the source database.
The target sends a request to start the actual flow of seeding data.
The source server opens a backup session using the familiar Extensible Storage Engine (ESE) streaming backup API.
The data collected from the source database is fed to the Microsoft Exchange Replication service, which feeds it over the replication network to the replication service on the target.
Once the complete database has been seeded, the newly created database copy moves to its final location.

The amount of time required for seeding will vary depending on the size of the database and the amount of bandwidth available on the replication network. It’s impossible to give even a rough estimate of the time required without knowing these two factors.

Sometimes a database copy will diverge from the original master copy. This can happen as the result of a network interruption or a hardware failure, for example. When a copy diverges, it needs to be reseeded. Exchange will automatically seed a database only when you create a new one; if you have a diverged database, you must manually reseed it. To do so, you can use either the Update-MailboxDatabaseCopy cmdlet or the Update Database Copy wizard in the EMC.

Operating with the DAG

After you have the DAG set up and the database copies added, during normal operation you don’t have to do anything other than monitor the health of your replicated databases. The Database Management tab in EMC, which Figure 3 shows, displays the status of each copy of the database: seeding, healthy, dismounted, and so on. The Active Manager will take care of activating the appropriate replica of the database if the active copy fails.

Figure 3: The Database Management view

You can also manually activate an individual database copy with the Move-ActiveMailboxDatabase cmdlet (or from EMC). Manually activating a database copy is called a switchover; when Exchange does the same thing automatically, it’s called a failover. No matter what you call it, the Active Manager infrastructure takes care of notifying other Exchange components (e.g., the RPC client access layer) so that clients are able to connect to the newly activated mailbox copy.

Does It Work?

Does this technology work? Absolutely. I recently deployed Exchange 2010 at a company that had been using a Linux mail server. During the first week after the deployment, someone accidentally unplugged and relocated one of the physical hosts that ran the virtualized mailbox server. No one noticed! There was no impact to users, and the failover wasn’t discovered until someone noticed that the physical server was no longer in its original location.

DAGs provide a very broad set of design options. RAID or just a bunch of disks (JBOD)? How many DAG member servers? How many database copies? Should you colocate the hub, mailbox, and Client Access Server (CAS) roles on two-node DAGs? If so, how do you handle failover and load balancing for the CAS role? One of the best things about the technology underlying the DAG feature set is that it can be adapted to a wide range of situations, including both site resiliency and mailbox high availability. As organizations get more experience designing, deploying, and operating Exchange 2010 with DAGs, I expect we’ll see the emergence of a few basic “building block” designs that can be adapted to individual circumstances—and that in itself would be a major advance for Exchange high-availability design and deployment.

Comments

Plain text