High Availability Out of the Box

In January 2006, Dave Thompson, corporate vice president of the division responsible for Exchange Server at Microsoft, hit the road on a press tour to tout the improvements planned for Exchange Server 2007. One thing he mentioned was striking: Exchange 2007 was intended to reduce the cost and complexity of providing high availability for messaging services. This sounded like an excellent idea, but Thompson’s press tour was relatively short on details. Now that Exchange 2007 is available, it’s time to see what the new high-availability features are and how they might affect your deployment plans. Your organization could benefit from adding redundancy with your server roles or from the new replication features, local continuous replication (LCR) and cluster continuous replication (CCR).

Clustering Changes
Let’s start with clustering, arguably the most complicated (and least understood) of Exchange 2007’s high-availability features. First, there are some new buzzwords for describing clusters. A traditional shared-storage cluster is now referred to in Exchange documentation as a single copy cluster (SSC). That’s because it keeps only a single copy of each data item, which can be owned by only one node in the cluster at a time. However, the new cluster mode, CCR, lets you operate shared-nothing clusters, where there are no data items and no shared storage.

Exchange 2007 is missing a feature that Microsoft heavily touted in prior releases of Exchange: the ability to run active/active clusters. All Exchange 2007 clusters must be active/passive, although you can still build clusters with up to eight nodes (known as N+1 clusters because you must have at least one passive node in the cluster). Like its predecessors, Exchange 2007’s clustering implementation uses (and has the same constraints as) the Microsoft Windows Cluster service. For example, all nodes in an Exchange 2007 cluster must be on the same IP subnet; this restriction is part of Windows Cluster service and has nothing to do with Exchange (although, thankfully, this limitation is going away in Longhorn Server). The subnet requirement might slow deployment of Exchange 2007 for companies that are primarily interested in the new high availability features.

Server Roles and High Availability
Exchange 2007 uses a new role-based architecture that separates all Exchange functionality into five roles: Mailbox, Hub Transport, Edge Transport, Unified Messaging (UM), and Client Access. Of these roles, only the Mailbox server role can be installed on a cluster, which is something of a departure from previous versions. The other roles all provide high availability and redundancy through other means:

You protect the Edge Transport role, which processes mail from the outside world, by having multiple Edge Transport servers and using Network Load Balancing (NLB), using a hardware load balancer (which can be configured so it doesn't direct traffic to a failed node), or setting up SMTP mail exchanger (MX) records in a round-robin configuration (although with this approach some traffic will initially be sent to failed nodes). Your Edge Transport servers essentially have no configuration data except for transport rules and the data they receive from the Hub Transport servers via the Microsoft Exchange EdgeSync service. If an Edge Transport server fails, you can quickly replace it with another server and resume normal operation.
You protect the UM role by adding multiple UM servers within the same UM dial plan and configuring your PBX or gateway to use round-robin routing to distribute load among them. If one UM server fails, others can take over its call-processing operations. Exchange 2007 lets you share any custom audio prompts you’ve created between UM server roles to ensure they’re not lost during a failure. (Remember that UM servers send the voicemail messages they record to a Hub Transport server for delivery.)
The Client Access server, like the Exchange Server 2003 and Exchange 2000 Server front-end server role, contains no user-specific data. You can use NLB or a third-party hardware load balancer to distribute load across multiple Client Access servers.

The Hub Transport role deserves special mention because of the way message routing works in Exchange 2007. Hub Transport servers are associated with Active Directory (AD) sites. If a user whose mailbox is in site 1 sends a message to a user whose mailbox is in site 2, the site 1 Mailbox server attempts to route the message to a Hub Transport server in the same site. The Hub Transport role that receives the message contacts a Hub Transport server in the destination site if there’s a direct site link or in an intermediate site if not. Unlike Exchange 2003, where you had to explicitly define bridgeheads for message transport, the Hub Transport architecture automatically finds and uses Hub Transport servers across multiple sites without the use of NLB or clustering. To add Hub Transport redundancy, just make sure you add at least one Hub Transport server role to every AD site.

If you’re using clustering now, you might be wondering how your existing cluster will map to Exchange 2007 roles. The answer is simple: Your existing server becomes a clustered mailbox server, and the other roles (at a minimum, a Client Access server and a Hub Transport server) must be installed on a nonclustered server. Remember that you can combine roles, so that the Hub Transport and Client Access roles can be on the same server; nonetheless, this essentially means that you might need additional servers as part of your upgrade. Say you have a single, two-node Exchange 2003 cluster. You can replace it with a two-node Exchange 2007 Mailbox cluster, but you'll need an additional server for the Client Access and Hub Transport roles. However, if you have multiple Mailbox servers, you still need only one additional server for the Client Access and Hub Transport roles, unless you want to add redundancy for those roles, too.

Local Continuous Replication
One request from administrators using earlier versions of Exchange was clear: They wanted a simpler way to provide resiliency against failures on a single server. Clustering provides resiliency against failure of certain hardware components, but many organizations don’t want to spend the money or take on the responsibility of purchasing and maintaining clustered systems. To meet this demand, Microsoft introduced LCR. When you turn on LCR for a storage group (SG), Exchange begins by copying the SG’s database to a separate volume on the same server. After that copy is made (a process known as seeding), the existing set of transaction logs is copied, too. When the initial copying is finished, the database is said to be in replay, and each new generation of transaction logs is copied to the backup volume as soon as it’s closed. In other words, when Exchange opens a new log file, it immediately copies the previously open log file to the LCR target volume. In case of a failure, you can quickly switch to the backup copy, either by editing the log file and database paths on the SG or by copying the database from its backup location to the original path.

LCR is designed to do three things:

It provides transaction log shipping and replay to give you a faster recovery time in case of a database failure.
It reduces the need for frequent full backups; you should still perform regular full backups, but LCR can act as your primary means of recovery after a failure.
It lets you back up, test, or replicate the copy of the database instead of the primary database. You can take backups of the LCR copy, for example, without impacting the performance of the primary database because the backup copy is on a separate volume.

LCR takes place on a single server; you can’t use it to copy data from one physical server to another. Therefore, you might wonder how LCR compares with disk mirroring, which is usually the preferred means of keeping a continually updated backup copy of data on a single server. Disk mirroring is definitely useful because it provides hardware-level protection. However, Exchange doesn’t know about the mirroring, and vice versa, so the way you monitor and manage mirrored volumes has nothing to do with how you manage the data that’s stored on the volumes. The fact that Exchange doesn’t know about the mirroring also means that hardware failures are transparent to Exchange, a major point in this method's favor. The drawback is that you can’t work independently with the mirrored copy of the volume; it must stay in sync with the original. LCR changes the situation by making the backup copy of the database a visible file system object that you can copy, move, or test when necessary. Also, disk mirroring blindly copies whatever junk is on the source of the mirror set to the backup. By contrast, LCR is transaction-based, so physical or low-level corruptions introduced by things such as buggy drivers might not be replicated to the LCR copy.

LCR has its limitations. First, you can protect only one database per SG. If you want to protect multiple databases, you’ll have to put them in their own SGs. This isn’t a big restriction given that Exchange 2007 Enterprise Edition supports up to 50 SGs; however, adding more SGs might dictate changes in where and how you store and manage your transaction log files. Second, if you use LCR to protect a public folder database, no other server in the organization can contain a replica of the folders contained in that database. That’s not a major restriction either because public folders already have a high degree of resiliency thanks to their multiple-master replication capability.

Cluster Continuous Replication
CCR is a major departure from the features of Exchange 2003 and Exchange 2000. With those versions of Exchange, the storage on a cluster must be shared between each cluster node so that any node can take possession of the shared-storage resources. Of course, only one node can own any particular resource at a time. This means that your cluster implementation needs shared-storage hardware, which tends to be expensive and is often finicky; in particular, stretching your cluster across a WAN can be difficult because of write latency introduced by synchronous replication, which is the only type of replication supported by Microsoft.

CCR removes that hardware requirement by letting you build clusters where data from the active node is automatically replicated to a passive node, with no physical hardware or storage components in common. This capability provides resiliency against the failure of a single server, just as traditional SCCs do. However, it also lets you build in resiliency against site or data center failures because the two nodes in a CCR cluster don’t have to be in the same physical location (as long as they’re on the same IP subnet, as mentioned earlier). As a bonus, the hardware for nodes in a CCR cluster doesn't have to be identical; in fact, you can get away with having cluster nodes that aren’t on the Hardware Compatibility List at all, although Microsoft doesn’t recommend doing so. Using CCR clusters doesn’t require you to use a SAN, but when you start to add large amounts of DAS to your CCR nodes, the management and maintenance overhead might become prohibitive.

Most importantly, a properly designed CCR implementation will have no single point of failure. This is a major improvement over Exchange 2003 clusters, which at a minimum have the shared quorum volume as a potential failure point. To deliver this benefit, CCR requires the use of a Windows clustering feature known as majority node set clusters. In a majority node set cluster, the quorum resource used in SCCs is joined by a file share witness, which is essentially just a Common Internet File System (CIFS) file share that can be used as the quorum resource but which doesn’t have to be physically owned by any of the cluster nodes. To use a file share witness as part of your cluster, you’ll need Windows Server 2003 Release 2 or Windows Server 2003 Service Pack 1 with a hotfix (to learn more about the update and download the hotfix, see http://support.microsoft.com/?kbid=921181).

With CCR, the underlying Windows clustering subsystem still provides failure detection, failover, and failback; CCR protects the Exchange data before a failure occurs and makes a replica available during failover. Operationally, CCR works in much the same way as LCR: You build a cluster, create an SG, and tell Exchange to turn on CCR for the SG. At that point, the database is seeded and the log files are copied. However, the transport method used for log files is somewhat different. With LCR, the logs are copied from one disk to another, but with CCR, the passive node pulls the logs from the active node via a file share. CCR's transport method reduces load on the active node because the passive node is responsible for retrieving the log files. Another difference is that in CCR, each log file is replayed against the passive copy of the database as soon as it arrives. This cuts down failover time by reducing the odds that a lengthy log replay will be required when the CCR copy is mounted as part of the failover process. When a failure occurs, the passive node automatically becomes active, at which point the copy of the data protected by CCR becomes live. CCR will replay all of the transaction files that were successfully replicated, so the amount of data loss is usually limited to whatever was in the most recently opened generation.

CCR has the same restrictions as LCR: Only one database per SG can be replicated, and you can’t mix public folder replication and CCR replication. In addition, you’ll need to create and manage a file share witness for the CCR cluster; this isn't a complex operation, but you’ll need to plan for it. In addition, if you’re using Exchange 2003 or Exchange 2000 clusters, you should be aware that Microsoft doesn’t support in-place upgrades, so you’ll need to move mailboxes from your existing cluster to another server, perform the upgrade, then move them back to the new cluster.

CCR essentially means that Microsoft is providing a fully supported data-replication solution as part of Exchange 2007. It isn’t as flexible as many third-party replication products, but it might reduce the need for additional spending to achieve your high-availability goals.

Additional Features
Exchange 2007 has a few other high-availability features worthy of mention. One change that might sneak up on you is that transaction log files are now fixed in size at 1MB, instead of the original 5MB size. Smaller log files make LCR and CCR seeding and replay much more efficient, but the result is that you’ll have five times as many log files for the same amount of server activity. Therefore, Microsoft has also increased the maximum number of log file generations that a single server can have; the restriction is now so large that you’re not likely ever to hit it.

Another new feature is the transport dumpster, a storage area for messages transiting a Hub Transport server. The dumpster temporarily keeps copies of in-transit messages; if a CCR failover occurs, the new active node can query the Hub Transport server and ask it to provide new copies of messages that might have been processed by the active node before the failover but haven't yet been replicated. For example, say that I send a 10MB message to a user on another mailbox server; that’s at least 10 transaction log files worth of data. If a failure occurs after the first five log files have been copied and replayed on the passive node, the node can ask the dumpster to provide a new copy of the message to ensure that it's properly delivered to the recipient.

Which Should You Choose?
With these new technologies, it can be difficult to decide what to use when.

SCCs are similar to traditional clustering; they require you to use shared storage, but in return you get more flexibility for node design and no restriction on the number of databases or SGs you use.
LCR is restricted to use on a single server; therefore, it's appropriate for situations where you need to protect the data on only one server. Although LCR reduces the need to do multiple backups during the day to provide recovery in the midst of the day, you should still perform regular full backups to ensure that you can recover the other necessary parts of your Exchange system (such as the system state data).
CCR provides server and site resiliency, although it’s limited to two nodes per cluster. CCR replication is integrated with failover, and you can use Microsoft Volume Shadow Copy Service or third-party replication products to make additional backups of the passive node’s data.

It will be interesting to see how people deploy Exchange 2007's high-availability features in practice and how the reality of their operational use matches up to Microsoft’s intent in designing and building the features in the first place.

Comments

Plain text