MNS and CCR, Part 2

Last week I wrote about Exchange Server 2007's cluster continuous replication (CCR) feature and explained how it's based on Microsoft's majority node set (MNS) clustering technology ("MNS and CCR," June 7, 2007). In an MNS cluster, each node keeps a copy of the quorum resource. CCR lets you use either a three-node MNS cluster or a two-node cluster with a third computer (not part of the cluster) acting as a file share witness (FSW) to hold a redundant copy of the quorum resource. This architecture has some interesting implications when it comes to designing a CCR implementation.

Microsoft's current recommendation is that you put the FSW resource on a Hub Transport server. Choosing which Hub Transport server to use is critical. Let's say you have two separate data centers: Anchorage, Alaska, and Birmingham, Alabama. Each data center has its own Mailbox server and Hub Transport server. You want to set up CCR so that you have an active node in Anchorage and a backup passive node in Birmingham, so you set up an appropriate WAN connection (bearing in mind that we're stuck with keeping CCR nodes on the same IP subnet until the release of Windows Server 2008--formerly code-named Longhorn).

But where should the FSW go? At first, you might think it should go in the Birmingham backup data center so that a site failure at Anchorage won't prevent failover. This seems like a reasonable approach until you consider the likelihood of different failure modes: It's much more likely that you'll have a failure of the WAN or Virtual LAN connection between Anchorage and Birmingham than that you'll lose the Anchorage data center altogether. If the WAN connection fails, each side of the cluster will assume that the other side has failed. If the FSW is in Birmingham, which is unreachable from Anchorage, the Birmingham cluster node will become active. When the WAN comes back up, you'll have to deal with the problem of two separate cluster nodes each thinking it's the active node.

Putting the FSW in Anchorage insulates you against WAN failures; if the WAN connection dies, the Anchorage active node continues working normally. However, automatic fail over of your Exchange resources to Birmingham isn't possible because the Birmingham node can't access the FSW. If manual failover is acceptable, you can reconstitute the FSW in Birmingham and fail over to the passive node. (For information on enabling an FSW, see the Microsoft article "How to Configure the File Share Witness.")

There is a solution for automatic failover: Keep the FSW in a third location. Imagine adding a data center in Chicago. Put the FSW in Chicago, and if either Anchorage or Birmingham fails, the remaining node can take ownership of the FSW and continue its usual operations. Of course, this solution depends on the WAN link remaining operational as well, so it's not necessarily an improvement over putting the FSW in Birmingham if the stability of your WAN is a problem. With cheap, redundant DSL or cable connections, plus a simple VPN, you can easily build a replacement WAN connection to bring up when necessary, but that's getting a little far afield from the original topic.

Does every organization need three data centers to get the most from CCR? Of course not. Many Exchange administrators I've spoken to since Exchange 2007 shipped are interested in using CCR as a replacement for single copy cluster (SCC) deployments with SANs, and they see the multisite ability of CCR as a nice bonus. I think, though, that the standby continuous replication (SCR) feature shipping in Exchange 2007 SP1 will be a game-changer for many of these admins; I'll write more about SCR next week.

Comments

Plain text