More Battles with the Cluster Monster

Last week, I pointed out some of the pitfalls inherent in clustering (see the first URL below). I mentioned that clustering Exchange Server can't mitigate the biggest single point of failure: administrator error. A poorly designed or administered cluster will reduce uptime instead of increasing it. With that fact in mind, let's talk about design principles that can help you build the best cluster for your buck.

First, remember that clustering isn't a silver bullet. Let's say you put Exchange on a two-node cluster, then lose power to your server room. Only then do you find out that your UPS is bad. The cluster can't help you now. (Even if your UPS is good, you'll be in the same boat if you don't have a way to cleanly shut down the cluster before the UPS power runs out.) Clustering can't entirely protect you from infrastructure problems such as electrical or cooling problems, fire, or flooding. Also, you can't use clustering for every Exchange 2000 Server service. (See the second URL below for a Microsoft article that lists which services you can and can't cluster.)

Second, think carefully about why you want a cluster. If you're trying to reduce unplanned downtime, failover times will be of paramount importance, and the biggest determiner of failover times is how many log files the system must play back. Frequently perform full backups to keep your log file count low—and don't turn on circular logging. If you're more interested in providing uninterrupted service during times of planned maintenance or providing transparent service to users, failover times might not be as important.

Third, bear in mind that clustering can't repeal the laws of nature—or of the Exchange engineering team. One Exchange 2000 server can mount a maximum of four storage groups (SGs). If you have two active/active nodes, each with three SGs, the cluster won't mount two of those SGs when a failover occurs. Oops! Don't put more than two SGs on each cluster node.

Speaking of storage: Because the storage subsystem is the only component whose failure can bring down both nodes in a cluster, make sure you pay careful attention to properly selecting and provisioning that subsystem. Keep your SG log files and databases on separate volumes, and use whatever monitoring tools the storage vendor provides to keep an eye out for incipient failures so that you can fix them before you lose data.

Last, listen to Ed Heinemann, a famous aeronautical engineer and aircraft designer from the 1930s to the 1970s. Heinemann's motto became well-known in the aviation community: "Simplicate and add lightness." It might make an English teacher cringe, but engineers—and probably most systems and network administrators—understand what the man meant: Unnecessary complexity is the enemy of reliability and performance, so simplifying the design is crucial.

The bottom line: If you can clearly articulate why you need clusters, can justify their cost (including care-and-feeding costs such as maintenance agreements and administrator training), and can specify a design that will provide the level of service you need—use clusters. If you can't do these three things—don't use them.

" Fighting the Cluster Monster" " XGEN: Status of Exchange 2000 Server Components on Cluster Server"

Comments

Plain text