Troubleshooter: Understanding Atypical Failover Behavior

We're having an odd problem with our Exchange 2000 Server cluster: When a node loses its connection to our LAN, it takes much longer to fail over to the other cluster node than is typical. The private interconnect is OK. Why is this happening?

When the node's public LAN connection is lost, the IP address resource fails. This in turn eventually causes the Exchange virtual server to fail over to the other cluster node. However, the failover is graceful, not the crash dive you see when the heartbeat interconnect is lost. As part of this graceful failover process, the failing server tries to contact a domain controller (DC). Of course, the server can't reach a DC because the server has lost its network connectivity, but the connection attempt must time out before the failover can finish. That's why the failover time seems unusually long compared with the times for other kinds of failures, including failure of the cluster interconnect. (I suspect that this behavior is unintentional, so perhaps Microsoft will change it in a future Exchange release.)

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish