JSI Tip 6468. Cluster node does NOT fail over when you lose network connectivity?

NOTE: The text in the following Microsoft Knowledge Base article is provided so that the site search can find this page. Please click the Knowledge Base link to insure that you are reading the most current information.

Microsoft Knowledge Base article Q814459 contains:

SYMPTOMS

When a cluster node loses connectivity with the client-access subnet, the cluster node may not fail over to another node, and all resources may remain in an online state on the original node. For example, a cluster node loses connectivity with the client-access subnet if the network connection to the public network is lost.

CAUSE

This behavior may occur if both cluster nodes lose communication with the client-access cluster subnet. If both cluster nodes of a two-node cluster lose connectivity with the client-access cluster subnet, the Cluster service (Cluster.exe) determines that the network is not responding. Because of this, the cluster resources remain with the original node, and no failover occurs.

MORE INFORMATION

The Cluster service tests the ability of each cluster node to communicate with external hosts. An external host is represented by an IP address that has the following characteristics:
  • It is not local to either cluster node. For example, it is not a cluster virtual IP address.
  • It is on the same client-access cluster subnet as both cluster nodes.
  • It currently exists as a destination address in the routing table of either cluster node, and the routing interface is the corresponding local client-access cluster network interface. Or it is currently present as an active TCP connection for either cluster node.
For example, the default gateway is typically used as an external host because it meets all three of these conditions.

The Cluster service tests LAN connectivity by using Internet Control Message Protocol (ICMP) echo requests to determine the scope of the network interface failure. For example, if node A cannot communicate with an external host but node B can, the network interface of cluster node A is determined to have gone offline, and the status of the network interface of cluster node B is considered as online. In this case, if node B is designated as a possible owner of the cluster resources on node A, it takes ownership of the cluster resources that are dependent on client-access LAN connectivity.

However, if both cluster nodes (A and B, in this example) can communicate with an external host, or if neither cluster node can communicate with the external host, the network interface of both node A and node B is considered as unreachable, and the network is determined to be down. In this situation, no failover occurs, and you cannot manually move resource groups that contain IP addresses that are dependent on client-access LAN connectivity to the other node. This occurs because the IP addresses cannot be brought to an online state because the destination network is considered as unreachable.

For additional information about how Microsoft Windows Cluster Services (MSCS) detects and recovers from network failures, click the following article number to view the article in the Microsoft Knowledge Base:

242600 Network Failure Detection and Recovery in a Two-Node Server Cluster

For additional information about the recommended network configuration on a server cluster, click the following article number to view the article in the Microsoft Knowledge Base:

258750 Recommended Private "Heartbeat" Configuration on a Cluster Server

For additional information about network failure detection on a Microsoft Windows NT 4.0 cluster, click the following article number to view the article in the Microsoft Knowledge Base:

257925 Cluster Server Does Not Detect Network Problems in Windows NT 4.0



Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish