Failed Cluster

A while back, my company installed a new Storage Area Network (SAN). A two-node cluster provided our home office with file and print services and used the fully redundant SAN for storage. Everything ran fine for 2 months; then we experienced a series of unexplained failovers.

Our File Share resource started sporadically failing over to the other node. Sometimes the File Share resource on the other node would also fail and cause the cluster to fail continuously, bouncing back and forth between the two nodes. The logs showed that the resource group failed because the drive was no longer available. We were baffled because no errors had occurred and the SAN reported that all the drives were fine. If we rebooted the server, the drive was present again and everything went back to normal until the resource failed again.

We discovered that one of our administrators was mapping the F drive through his user properties. The administrator's drive letter mapping was succeeding the SAN's shared F drive. The storage device then lost the drive letter and failed the resource and everything in it. Then, the server couldn't assign the SAN drive back to the F drive until the node was rebooted.

I have since determined that this scenario isn't true only for clusters. If you log on to a member server that has a drive that conflicts with your home directory, your home directory settings will render that resource offline as well. My advice is to strongly discourage administrators from mapping home directories on their Administrator accounts.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish