Troubleshooting a clustered installation of SQL Server can be tricky because SQL Server rarely tells you clearly why it failed. If a failure occurs, you’ll typically see the error message Setup failed to perform required operations on the cluster nodes. After you click OK in the message window, the installation rolls back and removes all the SQL Server program files and data files from each of the clustered nodes.
The installation problem I encounter most often in a clustered environment is that the installation has difficulty copying the files from one node to another. When you get the preceding error message, don’t click OK right away. First, check in Windows Explorer to see whether all the SQL Server program files copied successfully to the second node. If you don’t see all the files in the target node, check the network setting for each network on each node to confirm that File and Printer Sharing for Microsoft Networks is enabled. File and printer sharing must be enabled for the files to transmit over the admin share, which is a private administrator share.
You can test whether communications between nodes is working by opening a Run prompt (Start, Run) on one node and trying to access the admin share of the other node. For example, if you’re installing the first instance of SQL Server on the C drive in SQL2KNODE1 from the main article’s example, try to access SQL2KNODE2 by typing \\SQL2KNODE2\C$ at a Run prompt on SQL2KNODE1. If you can’t access the target node, you need to figure out what’s preventing the communication. Possible causes you might explore include:
- The firewall might be too restrictive, which could prevent you from copying data from one node to another.
- Services running on either node might prevent files from copying. You need to stop all unnecessary services before the installation. Services I’ve had problems with include any firewall services, SNMP, Tivoli or other monitoring services, and vendor-specific services such as Compaq Insight Manager.
- You’ve performed security hardening on one or both nodes. This means that you might have customized the security on your system to reject unsigned certificates, for example. If this is the case, log on to each node by using the account you typed in Step 4 in the main article to copy the files between the nodes and watch for pop-up messages about such things as unsigned certificates. If you see such a pop-up message on the primary node, it will also appear on the secondary node. If no one is logged on interactively to acknowledge the messages, the installation will timeout and fail.
- The account you designated to copy the files between the nodes might not have sufficient permissions to copy the files or access the registry.
If a SQL Server installation fails, also consider removing all the SQL Server directories that the setup process might have left behind and try to install again after a reboot. Microsoft Windows Server 2003 provides elaborate logs that show what the clustering install procedure is doing and where it fails. This makes troubleshooting a clustered installation much easier. But no matter what OS you’re installing SQL Server on, always troubleshoot a clustering problem in order of the installation. For example, always troubleshoot hardware first, then the OS, the network setup, the Cluster service installation, and finally the SQL Server installation. You’ll find that SQL Server is rarely the problem.
A handy T-SQL function you can use for debugging clustered SQL Servers is fn_virtualservernodes(). You can use the query
SELECT * FROM ::FN_VIRTUALSERVERNODES()
to determine what nodes are possible owners of your SQL Server resource. For example, after installing both clusters, I should see SQL2KNODE1 and SQL2KNODE2 in this query’s results. You use this query after the installation to confirm that the SQL Server instance is performing as expected and all possible nodes can own the resource.