My organization recently had a problem with performance degradation on clustered servers. We discovered the problem after users reported a verifiable drop in performance on the servers after routine weekend maintenance. Analysis showed that the secondary node owned all the resources in the cluster. Because only one node was managing the supposedly shared resources, that node was carrying the cluster's entire load. The result was a marked degradation in the cluster's performance.
By design, cluster resources are distributed evenly across nodes. Sharing resources among nodes improves scalability and results in better performance. When a server is removed from, then restored to, a cluster, the resources must be reset. In other words, the resources must be reattached to their original owner following maintenance. When our cluster administrator missed this manual step, the cluster's slow response was the result.
We learned our lesson and decided to automate our maintenance process. I wrote two scripts, which I've dubbed the Cluster Resource Management (CRM) scripts, to first capture all the current resource assignments, then restore those assignments. The first script captures the current cluster resource owner information prior to maintenance. The second script, which you run after maintenance is complete, reassigns resources to their original owner nodes.
Cluster Automation Server After some research, I found that Microsoft's Cluster Automation Server ActiveX object would provide me with the methods and properties to manage our cluster resources. The cluster administrator package installs Cluster Automation Server, Msclus.dll. The top-level cluster automation server object exposes a complete cluster management interface to scripting languages. By using the methods and properties associated with each Cluster Automation Server object, you can write scripts or an application to automate cluster administration tasks such as retrieving resource status, moving resource groups from one node to another, or failing a resource. A cluster administrator can use the scripts to connect either locally or remotely to a cluster or node. These objects leverage Windows Server 2003's, Windows 2000 Server's, and Windows NT's clustering API. You can find the objects on the Microsoft Developer Network (MSDN) at http://msdn.microsoft.com/library/default.asp?url=/library/enus/mscs/mscs/cluster_management_objects.asp. You can learn more about Cluster Automation Server by visiting http://msdn.microsoft.com/library/default.asp?url=/library/en-us/mscs/mscs/programming_with_ cluster_automation_server.asp.
You need to identify which object provides you with the methods or properties for accomplishing your task. Next, you determine how to obtain the object, then invoke the object's desired property or method to complete the task. You can see an example of scripting Cluster Automation Server's object properties and methods at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/mscs/mscs/enumerat ing_objects_with_cluster_automation_server.asp.
Capturing Resource Information—The First Script
Our task required us to place ownership of each resource group on the primary server and move the resource groups back to the primary server after the completion of server maintenance. Moving a resource group also moves the resources contained within the group. The MSCluster.Cluster object provides the resource group collection object, properties such as OwnerNode.Name, and methods such as Move to manage the resources.
Experience has taught me a helpful procedure for troubleshooting code: Use the Option Explicit statement to force declaration of every constant and variable in the script. The statement must be the first line in the script. Accordingly, the script in Listing 1 starts with the declaration of the machine name for which the script is being run. An alternative to enhance this script is to replace the hard-coded reference to the machine name with a prompt to query the user to provide the machine name at runtime. After the constants and variables have been declared, Listing 1 creates an instance of the Microsoft Scripting Runtime Library's FileSystemObject object. The script then uses the object's CreateTextFile method to create a log file (clusterstate.log) as a TextStream object. The log file stores all the current resources for this machine. (Additional information about FileSystemObject is available at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/jsobjfilesystem.asp.)
The code at callout A in Listing 1 instantiates the MSCluster.Cluster object. The code at callout B then uses this object to create a connection to the selected cluster node by calling the cluster object's Open method with the ComputerName constant. This constant is set to an empty string. In such a case, the Open method automatically references the cluster settings of the local machine. To connect to a machine other than the local machine, you need to edit the listing and assign a different value to the ComputerName constant.
Assuming that the Open method doesn't return any errors, the connection to the computer is established. The code at callout C enumerates the resource groups in each cluster node. For each resource group, this section of code records the name of every resource contained in the resource group, along with the name of its owner node. Then, using the TextStream object's WriteLine method, the script writes this information to the text file of every resource contained in each resource group. After all resource information has been written to the text file in a comma-delimited format, as Figure 1 shows, the script closes the text file and releases the memory before ending.
Restoring Resource Node Ownership—The Second Script
After the script in Listing 1 has captured resource assignments for the cluster, the cluster administrator can take a server out of the cluster to perform whatever maintenance is necessary. When maintenance is complete, the administrator can bring the machine back online and, as the final maintenance step, run the script in Listing 2 to restore the resources to their original nodes.
As callout A in Listing 2 shows, the script references the FileSystemObject object. A prerequisite to restoring the resources assigned to a cluster node is the input file, clusterstate.log. Therefore, prior to opening the text file for reading, the script checks for the input file's existence by calling the FileSystemObject object's FileExists method. If the input file doesn't exist, the script displays an error message on the console, then quits. When the input file exists, the script instantiates the cluster object and connects to the local machine, as callout A shows. The cluster object is created before the input file is read so that the object isn't created repeatedly in a loop.
When the script has determined that the input file exists and opens it, the script uses the Do...Loop statement to process the text file line by line. The loop contiues to the end of the file, as marked by the TextStream object's AtEndOfStream property, which callout B shows.
Because the data is stored in the text file in a comma-delimited format, as I explained earlier, Listing 2 uses VBScript's Split method to separate the resource name from the resource owner node name. The Split method stores the split data in an array of strings. In Listing 2, the first element in the arrLine array is the resource name, and the second element is the node name.
Now that the data is in an appropriate format, the script can move the resources to their original owner nodes. The MSCluster.Cluster object's ResourceGroup property returns a ClusResGroups collection object. The collection object's Item property returns a ClusResGroup object, which represents a single group in that collection. The ClusResGroup object's Move method moves that group's resources, as callout C shows.
The Move method supports two parameters: Timeout and ClusterNode. Timeout is the time in seconds that the process must wait before setting the pending variable to True. ClusterNode is the name of the node to which the resource group should be moved. This parameter is optional. (You can find additional information about the Move method at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/mscs/mscs/clusresgroup_move.asp.)
To meet our needs, we needed to move the resource group to a specific node. Therefore, in Listing 2, the node is passed as a parameter. However, if the node information is omitted from the script (i.e., not passed as a parameter), the Move method moves the resource group to the best available node according to the following criteria:
- list of currently active nodes
- preferred groups node entries
- set of nodes listed as possible owners
After moving all the resources, the script closes the open input file and releases the memory before ending.
Tying Up Loose Ends
After going through my company's regular procedure of moving a product from the development environment to infrastructure to user acceptance test (UAT), my CRM scripts were finally put into production. To see a script work and resolve a problem gives you a good feeling—a feeling of accomplishment that makes you proud. The production team reported that they had no problems after implementing my CRM scripts. As a matter of fact, the cluster administrators were happy that their pre- and post-maintenance work was reduced to a simple double click.
However, we wanted to eliminate even that double click. To do so, we automated the maintenance process by adopting HP's Remote Deployment Pack (RDP). (For more information on RDP, visit http://h18013.www1.hp.com/products/servers/management/rdp/index.html.) Implementing RDP also gave us opportunities to automate tasks such as remotely imaging the servers and remotely deploying patches. As you know, more automation means more scripts. I say, "Keep them coming."