People have been talking about Microsoft's Wolfpack initiative ever since Microsoft announced its intent to develop a new clustering standard for Windows NT in October 1995 (for information about Wolfpack's origins, see Mark Smith, "Closing In on Clusters," August 1996). Since then, Wolfpack has evolved, and we're finally getting close to an actual 1.0 product release--Microsoft plans to release the software sometime this summer. But does this first version of Wolfpack live up to Microsoft's claims for a mission-critical solution? To help answer that question, the Windows NT Magazine Lab tested Wolfpack Beta 2 as a high-availability file and SQL server.
Although I tested a beta release of Wolfpack, the software worked with limited difficulty. Beta 2 incorporates about 90 percent to 95 percent of the functionality of the release to manufacturing (RTM) version, so Microsoft still has a few changes to make before releasing the software.
Before we look at the software, you need to be aware of Wolfpack's hardware requirements. First, you can't use just any two servers to build a cluster. Microsoft will support only tested and certified configurations from its forthcoming Wolfpack Hardware Compatibility List (WHCL--Microsoft will post this list on its Web site when it releases Wolfpack).
Microsoft has created a Hardware Compatibility Test (HCT) CD-ROM that vendors--and end users--can use to test and verify Wolfpack-compatible systems and configurations. Keep in mind that Microsoft will support Wolfpack on only those configurations that pass. Microsoft has shipped a beta version of this CD-ROM to about 30 system vendors and will make the beta available on the MS Web site when it releases Wolfpack. Amdahl, Compaq, Dell, Digital Equipment, Fujitsu, HP, IBM, NCR, Siemens Nixdorf, Stratus, and Tandem have all publicly announced their intentions to certify Wolfpack clustering solutions. The bottom line: Don't assume you can upgrade any NT server to use Wolfpack.
Unlike other clustering software solutions such as Octopus and Vinca, Wolfpack has a greater hardware dependency and higher startup costs. You can use off-the-shelf hardware as long as Microsoft has certified it for use with Wolfpack. When you consider your initial investment in disk controllers, disk arrays, and NICs, you begin to realize that Wolfpack is for serious IS shops. I doubt that many vendors will make much of an effort to certify older hardware, so unless you are lucky enough to have one or two current server models, you'll have to add two servers to your purchase order. A one-man shop running a Web server out of the basement can implement a cluster just as easily as Chase Manhattan, but implementing any cluster takes dollars and a great deal of forethought.
If you plan to purchase hardware that you might want to cluster with Wolfpack, you'll want to review Wolfpack's requirements and follow them. Installing and configuring Wolfpack is not that hard, but you must set up your existing servers according to Wolfpack's requirements. If you don't follow the proper setup procedures, you'll have to back out of your entire installation and start over. This back peddling can result in a lot of down time while you reconfigure and reload your systems and data. If you're already running a beta Wolfpack cluster, note that the RTM version of Wolfpack will let you upgrade and preserve your existing cluster setup.
The Windows NT Magazine Lab reviewed Wolfpack Beta 2 on two IBM PC Server 704 systems (one 200MHz Pentium Pro, 128MB RAM, two Intel Pro/100B NICs, two Adaptec 2940W SCSI controllers) and a PC Server 3518 external disk array with 12 hard disks. This configuration, as you see in Figure 1, is just one of the clustering solutions that IBM plans to certify for Wolfpack. (For more information about how the Lab tested Wolfpack, see "Testing Wolfpack, LifeKeeper, Standby Server for NT, and NT Cluster-in-a-Box.")
Setting Up Wolfpack
Most of the problems I ran into during the review of Wolfpack relate to the fact that I was testing a beta release. Many of the bugs that I encountered (such as corrupting NT by using Beta 2 to uninstall Beta 1) will be long gone by the time you see the RTM version. However, starting with a clean install of NT Server 4.0 and Service Pack 2 (SP2) or SP3 doesn't hurt.
Wolfpack requires an external SCSI disk array, so even if you have 12 available drive bays in your server, you can't use them with Wolfpack. The software requires this configuration so that you can attach both server nodes in the cluster, as you see in Figure 1.
For Wolfpack to work, you must place all files and application components that you want the cluster to protect on the shared disk array and configure volumes and RAID stripe sets before you install Wolfpack. If you make these changes after the fact, you corrupt the partition tables (signature files) on the disks and make the disks completely unusable (reformat time!). Microsoft plans to fix this problem before it releases the final version of Wolfpack.
Installing Wolfpack is easy--the CD-ROM automatically starts (AutoRun) and puts you right into the Configuration dialog box. To set up the cluster, you first need to make sure both servers are part of a domain (no workgroups). You also need to have a dedicated administrative service account with Domain Admin group membership (such as Wolfpackadmin), and you must create a password for this account. When I reached this point in the installation, I didn't have the proper administrative account set up and the installation crashed. Microsoft states that it will fix the RTM version so that the installer will exit gracefully if you haven't met all the administrative requirements. Finally, you must have a free IP address for the administration services and an available quorum drive (a disk spindle that Wolfpack uses to determine whether another server is up or down) on the shared array so Wolfpack can store cluster administration files. The only other component you must set up is networking.
You can use your network in several ways, depending on how many heartbeats you want to set up (for a definition of heartbeats and other clustering-related terms, see "Clustering Terms and Technologies," page 62). The easiest approach is to use your servers' usual LAN connection for all cluster services, user connections, and the heartbeat--but this configuration leaves you with a big single point of failure: the LAN connection. Alternatively, you can have a dedicated connection or network crossover just for the heartbeat and use the LAN connection for everything else. Microsoft strongly recommends that vendors certify Wolfpack clusters with a private interconnect between the servers.
To configure Wolfpack to work with your network, enter your network information (IP address, whether you are using a private interconnect, and a name for the network resource) and a name (alias) for your new cluster, and you're finished with the first server node. To set up the second server node, tell the server node what cluster to join, and Wolfpack takes care of the rest. After you reboot the server nodes (one at a time) and load the management utility (Cluster Admin under Administrative Tools), you can start setting up cluster groups.
Configuring a Cluster
Screen 1, shows the basic GUI for the Cluster Administrator tool. It is easy and straightforward to use, but more documentation on how to set it up for specific purposes (e.g., fault-tolerant disk shares, SQL Server) would have been nice--the price of testing a beta version, I suppose.
The Cluster Administrator gives you an NT Explorer tree of cluster nodes (those servers that share clustered information), groups, and resources. You can move these groups and resources around, create new ones, move them from one cluster node to the other, take them online and offline, and so forth.
Wolfpack comes with several generic resource types, such as fault-tolerant disk set, generic service, and generic application, and specific resource types, such as Internet Information Server (IIS). Strangely enough, Wolfpack doesn't include a resource type for SQL Server (you have to create a group that includes individual resources for the SQL Executive and SQL Server services). Microsoft plans to provide setup instructions and a pre-customized resource DLL for SQL Server 6.5 with Wolfpack 1.0. The Cluster Administrator, which you can also install on a remote NT workstation for remote administration, works like any other Windows application (e.g., you can right-click objects to bring up menus of actions, double-click objects for attributes, and move and copy objects).
All the disks on your shared array show up in the Cluster Administrator as individual groups, which you can reconfigure to suit your needs. Once Wolfpack has ownership of these groups, you can't administer them from NT Explorer. In fact, if you don't bring these groups online first, you won't be able to access them at all.
After you properly configure the system, you can create failover groups for applications, services, shares, and so forth. These failover groups are virtual servers (aliases) that your users connect to, instead of connecting directly to resources on an individual server. That way, when the primary system fails, the alias moves to the other node in the cluster and continues working.
Microsoft designed wizards into the Cluster Administrator tool to automate and simplify cluster configuration. Screen 2 shows the Preferred Owners dialog box in the group wizard, which lets you create a new group alias and assign a primary server. Screen 3 shows the New Resource dialog box in the resource wizard, which lets you select the type of resource (application, fault-tolerant disk set, service, etc.), what group the resource belongs to, and what dependencies the resource has (such as what other services or resources must be online before you can activate this one). For example, SQL Server objects have dependencies: Before the SQL Executive and SQL Server services start and the cluster software brings the database online on the secondary node, the disk volume with the data devices must come online.
You also need to be aware of your security configuration. Wolfpack requires that your servers belong to a domain, so you want to let the domain handle system security. However, if you make any specific user logon or other changes to one server, you need to make identical changes to the other server. File, database, and other security objects will transfer with the file share on the cluster, but user security will not.
Testing a Cluster
I tested Wolfpack in the Lab by failing over disk shares and SQL Server. During the tests, I failed over the disk shares from one system to the other--either through the Wolfpack software or by shutting down the primary server--without a problem. In both cases, failover took about 30 seconds.
Unfortunately, our evaluation cycle was cut short by a catastrophic hardware failure in the disk array, which limited our SQL Server failover tests. This example serves as a lesson and an important warning about fault tolerance and clustering solutions: All NT clustering solutions remove many, but not all, single points of failure.
For example, if your shared disk array bites the big one, your cluster does too. The same principle is true for a LAN connection--even in an active/active configuration, if the hub or switch attached to one of the server nodes fails, the clustering software doesn't know that it needs to fail over and the client loses the connection.
Wolfpack doesn't reconnect you to a resource after a failover, but it returns the resource to the last state saved on a shared disk before the system failure. For example, if you experience an interruption in service, Wolfpack does its best to let you pick up (reconnect) where you left off and continue working.
Keep in mind that the way your system reacts when the failover occurs depends on the application you're using. For example, if you're working in SQL Server 6.5, which contains no cluster-aware code, and the cluster server crashes during a transaction, SQL Server rolls back the database to the point before you started the transaction. When the database fails over and comes online on the secondary server, you don't see any trace of the query or commit. (However, you can log the SQL Server rollback, which might let you recover lost transactions.) You have to reconnect to the server and resubmit the request. SQL Server freezes or reports a service failure, and you must wait for the secondary server to come up before you can hit Resend and continue.
If you fail over a file share, your users might never know anything happened--they won't have to reconnect to the share because the alias simply moves from one server to the other. For example, if you are working in Excel, your system caches your spreadsheet while you have it open. If the file server crashes and you try to save the file before the service comes online on the secondary server, you get a path not found error. Once the share is available again, you should be able to save the file without any problem. A server crash while you're saving a file might corrupt the file, but this limitation has more to do with how the individual application, not Wolfpack, handles files.
Many actions can cause a failover to occur. You can manually fail over a server so that you can perform maintenance on it without a significant service interruption; your SCSI controller can fail, causing applications to stop; or you can lose the interconnect (or multiple interconnects). An application can trigger a failover if the service (such as SQL Server or IIS) stops.
You can configure Wolfpack-protected resources for automatic or manual failover. Manual failover might be preferable in a situation where the server keeps crashing for unknown reasons and you don't want the service switching back and forth multiple times in a short period. Automatic failover is useful for a power failure or other short-term situation. With automatic failover, you simply select Allow Failback in a group's Properties page and designate a preferred server to force the group to automatically return to that server when it comes back online.
Wolfpack supports only an active/standby configuration for specific applications such as SQL Server and Exchange. This limitation is application-specific because SQL Server 6.5 and Exchange 5.0 are not fully cluster aware. These applications don't support multiple instances of the same program running on the same server (without manipulating directories, Registry keys, and DLLs, which is how other solutions such as LifeKeeper offer active/active configurations). IIS does allow for active/active operation on a Wolfpack cluster. You can set up IIS to be active on both nodes, each running one or more virtual servers that you can fail over. To set up IIS in an active/active configuration, you use Wolfpack's group wizard. SQL Server 7.0 will offer several enhancements that will address these issues. Despite these application-specific limitations, you can run file shares and printers simultaneously on both nodes with no problems--Wolfpack won't drop connections on the secondary node as it takes over services from the primary.
Building cluster-friendly applications is up to the user and vendor community. Client applications need to allow for automatic reconnect when necessary or graceful exits when their resources disappear for a short time. Microsoft has created a software development kit (SDK) so that you can design cluster-aware applications or middleware recovery kits. The documented Wolfpack APIs also let you customize the resource wizard and create new resource types.
Wolfpack will be excellent for high availability (99 percent). Its short failover time (approximately a 30-second delay) beats waiting 24 hours for a technician to fix the server.
|Wolfpack Beta 2|
Price: To be determined
|IBM PC Servers
520-574-4600 or 800-426-3333