Several major platform vendors and business-critical application providers have begun to make formal announcements about their intent to support Microsoft's Wolfpack clustering initiative (for information on this initiative, see Joel Sloss, "Wolfpack Beta 2," June 1997). As a result, many systems administrators and designers are in a panic over planning for the orderly introduction of this new technology. Much of this planning will focus on load distribution and failover scenarios for key applications. Although many administrators and designers see storage planning and data management as minor components of the overall process, these factors are key to the success or failure of a clustering implementation.
As of this writing, Microsoft plans to formally announce Wolfpack's availability this summer. Several of the early adopter partners such as Compaq Computer, Digital Equipment, HP, IBM, NCR, and Tandem and other platform partners such as Amdahl, Siemens-Nixdorf, and Stratus will make available a broad range of Microsoft-certified clustering configurations. At the same time, cluster-aware versions of business-critical applications such as Oracle Parallel Server, SAP R/3, the Microsoft BackOffice Suite, and Computer Associates (CA) Unicenter TNG will complement these hardware announcements. These partnering forces will combine to quickly move this technology into the enterprise. During this time, many MIS executives will face a lot of marketing hype that oversells the ease of integrating these clustering solutions. Systems administrators faced with this situation need to realistically look at integrating clustering storage and data management components (for information about why you would need to implement a clustering solution, see Mark Smith, "Clusters for Everyone," June 1997).
Storage Pragmatics and Challenges
Each vendor that supports Wolfpack will offer a slightly different solution; therefore, you need to consider some common storage issues across the board. One key issue is the need to establish a storage hierarchy and a backup and fault recovery plan early on.
In establishing this storage hierarchy, you take a different approach from the one you typically take with Hierarchical Storage Management (HSM). Instead of focusing on classes of service (i.e., cache/SSD, online, nearline, offline) and the use of various storage devices (e.g., disk, tape, optical) to provide these classes, you need to focus on segregating SCSI disk storage devices and subsystems depending on whether they are server, Wolfpack, or application driven. With application-driven requirements, you also need to determine whether your disk requirements are for raw or formatted (NTFS) drives (raw for Oracle Parallel Server and other business-critical applications that manage their own disk space, and formatted for all other applications). You can then apply optional data protection and availability schemes (i.e., RAID, disk mirroring) to these hierarchies as you see fit.
Microsoft built Wolfpack Release 1 so that both servers in a Wolfpack pair operate in an active/active mode (for definitions of such terms, see "Clustering Terms and Technologies," June 1997). As a result, each server will support different sets of applications, day-to-day workloads, and licensed copies of applications from the other server in case one server fails over. This configuration requires that you prepare a worst-case approach to plan for RAM, cache, and disk capacity on each server and storage subsystem.
You also need to analyze which protection or availability scheme works best for each server and storage subsystem. No one scheme is universal, so you must plan for some type of partitioning or multiple storage device or subsystem.
In addition to these concerns, you must be aware that Wolfpack relies on a quorum drive (a dedicated drive--or spindle--that both servers share to store and retrieve quorum resource log information). It can be a single point of failure: If this drive crashes or becomes corrupt, Wolfpack loses all its housekeeping information and neither failover nor failback can occur. At a minimum, you must provide hardware mirroring on this drive or consider backup-on-the-fly, 24-hour-per-day protection.
As a final consideration, the NT Server code (boot disk) and the Wolfpack application layer need to reside on a separate disk drive or subsystem for maximum availability and data integrity.
You can meet most of these requirements through the use of advanced SCSI interconnected RAID systems, which allow for multiple and independent configurations in each RAID chassis cabinet, and additional controllers for multiple data paths. These RAID systems provide multiple levels of component redundancy and fault monitoring, hot-swappable drives and other key components, and a configurable and expandable cache. Manufacturers such as Digital Equipment, DataGeneral, and Symbios Logic provide these systems and are involved in Wolfpack to some extent.
In regard to storage devices, all drives (including the server boot drives) must be dual-ported SCSI or reside on a dual shared SCSI bus. This configuration lets you connect the drives to both servers and their controllers.
This drive and bus arrangement is not unique, but using it for clustering requires more attention to cabling and termination. You will want to make sure that the performance, cache, and capacity of all components match. In addition, these drives and subsystems must match their controllers' SCSI type--e.g., fast-wide-differential (FWD) drives to FWD controllers--including cabling, terminators, and backplanes. This need for compatibility applies whether you use the cluster in a shared-disk or shared-nothing mode. (As Wolfpack starts to widely support fibre channel, much of this requirement will go away)--fibre channel will also provide attendant increases in data bandwidth.)
Database applications, such as Oracle Parallel Server, have proprietary file system and data management schemes (e.g., Oracle Enterprise Manager) and use unformatted drives that they format and partition using proprietary schemes. These devices must be dual-ported and use a shared-disk mode, where both servers have access to the same data files (and metadata). The applications also use a distributed lock manager (DLM) to prevent concurrent writes on the same file and to allow for synchronization and serialization on simultaneous reads. Screen 1 shows a Wolfpack cluster in shared disk mode for Oracle Parallel Server.
Another important requirement in planning a Wolfpack integration is deciding how to manage data across the cluster. Data management includes load balancing and distribution, and backup and fault recovery. Several software providers are working closely with Microsoft and its hardware partners to develop cluster-aware versions of data management products. These products include integrated network management applications (e.g., CA Unicenter TNG) and backup and restore applications and utilities.
All these applications look beyond the cluster as a single system image and manage all levels of each server in the cluster appropriately. They also monitor and manage the load across the cluster, identify areas that need attention, and control each server during operations such as full and incremental backup and restore, and recovery of lost data. Finally, these products let systems administrators take down one server in the cluster for routine maintenance or software upgrades, or perform incremental or full backups on one server without losing access to critical applications and data.
Have a Plan
Most pitfalls surrounding cluster implementation result from poor planning. To eliminate the potential for such catastrophes, I recommend that you develop a storage and data management plan. Your plan needs to consider
- Storage hierarchy layout and capacity planning (per drive or spindle)
- SCSI bus and cabling layout, ID numbering (lower numbers get higher priority), naming convention, and termination planning
- RAID level, mirroring, and cache-size planning (read and write-back)
- Backup timetables and execution windows
- Defragmentation and file migration
Regardless of the Wolfpack solution you choose, you need extensive and common-sense planning up front to put these systems into production quickly and to realize their benefits.