Digital Clusters for Windows NT

What are 99.9% PC/LAN server up-time and availability worth to you? More to the point, can you afford to bet your business on Windows NT?

Many companies have their LAN, databases, and all other business functions on NT systems. But companies such as financial institutions question whether NT is ready for prime-time, mission-critical applications. When you rely on computers for your accounting, product development, human resources management, data management, and now sales through the Internet, your systems must be operational 24 hours a day, seven days a week. Failure is not an option.

Clustering, which has been around in Unix and VMS for more than 10 years, is one technology for achieving near 99.9% server up-time. By letting you duplicate a mission-critical system, this technology guarantees availability, so you can bet your business on your OS.

Now clustering is coming to NT. Although this technology is not on the grand scale of its Unix or VMS predecessors, clustering offers functionality heretofore unknown to PC operating systems and represents a big step for NT toward availability worthy of those major-league, mission-critical enterprise applications. By having two computers instead of just one to support a task, you double your chances for meeting the goal of 99.9% server up-time.

Clustering 101
Before I get into the specifics of Digital's cluster solution, let me explain some clustering terminology: load balancing, primary server, failover (or secondary) server, failover, and failback. You can set up each server so that all five terms apply to it.

For example, suppose you use SQL Server for your accounting and order fulfillment departments and you have two databases that you want to protect by implementing a cluster. In a single-cluster environment (two servers), you can manually load balance--divide the work between the two servers--by installing SQL on both machines. Make one the accounting database's primary server--the system with principal ownership and management responsibility for a resource--and the other system the ordering database's primary server. Then, set up each system to have a primary disk (or disks) on the shared storage array (a chassis housing shared disk drives where cluster software stores and shunts data between systems). This disk will serve as the database device. So far, this configuration is no different from setting up two independent servers, except that the shared disks are on a subsystem physically connected to both servers.

Now, you set up the cluster by configuring each machine to be the other machine's failover (secondary) server--the system that will inherit ownership and responsibility for a resource--to the other. So when one system (the primary server) goes down, it will fail over--relocate cluster services or resources from the faulty system to the operational one. Its resources move to the failover server, and the service (such as a database) keeps running. When the primary server comes back online, the service will fail back--automatically migrate cluster resources from the failover server back to the primary server.

The failover server is not just a cold standby server (as with Novell): The server performs meaningful work and provides more than disk-mirroring or single-system availability through hot-swappable disks. The open architecture of both the software and the off-the-shelf hardware means that you have scaleability built in. You can add disk storage almost ad infinitum and functionality with more CPUs and peripherals such as printers and tape drives.

Digital's Configuration
Digital Clusters for Windows NT is two servers, a network connection, cluster software, and an external disk array with SCSI adapters. (Although the 1.0 product release supports hardware-based RAID, the Digital BA356 storage subsystem doesn't. A future product release will have a built-in RAID 5 controller. Also, version 1.0 does not support software-based RAID through NT.) A key feature of this clustering solution (Digital will contribute this feature to Microsoft's Wolfpack standard--Mark Smith explains Wolfpack in, "Closing In on Clusters," page 51) is that Digital's clustering can use off-the-shelf hardware for disks, network cards, and SCSI controllers.

Digital will officially support only its listed hardware (AlphaServer 1000, 1000A, 400, 2000, 2100, 2100A, 4100, Prioris ZX Pentium, Prioris ZX Pentium Pro, Prioris HX, Prioris XL), but the software works on other systems, too. You can use any two servers running NT Server 3.51 with Service Pack 4, but the clustered CPUs must be the same. You can't mix Intel with Alpha because of differences in how the NT File System (NTFS) handles file tags (information on permissions, groups, etc.) and page logs on Intel and RISC platforms. The two clustered systems don't have to be similarly configured (one can be a dual Pentium and the other a quad Pentium Pro), but on each machine, you have to install the same software (SQL Server, Oracle7 Workgroup Server, or any other application) you intend to fail over from one system to the other.

The disk array is a BA356, which is part of the Prioris kit you buy from Digital, without disk drives. This standard external storage chassis has a multichannel-capable, Fast and Wide, differential SCSI-2 backplaneyou can have as many SCSI channels on it as you have drives and controllers in your two servers. You can set up the disk array to be either in the middle of the SCSI chain between the two servers or at the end. Where you put the array depends on whether you leave the terminators installed and whether you use what Digital calls a trilink SCSI adapter. This adapter is a Y connector from the disk array to the two servers. You can order a standard cluster kit from Digital that comes with cables, terminators, and an Adaptec 2944W Fast and Wide differential SCSI-2 controller for each server.

The network connection is just a medium for a heartbeat between the two machines. The heartbeat lets each machine know the other is alive. If one disappears, the failover begins, and the remaining system takes over all assigned functions.

This connection can either be through a dedicated direct connection with a basic 10Mbit Ethernet card, or you can go through your usual high-speed LAN connection. Beware of using your usual LAN, because your domain controller and competition for your Ethernet media can introduce extra delays that can add to the 20- to 30-second failover time. Also, a failure in the part of your network between the clustered machines will initiate a failover: Each cluster machine will think the other is dead, so the clustering software on each server will drop ownership of the disks and leave them offline to prevent data corruption. Digital recommends a direct, standalone connection between the two servers for best performance.

The cluster software is where all the magic occurs. This software acts as a shim--new code that the software adds without disrupting existing OS code. The software provides the means for the SCSI drivers and the network layers in the OS to carry out the clustering capabilities. The software also has an administration tool for setting up drives, failover scripts, and other characteristics of the cluster (such as its network alias and administrator login and password). The logic behind the cluster's operation is complex, but the user and administrator aspects are simple.

The Technology
Let's get down to the business of understanding how Digital's clustering works. To the user, a cluster of two computers and a disk array appear as one cluster alias with shares. A new network path for Digital Clusters for Windows NT shows up in your browse list, as you see in Screen 1. Users connect to the alias instead of directly to each server. For the users, that's all there is to it--they won't even know the names of the two servers.

Digital Clusters for Windows NT follows ideas such as objects and groups that are already in NT. An object refers to a server, a disk, the cluster alias, or failover scripts and shares. Groups are where you assign objects that will fail from one system to the other. Although these groups aren't the same ones you need in user administration, the concepts are the same. For example, you create a cluster group, assign it a primary drive, create the shares, and assign applications such as SQL Server. You assign script objects to cluster groups, and these objects control what happens when a system fails. (This idea can be confusing because you assign the script to the server with primary control, but the script runs on the server that remains after a failure!)

The cluster wizardry is in the dynamic link libraries (DLLs) you install on the server and client. On the server side of a basic cluster, you have a SCSI storage shim and a network shim. The SCSI shim lets the two servers reside on the same physical SCSI bus without arguing over bus and disk ownership. The servers can't share the drives simultaneously, otherwise the servers would corrupt the data. Instead, primary server failure causes the disks to go offline, and control shifts from one server in the cluster to the other. The network shim on the servers lets you create the cluster alias that users reference through the new cluster domain.

On the client side, a DLL lets the system see the alias and treat both servers as one. Without the client software, you can see only individual servers, and the cluster alias doesn't appear in your network browse list when you try to connect to its network shares.

Failover Manager software on each server monitors the other and manages access to shared resources. The Failover Manager uses the network and storage shims and the Cluster Failover Manager Database (CFMD) to orchestrate the policies regarding the failover of cluster groups (and included objects). Figure 1 shows the Failover Manager architecture. Each machine contains complementary information that lets the Failover Manager manage the process of moving resources back and forth.

Another component of the cluster is the failover script. A script is a command that refers to the primary server but runs on the failover server. Although you don't use a scripting language (you enter commands as you do from a command prompt), you can initiate multiple actions when a server fails or comes back online. One example is a netsend command that issues a network message to all users about the system failure (or system recovery). You can execute an application that performs certain administrative functions. This application can be useful for failing over applications that the clustering software doesn't support, or failover IP addresses if you are running Microsoft's Internet Information Server (IIS) or Exchange on the cluster. Screen 2 shows the script administrator.

Installation and Configuration
When you buy Digital's cluster kit, you get the cluster server and client software, SCSI adapters, cables, terminators, and external storage cabinet; you must buy the servers, the disk drives, and NT Server separately. Although you have several server options, make sure your configuration lets you manually load balance your cluster and the configuration has enough power to handle the secondary load when one system goes down: Don't use a single-processor 100-MHz Pentium system as the failover for a quad-processor 200-MHz Pentium Pro.

Setting up a cluster is easy; configuring actions upon failure such as SQL database failover is more complicated. To help, Digital provides a GUI administration tool that lets you set up your cluster objects and groups, create failover scripts, etc. Screen 3 shows the GUI tool. NT's standard File Manager (which you can access from the Tools menu) lets you establish network shares for whole drives or directories. All file system attributes and security remain intact, and you can manage them as you do for any nonclustered drive.

Making failover groups, assigning drives, and entering scripts are all point-and-click (and drag) operations, with little typing necessary even for the scripts. For example, when you install a drive, you tell the cluster which bus the drive is on (because your server now has more than one SCSI bus), assign the drive to a primary server and a failover server (and run Disk Administrator from the primary server to format it, etc.), and put the drive in a group. The cluster drives must be on a bus separate from the drive or drives containing the OS and application software.

The GUI administration tool presents three views of cluster resources: system, cluster, and class. The system view lets you see the cluster from a physical hardware perspective (system names, SCSI adapters, and disks). The cluster view shows you the cluster from a failover group perspective (defined groups, included disks, applications, etc.). The class view presents the cluster from the perspective of available cluster objects, without regard to physical location and grouping, and shows all objects such as group lists, SQL objects, and scripts.

You can use the GUI administration tool to perform manual failover for administration purposes when you are servicing the machine and need to take it offline, and for manual failback. You can disable failover and failback entirely if you expect the system to be up and down several times in a short period.

You can install client software to support automatic failover for any Windows for Workgroups, Windows 95, and NT client. Both the server and client components support common protocols (TCP/IP, Internet Packet eXchange--IPX/Sequenced Packet eXchange--SPX, NetBEUI) with hooks for Simple Network Management Protocol (SNMP) server/cluster management through Digital's ServerWORKS Manager 2.0. This capability gives you failover of NTFS shares, SQL 6.5 and Oracle7 Workgroup Server 7.1 and 7.2 databases, and any applications that you launch or close with a script. Note, however, that Digital clustering does not support failover for DOS, OS/2, or Mac OS clients. Users on such systems can still connect directly to the servers and access network shares, but these users will have to manually reconnect to the remaining server after a primary failure.

You can upgrade an existing server installation for clustering or start from scratch. Each server has its own primary disk for the OS, applications, and so forth, and only data is on the shared drives. Whether you upgrade or set up a new cluster, you need to install the SCSI controllers in each system, set up a separate direct network connection (recommended over using your LAN) between the servers, and connect the storage subsystem. The software uses InstallShield on both the server and the clients, so installation is easy; Digital provides free software licenses for as many clients as you need.

When you set up a cluster, it has its administrator login and password, separate from the domain accounts (although you can match your domain administrator login, the separate login and password add a level of security). However, all domain user accounts still function, and each machine has its administrator login.

Application and database failover configuration depends on the program, such as SQL Server 6.5 or Oracle7 Workgroup Server 7.1. SQL 6.5 includes stored procedures that support clustering, so you can set up a primary database server and a failover directly through the database application.

This first product release of Digital Clusters for Windows NT does have a few hitches. For clustering to work on the client side, users must access the server through a network redirector modified by a network shim (which is a kernel-mode component). So after you install the cluster server and client software, for the failover to function, users must enter the common cluster name. The client Name service intercepts user universal naming convention (UNC) requests for cluster resources and sends them to the server Name service, which translates the cluster alias into the UNC server address. The server Name service passes the UNC server address for the cluster member that is exporting the resource back to the client Name service, which in turn passes the UNC back to the client redirector.

This process lets a user connect to a share through an alias rather than directly to a specific server. If users access a server resource directly (through the Connect Network Drive option and browse list) and the system fails, they lose the connection as if no cluster existed. The only way around this problem is to educate the users not to connect directly to the servers.

Performance
You can perform manual load balancing by assigning specific database work or file services to specific servers and disks. Digitial Clusters for Windows NT 1.0 doesn't support dynamic load balancing (where the cluster uses any available hardware for processing overflow, regardless of the administrative setup). However, you will see dynamic load balancing within the next couple of years and automatic IP failover for Web and email applications in version 1.1. You can script any applications that the cluster software doesn't directly support, but here are some caveats: If you are clustering an application server, you have to design applets (that a script executes) to manage your applications during a failover. Even then, users can still experience service interruptions. For now, you are better off leaving any compute work on your clients and data on the cluster servers.

When a failure occurs, users will experience a 20- to 30-second delay during the failover (you can adjust this setting on the server). Local applications will keep running on the client, and a well behaved, or cluster-aware, application will at most give a message such as "Network connection not ready: retry?"

Windows NT Magazine Lab ran several tests using a Prioris ZX5133MP/4 (quad processor 133-MHz Pentium) and a ZX6166MP/2 (dual processor 166-MHz Pentium Pro). In our tests, SQL failed over without complaint, and a test application (from a Digital demo CD) halted for about 30 seconds and then continued. We didn't test performance because the machines weren't set up for it. Digital says failover time will be the same no matter what hardware you run the cluster on.

So what data is lost during a failure? It depends on what you're doing. In a database, the application might roll back the logs to reconstruct the data, and you will lose any cached information from a query (especially if you're in the middle of a query at failure). An application such as Microsoft Word (if run from the client only) will pause and perhaps display the "retry connection" message; however, if you're in the middle of a save, you can lose your changes. If Word runs from the server, the program will crash entirely.

Other problems occur with nonsupported applications on the clustered servers. If a client runs an application connected to a cluster resource that contains open files or named pipes--an interprocess communication mechanism--a failover will break the pipe. Reads or writes to the cluster resource will fail if the client application doesn't handle the I/O properly. The individual client application is responsible for handling the error, then closing and reopening the file or named pipe on the failover server. A properly written program running on a client connected to the cluster alias automatically performs these steps, transparently to the end user. This sequence of steps maintains file system integrity.

The clustering software automatically disconnects users from the failed server and reconnects them to the remaining server, so failover and failback don't affect the user much. For situations other than those described above, Digital makes no provisions for preserving application context. You may get dumped out of the application and have to reconnect to your file, but this situation is no worse than when a single server goes down--it's just a whole lot faster and doesn't require manual administrator intervention.

Wolfpack
Digital is working closely with Microsoft and other vendors to develop a clustering standard code named Wolfpack. In fact, Digital and Microsoft are working so closely that Wolfpack and even NT Server will directly incorporate some of Digital's concepts. Digital's products will include all the Wolfpack standards and some of its own enhancements. Figure 2 illustrates Digital's Wolfpack strategy. Digital will provide a wizard to help its NT cluster customers migrate to Wolfpack after its release. If any functionality in Digital's product doesn't make it into Wolfpack, Digital will provide that functionality as a low-cost add-on called the NT Cluster Plus Pack.

Digital's goal in this cooperation is to avoid leaving customers out in the cold if they go with Digital's first release. Digital Clusters for Windows NT 1.0 will interoperate and scale according to the Wolfpack APIs and standards, so customers will be able to upgrade cleanly--without losing their investment in existing hardware and software.

What makes Wolfpack so special when clustering has been around for so long? The PC/LAN environment has never had such functionality in a nonproprietary, hardware-independent, and inexpensive standardized technology, which is what Wolfpack will be. Other availability solutions such as Novell's NetWare SFT III require specific software (and sometimes hardware) that not everybody supports. When you own the OS (e.g., NetWare, VMS), you can easily design whatever you want into it and be proprietary about the technology. Standardizing and supporting end users in any configuration they want is not easy if you also want to provide an upgrade and product-interoperability path. Wolfpack aims to achieve this goal, and Digital is betting heavily on its success.

Present and Future
Digital is following the Unix and VMS path toward full NT cluster functionality. Digital's VMS clustering product is the roadmap for its NT-based clustering products. Digital Clusters for Windows NT 1.0 is the first product release. Digital will introduce RAID support and ServerWORKS Manager 2.0 integration in a third-quarter update. Version 1.1 will include NT 4.0 support and IP failover for Lotus Notes, Microsoft Exchange, and IIS.

Digital is aiming its NT-based clustering at mission-critical environments, such as finance, medical, and utilities, which need near 99.9% server up-time. In an operating environment that needs basic failover capabilities for SQL Server and file services, Digital has hit its mark. Digital's next challenge is to deliver products that address the remaining issues of dynamic load balancing and IP failover.

However, phase 1 is an excellent start. Because Digital Clusters for NT is hardware independent and relatively inexpensive when compared to other NT hardware vendor solutions or Novell, it's already a major step toward reaching the Wolfpack goals.

Digital Clusters for Windows NT

System Requirements: (per server) Network card, SCSI adapter; NT Server 3.51 with Service Pack 4 (per cluster); External storage system; SCSI 2 disks
Digital Equipment * 800-354-9000 or 800-344-4825
Web: http://www.windowsnt.digital.com/clusters/default.htm or http://www.digital.com (To find a local reseller)
Price: Software: $995 per server; Prioris Kit: $3000 - $4500

Comments

Plain text