Fault Tolerance on the Rise

The world of 24 x 7 fault-tolerant-computer vendors isn't crowded. Compaq's NonStop Integrity systems, Sun Microsystems' variation of its SPARC system, and Marathon Technologies' Intel-based system for Windows NT are the main solutions. Therefore, Stratus Computer is about to make a big move by marketing midrange and low-end Windows 2000 (Win2K) fault-tolerant computers. At Comdex/Spring 2000, Stratus unveiled its plans to release three systems.

This move is quite a switch for a $1 billion company that has made its mark in the high-end server market for more than 20 years selling the Stratus Virtual Operating System (VOS) and UNIX boxes. For the development and release of the new products, Stratus intends to follow Intel's processor roadmap—first offering products based on the 32-bit Intel Architecture (IA-32), then offering IA-64-based products after 64-bit processors become available.

Stratus' product lineup includes ftServer 5200, ftServer 6500, and an unnamed low-end product. The ftServer 5200 (code-named Melody) will come with two 550MHz Pentium III Xeon processors, 512KB cache, a maximum of 2GB of RAM, 10 PCI slots, a dual (i.e., 2 boards) modular redundancy (DMR) or triple (i.e., 3 boards) modular redundancy (TMR) configuration, and 48-disk RAID 1 storage. The product will cost between $30,000 and $35,000 and will be available in August 2000. The ftServer 6500 (code-named Liberty) will come with four 750MHz Pentium III Xeon processors, 1MB to 2MB cache, a maximum of 4GB of RAM, 10 PCI slots, a DMR or TMR configuration, and 48-disk RAID 1 storage. This product will cost between $40,000 and $45,000 and will release in September 2000. Stratus hasn't released the details of the low-end server (code-named Tune). However, the entry price will be between $15,000 and $20,000, and the product will be available in fourth quarter 2000.

Stratus' Win2K systems are unique in several respects. Stratus duplicates or triplicates all the components of the system, then uses a clock to synchronize the separate motherboards. Each system operates as one system image that has proprietary cross-checking hardware. Because Stratus' computers operate as one system, they require only one OS license. Stratus builds these systems from off-the-shelf components but adds proprietary device identification, I/O hardware, and cross-checking application-specific integrated circuits (ASICs). In a DMR system, a proprietary algorithm compares the system state against a standard to check for failure. In a TMR system, the algorithm is simpler and polls the boards to determine whether one board doesn't match the other two.

Stratus has engineered numerous other fault-tolerant mechanisms into its computers. When an application running on a non-fault-tolerant system fails, the Windows OS dumps the system state to disk and reboots. Stratus' systems keep the system state in persistent memory so that when a system fails, the reboot occurs from system memory. The system performs a session dump after the reboot so that the administrator can analyze the results.

Because 50 percent of Win2K's blue screens result from device-driver failures, Stratus has tried to harden the drivers. Stratus employs rigorous board and driver certification that requires driver source code and often asks that drivers have checkpoints to diagnose driver and board conditions, such as ID number, revision number, and errors. Stratus also created a hardware protection boundary that isolates the driver and causes the device, instead of the system, to fail. When the system detects a driver writing to protected memory, it intercepts the call, takes the peripheral offline, and switches over to a backup. To alleviate the 25 percent of blue screens that result from DLL failures, Stratus wrote an add-on DLL manager software package.

Each Stratus system has a dial-up modem connection to a Stratus service center that knows the identity of every hardware and software component in the system. Diagnostic software and hardware that Stratus' worldwide service centers manage remotely are also part of the systems. As a result, Stratus can offer a system that guarantees customers 99.999 percent reliability, which represents 5 minutes of downtime per year. To bring down a Stratus system requires severe operator error, and many of Stratus' UNIX customers have years of service without failure. Stratus charges 15 percent of a server's cost annually to monitor it, and more than 90 percent of the company's customers buy this service.

When a device fails, a Stratus service center automatically takes the device offline and ships a replacement to the customer overnight. Stratus claims that often the first time a customer knows a component has failed is when the replacement arrives. Additionally, the configured components show a green, yellow, or red status light. Therefore, an untrained person can go to the failed component, pull it out, and insert the replacement.

David Flawn, Stratus' vice president of worldwide business development, explained that Stratus' systems provide a strong fault-tolerant alternative to other vendors' clustered Win2K systems because clusters are difficult to set up, difficult to maintain, and require specially written software to run. Flawn said, "The ftServers are fault-tolerant hardware and provide full compatibility with standard software."

By entering the Win2K market with low price points for its ftServers, Stratus will compete with mainstream server vendors such as IBM, Compaq, Hewlett-Packard (HP), and Dell. Steve Kiely, Stratus president and CEO, anticipates that many implementations, such as application service provider (ASP) and ISP, Microsoft Exchange Server, and Microsoft BackOffice deployments, can benefit from mission-critical hardware. Stratus customers can also move UNIX applications to less-expensive Win2K systems.

The Stratus systems will be among the first systems to receive Windows 2000 Datacenter Server (Datacenter) certification. Stratus has high hopes for the ftServer systems and expects to sell a significant number of systems in the next year. This volume will change the sales model for the company and give it a dramatically bigger profile in the market. Stratus will sell its servers through channel partners and might seek distribution through OEMs. Stratus might become an important competitor in the Windows server space.

