Floods, fires, earthquakes, power outages, and software and hardware failures are reminders of why disaster readiness and recovery are so important. Maintaining business continuity in the face of this adversity could mean the difference between weathering the storm and going out with the lights.
Enterprise IT groups know and handle this challenge well, but it can be quite difficult for smaller organizations to meet 99.0 percent uptime requirements, let alone 99.999 percent. Cost and complexity barriers keep many businesses from trying high-availability solutions at all, forcing IT staff to use manual, administrator-intensive detection, remediation, and recovery processes.
Many forms of high-availability solutions exist today, ranging from software-based solutions to mission-critical solutions that offer hardware-level redundancy and failover. The trick is to pick the right one for your organization, thereby achieving the desired availability without breaking your IT budget. As with network security, the more you can afford the better off you'll be, but there is a tipping point at which you're throwing good money after bad. In other words, your particular business might not require extreme measures. I recently took a look at Stratus Technologies' Avance high-availability software, one of the midrange solutions that can deliver availability at near-enterprise levels, but without the million-dollar outlay.
CIOs often call on systems administrators to reduce costs but still boost IT reliability. Administrators in small-to-midsized businesses (SMBs) tend to feel this crunch more acutely, because delivering fault tolerance can more than double the cost of the existing infrastructure for backup servers, redundant networking, and so on. Although native technologies in Windows Server are capable of getting you part of the way there, they fall short of the instantaneous failover that's needed for demanding workloads -- and demanding CIOs.
Stratus aims to solve this conundrum through a hardware agnostic, yet not entirely hardware independent, software-based availability package for SMBs. Stratus has made its name in enterprise-class high-availability solutions for more than 30 years, keeping the lights on 24 ´ 7 for critical human services, such as 911 call centers, hospitals, utilities, and more.
Avance combines a software offering with proactive management (which can even be monitored by Stratus remotely) and hardware redundancy. An Avance high-availability cluster provides near-zero failover and recovery times, with near-zero client impact (including stateful applications) using real-time monitoring and data replication. If you're running a heterogeneous environment, you'll also appreciate Avance's support for Linux server platforms (e.g., Red Hat, CentOS) and applications. Avance uses CentOS 5.5 and Citrix Systems XenServer virtualization technologies to abstract hardware from software, providing a foundation for transparently migrating OS and application workloads between physical systems in the event of a failure.
You can use most of the off-the-shelf server, networking, and storage hardware as long as any two systems you cluster are similar enough that a hardware mismatch doesn't result in bad driver behaviors (and thus a crash). In addition, the same RAID configuration must be used on both machines. One benefit of this clustering approach is that you don't need to purchase a dedicated storage array for data because replication between servers occurs over the wire.
The downside is that you still need an equivalently configured second server as a hot standby. Note that you won't have an active-active performance cluster. For more information, see the sidebar "How Avance Works."
For expediency, I started with two white-box Intel servers, which were supplied by Stratus. Each server had a S5520UR motherboard, dual quad-core Xeon X5560 processors, 24GB of memory, and 2TB of disk space. You can gain additional hardware resiliency if you select a chassis with hot-swappable components (e.g., CPU, RAM), RAID controllers, redundant power supplies, failover NICs, and so forth; doing so will reduce the likelihood of a single-server failure. This isn't required, however, since the solution's real-time monitoring includes more than 150 different metrics and predictive analytics that will trigger a live migration if a fault is either detected or about to take place.
Your dual-server configuration doesn't need to be any different from your standard build, with the exception of a dedicated gigabit Ethernet port on each machine for management and data replication, which is referred to as the "Sync" link. The servers can also be completely headless (after initial setup), because all maintenance operations are performed through a web-based console. However, Stratus recommends redundant Sync links to improve performance and fault tolerance.
Avance installation is straightforward and uses a self-imaged DVD. It automates setup for both servers through a single process, but you should reformat the machine if you're repurposing older hardware. (You can't change out the hardware on an existing OS platform build or migrate it from another machine unless it's identical hardware and already virtualized.) Adding the second machine to form a cluster is achieved by a fast software install driven from the primary node. When you join the second server to the cluster, an automated synchronization process images and configures it.
Avance's instant data replication between nodes means each server is always up to date. When a hardware failure, predicted failure, or planned shutdown occurs, the second machine simply picks up where the first one left off. This lets you carry out whatever maintenance is required on the first node without a service interruption. When you're done, you can manually flip the workload back to the first node or leave the workload on the second node, letting it migrate back to the first node only if a failure is detected on the second one.
Each virtualized server workload can be locked down and protected with anti-malware solutions and the like, but host-level intrusions are bad news. XenServer isolates virtual machines (VMs) from each other, and Stratus has invested in hardening the host configuration.
For additional protection, you could deploy a full application-layer firewall and place your servers in a demilitarized zone (DMZ), which is a common topology. Alternatively, you could deploy a dedicated security VM through which all traffic gets routed. However, a bare-metal hypervisor with no native application operations and a separate management server would provide a better overall (albeit more costly) security posture. The Avance console has an inactivity timeout feature but lacks any token-based or multi-factor authentication capabilities.
Every task in Avance is possible through the web management console, saving you from having to sit in the wiring closet with the machines. It's also convenient if you employ a services management vendor to remotely maintain your IT infrastructure. Although native IP repudiation or filtering isn't supported for limiting administration through only certain IP addresses, access through a VPN and firewall will grant similar security.
Using an easy-to-understand layout, the UI gives you quick access to Avance's default dashboard, which provides alerts, configuration details, and drilldown pages for managing both physical and virtual cluster attributes. You also have quick access to pages in which you can manage physical machines, set up storage groups and volumes to dedicate resources to specific workloads, lay out virtual networks, manage users, build VMs, and more. Most operations are driven by easy-to-use wizards that automate the tasks.
In keeping with the fully virtualized nature of the solution, you can create virtual CD installation points accessible by specific VMs, as Figure 1 shows. They can be used as either direct copies of .iso software media or downloadable installs by both servers and virtualized desktops. Although it might not be advisable from a security point of view, you can make physical components such as USB storage available to individual workloads.
When a failure occurs, Avance provides active monitoring across a variety of different categories, enabling a full range of fault detection, whether physical or virtual. As with some out-of-band (OOB) management solutions, predictive filters can help identify when something bad is about to happen, instead of just waiting for a failure. With this fair warning, you can get ahead of the problem before a catastrophic event occurs that even Avance can't handle. Although if you're using the right combination of metrics, which are dependent on the specific hardware and OS, I'm not sure what this could be.
To test Avance, I did a number of disagreeable things to the servers. I removed network cables, unplugged the power cord, killed VMs, and so forth. I even went so far as to hard-crash both machines at the same time by yanking out all power cords, even to the redundant power supplies (causing them to emit a variety of plaintive beeps). Impressively, nothing bad ever seemed to happen. Killing one entire server produced a warning in the console, as Figure 2 shows, but neither the management application nor the workloads (such as the Remote Desktop Services session) seemed to notice.
The VMs seamlessly kept going. When I brought the failed primary server back online, it quietly rejoined the cluster, resynchronized its data, and took its place as the new secondary node. I had difficulty thinking of anything else I could break without physically damaging the hardware.
Given these capabilities, what could you use Avance for, beyond the obvious uptime enhancements? As I previously mentioned, there are other forms of fault tolerance and clustering available, some of which might be better suited to certain workloads or situations. Areas in which Avance would be a natural fit include:
- 99.99 percent application availability
- Remote-site redundancy
- Small or branch-office resiliency
- Small- to average-size workloads (e.g., Microsoft Exchange Server, Microsoft SharePoint, customer relationship management—CRM—software, limited-scale database environments)
- Private cloud
Areas in which a different approach (or perhaps the more advanced enterprise-class V Series offering from Stratus) would be best include:
- High-throughput transaction processing
- Data warehousing
- Real-time computing
- High-capacity distributed applications or enterprise-scale deployments (e.g., multi-server email or database environments)
- Public cloud
Note that there isn't a facility for managing multiple Avance deployments through a single console. Thus, building one large cluster of powerful machines would be better than using several smaller clusters in a demanding environment.
Avance Lives Up To Stratus' Reputation
Avance lives up to the reputation established by Stratus' more advanced availability solutions. Avance also provides capabilities you'd normally expect in much higher-priced packages. With its focus on failover and ease of use, smaller IT shops with limited resources or training will be able to up-level their service offerings and greatly enhance disaster readiness.
But perhaps a more important question might be, "Would I install this in my data center?" The answer is yes, I would.