Over the past couple of months, I’ve been reviewing high-availability servers in the SQL Server Magazine and Windows IT Pro labs. Two of the servers I’ve recently looked at are the NEC Express5800/R320 and the Stratus ftServer 4500. Both of these servers depart from standard servers in several ways. The most significant difference is that these servers are designed primarily for high availability, and their systems’ design definitely reflects this goal.
Related: A Holistic View of High Availability
The primary difference between these systems and standard servers is that the NEC Express5800/R320 and the Stratus ftServer 4500 both utilize a similar architecture in which all of the system components are duplexed. In other words, a single high-availability server is composed of two separate and distinct motherboards, and each motherboard has a set of CPUs, RAM, power supply, and storage. The two CPUs are kept in lockstep, and if there’s a hardware failure, the backup set of system components immediately takes over, and the server continues to provide uninterrupted service. Both of these systems can provide five nines of availability. These systems are more expensive than standard servers, but if you have the need for a fault-tolerant server, they’re worth the extra cost.
Other High-Availability Options
Of course, specialized hardware isn’t the only route to high availability. Windows Failover Clustering is Microsoft’s primary high-availability solution and is designed to protect against unplanned downtime caused by server failure. However, it can also provide increased availability for planned downtime, and it lets you perform rolling upgrades, where you can manually fail over to a backup node and upgrade the original server while the backup node handles the application workload. Although Windows Failover Clustering no longer requires specialized hardware, it does require multiple servers, which need to be configured with enough available capacity to handle the additional workload after a failover happens.
SQL Server provides database mirroring and log shipping as a way to increase the availability of your applications. Like Windows Failover Clustering, these technologies are primarily designed to provide protection from unplanned downtime. Database mirroring and log shipping both provide protection at the database level. Database mirroring allows for automatic failover, but it’s up to you to create the logins and other server properties that are required to handle a failover. Log shipping is primarily designed for site recovery and disaster recovery scenarios in which your data is transferred to a backup system at another site. Log shipping doesn’t have an automatic failover option. Although I’ve presented these three solutions as separate options, there’s nothing stopping you from combining them.
Weighing the Cost
When you’re determining which of these types of high-availability solutions best fits your company’s needs, you need to weigh the price of the availability solution against the cost of downtime. Some applications can experience significant amounts of downtime with no cost—other than some end-user inconvenience. For this type of application, regular backups might be a good enough precaution against unexpected failure.
However, mission-critical and ecommerce applications can have huge costs associated with downtime. For many companies, if their web application or its back-end database is down, a major portion of their income is shut down. For instance, a PayPal outage in 2009 was estimated to have resulted in the loss of $2,000 per second, or $7.2 million per hour. And this only counted the loss of direct revenue opportunities. For companies such as PayPal, the cost of downtime is extremely high and certainly worth the price of implementing a five nines level of availability solution.
Finding the Right Solution
Although not every company needs or can afford five nines of availability, high-availability solutions are within the reach of almost all organizations. Even small-to-midsized businesses (SMBs) can have the need for high availability. Although the direct costs of an outage might not be as high for a small business as it is for an enterprise company such as PayPal, the overall impact of a critical application outage to the business might be more significant. Hardware-based solutions, such as the NEC Express5800/R320 and the Stratus ftServer 4500, tend to run in the $30,000 to $40,000 range, but they can offer a load-and-go type of solution that has little complexity and can leverage the vendor’s support organization. Solutions such as Windows Failover Clustering and database mirroring can cost less, but the burden of bearing the extra complexity falls on the customer. In either case, although there’s a price for higher availability, that price can easily offset the cost of downtime.