Every business using Microsoft SQL Server to manage mission-critical business processes should have both a high availability and a disaster recovery (HA/DR) plan that lets it continue business operations in the event of an unplanned outage. One of the key drivers for a HA/DR plan are the availability service level agreements (SLAs) that define how much downtime the business can tolerate. These agreements basically are statements of business value: the higher the cost of downtime, the shorter the SLA downtime window must be. Most availability SLAs are expressed in terms of numbers of "nines of availability" (see table). In a previous blog I talked about "Top 5 Things to Look for in a Disaster-Recovery-as-a-Service (DRaaS) Provider" as an alternative to building your own DR site. As a companion, we'll look at the implications of downtime and SLAs to your business.
The following table shows some of the most commonly used SLA downtime values:
Note: A month is 30 days and a year is 365 days
As you can see, an SLA with a "3 nines" uptime target means that the contracted service can only be down 43 minutes per month. As a point of benchmark, Gartner considers an unplanned availability SLA of 99.98 or 9 minutes per month to be "Best-in-Class."
What's your SQL Server uptime target? How do you compare to your peers? Paul Randal published some target SQL Server uptime survey results in June 2014. What he found was interesting. Thirty-four percent of those who responded had an uptime SLA of "4 nines" or greater. Thirty-two percent of respondents had a "3 nines" uptime target. We know from InformationWeek research that SQL Server is the most dominant database used for mission-critical functions (see chart), so it shouldn't be a surprise that a majority of those SQL Server users who responded have SLAs that would be graded as either "Outstanding" or "Best-in-Class" by Gartner. The obvious implication is that the cost of downtime for those critical functions is high.
That illustrates the importance of a HA/DR plan—to drive the time to restore IT services based on your critical database instances to minutes per month instead of hours, because the cost of downtime is so expensive. According to a Ponemon Institute and Emerson Network Power study done in 2013, the average cost of data center downtime across industries was approximately $7,900 per minute (a 41 percent increase from $5,600 per minute in 2010). The average reported incident length was 86 minutes, resulting in average cost per incident of approximately $690,200, compared to 2010's average of 97 minutes at approximately $505,500. For a total data center outage, which had an average recovery time of 119 minutes, average costs were approximately $901,500. (In 2010, it was 134 minutes at about $680,700.) The study also found that those surveyed average one complete datacenter outage and three partial datacenter outages per year.
Applying these numbers to the uptime survey results means that a majority of SQL Server users are trying to keep their yearly cost of downtime to $410,800 dollars or less, which is a 40 percent reduction over the datacenter average of $690,200. How does that match up to your—and your management's—expectations? More importantly, are you confident that you can achieve it?
Until next time…