Google sets the pace in achieving 99.9%+ SLA performance for cloud-based application suites

Despite IBM’s claim that Lotus Live is a true contender, in the eyes of most companies who are considering cloud-based applications Google Apps and Office 365 are the only games in town. Given Microsoft’s difficulties in achieving its SLA, it’s worth looking at the SLA record of its major competitor. And some interesting facts come into focus if you look at the information presented in the Google Enterprise blog.

First, in 2011 Google decided that they would exclude planned maintenance from the calculation of downtime. Scheduled maintenance has always been the get-out clause for services providers but now Google says that all downtime counts when they calculate SLA. Google further claims that they are the first major cloud provider to eliminate maintenance windows from their SLA. By comparison, the latest version of the Service Level Agreement for Microsoft Online Services (dated June 1, 2011 and available here) says that “Downtime means the total minutes in a month during which the aspects of a service specified … are unavailable, multiplied by the number of affected users, excluding (1) Scheduled Downtime…”


It seems that Google has created a competitive advantage in how it measures its SLA. However,counting scheduled downtime or not in the measurement of an SLA really doesn’t matter if real outages occur. In this respect, Google goes on to state that Gmail achieved an SLA of 99.984% in 2010 for both consumer and business users. In human terms this means that Gmail was unavailable for approximately seven minutes per month. I doubt that real-life people noticed the seven minutes as some of this time will have accumulated through glitches that occur for a few seconds at a time (the Internet, as we all know, is prone to glitches) and some of the time will probably have happened when users were asleep.

In fact, outages really only become bothersome when they last longer than the length of time it takes an average user to go and get a cup of coffee, or whatever is your beverage of choice. The logic here is that users will recognize that a problem is happening and put it down to their PC, local network, local IT, or something else and then go and get a drink. If the problem persists after they return refreshed and ready to work then they get annoyed. On the other hand, if service has been resumed then they are happy. The two outages suffered by Office 365 in August and September 2011 resulted in a total of circa 330 minutes downtime (if you were one of the users affected by both outages) demonstrates the unwanted attention and stress that flow from extended outages that fail my "is it time for a coffee" test. For those not close to a calculator, 330 minutes is equivalent to roughly 47 months of outage at the 99.984% level.

Microsoft hopes that Office 365 will deliver better reliability than BPOS and that it won't have to refund users with the kind of 25% credits for monthly subscriptions that it's been forced to pay for the two incidents to date. Even though it's been off to a bad start, I think that Office 365 will prove to be much more reliable over the long term. And if it does, then Google won’t be able to point out salient points such as:

Comparable data for Microsoft BPOS® is unavailable, though their service notifications show 113 incidents in 2010: 74 unplanned outages, and 33 days with planned downtime.

I have not been through the BPOS data to validate Google’s claim but I imagine that they wouldn’t make such an assertion without covering themselves with chapter and verse.

In terms of 2011 performance, in a September 27 article about their new status dashboard for Google Apps, Google state that they achieved an SLA of 99.99% for Gmail in the first six months of the year, or about five minutes downtime per month. I assume that this is measured against their new SLA calculation including scheduled downtime so this is really a terrific performance.

Gmail isn’t perfect and it has its ups and downs too. The disappearing inbox syndrome suffered by some 20,000 users on 28 February 2011 (a software update was later blamed) is an example of where Google has run into choppy waters. However, you can’t argue that Google has set the pace for SLA delivery for cloud email and application suite services and that Office 365 has work to do to get close to Google’s record.

Cloud email still only occupies a small but growing portion of the overall market. Customer faith and trust that cloud-based solutions really work will increase over time as performance demonstrates the worth and reliably of the solutions allied to some of the attractive financial aspects that salespeople invariably focus upon. Office 365 has started poorly and Microsoft now has to focus on achieving its 99.9% SLA target for the last quarter of 2011, then the first six months of 2012, then for all of 2012, and on into the future. And once it has a truly unapproachable record of delivery, the other strengths of Office 365 such as class-leading clients, user familiarity with the Office applications, and the potential of SharePoint will make it an even fiercer competitor than it is today.

