Microsoft is providing many of its Windows Azure customers with a 33 percent credit for the February 29 service disruption that was related to a Leap Day glitch. Bill Laing, Microsoft's corporate vice president of the Server and Cloud Division, made an official statement detailing the outage. Here's an excerpt of the statement from Laing that explains how the Leap Day bug hit Azure:
"The leap day bug immediately triggered at 4:00 PM PST, February 28 (00:00 UST February 29th) when GAs [guest agents] in new VMs [virtual machines] tried to generate certificates. Storage clusters were not affected because they don’t run with a GA, but normal application deployment, scale-out and service healing would have resulted in new VM creation. At the same time many clusters were also in the midst of the rollout of a new version of the FC [Fabric Controller], HA [host agent] and GA. That ensured that the bug would be hit immediately in those clusters and the server HI threshold hit precisely 75 minutes (3 times 25 minute timeout) later at 5:15PM PST. The bug worked its way more slowly through clusters that were not being updated, but the critical alarms on the updating clusters automatically stopped the updates and alerted operations staff to the problem. They in turn notified on-call FC developers, who researched the cause and at 6:38PM PST our developers identified the bug."
The service disruption occurred in several regions around the world and lasted until 2:57 A.M. PST on February 29, which left several customers without access to their cloud applications. Because of the "extraordinary nature" of the outage, Microsoft is providing the credit to all customers of Windows Azure Compute, Access Control, Service Bus, and Caching for the entire affected billing month, regardless of whether their service was affected. "Microsoft recognized that this outage had a significant impact on many of our customers. We stand behind the quality of our service and our Service Level Agreement (SLA), and we remain committed to our customers," Laing said.
DevProConnections would like to hear your thoughts about the Windows Azure outage. Were you impacted by the Windows Azure service disruption? Just add a comment to this blog post or send us a tweet using the Twitter handle @devproconnect. For detailed information about the cause of the Windows Azure service disruption, see "Leap Year and Windows Azure Cloud Outage: Root Cause Analysis" on Talkin' Cloud.