You're probably familiar with the old saying, "Fool me once, shame on you; fool me twice, shame on me." This week has me wondering whether anyone in the cloud has ever heard this saying.
I'm not talking about the highly publicized outages of the Microsoft Office 365 services in August and September of this year. Both of those outages were clearly growing pains, and Microsoft has maintained solid uptime since then. No, I'm talking about the recent outages at a number of tier 1 Internet providers, apparently caused by a buggy firmware update for routers manufactured by Juniper Networks. On November 7, a firmware update caused a number of routers to crash while the routers were attempting to exchange routing table updates via the Border Gateway Protocol (BGP). This crash resulted in an annoying chain of failures: When a router to which the firmware had been applied received a specific BGP update, it would fail and reboot itself. As the firmware was more widely deployed, the outage spread.
I haven't seen any credible estimates of the percentage of Internet users affected, but the footprint of the outage was large enough that Research In Motion (RIM) was forced to deny that their BlackBerry service was down, and many customers of Time Warner Cable – not a small company – were unable to access the Internet at all.
What does this have to do with the cloud? Consider the plight of Time Warner Cable business customers who use Google Apps, Office 365, or LotusLive. (OK, I made that up; not very many people use LotusLive.) Through no fault of their own, and through no fault of their email hosting provider, they lost the ability to access their messaging data. For cloud customers who use BlackBerry devices, insult was added to injury by the apparent failure of the RIM network; thus far it's not entirely clear what happened there, but for cloud customers who couldn't use their desktop or mobile devices to access email, it must have been quite frustrating.
Of course, this incident doesn't mean that cloud services are inherently bad. Customers who depend on Internet connectivity for business operations face the same risk; after all, you could argue that there's little difference between an outage that prevents you from sending mail to and from other Internet users and one that knocks out all online access to your mail. In either case, caching clients such as Outlook will continue to give you access to your existing mail, so it's plausible to claim that many affected users might still be able to get their work done.
Instead of using this outage as a stick to beat cloud messaging providers, it's more useful to consider it as a wake-up call. If your business operations really depend on Internet access – and if you're using cloud-based messaging, they probably do – then you should consider how and whether you need to provide redundant access to the Internet.
Redundant Internet access can be as simple as provisioning a second lower-bandwidth connection from a different provider or as complex as duplicating your entire Internet-access system. The point isn't how you provide this redundancy; it's that you consider whether you have a business requirement to do so and then act accordingly. Don't wait for the next outage, because that outage could happen at any time.
In closing: Today, November 10, is the 236th birthday of the United States Marine Corps. On this occasion, I want to send a huge shout of "Semper Fi!" out to all Marines, present and past. As the traditional toast at Marine dining-in events says, "God bless the United States, and success to the Marines!"