Trusting the Cloud
But there were some noteworthy cloud-related failures last week that speak to a much wider problem we're going to experience more and more moving forward. And that is that we are increasing our reliance on connected systems in an age when secure, always-on connectivity is a myth, at best.
May 3, 2011
Last week, I wrote about the industry's overreaction to news that Apple (and, it turns out, Google and Microsoft) was tracking smartphone users' locations using the device integration GPS, 3G, and Wi-Fi antennas. My assertions about Apple's official response were since proven correct: Apple says it isn't tracking anyone, thank you very much, and it will update the phone's software in the coming weeks to curtail the storing of location information. As expected.
So problem solved, I guess. But there were some noteworthy cloud-related failures last week that speak to a much wider problem we're going to experience more and more moving forward. And that is that we are increasing our reliance on connected systems in an age when secure, always-on connectivity is a myth, at best.
I'm speaking about Sony's egregious PlayStation Network (PSN) hack, which exposed the personal data of over 77 million customers to a single nefarious individual, and Amazon's epic Amazon Elastic Compute Cloud (EC2) outage, which lasted several days and took down a number of other sites and services that rely on EC2.
Both events are unique, cautionary tales. But maybe not of the kind you're imagining.
For Sony, the issue was security 101: The company apparently didn't encrypt much user data on its own servers, and after being hacked, Sony eventually informed its customers that the hacker had obtained "profile data, including purchase history and billing address, and PlayStation Network/Qriocity password security answers." Originally, the company also told customers that credit card numbers (excluding the security code) and expiration dates may have been obtained as well; this past weekend, it admitted that as many as 10 million credit card numbers could have been stolen.
Even though Sony's security policies and crisis handling skills don't equate to any inherent failing in cloud computing, many are using this event as a justification for not trusting their vital personal information online. After all, if a company as big as Sony can screw this up, how can they trust any company? But the reality here is that Sony just screwed up, and correctly protecting customer information shouldn't have been difficult. That this happened almost a decade after Microsoft's security wake-up call, resulting in Trustworthy Computing, is embarrassing and suggests that not everyone was paying attention. I bet many firms are right now testing their own security controls to ensure they don't get Sony'd.
Amazon's outage is perhaps harder to explain, but slightly more understandable. Embarrassingly, for me, it came just days after I praised an Amazon employee about his company's ability to securely and reliably deliver cloud computing services at a scale that is beyond the capability of the competition. He told me that it all came out of the company's experience with online retailing at a massive scale.
Once the back patting had stopped, however, EC2 went down for the count and stayed that way for days. It took down a number of partner sites too, most famously Foursquare (the social networking darling du jour) and Reddit (a news aggregation site).
According to Amazon, this kind of failure should never happen, and it appears to have been caused, in part, by human error. Long story short, a routine capacity upgrade triggered a "network event"—"mistake were made," I guess—causing a mirroring network to take on the main load, become overwhelmed, and then trigger its own cascading series of mistakes.
Lesson learned, Amazon has fixed the previously hidden design flaw in this system and claims—correctly, I think—that such an outage will never happen again. But Amazon's biggest sin here, I think, was similar to that of Sony's during the start of PSN outage: It simply wasn't transparent enough quickly enough. Customers were freaking out and Amazon wasn't providing enough information. Heck, it wasn't providing any information.
Every time something like this happens, my tech press and blogger peers break out the "Chicken Little" headlines and write pointless articles with rhetorical titles, such as, "Does Amazon Outage Prove Immaturity of Cloud Computing?" or whatever. This supposed dialog—it's just click bait, plain and simple—avoids a very real truth. And that is that companies like Amazon—or Microsoft, Google, and perhaps even Sony—are far better equipped to deal with problems when they happen than are most private IT organizations. These types of failures are still uncommon and are in fact newsworthy only for that very reason.
So the next time Gmail doesn't work for a few hours, or Exchange Online experiences intermittent connectivity, or whatever, just remember this: If it was happening inside your own firewall, you'd have to fix it. Infrastructure belongs in the hands of dedicated infrastructure providers. Neither of these episodes offers any reason to suddenly think otherwise.
Related Reading:
About the Author
You May Also Like