An interesting series of events took place over the past several weeks that should be noted because of the events' similarities, relative closeness in time to each other, and implications for the future.
The least important of the events happened in early August. Google somehow mistakenly identified one of its own blogs as spam and deleted it. The blog was related to Google's custom search engine technology, and although deleting the blog didn't have a huge impact on customers, it did come as a surprise that a major technology company--one that considers itself to be on the extreme cutting edge--managed to make such a mistake. Obviously, some of Google's technology is flawed and fortunately it wasn't a heavily relied upon aspect of the company's technology that suffered in this incident.
At roughly the same time, Cisco Systems made its entire Web site unavailable through hardware failure. According to the company's blog, "The issue occurred during preventative maintenance of one of our data centers when a human error caused an electrical overload on the systems. This caused Cisco.com and other applications to go down. Because of the severity of the overload, the redundancy measures in some of the applications and power systems were impacted as well, though the system did shut down as designed to protect the people and the equipment. As a result, no data were lost and no one was injured. Cisco has plans already in process to add additional redundancies to increase the resilience of these systems."
Cisco's site failure was indeed a serious problem. Imagine the worldwide impact if that outage had occurred while customers were trying to download a recently released security patch for a vulnerability that was being actively exploited.
Next on the list is Skype, which managed to take down its entire worldwide peer-to-peer network last month. Because of flaws in its "supernode" software design, the company essentially created a situation in which a Denial of Service (DoS) attack became possible simply because many people were rebooting their computers at about the same time. As a result, Skype's VoIP network--which the company would surely like the majority of us to depend on for day-to-day voice communication--became useless for three days.
Yet another outage occurred when an Internet backbone cable was cut. The cut cable took down major portions of networks operated by Level 3, Cogent, and TeliaSonera, all of which provide Internet connectivity to many endpoints. When the cut cable was discovered, repair crews inadvertently repaired the damaged cable with another damaged cable and didn't discover the damage to the second cable until after the repair didn't work. As a result, the outage lasted far longer than it should have. Meanwhile, Internet connectivity for many entities was nonexistent. This particular incident wasn't any one company's fault; however it's noteworthy as yet another outage with considerable impact.
If those events weren't strangely coincidental enough already, there's more. Microsoft recently made mistakes that rendered a large number of people's Windows systems nearly useless. According to Microsoft (at the URL below), "preproduction code was accidentally sent to production servers" and the code just happened to handle the company's Windows Genuine Advantage (WGA) technology. The overall effect was that for a short period of time, the affected Windows systems could not be activated, and for a long period of time (nearly 20 hours), Windows systems could not be validated.
Think of the implications of these incidents, and ask yourself, "How secure is my enterprise if it relies increasingly on software as a service?" For John Dvorak's take on this issue, see "Don't Trust the Servers" at the following URL.