Despite what some advertisements lead you to believe, when a disaster strikes, you need more than just a large insurance policy to get things back to normal. And in some cases, you simply can't bring a business back to where it was before the disaster.
Mike Hotek, a principal mentor and colleague of mine at Solid Quality Learning, is one of the world's leading experts in SQL Server replication and high availability. As a consultant, Hotek works with companies of all sizes, from Fortune 100 firms to organizations with only a dozen people. Twelve of Hotek's client companies were housed in the World Trade Center in New York on September 11, 2001. Of those 12 companies, seven were destroyed that day 2 years ago. Although some data might have survived, no people from those companies remained to care about the information.
Survivors from each of the other five companies contacted Hotek for help. The first company called 2 days after the attack. Hotek spent only half a day working with that company because every piece of data the company owned and any information about how to recreate the data was lost when the Twin Towers collapsed. The business didn't survive. Two of the other four companies that had survivors were in similar situations—they simply didn't have anything left with which to restart their businesses. The remaining two companies have survived—primarily because they implemented Hotek's suggestions for disaster prevention and preparedness. Hotek described to me his experiences with these surviving businesses.
During the first week after the attacks, Hotek slept at most an hour each night. Despite all the disaster planning and preparation his clients had done, recovery took weeks of hard work. However, both companies were minimally functional within a week because of a fortuitous coincidence. Both companies had ordered new hardware before the attack, so in the week after September 11, they contacted the hardware vendors and redirected the hardware to the companies' new locations. Those preexisting hardware orders became the starting point for their new systems. One company was fully functional in about 18 days; the other was operating normally in about a month.
The magnitude of this particular disaster made full site recovery different in two ways from other disasters that require a complete system rebuild. First, because of the widespread devastation, no one had any idea how fast the system might be back up. No company officers or stockholders were breathing down the necks of Hotek and the survivors, asking "How much longer?" That the companies even had survivors was a miracle, so the news that the businesses might be operational again was beyond anyone's expectations. So no external pressures forced the companies to hurry their recovery.
The second reason this recovery was different from typical scenarios was more personal. Although many disaster-prevention strategy lists specify making sure that more than one person has all the system passwords and knows where the backups are stored, few people really plan for what they would do if the entire IT staff was no longer available. Both surviving companies lost most of their IT staffs, and the remaining people were in shock. Fortunately, in both cases, Hotek also had most of the necessary information because he'd helped the companies set up their disaster-recovery plans. For both companies, he was the one person who knew about the technology requirements of the businesses—which is why he gave up sleep for a while. Although company officers and stockholders didn't expect an immediate recovery, Hotek realized that he needed to help the companies recover as quickly as possible. He took these six basic steps to get both businesses functioning again:
- Determined what skills were available among the survivors
- Procured funding for recovery efforts
- Found a new site and ordered new hardware for the most crucial systems
- Hired contractors to help rebuild the systems
- Located all the backups and process documents and determined what data was still available
- Got the surviving IT staff and the contractors working
For both companies, part of the recovery effort involved planning a whole new information-management infrastructure that built in disaster-prevention and recovery strategies, including both hot- and warm-standby systems. Hotek explained that he became a replication expert because of all the work he does for disaster preparedness. When he sets up a replicated system, he gains the immediate benefit of having a distributed, scalable system, and he has a warm standby to turn to if some systems are lost.
Two years later, both companies are still in business and restaffed, although they are slightly smaller today than they were before September 11. Hotek still works with the companies, performing checkups and reviews. He still has specific knowledge about their systems that no one else knows because he designed and implemented those systems. But he believes his role will phase out over the next year or so, at which time he plans to turn over his knowledge to inhouse IT staff. Today, both companies have redundant offsite hardware that they can bring in at short notice. They both have distance clusters 25 miles from their main sites, and they use log shipping to keep the distance clusters in sync with the main systems. Data-loss exposure is now limited to about 5 minutes. Ironically, both companies had this level of redundancy in the planning stages 2 years ago and had expected to begin implementation in October 2001.
I asked Hotek if he'd learned any lessons about disaster prevention from his experiences after September 11. He said the most important lesson was that what he'd been telling his clients for years was true—no scenario is too far-fetched to consider when preparing for disaster. As Hotek knew all along, the factors that helped the surviving companies get back up and running were planning, preparation, and perseverance.