We don't know when the next disaster--in whatever form--is coming, but we do know that there are disasters in our future. Whatever the situation, the service cycle associated with disaster recovery is critical. Whether companies are using traditional disaster recovery products and processes, disaster recover as a service (DRaaS) or some combination thereof, there are some important lessons--borne from experience--that they can learn from.
Much of that wisdom comes from system architects, as well as those who do DRaaS for a living. One of these fellows is colleague Doug Theis of Expedient (formerly nFrame), who’s director of market strategy for the organization hosting my lab’s extensive gear. Many of the points made herein are based on Theis' experience with client "scar tissue."
1. The disaster isn’t over until the recovery + 10 weeks.
There's the warning, the event and its elastic duration, the aftermath and the recovery. And then there's the actual recovery, which is the return to normalcy--whatever normalcy is. For example, floods wipe out equipment, but they also have destroyed employees' homes. Supply chains get disrupted, and recover in fits and spurts. Even if you have power and plumbing, food and other resources may be limited until new deliveries of supplies can be made. Your plans must reflect not just the event cycle of a disaster, but also the localized recovery of supply chains and logistics. This can take weeks and even months. It's important to remember that normalcy doesn't return when the Red Cross and FEMA leave; it arrives weeks after that.
2. Regulatory and compliance edicts are still in place.
While there may be leniency in reports and auditor visits, you’re still subject to the same regulatory requirements before, during and after a disaster. It may be tempting to cut regulatory corners while recovering from a disaster, there's no such thing as a compliance pass.
3. Testing is not optional.
Even the best-laid plans need testing. It costs money to perform disaster recovery testing, and it takes time, but you can’t prove the value of assets and planning unless you’ve done a real drill. Taking notes on what worked, what didn’t and why will provide insight into many facets of what will enable business continuity--and at what costs. Every test is a new opportunity to smooth future disruptions and disasters.
4. Inter-disciplinary continuity planning needs to be organizational DNA.
Companies are more likely to survive a disaster situation if everyone—across company disciplines—knows what to do before, during and after an event. Unfortunately, at many companies, disaster recovery plans are tribal knowledge—knowledge disappears when employees change roles or leave the company altogether. Tribal knowledge is invaluable, but must also pass from generation to generation. Ensure that disaster recovery plans are well-documented, purposefully distributed (including acknowledgement of receipt) and regularly updated. This process also requires habitually updating DR resources, and doing things twice--always updating the secondary environment. This habit needs to be in your organization’s DNA.
5. DR is an all-or-nothing proposition.
Resource duplication doesn’t endow instant protection. More importantly, IT is one element of business continuity and DR planning isn’t just IT’s job. In fact, if the entire organization is not involved and engaged, the plan is flawed from the get-go.
That’s because any organization, no matter what the size, is complex. If planners don’t write the DR script in a way that accounts for and accommodates all people, processes, supply chains, logistics and products, gaps will likely appear when the plan has to be put into action. IT often leads DR planning, but all stakeholders should lead the larger business continuity effort, with IT aligning with the approach. It’s not just IT continuity; it’s business continuity.
6. Failback is often more complicated than failover.
Unless a plan is fully tested, most organizations don’t have a clear understanding of the effort needed to recover, according to Theis. “Even when they get to the test, they don’t always understand--actually how long is recovery? Where are the critical points? Where are the IT staff qualified to do this?” he said. “The mediocre DR model is a second stack of gear somewhere. They often don’t test fully. It’s low value compared to the other 40 projects on their docket. There’s really no realistic effort.”
When failback is not fully tested or understood, DR often becomes a heroic effort, Theis added. “Do you really want to fail over at 2 a.m.? Are you thinking clearly at that hour? Do you know what the implications are?”
Real cyclical simulation is critical, especially since failback is often more complicated than failing over. “It is a hairy, hairy deal,” said Theis.
7. Syncing = success.
If an organization’s applications are out of sync, they may not know it until the worst possible time. As part of every failover and failback drill, organizations should check to see that synchronization efforts are complete and transaction-intact—to the point that there will be no business interruptions during failover and failback. Lacking synchronization, neither failover or failback will work because the premise of a working IT infrastructure was broken.
8. Continuity plans don’t live on in perpetuity.
Companies that employ DIY DR often have continuity plans that are 5 years old—“covered in dust on somebody’s shelf in an obscure office,” said Theis. Especially in this day and age of constantly churning technology and business considerations—not to mention mergers and acquisitions and other major events--plans have to be updated and reviewed on a regular basis, with changes documented constantly.
9. Dependencies dictate plans.
Most organizations divide into two IT categories--top tier and everything else—a key consideration when developing, testing and executing a DR plan. But people tend to forget what these categories comprise and which is which. Understanding the dependencies within each tier is very important--perhaps more important than anything else. The prep work of figuring out these dependencies is arduous but required in order to fail over and back in a way that makes sense for the business and is supported by DR plans, products and people.
10. People do need people.
Cross-training staff and making plans readily available for access are critical steps in the DR planning process. So is considering the human side to DR. For example, people with mobility or childcare/dependent care needs may not be available to help during an actual DR process. Those who are available to help need a script to follow. Employees in disaster situations also need compassion: People may be working under stressful personal and professional circumstances related to the disaster. Graciousness is a virtue any time, but especially during periods of duress. Flexibility counts.