You need only take a quick look at the news on any given day to remind you of why your company needs a disaster recovery plan. Chances are, you won't ever experience a Level Four disaster, such as a terrorist bombing or natural disaster like a hurricane or flood. But even the smaller-scale Level One, Two, or Three disasters that you'll more likely encounter, such as power outages and server malfunctions, can paralyze business operations unless you've developed a plan for rapidly restoring IT services. Table 1 lists and describes the four disaster levels. You probably already have a disaster recovery plan, but it's wise to review it periodically and update it to accommodate changes in your business. Drawing on my experience in developing disaster recovery plans for clients, I've compiled a list of the 10 steps an organization of any size should follow when creating a new disaster recovery plan or revising an existing plan.
1. Review Your Backup Strategy
I generally recommend that clients perform full daily backups of all essential servers and data resources. Stay away from incremental and differential backups. In an emergency, you don't want to have to search for not only the last good full backup but also the five incremental backups necessary to complete the restore.
If you're running Microsoft Exchange Server or Microsoft SQL Server, consider making hourly backups of transaction logs so that you'll be able to restore your system to within 1 hour of when it crashed. One-step restore backup solutions are useful, but make sure you know how to manually recover the server should you have to restore your data on a different server platform. Store at least one tape off site weekly and store on-site tapes in a data-approved fireproof safe. Always ensure that you have on hand a new tape drive that can read your existing tapes. You don't want to discover that you no longer have a tape drive that can read your outdated tape format. Consider leveraging built-in features of Windows Server 2003 and Windows 2000 Server—such as Microsoft Remote Installation Services (RIS), offline folders, Microsoft Volume Shadow Copy Service (VSS), and Windows Server Update Services (WSUS)—to aid in the recovery process and help get your network up and running again.
2. Make Lots of Lists
Your disaster recovery plan can't have too much documentation. To recover gracefully from a disaster, you need to adequately document equipment, network layouts, applications, and technical and business procedures that you'll need to reconstruct your business. Here are some items you should document and make available and accessible to those who need this information (employees, consultants, service providers):
Business locations. Document the business address, phone number, fax number, and building management contact information. I suggest including a map to the business's location(s) that includes surrounding geographic areas.
Equipment list. Compile an inventory listing of all the network components at each business location. Include the model, manufacturer, description, serial number, and cost for each network component. Many software products are available that can help you create and maintain such an inventory. Often this type of list is necessary for insurance purposes, so it might already exist at your company (e.g., in the financial department). Should a disaster affect your company, this list will be useful for determining what IT will need to order to replace damaged equipment.
Application list. Make a list of business-critical applications running at each location. If necessary, include in this list specific recovery instructions for ERP applications. For major applications, include technical support contact information, account numbers, and—for applications that have a maintenance agreement—service contract information.
Essential vendor list. Compile a list of essential vendors—that is, those necessary for your business operations. Consider establishing lines of credit with these vendors in case bank funds aren't readily available after a disaster occurs.
Critical customer list. Compile a list of customers for whom your company provides business-critical services. Designate someone in your company to notify these customers of your business status after a disaster has occurred and provide estimates of when your firm will become fully operational.
3. Diagram Your Network
Use a software package such as Microsoft Office Visio 2003 to draw detailed diagrams for all networks in your organization, including both LANs and WANs.
LAN diagram. Construct a detailed diagram of the network layout for each business location, such as the sample diagram that Figure 1 shows. Make sure the diagram corresponds to the physical layout of the office (as opposed to a logical diagram of the network) to make it easier for someone who's unfamiliar with the office layout to find items. This diagram should show all network components and briefly describe each component and the OS version.
WAN diagram. If your company has a WAN, this diagram should include all WAN locations, similar to the WAN diagram in Figure 2. If you're on a VPN, the diagram should include the IP address, model, serial number, and firmware revision of the firewalls; WAN default gateway; VPN policies; and local IP subnet. Also document the firewall configuration(s) in both electronic and hard-copy form. If your company uses frame relay, be sure to document all the router configurations electronically and in hard copy. Make sure you've recorded all the Data Link Connection Identifier (DLCI) and circuit number information. On the diagram, also include information about the WAN carrier, including the carrier's name, technical support phone number, account number, and circuit ID.
4. Go Wireless
If a disaster makes it impossible for your business to operate in its regular location, consider using wireless equipment to restore the network quickly. Be sure to purchase equipment that supports the Wi-Fi Protected Access (WPA2) security standard because you probably won't have the infrastructure in place to perform full-blown Extensible Authentication Protocol-Transport Layer Security (EAP-TLS) authentication.
5. Assign a Disaster Recovery Administrator
I suggest you assign a primary and secondary disaster recovery administrator for each business location. Each disaster recovery administrator should have the other's contact information. Ideally the disaster recovery administrators should live close to the office location so they can easily reach the office in the event of a major disaster. The administrators are responsible for declaring the disaster, defining the disaster level, assessing and documenting damage, and coordinating the recovery effort. The administrators should have a good overall understanding of business operations, know how to prioritize office services, and be familiar with all operational aspects of the business location.
6. Assemble Teams
When a major disaster strikes, expect confusion, panic, miscommunication, disruption in services, and other uncontrollable forces that will counter your efforts to get your company up and running. You can minimize many of these challenges through sound disaster planning and communicating the plan to employees before disaster strikes. Verify that everyone involved with disaster recovery is aware of your business's disaster plan and knows their role in disaster recovery. The disaster recovery administrator should divide up business-recovery tasks that will need to be performed and assign employees to teams that will carry out those tasks. Here are some suggested teams; you should develop your own list of disaster recovery teams that cover areas of responsibility specific to your business.
Damage assessment/notification team—collects information about the initial status of the damaged area and communicates this information to appropriate members of staff and management. This team compiles information from all areas of the business, including accounting, business operations, IT, vendors, and customers. Following the assessment, the team oversees any salvage operations, such as salvaging equipment, office supplies, and backed-up tapes. Team members should be authorized to purchase replacements for equipment and supplies damaged during the disaster. This team will become the replacement team after the salvage operations are finished.
Office space/logistics team—assists disaster recovery administrator in locating temporary office space in the event of a Level Four disaster. Team members are responsible for transporting co-workers and equipment to the temporary site and are authorized to contract with moving companies and laborers as necessary to relocate to the temporary site.
Employee team—oversees employee issues, such as staff scheduling, payroll functions, and staff relocation.
Technology team—orders replacement equipment and restores computer systems; reestablishes telephone service and Internet and VPN connections.
Public relations team—communicates with the public about estimated reopening time and rescheduling of private-party appointments.
Safety and security team—ensures the safety of all employees during the entire disaster recovery process. This team decides who will and won't have access to the affected location and is responsible for notifying employees of any safety hazards that exist in the building and ensuring that the site is secure to prevent looting.
Office supplies team—orders new furniture, office supplies, and forms that are necessary to resume normal business operations.
7. Create a Disaster Recovery Web Site
Consider developing a Web site where employees, vendors, and customers can obtain up-to-date information about the company after a disaster. This Web site should be a mirrored site that's cohosted at two geographically separate business locations. On the Web site, the disaster recovery team will post damage assessments for business locations, each location's operational status, and when and where employees should report for work. The Web site should also include an interface where the disaster recovery administrator can post timestamped messages about the recovery effort. You might choose to make some of this information publicly accessible, but a majority of these pages should require a logon and should be protected with a Secure Sockets Layer (SSL) certificate. This site should also contain the latest copy of the disaster recovery plan in PDF format.
8. Test Your Recovery Plan
Most IT pros face Level One and Two disasters regularly and can quickly respond to such events. Level Three and Four disasters include "acts of God" and other factors that are out of your control. To respond to these more serious disasters, your disaster recovery plan should carefully organize and assign whatever resources you do have control over in such situations. Once you've devised a disaster recovery plan, you should test it regularly and revise it as necessary. When you test the plan, create different scenarios that simulate Level One through Level Four Disasters. You might find it helpful to discuss your plan with other IT pros to learn what worked and didn't work in their disaster recovery plans. For a real-world example of an IT pro who put her company's disaster recovery plan in action during Hurricane Charley last year, see "Riding Out the Storm," March 2005, Instant Doc ID 45263.
9. Develop a Hacking Recovery Plan
Hack attacks fall within the scope of a disaster recovery plan. For a discussion about some special considerations for addressing hack attacks in your disaster recovery plan, see the Web-exclusive sidebar "Planning for a Hack Attack," http://www.windowsitpro.com, InstantDoc ID 47392.
10. Make the DRP a Living Document
Review the disaster recovery plan at least once a year. If your company or network changes frequently, you should probably review the plan semiannually or even quarterly. Remember, an out-of-date disaster recovery plan is almost as useless as no disaster recovery plan at all.
|Project Snapshot: How to|
PROBLEM: You need a plan for responding to major and minor disasters to let your company restore IT and business operations as quickly as possible.|
WHAT YOU NEED: Disaster recovery planning team; data- and system-backup procedures; detailed documentation of business and IT information (e.g., procedures, equipment, networks, customer and vendor contacts)
DIFFICULTY: 3 out of 5
To get other great security-related content, subscribe to the free Security Update email newsletter.